Data Lakes – Deep Diving into Data Like Never Before

As data analytics, machine learning and AI continue to rapidly evolve, so, too, does the need to acquire, access and catalogue large amounts of data required to power data analysis. This has given rise to something called a “data lake”.

The standard model for data storage has been the data warehouse but in a traditional data warehouse, the data must be classified and formatted carefully before being inputted to the warehouse (schema on write). Because the data is so formally structured, the questions must be carefully defined, as well. A data warehouse is expensive, too, and affordable only to corporations large enough to support the enormous costs needed to design, build, house and maintain the data center infrastructure and associated software costs.

The Data Lake Difference

The data lake is also a storage repository but with several significant differences:

  • The data lake can hold all types of data: structured, semi-structured and unstructured.
  • The data doesn’t have to be filtered or sorted before storage – that happens when the data is accessed (schema on read).
  • The costs of a data lake are vastly diminished thanks to scalable storage on demand in a cloud-based platform like Microsoft Azure which also eliminates costly infrastructure.

Optimus Information recently asked Ryan O’Connor, Chief Technical Strategist and Milan Mosny, Chief Data Architect, to talk more about data lakes and how Optimus is using the technology to further the business goals of our clients.


Q. How do you define a data lake?

Milan: A data lake holds data that is large in volume, velocity or variety. This is data acquired from logs, clickthrough records,  social media, web interactions and other sources.


Q. So, when would a business use a data lake versus a data warehouse?

Milan: A business unit will use a data lake to answer questions that a warehouse can’t answer. These are questions that need huge amounts of data that won’t necessarily be present in a warehouse. The data lake can supply answers that will increase the agility of the decision making or the agility of the business processes. Without a data lake, a business will have to use an ETL (extract, transform and load); they will have to define the ETL, build it and the load the data into the warehouse before they can begin to create the questions to get the answers they’re looking for. The data lake eliminates the need for the whole ETL process and saves enormous amounts of time.


Q. Is there a minimum size or amount of data needed to start a data lake?

Milan: I wouldn’t worry about minimum sizes. The best way to approach creating your own data lake is to start with a variety of data and then grow the lake from that point of view. One of the strategic strengths of a lake is that it holds so many different kinds of data from multiple (and different) sources. Variety is the key and that’s where I would focus.


Q. Data lakes are typically on cloud platforms like Azure. Can a data lake be on premises?

Milan: It can be, but only really big companies can justify the cost of running the extra servers needed to store the data. Why would you even bother when Azure and other cloud platforms are so scalable and affordable? It doesn’t make much sense, financially. Plus, Azure contains so many of today’s powerful data lakes technologies like Spark, a lightning-fast unified analytics engine, Azure Databricks and Azure Data Lake Analytics. In fact, Microsoft has a suite of superb Azure analytics tools for data lakes. The nice thing about these tools is that you can work on storage which is extremely affordable with Azure. So, you dump your data into storage on Azure and then you can spin up the analysis tools as you need them – without having to spin up the Azure cluster at the same time.


Q. Since a data lake can hold all sorts of data from different sources, how do you manage a data lake?

Ryan: The key is how you organize the ecosystem of ETLs, jobs and tools. You can use Azure Data Factory or Azure Data Catalogue which lets you manage the documentation around the datasets, what’s in each dataset and how it can be used and so on. As Milan said, Microsoft has recognized the massive impact of data lakes and has already produced some tremendous tools specifically for them.


Q. How is Optimus going to introduce data lakes technology to its customers?

Ryan: Well, we are already implementing data lakes in our analytics practice. What we’re offering clients right now is a one-week Proof of Concept (PoC) for $7500 CAD in which Optimus will do the following:

  • Identify a business question needing a large dataset that cannot be answered with a client’s current BI architecture
  • Ingest into Azure Data Lake storage
  • Define 1 curated zone
  • Create a curated dataset using Spark, Azure Databricks or Azure Data Lake Analytics with R, Python or USQL
  • Create 1 Power BI dashboard with visuals that reflect the answer to the business question
  • Provide a Knowledge Transfer

Q. Speaking of Power BI, Optimus is a huge fan of this tool, correct?

Milan: That’s right. We love it because we can build out stuff quickly for our customers, especially when it comes to PoCs. For visualization of data, nothing beats Power BI, especially when it’s applied to data lakes. It can connect to Hadoop clusters, to large storage volumes – in fact, it can connect to just about anything, including APIs.


Q. What is the purpose of the “one-week PoC”? What will your customers get out of it?

Ryan: We are only doing one curated zone as part of our offer. A customer would have multiple business problems they would want to answer, of course but this one-week PoC gives them a taste of what is possible.  A large project would require a full analyze phase, architecture, build out, test and deploy methodology.  A platform would also need to be chosen to show the data.

Milan: Our customers can expect us to set up the basic structures on Azure for them and we’ll give them examples of business questions around which they want to build so they can see how to expand it to other areas, as well.

 

A data lake can bring enormous opportunity for powerful data analytics that can drive significant business results. How it is set up and used is the key to how successful a role it can play in your company’s use of data analysis. Optimus Information can help by showing you what a data lake can do with our one-week PoC offer. Take advantage of this offer here.


More Resources:

Think Big: How Design Plus Data Will Change Your Business

Is design thinking catching your attention? It should. Data insights not available before now can transform your business models and allow you to lead in your industry when you incorporate elements such as predictive, mobile dashboards and machine learning. This wave of change is forcing data architects to re-think and re-design how programs and applications must be built. To truly innovate, design teams need to push the design thinking envelope on almost every project.

“You can have data without information, but you cannot have information without data.”
– Daniel Keys Moran, computer programmer and science fiction writer.

Since the invention of the first computer, the world has been on a digital light-speed journey – one that has seen massive change in how we interact with our world and with each other. Today, there are more than 2.5 billion[i] smart phones carried in people’s pockets – each more powerful than the ones used to run the spacecraft that landed the first men on the Moon.[ii] In particular, how we interact with and gain insight from data has gone through an incredible transformation. We have evolved from relying on simple historical reporting – from the days of simple reporting to now, where tanker.

The Way It Was

Reporting has always been a critical element for a business to thrive and we have been accustomed to seeing our reports – our data – in fairly standard and historic terms. Let’s take a straightforward quarterly sales report at a consumer retail company, for example. Simple data, like units sold, prices received, cost of goods, volume of shipments and so forth, would be gathered and stored over a three-month period and then used to generate a few charts and graphs. Conclusions would be drawn from this static data and the company would shift strategy based on the conclusions.

Perhaps the conclusions were accurate and maybe they weren’t. Regardless, that’s how it’s been done for a long time: based on the data available.

The Way It Is

Today, the capability exists to break down data into far greater detail, do it in real-time and through disciplines like machine learning and artificial intelligence, draw highly focused and accurate conclusions not at the end of a business quarter but at the end of each day, and, in many cases, as it happens.

IoT Changes Shipping Industry – Reduces Risk and Cost

A client that operates a fleet of tankers equipped with IoT sensors wanted to move beyond its basic data reports and drill deeper into the technical data gathered aboard its vessels. Optimus utilized elements from Microsoft’s IoT Suite, including Azure Data Factory, to create visually appealing reports and dashboards that contained information gathered from thousands of sensors throughout the fleet.

The results meant a far more in-depth data analysis than the company had been getting, delivering more accurate insight for more accurate business decisions. When it comes to tankers, a simple mistake can cost millions in terms of lost time, environmental disasters, financial penalties, missed deadlines and more.

Optimus solved the client’s existing problem while building a platform for continuous improvement with data analysis using Microsoft Azure tools. Because the data can be aggregated in the cloud, the client can analyze greater amounts of data over an extended period of time, thus further enhancing their shipboard operational analysis and implementing global cost saving efforts as a result.

Now, a business can make highly informed decisions immediately and adjust accordingly. Of course, it’s not simply analyzing a few traditional data points, like sales; it’s analyzing where those sales took place, in which store locations, even in which aisles or departments, at what time of day, from which shelf the customer chose a purchase, what the customer’s likely income level is– in other words, the more highly specialized the data, the more highly specialized and precise the conclusions that can be drawn.

Because it’s possible to generate highly detailed data and analyze it from so many different perspectives, every sector of the economy is making use of data analysis.

In the manufacturing sector, factory operations are being revolutionized[iii] by both big data and analytics. Sensors generate endless streams of data on the health of production line equipment, data that’s being examined by the minute for the slightest indication of a potential problem or defect. Conclusions are drawn and actions implemented immediately to avoid any breakdown and disruption in the production process. There’s a positive ripple effect to this: customers don’t experience delays and the company doesn’t experience a loss of revenue.

The virtually unlimited storage capacity in the cloud, coupled to highly sophisticated computer algorithms that can perform serious analysis in, literally, seconds, is placing tremendous demands on data architects. Programs and applications must be agile enough to allow for updates, added features and improvements without delay. This has meant developing new architecture that can not only run a program at lightning speed but can be altered or updated in the areas where it needs improvement, much like making incremental improvements to a car model but without re-designing the whole car every time.

Gone are the days of a monolithic software structure where data warehouses needed a year or more to be designed and several more months for data to be inputted. If missing data was discovered, it would mean an entire rebuilding of the program.

Microservices and Teams

Today, Optimus Information designs architecture so that updates, changes or improvements can be made to one area of a program or application without having to open up the whole program. By using microservices in our software development, Optimus has created functional teams whose responsibility is to just one area of a program. A team focuses only on its specific area and generates improvements without impacting other teams or resulting in an overhaul of an entire software product. Tremendous amounts of time are saved for our clients and the cost of updates or re-designs is driven down dramatically.

Optimus applies the same method to data gathering. By means of advanced tooling, our clients can store raw data, without pre-aggregating it, run a query on that raw data and have the answers they need in a matter of seconds. Previously, it would take weeks to get a result because the data would have to be assessed and compartmentalized as it was gathered and placed into structured environments before a query could be run. This is what we call modern data warehousing. The focus is on agility and speed.

Down the Road from Microsoft by Design

Optimus specializes in working with IT departments of companies that don’t or can’t spend the time and money to develop the cloud-based software architecture needed today. Optimus uses a suite of leading edge services, on the Microsoft Azure platform, that allow us to select exactly the right components to solve a client’s problem. We are physically located close to Microsoft’s Vancouver and Redmond development centres

Optimus is a Microsoft Gold Partner and, in that role, we work very closely with Microsoft on new product previews and trials that are in development, giving feedback that improves our customer’s end product. Optimus employees have often already kicked the tires on new Azure features before they are released. This keeps us at the forefront of rapidly changing technology but let’s us give feedback as enhancements are designed.

If you want to enhance and sharpen the results of your data analysis, we invite you to contact us. We are happy to explore some “what-if” scenarios with you to help propel your data insights – and your business – forward exponentially. Reach out and schedule a virtual coffee anytime.

Game Changers: The Role of Big Data in the Future of Credit Unions

In 2002, Billy Beane was the manager of the Oakland Athletics in Major League Baseball. Oakland was a small market club with a similar sized budget and it struggled to be competitive.

Because Oakland didn’t have the money of big market teams like the New York Yankees or Los Angeles Dodgers, Beane knew he couldn’t hope to attract the high-priced talent – the superstars – to play in Oakland.

Enter Paul Depodesta, aged 27, an economics graduate from Harvard, with an analytical mind and a love of baseball. His arrival on the doorstep of the Oakland A’s gave birth to data analysis in professional sports.

He analyzed player stats, using computer algorithms, and his results allowed Oakland to sign inexpensive players that other teams dismissed. The A’s were propelled into the stratosphere of success, thanks to big data.

The A’s finished the 2002 season with 103 wins, the same number as the New York Yankees – but with a budget about a tenth the size.

This is the “secret sauce” in data analytics: the ability to take substantial amounts of information – in the case of Oakland, endless baseball player statistics – look for patterns and capitalize on what is found.

Credit Unions, Machine Learning and Data Analytics

Credit unions in Canada are rapidly embarking on the same exploration. Using machine learning and data analytics, these financial firms are finding ways to improve service to their clients while, at the same time, discovering nuggets of information from the vast amounts of data they collect, that can then be turned into business opportunities.

Virtually every customer transaction within a credit union is electronic, and the amounts of data being collected are staggering. The need to analyze this information is what drives credit unions today to embrace machine learning and data analytics.

Matthew Maguire is the Chief Data Officer at Co-Op Financial Services, a California-based company that operates an interlinked system of ATM machines throughout the U.S. and Canada. He argues that machine learning and data analysis are critical for mid-sized credit unions as they work to reinforce current customer relationships and build new ones.

“Data is coming in from different places and the challenge is… how do you make it all connect?[i]” he said.

Credit unions are moving quickly into data analysis. Through machine learning, which unearths customer transaction patterns by using algorithms, credit unions are learning a great deal about their customers and are designing strategies to capitalize on that in order to drive sales.

But, for credit unions, data enables other capabilities. Patterns of fraud can be easier to spot and shut down through data analysis.

When a client invests with a credit union, regulations require the client to complete what’s called a Know Your Client form, which essentially draws a profile of risk tolerance and investment objectives. If the client’s portfolio strays from that profile and becomes riskier, big data can alert the financial institution and the problem can be corrected before any monetary loss accrues to the client – or to hundreds of thousands of clients.

Chris Catliff is the president and CEO of Blueshore Financial, a B.C.-based credit union with more than $3 billion in assets. His vision of the future of credit unions is predicated on the power of data analytics in combination with machine learning.

He envisions the day very soon when a client approaching a branch receives a text message saying the client is already checked in at the branch. As they walk through the door, their customer profile and picture pop up on a screen [ii] at a concierge desk and they’re greeted by name.

Blueshore’s ATM machines will respond to a customer’s biometrics and offer a transaction based on a pattern of previous transactions. Up-sell opportunities will present themselves, so staff can suggest options – situations that might never occur without data analysis.

Service, he said, “has to be electronic transactions with the introduction of superior, human touch at various critical points. It’s high tech and high touch.”

Explore Your Data Potential

Like the members they serve, every credit union is unique. It is imperative for a credit union to work with data specialists who can marry the individual needs of each credit union with high levels of expertise across big data, data analysis and machine learning.

One of our strengths here at Optimus is our track-record in the areas of data gathering, analysis, machine learning, dashboarding and data visualization, through which we help our clients tailor data mining and analysis to their business goals.

At the end of the day, it’s all about staying competitive and, like the Oakland Athletics, reaching the pinnacle of success by embracing and employing new strategies to achieve that success.

 

[i] https://www.pymnts.com/big-data/2018/credit-unions-big-data-authentication-aml-kyc/
[ii] http://enterprise-magazine.com/features/betting-big-on-big-data/

 

4 Ways Azure is Rising to Meet Data Warehouse Demands

In today’s data-first world, IT infrastructure is the foundation for strategic decision-making, with companies requiring larger quantities in shorter periods of time. This is putting the traditional data model – where data from systems like CRM, ERP and LOB applications are extracted, transformed and loaded (ETL) into the data warehouse – under pressure. The problem is compounded by increased data volumes from social apps, connected devices (IoT) and emerging sources of data.

The need to gather data from traditional, transactional systems, like ERP, CRM and LOB, and then integrate this data with social, mobile and connected devices has driven the adoption of big data storage technologies such as Hadoop. At Optimus, we’re finding more and more users demand predictive, real-time analytics to make use of their data, something that can’t be done with traditional data warehouse tools. Consequently, organizations are considering cloud-based solutions such as Azure to transform their data warehouse infrastructure.

Microsoft knows this, and are growing their solution portfolio accordingly. Below are four ways in which Microsoft Azure is adapting to meet the demands of today’s modern data warehouse.

1. Consistently High-Performance for all Volumes of Data

Microsoft is working to solve the problem of achieving high levels of performance for large datasets through MPP technologies, in-memory columnstore and optimizations on core query engine. In particular, Optimus is seeing SQL Server emerge as a leader in performance and scalability. SQL Server supports a large number of cores with complex vector instructions while holding terabytes of memory and contains local flash storage that provides high I/O bandwidth. When optimized for inherent parallelism and concurrency, it is not uncommon for users to outperform large distributed databases.

In one example, Microsoft and Intel teamed up to create a 100 terabyte data warehouse using a single server, four Xeon E7 processors and SQL Server 2016. According to the report, “The system was able to load a complex schema derived from TPC-H at 1.6TB/hour, and it took just 5.3 seconds to run a complex query (the minimum cost supplier query) on the entire 100TB database.”

2. Storing Integrated Data

Companies are looking for ways to store integrated – both relational and non-relational – data of any size, type and speed without forcing changes to applications as data scales.

Enter the Azure Data Lake Store. Data Lake makes it simple for everyone, from analysts to developers and data scientists, to access, add and modify data, regardless of its state.

Facilitating all of this is Azure HDInsight, a cloud-based Hadoop and Spark cluster. HDInsight lets your team create analytic clusters, manipulating data into actionable insights. In addition to a fully managed Hadoop service, Microsoft has included PolyBase in HDInsight, which provides the ability to query relational and non-relational data in Hadoop with a single, T-SQL-based query model.

3. Built with Hybrid Data Storage at the Core

While the cloud continues to gain popularity, companies are realizing that they still need to keep at least some information on-premises. Microsoft is acutely aware of this and has built Azure accordingly. Their data warehousing and big data tools are designed to span on-premises and cloud warehouses. Microsoft’s hybrid deployment is designed to provide the control and performance of on-premises with the scalability and redundancy of the cloud. Optimus is seeing users access and integrate data seamlessly, while leveraging advanced analytics capabilities, all through Azure.

4. Machine Learning and Big Data in Real-Time

Traditional advanced analytics applications use outdated methods of transferring data from the warehouse into the application tier to procure intelligence, resulting in unacceptably high latency and little scalability.

In contrast, Microsoft has transformed integrated analytics with machine learning in the cloud. The Cortana Intelligence Suite, coupled with R Server, can be deployed both on-premises with SQL Server and in the cloud with HDInsight. The resultant solution is one that solves for hybrid, scales seamlessly and enables real-time analytics.

There are many factors driving companies to consider an Azure Cloud data warehouse migration. To learn more, check out our e-Book, Building a Modern Data Warehouse on Azure.

Does Your Data Warehouse Belong in the Azure Cloud? Here are Some Things to Consider

It’s no secret: Microsoft Azure is hot right now. This is demonstrated by their 97% growth in Q2 2017. With more organizations migrating their data infrastructure to the cloud every day, some companies are asking themselves: does my data warehouse belong in Azure? While there’s no simple answer to this question, there are some ways in which you can begin to assess your current data infrastructure’s suitability for an Azure Cloud migration.

The Cost Factor

The team at Optimus has found cost to be one of, if not the top driver for cloud adoption. There are several factors businesses should consider where cost in the cloud is concerned:

  • If your business is cyclical (i.e. retail with high volume throughout the holiday season), the cloud pay-as-you-go model makes strong financial sense. Cyclical companies can burst to the cloud when they need to, saving them from buying new servers that may only be required a few weeks per year. Conversely, it may not be cost effective to move workloads that are required to run at a stable level 24/7/365 to the cloud, especially if they are running on equipment that does not need upgrading in the foreseeable future.
  • At Optimus, we have found that many organizations prefer opex over capex. Opex tends to be easier to manage over the long term, especially for fast-growing businesses where a significant capex could stall growth. The more a business transitions to the Azure pay-as-you-go model, the more they shift their data warehouse costs from a capex to an opex.
  • The apportioning of data costs across departments is significantly simplified in Azure. Pricing for individual workloads is made transparent, and data usage is easily tracked.

When considering leveraging Azure for your data warehouse, it is important to remember that a cloud migration is not an all-or-nothing endeavour. Every business will have certain workloads that make financial sense in the cloud and certain workloads that should remain on-premises. Perform an accurate assessment of your current data infrastructure to determine your cloud suitability.

What are Your Data Governance Requirements?

Meeting data governance and regulatory requirements is at the forefront of the mind of anyone considering an Azure migration, and for good reason. Moving an on-premises legacy data infrastructure to the cloud is a difficult landscape to navigate.

Your industry may determine your suitability for an Azure Cloud data warehouse migration. Certain sectors, such as financial and healthcare, have strict data governance laws to comply with. You need to make sure your – and your client’s – data remains within certain jurisdictions, something that may prove challenging and will influence your choice of what data to move to Azure.

Do you need to retain control over user authentication? If yes, you’ll need to look at the feasibility of this with various applications. Your service provider will be able to assess this with you and make the right recommendations.

Latency: Still a Consideration?

The short answer is yes. In particular instances where the speed of data transaction is mission-critical, an internal data warehouse may be required. This is common in the financial industry, where trading companies are under increasing pressure to host their servers physically close to a stock exchange’s computers. In an industry where transactions are conducted in microseconds, speed is priority number one.

While Azure has made significant improvements to latency times, the fact remains that the closer two computers are to each other, the faster they can communicate. At Optimus, we have seen companies with these types of operational concerns benefit from leaving some of their data on-premises. However, because the amount of data required to perform at a high level is typically small, leveraging the public cloud is still a viable option for most organizations.

There are many factors to keep in mind when considering a data warehouse migration to Azure. To learn more, check out our e-Book, Building a Modern Data Warehouse on Azure.

Infographic – The Modern Data Warehouse Framework

Check our latest infographic, The Modern Data Warehouse Framework!

As organizations are collecting and processing increasing amounts of data from a growing number data sources, data systems must evolve to keep up. In order to make the best data-driven decision possible, you need to reimagine the way you look at data warehousing.

We took a look at how to transition your data warehouse to the cloud and put together our top 8 recommendations for building a modern data warehouse on Azure.

 

Download the PDF here

 

The-Modern-Data-Warehouse-Framework Infographic - The Modern Data Warehouse Framework

Infographic – The Modern Data Warehouse Framework

 

 

 

Power BI for Mobile: Take Your Data on the Road

One area where the Power BI software stack really shines is the mobile space. The Power BI product line includes three apps: one for iOS, Windows Phone and Android. These apps allow you to take anything you can generate in Power BI and make it readily available to any stakeholder with a mobile phone or tablet. With a couple swipes, users can quickly interact with all your analysis. Power BI allows you to bring together the advantages of mobile devices, big data systems and compelling visualizations in a way that permits everyone involved to make better decisions.

The Power of the Dashboard

It’s one thing to produce an informative chart, but it’s quite another to deploy a fully interactive dashboard that can fetch real-time updates. Power BI permits you to tie together data from a variety of sources, including numerous non-Microsoft products. For the end user, the guy in marketing who just needs to see today’s report, the component that makes it all accessible is the dashboard.

Power BI dashboards allow you to publish any type of common infographic, geospatial information or visualization. If you need a bubble chart that displays the YTD performance of your company’s retail outlets, there’s an out-of-the-box solution for that with Power BI. It also allows you to create maps and overlay existing information onto those maps. Instead of just seeing that Store #325 is performing well, an app user can pull up the dashboard and see on the map whether that’s a one-off phenomenon or a regional trend.

Making Information Accessible

In the world of data analytics, a lot of work goes into empowering decision makers who may not have strong technical background. It’s extremely beneficial to give those people an app that allows them to quickly sort through the available data in a clear format. If your buyers can quickly bounce between multiple years’ worth of data and make comparisons, they can make important decisions faster.

Power BI also allows you to determine how the dashboard accesses the available information. Rather than simply presenting users a static set of reports, you can configure queries that allow them to sift through in a self-guided fashion. If someone needs access to a real-time inventory report, your dashboard can be configured to fetch that information from the company’s SQL Server installations. This allows members of your organization who might not be data scientists to rapidly develop insights that can guide their choices. 

Cross-Platform Compatibility

Keeping everyone in your business on the same page can be a challenge. Microsoft has gone to great lengths to ensure that the Power BI apps display information faithfully on every platform and function in a similar fashion. Our hypothetical data scientists in our example will have no trouble grabbing an art department iPhone and finding everything they need.

Data Sources

Any data source that can be access inside Office or Power BI can be presented within the app’s dashboard. If you need to present data from an Excel sheet in an appealing manner to someone on the other side of the planet, the app can make that happen. It also allows you to connect to commonly used data sources, such as SQL Server Reports, and outside sources, such as Google Analytics, Salesforce or MailChimp. You can even mix and match functionality, for example, overlaying Salesforce data on Google Maps.

Conclusion

Business intelligence is about putting the right information in the rights hands and in a format that makes a visually compelling case. Your company will likely invest a lot of effort in the coming years into producing analysis and generating insights. With Power BI’s mobile app, you can ensure that the people who need those insights have access to them with the touch of a finger. The app allows you to pass along analysis to stakeholders in a secure environment that makes interacting with the data easy. In short, it makes all your data analytics faster, more appealing and more accessible.

If you have questions about getting started with Power BI or want to push the toolset further, give us a call. We’re always happy to answer any questions.

 

 

Building a On-Site Data Warehouse Using Microsoft Stack

Most IT specialists are aware of individual components within SQL Server and may use one or two of them regularly for data analytics or general operations. Outside of IT, others may be familiar with one of several analytics-oriented SQL Server add-ons such as those for Excel that are targeted at data analysis or data mining in general.

It is likely that far fewer staff, if any, have combined SQL Server components into a complete data analytics stack that enables the organization to build a high-performance data analytics solution using on-premise and external data sources that deliver meaningful analytics across the enterprise.

Fortunately, Microsoft provides all the tools and services necessary for a traditional, on-site data analytics system. This article outlines these components and their general usage. Of necessity, it cannot cover all the intricacies that may arise out of business-specific requirements.

Normalized vs. De-Normalized Databases

Relational databases use a normalized data organization optimized for rapid insert and update operations while minimizing data reads, which makes them ideal for OLTP processing. These types of databases are not suitable for data warehouses, which use an OLAP design that de-normalizes the data. This organization is optimized for the high-volume read performance required for analytical processing.

Other Reasons to Use a Separate Data Analytics Database

In addition to optimizing read performance, creating a separate OLAP database versus utilizing the OLTP corporate database has several advantages:

  • Analytical data queries requiring summarization of large data quantities will not bog down corporate database servers that collect related data items from disparate resources
  • Historical data can be cached rather than archived, which is typical on OLTP databases
  • You are free to consolidate data from several sources including off-premise databases, social media sites or a variety of document formats such as XML or Excel.
  • Cleansing and optimizing your data for analysis will be automated using your business-specific rules, queries and schedule.

The Microsoft Data Analytics Stack

SQL Server 2016 contains the entire suite of components necessary to build your first data warehouse. The three critical components are Integration Services, Analysis Services and Reporting Services.

SQL Server Integration Services (SSIS)

SSIS is SQL Server’s ETL component that connects to your data sources, migrates data, restructures it according to business-specific rules and populates the database. It manages acquiring data from multiple sources, cleaning this data plus performing type conversions and any other transformations required to streamline analysis. SQL Server Agent permits automatic scheduling of these processes so the database always contains the most relevant and timely data.

Use the particular business problems you are solving to guide ETL preparation and specification. A user-centric approach pares down and organizes the data being collected for the most efficient extraction and presentation.

SQL Server Analysis Services  (SSAS)

SSAS is critical to creating the most useful and highest-performing database queries. This component transforms normalized or relational data into multidimensional database “cubes” that are tuned to deliver rapid results to ad hoc queries. It does so through data indexing, data aggregation and enabling complex data calculations to be embedded at the cube level where processing is most efficient.

SQL Server Reporting Services (SSRS)

SSRS is used to make queries on an SQL Server database as well. It offers everything required to build detailed, complex reports with visualizations and facilities to manage and distribute reports securely. Users may view reports via the SSRS Web Portal, Power BI or a SharePoint server. End users are able to import reports directly to Excel or PowerPoint.

SQL Server Data Tools

This package works within Visual Studio (VS) and provides development tools for all the integration, analysis and reporting steps outlined previously. It is free and downloads a VS shell if you do not already have VS installed. SSDT lets you design and build your OLAP database and SQL Server content in much the same manner as you would for a VS application. It replaces the previous Business Intelligence Development Studio.

Conclusion

Microsoft’s SQL Server plus SSDT support development of a complete data analytics analysis system that includes an efficient OLAP database, optimized data schema and a full pipeline of data acquisition, pre-processing, analysis, delivery and visualization. Many organizations without a current data analytics solution may not have expertise in all the SQL Server components required, and those that do have typically only applied their skills toward OLTP type solutions.

Thus, many businesses employ an experienced partner for architecting a data analytics system that is scalable, extendable and highly secure. The most capable consultants are able to train and guide in-house staff through the entire development and maintenance lifecycle such that the enterprise becomes self-sufficient at further expanding and optimizing the system.

If you need help or advice on your data analytics system, let us know. Our team of data experts have worked with businesses large and small across all industries and have the right specialized technical skills to guide you towards a solution.

 

 

 

Data Analytics in the Cloud: Where to Start?

An enterprise-wide data analytics system pulling data from multiple sources, correlating and presenting results in relevant, insightful visualizations to enable prompt, informed decisions is the dream of many organizations. Those in the lead already reap the benefits of faster, high-accuracy, proactive decisions that data analytics provides.

Getting to that point requires exquisite planning and execution by stakeholders from top to bottom and across departments in order to achieve an implementation that is useful, maintainable and flexible enough to handle ever-improving data analytics technology and an increasingly competitive business environment.

Don’t Go Too Big Too Fast

Data analytics systems are most valuable when shared across the organization. Therefore, cross-departmental input and commitments are vital as well as a high degree of project transparency. Collecting requirements and then creating an all-at-once solution months or quarters later is courting disaster. That sometimes works for limited software development projects, but data analytics initiatives necessarily demand a much larger scope in both time and space.

Adopt an incremental mindset from the start by applying a less risky phased and flexible tack to your data initiative’s development. The project should produce regular and continuous adjustments to functionality, features and metrics without unnecessary thrash. This paradigm is most likely to produce a quality product with high acceptance across the business.

Gain Executive Stakeholder Buy-In

With the correct attitude regarding initiative progression, gain C-Suite buy-in via detailed assessment and quantification of business needs to which data analytics capabilities can add measurable value. These come from conversations with department heads, people managers, project managers and line workers involved in operational activities that probably benefit from insights provided through readily accessible business analytics.

Collect Technical Requirements

After executive endorsement, pinpoint technical needs, features and functions required to create a system meeting the project’s strategic goals. These requirements are derived from various operational aspects:

  • Existing data analytics-related processes, e.g. CRM, and their supporting software and infrastructure
  • Identifying existing data sources and creating a baseline of the what, when and how of data storage and processing
  • Where applicable, data sharing patterns, especially where data transformation steps are required
  • The current collection of in-house, open source or cloud-based tools and services utilized

Turn Technical Requirements into KPIs

Concurrently with technical requirements acquisition, work closely with stakeholders to develop meaningful metrics and KPIs. Examples include metrics around analytics data acquisition, storage and cleaning. Marketing metrics might measure campaign response. High-interest sales metrics centre on conversions and ROI, whereas support metrics include customer satisfaction and retention. Be open to innovative metrics that a new data analytics system could enable.

Ask for Resource Commitments

While collaborating on KPIs, initiate frank discussions with regard to workers and material that stakeholder departments or teams are willing to provide to the project. The benefits of such commitments should already have been incorporated into specific KPIs that benefit them.

Choosing a Software Model

Inconsistent use of software models, such as open source, in-house or cloud-based SaaS is common in companies. This often results from an organic acquisition of tools and projects over time. Your data analytics implementation will not correct old problems, but its choice of software model should be based on technology availability, costs and the ability to scale and adapt as your company expands.

For instance, even with a strong in-house IT development capability, the benefits of basing your data analytics implementation on cloud-based SaaS are compelling.

First of all, removing the constraint of higher capital needs and their approval alone makes a forcible argument for choosing pay-as-you-go cloud SaaS. Furthermore, this complements your phased approach as infrastructure and services are easily scaled and maintenance is automatic. Finally, today’s cloud SaaS from the best providers is fully customizable, which enables rapid functionality development and ease of modification during ongoing development.

Additional Tips

  • Expect dirty data, especially from social media, and deal with it at the source where possible. Employ tools such as import.io, diffbot and ScraperWiki in this battle. Especially during testing, consider importing customized, on-demand data sets. 
  • Be sure data analytics dashboards and reports are highly customizable but easy to learn. This area is the face of your initiative for the majority of users. Also, ensure dashboard functionality works for your mobile users.
  • Build in extensibility. This means anticipating new data sources and leaving room for the latest in predictive analysis technology.
  • If you are using a phased, results-oriented approach, you will have plenty of opportunities to celebrate small victories. Relish and share these milestones.

Conclusion

Data analytics have a proven track record of providing enterprises with streamlined and on-target decision-making improvements based on “right-time” data flows from inside and outside the company. Implementing the best system for your company requires intense and thorough planning followed by step-wise development and deployment.

Realize that even as your project begins achieving its end goals that ongoing business needs and changing markets call for continued growth of your data analytics capability. If you already chose cloud-based SaaS as your software core, then the next growth and adjustment phase will be much easier than the first, especially if you stick to your iterative development paradigm.

If you have questions about how to get started working in the cloud, let us know. We’re happy to share our knowledge and set you on the right path.

 

 

Automated Testing for SSRS Reports

Motivations for SSRS Report Testing

Both data warehouse developers and end users of data analytics reports have a keen interest in the accuracy and appearance of data content. As SSRS reports are being developed, they are typically tested piecemeal during construction and as a whole when completed for both these aspects.

However, it is always possible to overlook certain report items or their representation after report deployment. Furthermore, issues with changes in data sources or the ETL flow may introduce anomalies or errors that affect data values or their presentation at a later date.

SSRS Report Testing Automation Issues

To increase report testing efficiency and accuracy, automated testing is key during both development and maintenance of SSRS reports, especially for organizations utilizing dozens or hundreds of reports on a regular basis. Automation can decrease development time and play an important role in correcting discrepancies in report data post-deployment, which could otherwise negatively impact confidence in the data warehouse.

The complexity of interactions within SSRS itself and the many other components of SQL Server, however, make the creation of fully automated testing a tricky business. Furthermore, the high-degree of customization possible for SSRS reports implies that customized testing approaches are probably required in addition to standardization of report content and layout wherever feasible.

SSRS Testing Areas for Automation

Unit Testing

There are a large number of characteristics that can be tested during development via unit tests. Unit tests can migrate to test suites for troubleshooting post-deployment bugs quickly. These are a few example coverage areas for such automated tests:

  • If data is not reaching a report, validate the data source, check that the correct mappings to the dataset are being used and that each column’s Visibility property is set correctly.
  • If data is not present and the report uses stored procedures, validate the SP parameters, their data types, conversions and formatting requirements.
  • The failure of data to group as expected can be tested by examining the grouping expressions and that the Aggregate function is applied to Numeric columns.
  • Create a query test framework that can take as input individual queries or query scripts. Such a framework would contain validated comparison datasets including metadata to run against actual queries/datasets with each report’s specific data source, parameters or configuration.
  • Test report rendering into formats used by the organization. Initial tests require manual inspection of results, but subsequent tests could be automated using input reports under version control. The most useful of these tests is to output the report in XML format, which is likely to be complete, free of data conversions and most amenable to automated comparison tests.

A possible starting point for building a permanent RS unit testing framework can be found here: http://rsunit.codeplex.com/

Layout Testing

Automating report layout tests typically presents the greatest difficulties. If your organization uses the SSRS web portal, however, you can take advantage of a number of web UI automation tools that facilitate such testing.

Selenium Web-Driver is a free tool for testing web page layout and functionality, which works with Firefox, IE, Chrome and Safari. Automation test scripts are written in either Java, C#, Ruby, Python or JavaScript. Scripts utilize the Web-Driver API which invokes a live browser or runs headless.

Other UI-based testing tools are also available such as the open-source Sikuli, freeware AutoIt or TestComplete, which is a proprietary tool by SmartBear.

Conclusion

Test automation has its limits, of course. This is especially so with regard to SSRS reports testing. For instance, usability testing is clearly out of scope for automation. Automation is further complicated in SSRS, since data is often manipulated beneath the covers beyond control of the report itself. For example, data may be manipulated outside of queries in embedded VB.Net code or in external class libraries.

Even so, automating report testing wherever feasible always pays off. Choose the low-hanging fruit first before expanding into other areas, such as troubleshooting suites and GUI-based layout testing. As you progress, you are likely to find that developing a mindset aimed at automation frequently instills a virtuous cycle of discovering new opportunities for further test automation.

In SSRS testing areas that present the most difficult obstacles, employ third-party tools or employ an experienced automation consultant who can demonstrate automation methods most appropriate for your SSRS development and usage scenarios.