Building a On-Site Data Warehouse Using Microsoft Stack

Most IT specialists are aware of individual components within SQL Server and may use one or two of them regularly for data analytics or general operations. Outside of IT, others may be familiar with one of several analytics-oriented SQL Server add-ons such as those for Excel that are targeted at data analysis or data mining in general.

It is likely that far fewer staff, if any, have combined SQL Server components into a complete data analytics stack that enables the organization to build a high-performance data analytics solution using on-premise and external data sources that deliver meaningful analytics across the enterprise.

Fortunately, Microsoft provides all the tools and services necessary for a traditional, on-site data analytics system. This article outlines these components and their general usage. Of necessity, it cannot cover all the intricacies that may arise out of business-specific requirements.

Normalized vs. De-Normalized Databases

Relational databases use a normalized data organization optimized for rapid insert and update operations while minimizing data reads, which makes them ideal for OLTP processing. These types of databases are not suitable for data warehouses, which use an OLAP design that de-normalizes the data. This organization is optimized for the high-volume read performance required for analytical processing.

Other Reasons to Use a Separate Data Analytics Database

In addition to optimizing read performance, creating a separate OLAP database versus utilizing the OLTP corporate database has several advantages:

  • Analytical data queries requiring summarization of large data quantities will not bog down corporate database servers that collect related data items from disparate resources
  • Historical data can be cached rather than archived, which is typical on OLTP databases
  • You are free to consolidate data from several sources including off-premise databases, social media sites or a variety of document formats such as XML or Excel.
  • Cleansing and optimizing your data for analysis will be automated using your business-specific rules, queries and schedule.

The Microsoft Data Analytics Stack

SQL Server 2016 contains the entire suite of components necessary to build your first data warehouse. The three critical components are Integration Services, Analysis Services and Reporting Services.

SQL Server Integration Services (SSIS)

SSIS is SQL Server’s ETL component that connects to your data sources, migrates data, restructures it according to business-specific rules and populates the database. It manages acquiring data from multiple sources, cleaning this data plus performing type conversions and any other transformations required to streamline analysis. SQL Server Agent permits automatic scheduling of these processes so the database always contains the most relevant and timely data.

Use the particular business problems you are solving to guide ETL preparation and specification. A user-centric approach pares down and organizes the data being collected for the most efficient extraction and presentation.

SQL Server Analysis Services  (SSAS)

SSAS is critical to creating the most useful and highest-performing database queries. This component transforms normalized or relational data into multidimensional database “cubes” that are tuned to deliver rapid results to ad hoc queries. It does so through data indexing, data aggregation and enabling complex data calculations to be embedded at the cube level where processing is most efficient.

SQL Server Reporting Services (SSRS)

SSRS is used to make queries on an SQL Server database as well. It offers everything required to build detailed, complex reports with visualizations and facilities to manage and distribute reports securely. Users may view reports via the SSRS Web Portal, Power BI or a SharePoint server. End users are able to import reports directly to Excel or PowerPoint.

SQL Server Data Tools

This package works within Visual Studio (VS) and provides development tools for all the integration, analysis and reporting steps outlined previously. It is free and downloads a VS shell if you do not already have VS installed. SSDT lets you design and build your OLAP database and SQL Server content in much the same manner as you would for a VS application. It replaces the previous Business Intelligence Development Studio.

Conclusion

Microsoft’s SQL Server plus SSDT support development of a complete data analytics analysis system that includes an efficient OLAP database, optimized data schema and a full pipeline of data acquisition, pre-processing, analysis, delivery and visualization. Many organizations without a current data analytics solution may not have expertise in all the SQL Server components required, and those that do have typically only applied their skills toward OLTP type solutions.

Thus, many businesses employ an experienced partner for architecting a data analytics system that is scalable, extendable and highly secure. The most capable consultants are able to train and guide in-house staff through the entire development and maintenance lifecycle such that the enterprise becomes self-sufficient at further expanding and optimizing the system.

If you need help or advice on your data analytics system, let us know. Our team of data experts have worked with businesses large and small across all industries and have the right specialized technical skills to guide you towards a solution.