Data Analytics in the Cloud: Where to Start?

Where should you start when doing data analytics in the cloud?

An enterprise-wide data analytics system pulling data from multiple sources, correlating and presenting results in relevant, insightful visualizations to enable prompt, informed decisions is the dream of many organizations. Those in the lead already reap the benefits of faster, high-accuracy, proactive decisions that data analytics in the cloud provides.

Getting to that point requires exquisite planning and execution by stakeholders from top to bottom and across departments in order to achieve an implementation that is useful, maintainable and flexible enough to handle ever-improving data analytics technology and an increasingly competitive business environment.

Don’t Go Too Big Too Fast

Data analytics systems are most valuable when shared across the organization. Therefore, cross-departmental input and commitments are vital as well as a high degree of project transparency. Collecting requirements and then creating an all-at-once solution months or quarters later is courting disaster. That sometimes works for limited software development projects, but data analytics initiatives necessarily demand a much larger scope in both time and space.

Adopt an incremental mindset from the start by applying a less risky phased and flexible tack to your data initiative’s development. The project should produce regular and continuous adjustments to functionality, features and metrics without unnecessary thrash. This paradigm is most likely to produce a quality product with high acceptance across the business.

Gain Executive Stakeholder Buy-In

With the correct attitude regarding initiative progression, gain C-Suite buy-in via detailed assessment and quantification of business needs to which data analytics capabilities can add measurable value. These come from conversations with department heads, people managers, project managers and line workers involved in operational activities that probably benefit from insights provided through readily accessible business analytics.

Collect Technical Requirements

After executive endorsement, pinpoint technical needs, features and functions required to create a system meeting the project’s strategic goals. These requirements are derived from various operational aspects:

  • Existing data analytics-related processes, e.g. CRM, and their supporting software and infrastructure
  • Identifying existing data sources and creating a baseline of the what, when and how of data storage and processing
  • Where applicable, data sharing patterns, especially where data transformation steps are required
  • The current collection of in-house, open source or cloud-based tools and services utilized

Turn Technical Requirements into KPIs

Concurrently with technical requirements acquisition, work closely with stakeholders to develop meaningful metrics and KPIs. Examples include metrics around analytics data acquisition, storage and cleaning. Marketing metrics might measure campaign response. High-interest sales metrics centre on conversions and ROI, whereas support metrics include customer satisfaction and retention. Be open to innovative metrics that a new data analytics system could enable.

Ask for Resource Commitments

While collaborating on KPIs, initiate frank discussions with regard to workers and material that stakeholder departments or teams are willing to provide to the project. The benefits of such commitments should already have been incorporated into specific KPIs that benefit them.

Choosing a Software Model

Inconsistent use of software models, such as open source, in-house or cloud-based SaaS is common in companies. This often results from an organic acquisition of tools and projects over time. Your data analytics implementation will not correct old problems, but its choice of software model should be based on technology availability, costs and the ability to scale and adapt as your company expands.

For instance, even with a strong in-house IT development capability, the benefits of basing your data analytics implementation on cloud-based SaaS are compelling.

First of all, removing the constraint of higher capital needs and their approval alone makes a forcible argument for choosing pay-as-you-go cloud SaaS. Furthermore, this complements your phased approach as infrastructure and services are easily scaled and maintenance is automatic. Finally, today’s cloud SaaS from the best providers is fully customizable, which enables rapid functionality development and ease of modification during ongoing development.

Additional Tips

  • Expect dirty data, especially from social media, and deal with it at the source where possible. Employ tools such as, diffbot and ScraperWiki in this battle. Especially during testing, consider importing customized, on-demand data sets. 
  • Be sure data analytics dashboards and reports are highly customizable but easy to learn. This area is the face of your initiative for the majority of users. Also, ensure dashboard functionality works for your mobile users.
  • Build in extensibility. This means anticipating new data sources and leaving room for the latest in predictive analysis technology.
  • If you are using a phased, results-oriented approach, you will have plenty of opportunities to celebrate small victories. Relish and share these milestones.


Data analytics have a proven track record of providing enterprises with streamlined and on-target decision-making improvements based on “right-time” data flows from inside and outside the company. Implementing the best system for your company requires intense and thorough planning followed by step-wise development and deployment.

Realize that even as your project begins achieving its end goals that ongoing business needs and changing markets call for continued growth of your data analytics capability. If you already chose cloud-based SaaS as your software core, then the next growth and adjustment phase will be much easier than the first, especially if you stick to your iterative development paradigm.

If you have questions about how to get started working in the cloud, let us know. We’re happy to share our knowledge and set you on the right path.


(Note: This blog post has been updated with new information.)


Cloud Data Warehousing with Azure SQL

If you are running an on-premise data analytics stack on Microsoft’s SQL Server, but running into maintenance, cost and scaling issues, you can consider moving your data system to a cloud-based database service such as Azure SQL Database. Especially for your first data analytics stack, Azure SQL Database provides low startup costs with the ability to easily expand as business grows.

Advantages of Azure SQL Database

There are several benefits to moving on-premise SQL Server infrastructure to Azure:

  • Physical acquisition, provisioning and maintenance of SQL Server deployments are a thing of the past. Furthermore, decreasing or increasing data infrastructure is instantaneous with SQL Database elastic pools.
  • Azure assists existing database migration to Azure SQL Database with wizard-based tools.
  • All stored and transmitted data are encrypted via client-side keys.
  • Microsoft accommodates third-party and open-source technologies, such as Python, Java, node.js, PHP and Python.
  • SQL developers feel right at home using SQLCMD or SQL Server Management Studio for development.

SQL Database Limitations

Although all SQL Server components, SSIS, SSAS and SSRS are available on Azure, there are still areas where the Azure version is not completely fleshed out. For instance, only a growing subset of T-SQL features are yet available such as cursors, transactions, triggers, all data types, all operators plus logical, arithmetic and string functions.

Additionally, many T-QSL statements in SQL Database do not support every option available in SQL Server 2016, such as CREATE/ALTER for databases, logins, tables, users and views. Collation of system objects, cross-database queries with three- or four-part names, database collector, diagrams and mail, some events and certain database characteristics that were managed manually in SQL Server but are automatic in SQL Database are also missing.

For a full list of deficiencies, see Azure SQL Database Transact-SQL differences

Additional Azure Capabilities

SSRS is actually replaced with a separate service, SQL Reporting, which incurs a separate charge for reports. It is not a general reporting service since it only works with SQL databases. It does offer a nearly identical development interface to traditional SSRS.

Azure Tables is a storage service targeted at non-relational database storage, which is a type preferred for data analysis processes. It stores up to 100TB of data via an Azure Storage account and supplies data in row form. Additional advantages include less cost than straight Azure storage and easy scaling.

Built on top of Hadoop, HDInsight offers unstructured data storage plus a number of tools, such as Sqoop, Pig and Hive for query processing. Your in-house SQL Server, Excel or SQL Database are all able to connect to this service.

Data Factory is Microsoft’s SaaS analogue to SSIS. It visually coordinates other services to transform raw, unstructured data via data flow pipelines into clean, transformed data ready for analysis engines such as HDInsight or Azure Machine Learning for predictive analytics.

In lieu of SQL Reporting, you can utilize Microsoft’s SaaS Power BI for report, dashboard and visualization creation. You can use this tool in conjunction with your on-premise SQL Server installation or stored spreadsheets too.

Steps to Migrating from SQL Server to SQL Database

SQL Database is, in theory, backward-compatible all the way to SQL Server 2005. In spite of this, the first step in migration is to test and fix any compatibility issues that may exist with SQL Database V12.

There are several methods to determine compatibility including the use of SQL Server Data Tools, the SqlPackage utility, SQL Server Management Studio’s Export Data Tier wizard and the Azure SQL Migration Wizard. SSDT, SSMS and SAMW can be used to fix any migration issues with your database also.

The next step is to create an Azure SQL Database logical server and migrate your existing data to it. Although other methods exist, the use of SQL Server transaction replication is the recommended solution since it minimizes live database downtime. Other solutions are to export/import BACPAC files when connection bandwidth is low or unreliable or use the SSMS database deploy wizard for smaller databases.

Running SQL Server in the Cloud Directly

There is nothing stopping you to begin or continue your SQL Server-based data analytics development and deployment without Azure SQL Database. Amazon AWS provides any level of SQL Server instantiation online with the advantages of computational, networking and storage elasticity on a pay-as-you-go basis. With a bit more lifting, you could do the same thing on Azure or AWS by utilizing their Virtual Machine services directly for your own SQL Server deployment.


Running data analytics in the public cloud brings all the usual benefits of cloud-based operation, the most important of which are elastic storage for big data crunching systems and high availability in-house or mobile across the enterprise.

Whether or not your business should consider a cloud-based data analytics deployment depends on several factors including TCO, data volume, bandwidth requirements, security and the need to scale operations up or down quickly.

A wise approach is to work with an experienced Optimus data analytics consultant to collate all factors and develop a full data architectural solution. Our experts can guide you towards the best solution for your needs.




SSIS Best Practices

What are the SSIS Best Practices?

SSIS represents a complete rewrite of its early predecessor Data Transformation Services. Usability, parallelism and performance have all been vastly improved over the years resulting in an SQL Server component aimed at high-volume, high-performance ETL applications. Here are several additional tuning steps that can further increase SSIS performance for your particular jobs.

Managing Parallel Execution

MaxConcurrentExecutables and EngineThreads are two SSIS properties used to optimize SSIS’ parallel execution capabilities. The first specifies the number of data flow tasks that are run in parallel by the SSIS runtime engine. Its default value is -1, which allocates an amount of threads equal to the number of processors/cores plus 2. If the value of MaxConcurrentExecutables is less than the number of parallel tasks available, only those number of tasks run simultaneously. When one completes, another is added.

EngineThreads limits the number of source and worker threads used to execute the data flow tasks in the data flow pipeline engine. Thus, its effect is “nested” within the overall data flow parallelism influenced by MaxConcurrentExecutables. Its default value is 10, which indicates up to 10 data source and 10 worker threads will be allocated.

In order to tune these values for an ETL job, turn on PipelineExecutionTrees event logging, which tells you how many execution trees are used per data flow task. Use this number to fine-tune the setting of EngineThreads.

One caveat here is that these values cannot override precedence constraints between tasks, which may decrease the amount of execution parallelism available.

Use Logging Judiciously

SSIS logging is very helpful in tracking down bugs in package code or for monitoring packages already debugged and deployed. It allows exclusive selection of events of interest and permits logging them to any of several destinations.

The LoggingMode property within a package enables and disables logging easily without package modification. Use it wisely as logging adds additional SSIS runtime overhead. Be aware also that a decrease in performance because of excessive logging may mask race conditions in code execution, which is typically the opposite effect that is desired when debugging intermittent package problems.

Taking Pressure Off TempDB and the Transaction Log

During high volume data transfers, changing the defaults for two SSIS settings will avoid uncontrolled growth of the transaction log and tempdb memory usage. By default, Rows per batch and Maximum insert commit size are set to maximum values of -1 and 2 GB, respectively. The first value permits all data rows to be processed as a single batch, whereas the second specifies whether all rows in the data transfer are to be committed in one go. Thus, for high volume transfers, set these to more reasonable values to match your execution environment.

Avoiding Defragmentation When Loading Large Data Volumes with Keys

When transferring several to hundreds of millions of rows from one table to another, you may experience a sharp slowdown in execution if the data contains a large number of clustered or non-clustered indices. Such fragmentation may reach as high as 90 percent.

It is possible to periodically halt the transfer and rebuild the indices, but a more efficient solution is to drop the indices completely, turn the target table into a heap and transfer all remaining data to this heap. Once the raw transfer is complete, rebuild the indices. This will decrease the overall execution time of the entire transfer several fold.

Save Time with Checkpoints

The Checkpoint features lets a package restart from the last point at which it failed instead of from the very beginning. When a Checkpoint is enabled on a package, execution status is logged to the file named in the CheckpointFileName property. Deleting this file will cause the package to start from the beginning.

Additionally, set CheckpointUsage, SaveCheckpoints on the package and FailPackageOnFailure for data flow tasks you want to have restart capability. Note that Checkpoint only works at the package level and not inside data flow tasks, which will start at the beginning of their internal sequence.

When performing long, high-volume ETL operations, Checkpoint can save many hours. It also is a great assist in debugging new packages. In the latter case, use it in conjunction with the Boolean DelayValidation property, which controls SSIS early and late validation of packages and components prior to execution.


SQL Server Integration Services’ creates data warehouses for the purpose of high-volume data ETL operations. The focal point of SSIS development is an SSIS package, created within Business Intelligence Development Studio. Packages are used for various workflows including data cleansing, standardization, integration and administrative tasks.

At the package, task and system levels, there are a number of tweaks possible to optimize SSIS performance to the job at hand or to assist the debugging of new or modified packages. Always bear in mind, that these tips are not hard and fast rules due to internal SSIS dynamic optimizations that happen at runtime. Thorough testing and monitoring is always a wise approach when applying these practices to your particular tasks.

This is just the tip of the iceberg for optimizing your SSIS, and there are many more tips and tricks to make your data systems clean and efficient. If you need help or have questions about your current system’s performance, our data analytics specialists will be able to guide you. Give us a call, we’re always happy to help.


Contact us to learn more.

Power BI vs QlikView

What are the key difference between Power BI and QlikView?

In the world of business intelligence, putting the right tools in the hands of the right people is critical to success. Two of the most popular tools in the industry today are Microsoft Power BI and QlikView. Both provide modeling capabilities and allow you to generate visualizations and dashboards, but there are key differences between the two products. Our BI experts at Optimus have conducted a side-by-side comparison.

Power BI Benefits

As tends to be the case with the majority of Microsoft products, the greatest strength of Power BI is its level of integration with other Microsoft products. If you’re already using Office 365 and SQL Server, you have a good chunk of the most common data sources for using Power BI already in place. It’s a fairly quick jump from entering data into an Excel spreadsheet to producing a visualization in Power BI.

Microsoft’s current commitment to cloud-based computing extends a lot of the advantages of Power BI further. For example, the Power BI mobile app can be fully connected to the rest of the software stack. If you want a sales person in the field to have access to real-time information that’s being updated on your SQL Server installation, the dashboards you create in the mobile app can stream that information live to a phone or tablet.

QlikView Benefits

QlikView is widely regarded as the more mature and robust of the two BI solutions. It’s built especially well for analytic purposes, and anyone who comes from a programming or data science background is likely to find it less limiting. QlikView allows you to get into the guts of its scripting system and tinker around with any models you create, and that can be very beneficial if you need low-level access. You can also produce a wider range of visualizations and presentations using QlikView.

One area where QlikView really excels is in raw speed. By keeping its analytics capabilities in-memory, it allows anyone with sufficient processing resources to crunch a ton of data and develop insights fast. The ability to use QlikView on locally based computers is also important to managed organizations that have strict requirements regarding the availability and storage of sensitive information.

Power BI Disadvantages

Data science people will find Power BI’s interface limiting. The cube system makes it very hard to get into low-level elements of models to make minor tweaks. This can be a net benefit for users who don’t want to tear through the weeds, but it renders Power BI the more disadvantaged of the two when it comes to raw processing power and the ability to generate unique insights.

Power BI, like almost the entire modern Microsoft software stack, is very dependent upon cloud-computing resources. While this increases availability, it also imposes regulatory and legal limitations on its use. If you work in an industry that has strict requirements for data storage, using Power BI may simply not be an acceptable choice. Many companies also just prefer to house their storage and processing capabilities on-site.

QlikView Disadvantages

The features that make QlikView powerful also make it difficult. QlikView comes with a very steep learning curve. The situation can be worse for people who don’t have a data science background. For some users, QlikView is simply too complex and difficult to acquire.

QlikView also struggles with a lack of specific purpose. This is another case where it’s somewhat freeform approach to being an analytics platform is a double-edged sword. Its functionality can be a bit of a sprawl, and some companies simply don’t have the time or people required to make the most of all that power.

The Right Tools for You

The choice between Power BI and QlikView doesn’t lead to a decision where one is clearly better than the other. You need to be serious about evaluating your people and your organization before you adopt either one. If your company is already married to the Microsoft stack and doesn’t require advanced analytics and modeling capabilities, the Power BI is the choice for you. If you find the available capabilities in programs like Excel to be too limiting, you should take a long look at QlikView. Power BI excels in ease-of-use and deployment, while QlikView benefits immensely from allowing data-centric people to get into the weeds. The critical thing is to balance the pros and cons of each package against the demands of running your operation on a daily basis.

If you’re looking to switch BI stacks or add to your current roster, speak with our BI experts. We can help you determine the best tools for your needs and your organization.


Power BI for Office 365

How can you integrate Power BI for Office 365?

If you’re like many folks in today’s business world, there’s a good chance you’re already using Microsoft’s Office 365 software suite. You may even already feel like you have a fairly data-centric approach to using Office, especially if your company is a heavy Excel user. Putting all that information to use, however, requires powerful data visualization and publishing tools, and that’s where Power BI comes in.

What is Power BI?

Power BI is a business intelligence package designed by Microsoft. BI systems are a way to allow companies to quickly convert their data into usable analysis, presentations and dashboards. For example, if you want to present real-time inventory tracking information to internal stakeholders, you can use Power BI to create a dashboard that allows those individuals to have access to visuals that explain everything in detail. Power BI also empowers you to keep that information up-to-date and even makes it easier to connect with your on-the-go stakeholders by presenting your dashboards within mobile apps.

Power BI actually refers to four components of the same service. As an end user, the most obvious component to you is likely to be the Power BI desktop. This is where you create visualizations and move data around. On the backend, however, are the Power BI gateways, where all the processing and magic happen. On the access side of things are presentations and apps. Tying all three of those elements together is the main Power BI service.


Microsoft actually makes a basic version of Power BI freely available. For many smaller businesses, this may be all they need. As is the case with the Office 365 ecosystem, Microsoft encourages companies to opt-in to a subscription-based system that grants them access to more usage and raw processing capabilities with Power BI.

The pro version costs $9.99 per month per user. It allows each user 10 Gb of data capacity versus 1 Gb for free users. Free users can stream 10,000 rows of data an hour to their dashboards, while pro users can stream 1 million rows per hour. Pro users also have much better access to collaborative tools.

Integration with Office 365

Power BI is available as part of the enterprise version of Office 365, making installation and integration easy. Any information you have access to in programs like Excel can easily be brought into Power BI. It’s also thoroughly integrated with products like SQL Server, so you can quickly pull information from your databases. Likewise, anything you produce in Power BI can be published as a presentation or an accessible dashboard within an app or programs like Excel, PowerPoint or Word.


It’s easy to get lost in the publishing side of Power BI and forget that it comes with a slew of powerful functions. Power BI allows you to run powerful analytics on your data. If you’re looking to produce a sales forecast, you can quickly pull live information from your SQL Server installation into Power BI.

Power BI allows you to develop a workflow that suits your existing approach. If you’d prefer most of your output to fit within existing Excel templates, Power BI is designed to allow you to readily drop calculated items into your spreadsheets. It also allows you to develop data models. If you want to check performance indicators, you can quickly establish criteria and ask Power BI to tell you how close to your targets you currently are. It can even help you project well into the future.

Visualizations are nice, but where Power BI exceeds expectations in by providing a high level of interactivity. Once you’ve created a dashboard, users have access to tools that let them sort through the information. You can even configure real-time updates, so users can check in regularly and see the current state of your business.

Power BI comes preconfigured with tools designed for a wide array of industries. If you need to collect human resources data and make it available to others, you’ll find an out-of-the-box solution for that with Power BI. Same goes for other common business functions, including marketing, finances and operations.

You can also use Power BI to quickly search all of your company’s available data. By using the unifying power of cloud-based solutions, Power BI can interpret your queries and sift through any data source your business has access to. This includes many non-Microsoft products, including applications like Google Analytics, QuickBooks and SalesForce.


Making your information more accessible, both physically and mentally, to stakeholders is one of the biggest challenges any company faces. Power BI allows you to quickly develop insights from your available data and produce dashboards and presentations that are compelling. Where a normal Office 365 user might produce a couple charts, a Power BI user can provide detailed, interactive graphics that drive decisions. Power BI makes your information more accessible in the most basic and human ways imaginable.


Contact us to get started.

(Note: This blog has been updated with new information.)


Steps to Getting Good BI Requirements

What are the steps needed to get good BI requirements?

Half the battle for many projects is getting started. BI projects are no different in that respect. A critical phase in realizing your BI project vision is developing thorough project requirements in clear, crisp, implementable and measurable terms.

The temptation is to rush this phase, which often produces excessive churn or even failure as unguided interpretations and showstopper issues appear during implementation or testing. It is vital that you get your BI project off to the best possible start by establishing sound requirements via rigorous steps.

KPIs to Validate Requirements

Requirements are, frankly, meaningless if their intentions and results are not measurable. Key Performance Indicators, or KPIs, are a good approach to ensuring your requirements focus on the right values. The right KPIs are often derived from current organizational metrics from which your project would presumably improve upon.

For instance, if one of the strategic goals of your BI project is to improve customer retention, then a set of relevant KPIs might include data concerning new customer acquisition, customer relationship strength, ROI on loyalty programs, repeat sales and customer attrition among other metrics.

Improvement to these metrics and additional metrics based on new insights will be directly stated within the requirements, e.g. “A 25% decrease in customer attrition rates over the next two quarters.” It is important to also state the methods by which metrics will be evaluated to ensure they are on par with existing ones.

Acquiring Requirements

One often overlooked technique in developing BI requirements is their acquisition across teams and departments who have stakes in the project. Naturally, this requires that you have a priori identified all affected stakeholders. This “reqs” collection phase can be employed at any time, but conversations are often facilitated by an existing pro forma set of requirements. First meetings should be formatted as brain storming sessions, whereas later reiterations should be tightly structured.

During requirement review meetings, new requirements and metrics will be discovered. Additionally, you may find that data relevant to your project is restricted by departmental policy or that regulatory compliance issues for particular data that must be considered. Far better to discover such issues at the requirements phase than in the development or deployment stages.

Take it for granted that you will be unable to accommodate everyone’s requirements, especially those irrelevant to the strategic vision. During the first phase of acquisition, it is possible the vision can be tweaked, but you should resist major or ongoing changes to it as much as possible.

BI Data Acquisition

During high-level requirements discovery and definition, pay attention to the nuts and bolts of how the project will acquire the project data from a technical point of view. This is especially true if databases, operating systems, file formats and applications relevant to your project are not uniform across the enterprise, which is often the case with companies still in their early stages of digital transformation.

Assign the most technically-savvy members of your team to digging out the details of how project data will be acquired, stored and managed. Expect to hit snags, but do not expect those outside the project to change mid-stream their habits or constraints. This data acquisition analysis naturally leads to another set of project requirements that must be stated up front in order to minimize surprises later in the project.


To summarize, once you have developed and vetted strategic goals for a particular BI project, it’s time to roll up your sleeves to tackle a well-defined, thorough set of measurable requirements for said project. These must encompass the viewpoints of all stakeholders up to the point that they support the project’s strategic vision.

State a set of pro forma requirements related to the business decisions supported by the proposed BI system. Use these requirements to generate detailed discussions regarding metrics, data acquisition, data sharing and regulatory compliance with all stakeholders. Iterate with stakeholders on the requirements with increasing structure to avoid heading down rabbit holes. Also, evaluate technical issues relative to obtaining data, storing data and formatting data and make sure these are reflected in the requirements also.

Always bear in mind that despite the seeming tediousness of requirements development, a thorough job pays enormous dividends in subsequent project stages through avoidance of foreseeable issues. These are far more difficult to correct in later project stages and could even spell disaster for the entire project.

If you have questions about your next BI project’s requirements, contact us today.

How SSIS Fits into an ETL Process

MicrosoftSQL Server Integration Services – SSIS – is a common solution for many organizations’ data management requirements with regard to their Business Intelligence systems. Specifically, SSIS and SQL Server together streamline Extract, Transform and Load – ETL – operations on data warehouses. The process is essential to providing efficient, accessible data flows to BI system applications.

Using SSIS “As-Is” to Design Your ETL Process

Guidance about utilizing SSIS/ETL capabilities typically centers on the use of SSIS control and data flows to extract and apply multiple data transformations before storing processed results within the original or an alternative database. Its greatest strength is that it runs as a one-step process without the need for intermediate storage. However, that aspect leads to certain drawbacks that are discussed later for an alternative development approach with staging tables. 

Standard SSIS Extract/Transform

Data transformation employs SSIS’ large selection of data filtering operations, such as sorting, merging, unions, joins and data type conversions. Extraction is accomplished using either a flat file or OLE DB connection. Transformations are defined with graphical control and data flows that may include data multicasting to parallel processes. These process flows may have separate endpoints or may re-combine via a union operation.

The Use of Staging Tables

Staging table ETL process architecture provides several benefits over standard SSIS ETL development, which has deficiencies especially with regard to slowing down development and debugging.

Standard SSIS Drawbacks

Because standard development utilizes the built-in one-step process, each data flow test run extracts a batch of data from the original source, which consumes additional time. This problem is compounded if your tests are running on a production database because extra care must be taken not to corrupt its state.

An additional problem is that SSIS data flows are specified at a very low level of abstraction, which makes their development and debugging tedious in the same way that assembly code slows code production compared to using C# or Python for example.

Separating Extract and Transform

Staging separates the steps of extraction and transformation in the top-level control flow. Rather than working with the entire data source, a copy of the data is pre-fetched into a staging table with minimal transformation, if any. The only filtering necessary would be to avoid performance issues such as an overly large table.

In SSIS, an easy way to create the staging table is to edit the data source destination table properties with the option to create a new table, which results in fetching all the correct columns. Data transformations are implemented as database views in lieu of specifying SSIS transformations. These can be written as SQL queries using CREATE VIEW with SELECT statements for the fields of interest.

Tips When Using Staging

While building a staging query, it is a good practice to change field names into more human-readable form as necessary and to rename IDs as Keys to accommodate BI application usage. The multiple staging tables created in this fashion are later joined into dimension and fact tables.

Note that the use of database views for larger data sets may create performance constrictions in production, although it is much more efficient at the development stage. You can get around this by later converting the views to stored-procedure tables that break down views operations into smaller pieces that are run during the ETL transformation step.

Memory Performance Tips for Standard and Staging Approaches

Regardless of which architectural approach is used to develop your ETL process, a key aspect to bear in mind is that transformation processes should be small enough to run completely within memory. This practices avoids performance-crushing disk swaps.

  • If the SQL Service and SSIS are running on the same system, SQL Service typically gains priority, which means that SSIS becomes a bottleneck
  • Apply SELECT only to the columns actually needed
  • Avoid using a Commit size of zero if inserting into a Btree to avoid memory spill
  • Use narrow data types to reduce memory allocation but avoid excessive data casts
  • Move sorts to the end of the transformation chain

To determine if your implementation avoids touching the disk during transformations, keep an eye on the SSIS performance counter Buffers spooled. If transformation remains in memory but performance lags, check the obvious, which is whether or not the speed of the data source system can keep up.

Especially if you employ the staging approach, since it requires more disk space, be mindful of overall resource utilization such as CPU usage, I/O bandwidth and network usage.


SSIS ETL is an essential pillar in supporting BI information systems. It provides powerful data transformation capabilities to match raw, large data stores to the narrower, optimized data needs of BI applications that are so essential to modern enterprise IT. The architectural approaches to SSIS ETL process development presented here offer trade-offs between performance and development/debugging effort that may improve the efficiency of your organization’s SSIS utilization.

Contact us today if you have any questions about SSIS and your BI system. We’re always happy to help.


BI: Difference Between Cloud and On-premise

What are the differences between Cloud and On-premise?

The model for how companies acquire and field software for business intelligence purposes has changed dramatically in the last five years. One major factor driving this change has been the emergence of cloud-based computing options and software as a service. Modern operations are increasingly comfortable having large amounts of their data, processing capacities and even actual software hosted remotely. One of the biggest challenges in the BI sector today is figuring out just how much of your infrastructure you want to have off- or on-premise. We’ll take a look at the pros and cons of both approaches and how a mixed environment may also be beneficial.


Comparing cloud to on-premise options in terms of security can be a bit tricky. Cloud-based solutions benefit immensely from regular updates by the vendors supplying them. Poorly updated software is one of the leading security risks in almost every sector of the business world. On the other hand, large farms of remotely hosted systems are very attractive targets to hackers by virtue of their size alone. SaaS providers see themselves in a perpetual battle against interlopers, but they also put significant resources to work on a daily basis because their reputations are on the line. The awareness of the risks in the cloud sector goes a long way to alleviate concerns about security. In fact, all the major recent data breaches reviewed by Bankrate occurred at on-site data centers rather than in cloud-based systems.


When companies start thinking about moving to off-premise solutions, the first thing that crosses their minds is often pricing. On-site solutions often carry very steep upfront costs, while cloud-based systems are typically billed on a recurring basis. It’s also important to appreciate that every piece of software that’s run on-premise has to be hosted on a piece of hardware. Installing, maintaining and replacing those systems are costs that accumulate rapidly. When a server dies at a SaaS provider, that’s a vendor problem, and it’s usually handled quickly and with no interruptions of service.

Costs also can vary dramatically among vendors. For example, Microsoft’s Power BI is integrated seamlessly with its Office 360 products, and it’s available to small businesses in a free version. The paid version starts at $9.99 per month, per user. Other vendors, such as SAP, sell products with pricing options that are opaque and require negotiation.


Getting up and running with BI systems for the first time is challenging regardless of whether you elect to use a cloud-based or on-premise solution. The big difference is that on-site systems require significantly more hardware setup. They may also demand multiple visits from the vendor to verify that everything is running as expected.

For companies that are looking for a fire-and-forget solution, SaaS wins hands down. The dashboards built for the cloud-based systems are typically designed to offer cross-platform compatibility. That means you can have folks in the office using Windows desktops and sharing information with individuals in the field who might be working on mobile platforms based on iOS or Android.


For all the setup headaches that on-premise systems can entail, they tend to offer much better customization. Cloud vendors, however, shouldn’t be discounted. There’s a good chance that you won’t require an on-site, customized setup unless your company is heavily into building its own solutions.

One of the main drivers of the adoption of cloud computing is the need for increased cooperation within companies. By moving services and data to the cloud, many businesses have dramatically increased the availability of information throughout their operations. For companies that have interests spread across the country or globe, this can reduce friction.


Every computing system encounters bottlenecks. SaaS solutions are often bottlenecked by network capacity and speed. On-site solutions are frequently limited by the available hardware. Cloud services providers offer solutions that allow clients to throw more resources at big problems when they require speed and power. This sort of instant scalability is hard to replicate at your own location without building your own supercomputing cluster.

Mixed Solutions

Few companies have wholly embraced the cloud-based ecosystems, and almost all operations are likely to end up using some mixture of both solutions. Many offerings take this fact into account. For example, Tableau mixes cloud storage and computing options with a desktop application that blurs the line between on- and off-site solutions. It also is designed to allow users to important local files and connect to remotely served data. SaaS providers are increasingly accommodating toward the needs of firms that have ingrained desktop computing cultures.


SaaS systems are growing in popularity, and many companies are trying to move as much of their BI infrastructure into the cloud as possible. As vendors continue to prove themselves capable of handling concerns about security and availability, this trend is likely to continue over the next five years. Increased competition in the sector should also encourage cheaper and more options, and that’s a win for nearly everyone looking to improve their BI resources.

Setting up a solution can be challenging, especially if you don’t have the right specialists on your team. Optimus information has worked on implementation of both SaaS cloud BI platforms and on-premise solutions, and we have the right BI technical experts to move you forwards without incurring the financial burden of a full time BI staff.

If you have questions about whether a cloud or on-premise BI solution is right for you, contact us today. We’re always happy to help.

(Note: This blog has been updated with new information.)

Product comparison: Power BI, Tableau, QlikView and SAP

Few topics that come up during a business meeting are at once as simple and as challenging to address as the question of how the company is doing. It wasn’t long ago that someone involved in the process would toss around printouts of a few Excel sheets and declare the discussion over. In recent years, however, business intelligence has taken on a life of its own. Even in small operations, hard data is in-demand.

There are four popular tools used to handle BI work these days. They are:

  • Power BI
  • Tableau
  • QlikView
  • SAP

Let’s take a look at the pros and cons of each option.

Power BI

Power BI gains a lot of its strength from the fact that it’s built by Microsoft. That means companies that are deep into the MS Office ecosystem can quickly use their existing data from Excel and Access to produce new insights. It’s built to handle data from a number of different inputs, and you can create everything from widgets to full-fledged dashboards with ease using Power BI. It’s a popular choice for companies that need seamless integration and dedicated support.

For those who want visualizations and ease of use, Power BI is a great choice. It was originally designed as an add-on for Excel and has grown into a fully formed product in its own right. Power BI is a relative newcomer, and it’s generally not seen as a good option for companies that need to power through big data sets or produce complex visualizations. It does, however, provide support for R-based analytics, a programming language popular with statisticians and data miners, so advanced data people shouldn’t instantly dismiss its usefulness.

Microsoft is moving toward a monthly pricing model for most of its major business applications, and Power BI is no different. The basic version is available for free, and it’s designed to process 10,000 rows of information each hour. The advanced version is available for a monthly fee of $10. Both allow you to create dashboards that are compatible with Windows, Apple iOS and Android devices.


Tableau is designed to put a lot of analytic power in the hands of people who don’t have PhDs in stats. It handles many types of analysis that are often thought of as advanced functions, including trend analysis, regression modeling and correlations. It also allows you to quickly derive metadata from your existing information sets. You can simply group points together and produce new visualizations with a few clicks.

One thing Tableau does incredibly well is mapping. It comes with worldwide map data out of the box, and it also does a very good job of recognizing what formats correspond to what countries. You can readily throw together a visualization of your company’s country-by-country performance in Tableau by simply importing an Excel sheet. Within a matter of minutes, you’ll have a presentation element that looks professional.

Tableau is widely considered the BI industry standard for visualization projects. It is, however, lacking in deeper analysis capabilities. Many users, however, pair it with a separate engine because it’s so hard to sacrifice its gorgeous visual output. Tableau is available in both free and paid versions.


If you’re looking for something that’s powerful and able to rapidly mine data, QlikView may be the option for you. It has a bit of a reputation for a steep learning curve, but QlikView also is likely the bit-for-bit analysis champ among the big BI tools on the market today. It’s a popular choice for users who value analytics above visualization.

QlikView does visualizations, and it even can produce dashboards. It’s just not the preferred choice for those who want to produce beautiful data. If you’re looking for simple charts, it’s a solid contender.

QlikView is available in a personal edition that’s free. The personal edition is fully featured. Both the free and paid versions allow cloud deployment of applications on the company’s QlikSense platform. The paid version’s business model is built on allowing a larger application size, more storage and unlimited users.


SAP BusinessObjects is the preferred BI tool for companies that require predictive capabilities. It’s designed specifically for managing customer relationships, tracking finances and handling supply chains. If you’re looking for a program that can quickly compile your data and answer basic questions about profitability and efficiency, SAP is likely to be your BI weapon of choice. The package also comes with a variety of reporting a data visualization options.

Where the SAP offerings take a hit is pricing. It’s far from being as cost-effective as other BI options on this list. The range of offerings is diverse, and the company does not make a point of disclosing licensing fees upfront. Generally, you should expect to spend a couple thousand dollars to get started with SAP. The opaque licensing regime and poorly streamlined family of programs often turns potential users off SAP.


There isn’t a clear contender for the biggest BI tool in the all the land. If clean and effortless visualizations are your thing, Tableau will be your choice. QlikView offers raw power. Power BI provides easy integration. SAP is a great choice, but only if money isn’t an object.

If you need help figuring out which option is the best for you, ask one of our experts. We can help you choose and get you started in the right direction.

Contact us today

Best Practices for Successful Business Intelligence

Few, if any, companies doubt a successful Business Intelligence value to their internal and customer-facing operations and the crucial role it plays in making accurate, lightning-fast tactical and strategic decisions that affect the business’ competitive position. The difficult bit is ensuring that Business Intelligence’s deployment in your company is fetching the best data, crunching it in a timely fashion and presenting actionable insights to the right decision makers at the right time.

The team at Optimus have traversed this path before and have several tips that will ensure a quick start and maximum return from your successful Business Intelligence projects.

Get Everyone On Board

Once you develop a unique strategic vision around your Business Intelligence project, share it with anyone who has a stake in its outcome. The ensuing discussions will crystallize short-term tactical and long-term strategic goals and produce a detailed roadmap with a timeline for development and installation.

Understand Your Data’s Context

As project functional requirements are shared across stakeholders, issues are bound to arise relative to which data is procured, how it is captured, stored and processed. These issues will differ depending on the nature of the data and its source. For instance, customer data must meet regulatory compliance. Other data may have restrictions on distribution and still other data streams will have format or periodicity characteristics that create technical problems.

Assigning technical personnel to the requirements gathering process as early as possible with a mission to uncover such issues will increase the likelihood that they are dealt with in the most efficient manner.

Aim for Agility

Approach the next phases – requirements gathering, specification, development, testing and deployment – with an eye toward flexibility and agility. Although it is theoretically possible to anticipate all the omissions, glitches and miscalculations in the requirements stage, in practice it never works out that way.

To mitigate this fact of life, take a page from agile software development methodologies. Break the project into small, independent components and apply short iterations over progressively more functionality. This approach yields continuous results while avoiding lengthy and disruptive project halts when inevitable technical hurdles arise.

Produce Measurable Solutions

At all times, keep in mind that the project should be producing not just functionality but measurable solutions to problems within the company. It is easy to check off features, but more difficult to judge the value of each feature to the end solution, which may be, in part, judged by subjective criteria.

Various methods to deal with this aspect include the use of working wireframe mock-ups along with usability tests, identifying and validating action loops, the use of responsibility matrix models and the development of user personas. Separately or together, these techniques are used to validate the BI system’s usefulness and ease of use.

Ensure Maximum Accessibility

One of the most invaluable improvements in BI over the decades has been increased accessibility of business intelligence tools across an enterprise. Whereas in the early days of BI, data experts were required to collect, massage and interpret data, modern successful Business Intelligence software automates those processes. Today, non-experts possess the ability to develop actionable reports and dashboards relevant to their job function right at their desk.

One caveat is in order however. Although your new BI project should facilitate maximum accessibility as an end goal, do not distribute it widely until the bugs are shaken out. Furthermore, where possible, augment existing processes, workflows, apps or websites with your solution in lieu of creating a standalone app that is easier to ignore. On the other hand, develop a mobile app version of your project for those employees who typically work away from the office.

Plan for Training

Training is essential to your project in order to raise awareness among employees, tout its value and increase usage. Assigning one or more technical evangelists to spread the word is a superb way to start. These project ambassadors must be backed up with a suite of training material including downloadable multimedia content, ready-made presentations for department leaders and face-to-face classes as necessary.


Start your BI project development by ensuring everyone is on the same page with regard to the vision and the benefits of your BI solution. Collect stakeholder input to ensure a fuller set of requirements.

Deal with technical issues to your solution, its development and deployment through short-cycle agile methods, while bearing in mind it should result in maximum accessibility and collaboration when it is finished.

Finally, plan well ahead for the training necessary for maximum uptake and efficient usage of the project whether it be a plug-in, standalone app or a web-based tool.

Transforming your company into a fast-acting, data-driven organization that stays two steps ahead of the competition is not a task for the faint of heart. It is made infinitely harder by not following several tried and true approaches that raise your chances of success.


Contact us to learn more.

(Note: This article has been updated with new information)