10 Tips for Success with Big Data – Part 1

The age of big data has been a boon for anyone in the business intelligence world. Creating reports, apps and visuals that motivate decision-making processes is simply easier when you have a large data set to draw upon. There are, however, a number of issues to keep in mind. Here are 10 tips for anyone who’s looking to more successfully utilize big data for BI purposes.

1. Maintain Data Formats

There’s a temptation when putting data to use to reformat the data set for greater ease of use. It’s not unusual, for example, to remotely retrieve a data set using JSON and then dump the key and value pairings because the information contained in the keys is seen as useless. Those value pairings often offer insights that may only be useful further down the road. By maintaining the original format of a data set, you can preserve information like time sequences and references. That can be beneficial if you’re asked later to track down a specific data point as part of a later discussion.

2. Security Matters 

It can be a lot of fun splashing around a big pool of data, but it’s also important to be prepared to be told no and to tell others no. Security protocols exist for a reason. Your BI systems should already have industry-standard security in place, and you should not ignore its usefulness by not setting limits on authorization. As exciting as it can be to share data, it’s always critical to be sure that you and those you share it with have a right to access it.

3. Price Traps

At the petabyte scale, the cost of storing and transmitting data can be staggering. It’s easy to buy into the argument from vendors that big data costs mere pennies per gigabyte. Likewise, vendors love to price SaaS systems on a per user basis. You always want to make sure that your operation is paying the most feasible prices possible for its BI systems, and that often means negotiating with vendors. Whenever possible, try to arrive at flat prices or low rates with strict limitations in place.

It’s also important to bear in mind that many vendors are hoping you’ll go over your limits. Make sure your BI implementations shut down access to resources before they cause your fees to go through the roof. Remotely hosted storage and processing providers have built their business models on the belief that people rarely show restraint when playing with a fun toy. Contain yourself.

4. Don’t Let Data Delay Decisions

There’s always a pressure in the world of BI to have the freshest data. The problem with this attitude is that it can inhibit decision-making processes and ultimately undermine the value of analysis. Your operation cannot afford to be impaired by a ceaseless wait for new data.

De-emphasizing the importance of using the absolute freshest data can also help you realize speed and efficiency gains. For example, it’s easy to see how caching your data can improve performance. It does, however, come at the cost of forgoing access to the absolute freshest data. If the arrival of a small sliver of data can disrupt the decisions that are being made based upon it, that fact often will open more questions about the volatility of what’s being studied than the freshness of the data.

5. Don’t Discard the Outliers

The presentation of most large data sets almost always leads to questions about outliers. There’s a strong temptation to discard the outliers and present the cleanest visuals possible. The willingness to retain outliers, however, can be a signal of quality and honesty. Outliers can and should be discussed when they’re discovered. If you expect to be asked about the lack of smoothness that outliers can create, the wisest choice may be to explain that throwing them out inhibits discussion and is frequently a sign of manipulation. It’s always better to follow the data wherever it goes. 

Stay tuned for part two of this two part blog post.