Essentials of Data Governance

pexels-christina-morillo-1181341-1030x688 Essentials of Data Governance

In the era of emerging technologies, data has become essential for organizations. With rapid digital transformation across industries, gaining a competitive advantage is crucial for thriving in the market. Today, data is the new “oil” that forms an organization’s core for business growth. However, the rate of data generation has become enormous. A recent report by Towards Data Science produced the statistics of data generation that stands at a whopping  2.5 quintillion bytes. Additionally, the current projections state the data generation rate to rise to 133 zettabytes by 2025.

In recent years, the increase in the number of data breach cases has doubled. The imminent threat in a business is the possibility of data breaches. To bolster data protection, it is of utmost importance to have a robust data governance framework. As per IBM data breach reports, the average cost of a data breach is highlighted as $3.86 million, while the USA alone recorded a breach of $8.64 million.

There is a need for robust data governance framework to tackle such challenges. Standard data governance ensures data security, data quality, and integrity while providing the traceability of the data origins. Also, data governance can be successfully implemented when high-quality data is readily available with crucial information on the data types, which is achievable with a data catalog.  Besides, an organization attains firm control over its data usage policies when a regulatory body imposes stricter guidelines. Today, it is possible with some of the robust regulatory bodies available that put a strong emphasis on data governance. Among them, the most well-known is the General Data Protection Regulation (GDPR). Furthermore, a data governance approach can reach its ultimate goal within an enterprise with its essential components, namely processes, policies, access controls, and data protection, encompassing the entire data-related workflow within an organization. Tech giants such as Microsoft have contributed significantly to the data governance requirements with the Azure Purview offering that has reach achieved wide acceptance in the industry.

The article delves into the topic to provide a deep insight into data governance and its regulations.

Data Governance Overview

Data governance is a strategy that incorporates the practices, processes, and technical requirements of an organization into a framework by which an organization can achieve standardization in its workflow, thereby providing protection and the appropriate management of its data assets. A useful data governance model’s scalability is a must as it ensures that all the policies, processes, and use-cases are applied accurately for transforming a business into a data-driven enterprise.

Another crucial aspect of data governance is for an organization to conduct a risk assessment and compliance. The successful integration of data governance is determined by efficient data management and data security factors within the framework. An ideal governance policy must address the critical components of data storage,  the original source, and a well-defined data access strategy. Furthermore, data governance solutions focus on providing response plans relating to misuse of data and unauthorized access.

Data governance and data management are often used synonymously, but it is essential to understand that data governance forms a significant part of a data management model.

Data Catalog

A data catalog acts as the inventory of the critical data assets in an organization. The use of metadata helps to manage the data more efficiently. The data professionals benefit from a data catalog as it helps in data collection, organizing data, easier accessibility to data, and improvement of the metadata to support data discovery and governance. While the data generated is enormous in a day to day functioning of an organization, finding relevant data becomes challenging for specific tasks. Additionally, data accessibility is demanding due to various legal regulations of the organization and a particular country’s government. The key factors to understand are the data movement within an organization, such as the individuals who will have access to it and the purpose they want to access it. Such tracking of the data ensures the protection of the data as it limits unauthorized personnel. Thus a data catalog plays a crucial role in addressing some of the challenges related to data.

  • A data catalog provides all the essential data required by an organization; therefore, data accessibility from a single point ensures reduced time for searching data.
  • Creating a business vocabulary.
  • Efficient transformation of data lakes into data swamps.
  • Identifying the different structures of the data.
  • Availability of high-quality and reliable data.
  • Data reusability possibilities

An organization can achieve a competitive advantage with the appropriate use of data. Therefore the data should be trustworthy from the appropriate sources. Some of the organizations’ key members, such as C-level executives, use data for business decisions. Thus, a data catalog becomes useful for looking at cost-saving and operational efficiency factors with a keen eye on fraud and risk analysis.

Data Governance Framework

A data governance framework allows an organization to focus on achieving the business goals and data management challenges while providing the right means to attain them more speedily and securely. Besides, the results of a data governance integration are scalable and measurable.Key-Participants-in-a-Data-Governance-Framework Essentials of Data Governance

Figure. Key Participants in a Data Governance Framework. Source

 

Some of the essentials of a data governance framework are:

  • Use Cases

The data governance framework must address some critical factors, such as the use case for several business scenarios in an organization. The data governance use cases should interlink the need for a data governance framework and its contribution to achieving business goals. Ideally, the use cases are derived from significant factors in an organization, such as revenue, cost, and the associated risks. The category-related use case addresses the enrichment of products and services, innovations, market opportunities, and the ability to achieve them at a reduced cost of maintenance with efficiency, auditing, and data protection.

  • Quantification

The need to quantify data is an absolute necessity as it produces data governance integration in the organization. A business needs to ascertain that they are following, covering all the categorized use cases with evidence to monitor the performance and provide future insights.

  • Technical Benefits

With the technical addition in a workflow, the data governance solutions can efficiently address some of the critical components, thereby ensuring efficiency. The data governance must address factors like the need for technology investment and the primary members who will work with data-related processes. A technical infusion in the workflow also enables the easier discoverability of data definitions, data categories, data lineage, and the appropriate classification of data as trustable data or untrustworthy data. The technical addition also makes it possible to create a feedback mechanism for resolving regulatory issues and policies concerning data usage.

  • Scalability

The data governance policies should be capable of providing scalable results. Using a scalable model provides growth opportunities for an organization by addressing the problems in a data lifecycle. The primary focus is to introduce new tools to reduce operational costs and provide data protection for business growth.

Data Governance Processes

The data government processes comprise of the following.

  • The organization must be mindful of the essential documents such as regulatory guidelines, statutes, company policies, and strategies.
  • A clearly defined workflow states legal mandates, policies, and objectives to be synchronized to help an organization meet data governance and management compliance.
  • Data metrics to be incorporated to measure the performance and the quality of the data.
  • Principles of data governance to be met.
  • Identification of the data security and privacy threats.
  • Control measures to ensure smoother data flow with a precise analysis of the risks.

Data Governance Policies

Under data governance, there are various policies to determine the effectiveness of the organization’s operational strategies. Some of the policies related to data accessibility, data usage, and data integrity are incredibly crucial for successful data governance implementation. The most important policies that an organization must follow for successful data management are as follows.

  • Data Structure policy
  • Data Access Policy
  • Data Usage Policy
  • Data Integration Policy

 Privacy and Compliance Requisites

The organizations are associated with a significant amount of highly sensitive data. Therefore, an organization needs to follow the regulatory compliance of data governance. In the context of business, privacy refers to an individuals’ right to have control over the type of personal data they want to be collected and used and the sensitive information that should be restricted. As per EU directives for data governance, sensitive data is defined as the data that contains a name, address, telephone number, and email address of an individual. On the other hand, sensitive personal data is distinguished clearly, as the data contains information on a person’s ethnicity, political opinion, religion, race, health-based information, criminal conviction, and trade union-based membership details. Such data have stricter guidelines that must be followed with due diligence.

Role of General Data Protection Regulation (GDPR)

The General Data Protection Regulation (GDPR)  was established in the year 2016. The primary aim of the regulation was to provide a framework for data privacy standards. GDPR states that any company looking to conduct business in Europe must be willing to adhere to data protection norms. The GDPR has strict guidelines that ensure the protection and privacy of personal data for its citizens. The mandate was an update from the previous Data Protection Directive in Europe.

Crucial-Requirements-of-GDPR- Essentials of Data Governance

Figure. Crucial Requirements of GDPR. Source

 

Under GDPR, the mandate’s scope extends its reach in terms of the territorial horizon while providing a well-defined law for processing personal data by offering their business services in Europe. The organizations or individuals aiming to provide their services without the presence in Europe are monitored for their service offering under GDPR guidelines. The tracking of such services includes online businesses that require users to accept cookies to access their services. GDPR also differentiates the various data types and the data considered personal data under the mandate.

Furthermore, the direct and indirect data are interlinked with the identification of data subjects. The data subjects are people who can be identified with their information presented in the data. The data in this context is related to personal information such as names, addresses, IP addresses, biometric data logs, citizenship-based identification, email, and the profession.

Additionally, the GPPR mandate ensures that the data is collected within the limits of the law, and it should be highly secured while it exists the records of the organization with stricter rules for its uses. The primary categories of GDPR data governance requirements are:

  • There must be a classification of personal data, while personal identification data must have limited usability. The individuals can access their data and hold the right to request personal data removal or rectification. The mandate also states mandatory data processing requirements and portability of data.
  • Data protection is a must, and it should cover all aspects of safeguarding personal data collected. Also, there must be confidentiality, integrity, and availability of the data collected for business purposes. The organizations should also adhere to data restoration regulations for scenarios that may involve data loss due to technical failure or accidents.
  • The collected data must be well- documented as per legal procedures.

Access Controls

Access controls form an integral part of access governance that regulates the accessibility of data. The critical areas covered comprise the guidelines to specify who can access the data and view it. Additionally, it specifies that there is a requirement to state the purpose of data access in the organization. The compliance of access controls allows eliminating unauthorized access of data.

As per the GDPR mandate, some of the data protection requirements must enforce specific procedures.

  • There must be accountability associated with data protection requirements. Data protection personnel must be appointed to manage data and monitor its activities for organizations involved in data processing activities. The appointed individuals must ensure that the data protection standards are met.
  • Data storage is the essential factor for data privacy. Therefore, organizations must have a data map and data inventory to track the source of data and its storage. The source includes the system from which it was generated while tracking the data lineage to provide comprehensive data protection.
  • Data accuracy is paramount, and organizations must keep up-to-date data to achieve high-quality data. Also, data quality reporting must be followed to keep up with data quality standards.

Data Protection

  • Data intelligence provisions for getting insights with 360 visibility of data.
  • Identifying data remedies for security and privacy issues.
  • To protect sensitive data with access governance and ensure no overexposed data exists with data governance methods.
  • Integrating artificial intelligence capabilities to identify dark data and its relationship.
  • Assigning labels with automation to provide data protection during the workflow and the lifecycle of the data.
  • Rapid data breach notification and its investigation.
  • Automate procedure for classifying sensitive and personal data.
  • Automated compliance and policy checks.
  • In-depth assessment of risk scores with metrics depending on the data type, location, and access consent.

Reimagining Data Governance with Microsoft Azure Purview

Azure Purview is a unified data governance service by Microsoft. The governance service enables management and governing of on-premise, multi-cloud, and software-as-a-service (SaaS) data. The users can have access to a holistic and up-to-date map of the data with automated data discovery. Besides, the classification of sensitive data is more manageable along with end-to-end data lineage. With Azure Purview, the data consumers are assured of valuable and trustworthy data.  Some of the key features of Azure Purview are discussed in the following section.

  • Unified mapping of data

The Purview data map feature establishes the foundation of practical data usage while following the data governance standards. With Purview, it is possible to automate the management of metadata from hybrid sources. The consumer can take advantage of data classification with built-in classifiers that can Microsoft Protection sensitivity labels. Finally, all the data can be easily integrated using Apache Atlas API.

unified-data-mapping Essentials of Data Governance

Figure. Unified Data Mapping using Azure Purview. Source

 

  • Trusted Data

Purview offers a data catalog feature that can allow the easier search of data using technical terms from the data vocabulary. The data can be easily identified as per the sensitivity level of the data.

  • Business Insights

The data supply chain can be interpreted conveniently from raw data to gain business insights. Purview offers the option to scan the power BI environment and the analytical workspace automatically. Besides, all the assets can be discovered with their lineage to the Purview data map.

  • Maximizing Business Value

The SQL server data is more discoverable with a unified data governance service. It is possible to connect the SQL server with a Purview data map to achieve automated scanning and data classification.

  • Purview Data Catalog

The Purview data catalog provides importing the existing data dictionaries, providing a business-grade glossary of terms that makes data discoverable more efficiently.

Conclusion

Business enterprises are generating a staggering amount of data daily. The appropriate use of data can be an asset for gaining business value in an organization. Therefore, organizations need to obtain reliable data that can provide meaningful business insights. Advanced technologies such as artificial intelligence and data analytics provide an effective way of integrating data governance in the operational workflow. Today, tech giants like Microsoft, with their data governance offering: Azure Purview, have paved the way for other organizations to opt for data governance. Many startups follow in the footsteps and have acknowledged the importance of data governance for high-quality data while ensuring data privacy at all times, thereby offering several data governance solutions in the market. A robust data governance framework is essential for maintaining the data integrity of the business and its customers.

 

 

Integrate Data Silos with Azure Synapse Analytics

The Roadblock for Digital Transformation

Synapse-1-211x300 Integrate Data Silos with Azure Synapse Analytics

Source: Harvard Business Review

It is clearly established that Digital Transformation is the key to success and even survival for organizations, even more so with the current global crisis due to COVID-19. 64% of executives believe that they have less than four years to complete digital transformation or they will go out of business. 91% of global executives surveyed by Harvard Business Review feel that effective data and analytics strategies are essential for digital transformation. This data driven culture is critical to spark innovation and drive efficiencies, which is crucial for survival.

But, 80% of the respondents also say that their organizations are struggling to become mature users of data and analytics even though 79% of the employees use data and analytics at least once a week. What gets in the way of organizations effectively using data and analytics for business decisions?

More than half (55%) of the executives say the key roadblock stems from data silos and difficulty managing data coming from multiple systems. Digital transformation leads to a lot of data being captured across various systems which can be extremely valuable. However, less than 20% of this data can ever be analyzed due to the silos. This is mainly because of the disconnect between Big Data analytics, enterprise Data Warehousing, Analytics, and Artificial Intelligence/Machine Learning.

Simplifying Analytics

The need of the hour is to simplify analytics in a manner that breaks down these silos and makes the most of the data available for analysis without having to jump through hoops. In an ideal world, streaming operational data should be available for immediate analysis to generate reports and run models on the data. This is not a trivial problem.

Operational data is a mix of structured and unstructured data which is generally stored in a Data Lake, not suitable for analytics. Hence, the operational data needs to be imported into a Data Warehouse. The reporting and analytics services can then run on the Data Warehouse.

This creates three key issues. 

  1. Lag between the operational and analytics data stores due to the ELT pipeline. 
  2. Balancing the operational, ELT, reporting, and analytics workloads in the cloud. 
  3. Efficient and effective model management.

Organizations would really benefit from a framework which effectively addresses these issues and removes the roadblocks to data maturity. Azure Synapse Analytics is a step in the right direction with a big promise – Limitless Analytics Services in the cloud.

Azure Synapse Analytics to the Rescue

synapse-2-300x198 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Microsoft has launched Azure Synapse Analytics to fulfill the promise of limitless analytics services. This service creates a single place to collaborate for Data Engineers, Database Administrators, Data Scientists, Business Intelligence Analysts, and Business Users with everyone accessing the same data.

The service offers a distributed query processing engine, versatile form factor for computing (cluster/ serverless), and a single experience for the users to manage the end-to-end process. This provides the much required flexibility in scaling and a great user experience, which promotes collaboration.

Many features of Azure Synapse Analytics are now generally available with many more in the pipeline. We believe that this service will evolve rapidly into the standard for analytics at scale for organizations.

Benefits of Azure Synapse Analytics

Azure Synapse Analytics allows teams to seamlessly work together. However, the benefits go beyond this. Some additional benefits are:

1. Unified Experience

 

synapse-3-300x117 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Azure Synapse Analytics allows users to ingest, prepare, manage, serve, visualize, and analyse the data using a unified experience. Users can bring their analytics to where the data is located, rather than switching to a different interface. This gives a big boost to productivity.

synapse-4-300x169 Integrate Data Silos with Azure Synapse Analytics

2. Limitless Scale

Azure Synapse Analytics enables limitless scaling for data and analytics in the cloud. Data professionals can derive insights from all the data across data warehouses and big analytics systems at speed. They can query both relational and non-relational data at petabyte-scale using T-SQL language. Furthermore, they can benefit from a versatile form factor of using clusters and serverless computing. Finally, they can run analytics systems along with mission critical workloads with intelligent workload management, workload isolation, and limitless concurrency.

Synapse-5-300x169 Integrate Data Silos with Azure Synapse Analytics

3. Integrate Business Intelligence and Machine Learning

Azure Synapse Analytics allows users to integrate Power BI and Azure Machine Learning within the Azure Synapse Studio. Then BI professionals and Data Scientists can tap into the available data immediately to create faster insights.

Synapse-6-300x169 Integrate Data Silos with Azure Synapse Analytics

4. Cloud-Native HTAP Implementation

The announcement of Azure Synapse Link (Preview) brings cloud-native hybrid transactional and analytical processing (HTAP) for Azure Cosmos DB. And with this, plans to expand it to other data stores in the future. It creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics. This enables users to run near real-time analytics over operational data which is stored in Azure Cosmos DB.

Synapse-7-300x126 Integrate Data Silos with Azure Synapse Analytics

Want to learn more? Click here for a quick and informative video that demonstrates the power of Synapse Analytics Link.

5. Price-Performance

Price-performance is also a critical part of data solutions. According to Microsoft, Azure Synapse Analytics offers better price-performance as compared to Google BigQuery and Amazon Redshift based on field tests done by GigaOm.

Synapse-8-300x221 Integrate Data Silos with Azure Synapse Analytics

Source: GigaOm Report

The TPC-H and TPC-DS results published by Microsoft show a significant reduction in price of Azure Synapse Analytics as compared to the others in the preceding as well as following graphics.

Synapse-9-300x294 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Speed

As demonstrated in this video from Ignite 2019, Azure Synapse Analytics can be blazing fast in a petabyte-scale environment combining relational and non-relational data. This can be a game changer for organizations where faster decision making can lead to substantial profit increase.

Synapse-10-300x168 Integrate Data Silos with Azure Synapse Analytics

Getting Started

Synapse-roadmap-300x166 Integrate Data Silos with Azure Synapse Analytics

We have multiple offers that make it easy for organizations to get started with Azure Synapse Analytics no matter what stage they are at in the process. 

  1. Just getting started?  We offer a free two-hour lunch and learn workshop to help you understand this service. 
  2. Do you already know about the service but need help figuring out your next step?  We can conduct an assessment, strategy, and roadmap workshop that will provide your organization a plan with how to move forward. 
  3. Do you have a roadmap but need help with implementation? We can get you started with the first pilot which can be completed in 2-4 weeks. Once you have experienced the value from the pilot, we can help you with the implementation as per the roadmap.

 

Contact us at info@optimusinfo.com to get started.

Starting a Data Project

chart-close-up-data-desk-590022-e1583771218589-300x168 Starting a Data Project

It’s exciting to hear ‘Data is the new Oil’ or the ‘new Gold’ or the new ‘something valuable’. What I dread, though, is the day we hear ‘Data is the new fad and a complete waste of money’. I hope that day never comes!

A lot will depend on how businesses approach data projects. Right now, it could go either way. There are many organizations throwing money at data projects to ensure they are not left behind. There are many more who are not even getting started fearing the outcome or the futility of it. If you belong to either camp, I will share a simple process to maximize the return on your data projects.

Where Data Projects fail

Data projects are complex and resource intensive and hence have many failure points. Most failure points are like the failure points of any complex project. Data availability, data quality, team quality, team work, communication, and so on. There is one, though, which is unique to data projects and at the root of all failed projects. It’s what I call the ‘rabbit hole question’. If a data project begins with this, it is likely to fail.

The Rabbit Hole Question

This is the question I most often hear from companies wanting to start data projects. It is some variation of – ‘What can I do with my data?’. I agree that it is the most natural question to ask, however, not the question that is going to set you up for success. It is the dream question for the salesperson who can now engage the solution architects. Who will then build an exciting solution. A solution that is likely to cost a lot of money and take a lot of time. Worse, it may not yield any results. Why? because it’s the ‘rabbit hole question’.

This question propels everyone to start thinking about what all to do with the data. Or, where all to apply the algorithm or the tool. There are many possibilities and hence many potential projects. But there is no way to figure out what we will get at the  end of these projects. We will only discover it as we go along. And chances are we may not like what we see in the end, if we see anything at all.

Avoiding the Trap

So, how do we avoid the ‘rabbit hole question’? Where do we start and how do we proceed to maximize our chance of success? The answer is to flip the question – ask “What can my data do for me?”. Better still, use a top down approach of starting with your Business Objectives. The graphic on the right illustrates a more sensible approach to data projects.

info-03-e1583772519476-300x219 Starting a Data Project

The key is to break down the process into two phases – Planning and Execution. Planning requires little time but a lot of thinking but is crucial for success.

It is important during planning to stop thinking about the data you have and what to do it. Instead, start with what the key objectives for your business. Next is to think about the Actions required to achieve those objectives. That leads us to thinking about the kind of decisions we need to take. Then we can ask the question – “What insights do I need to take these decisions?”. These required insights then lead us to the relevant data and findings.

In this process, we may find that we do not have some of the required data. We can then start collecting those. In the meantime, we can then switch to execution with the data we already have. We can use the data and findings to generate relevant insights. These insights then drive the appropriate decisions. These decisions then guide us with the required actions to achieve our objectives.

Data Strategy Workshop

In our experience, the knowledge required for Planning is available in the organization. It usually sits in different silos though. Also, we find that the key stakeholders are usually not aligned.

Hence, we recommend conducting a Data Strategy workshop. Such a workshop aligns all stakeholders around the business objectives. It then allows the group to connect the objectives all the way to the Data they have.

Screen-Shot-2020-03-09-at-9.40.40-AM-300x78 Starting a Data Project

The outcome of the Workshop is an aligned Data & AI Roadmap. We can then jump into execution with the least effort and cost. The initial success then builds confidence in the organization for further projects. It also frees up time of critical resources to contribute to these projects.

Screen-Shot-2020-03-09-at-9.34.57-AM-300x79 Starting a Data Project

Optimus has already conducted Data & AI workshops for various organizations with fantastic results. If you would like your organization to have a clearly defined, cost effective, Data & AI Roadmap, please contact us at rajeev.roy@optimusinfo.com 

 

 

Data Lakes – Deep Diving into Data Like Never Before

As data analytics, machine learning and AI continue to rapidly evolve, so, too, does the need to acquire, access and catalogue large amounts of data required to power data analysis. This has given rise to something called a “data lake”.

The standard model for data storage has been the data warehouse but in a traditional data warehouse, the data must be classified and formatted carefully before being inputted to the warehouse (schema on write). Because the data is so formally structured, the questions must be carefully defined, as well. A data warehouse is expensive, too, and affordable only to corporations large enough to support the enormous costs needed to design, build, house and maintain the data center infrastructure and associated software costs.

The Data Lake Difference

The data lake is also a storage repository but with several significant differences:

  • The data lake can hold all types of data: structured, semi-structured and unstructured.
  • The data doesn’t have to be filtered or sorted before storage – that happens when the data is accessed (schema on read).
  • The costs of a data lake are vastly diminished thanks to scalable storage on demand in a cloud-based platform like Microsoft Azure which also eliminates costly infrastructure.

Optimus Information recently asked Ryan O’Connor, Chief Technical Strategist and Milan Mosny, Chief Data Architect, to talk more about data lakes and how Optimus is using the technology to further the business goals of our clients.


Q. How do you define a data lake?

Milan: A data lake holds data that is large in volume, velocity or variety. This is data acquired from logs, clickthrough records,  social media, web interactions and other sources.


Q. So, when would a business use a data lake versus a data warehouse?

Milan: A business unit will use a data lake to answer questions that a warehouse can’t answer. These are questions that need huge amounts of data that won’t necessarily be present in a warehouse. The data lake can supply answers that will increase the agility of the decision making or the agility of the business processes. Without a data lake, a business will have to use an ETL (extract, transform and load); they will have to define the ETL, build it and the load the data into the warehouse before they can begin to create the questions to get the answers they’re looking for. The data lake eliminates the need for the whole ETL process and saves enormous amounts of time.


Q. Is there a minimum size or amount of data needed to start a data lake?

Milan: I wouldn’t worry about minimum sizes. The best way to approach creating your own data lake is to start with a variety of data and then grow the lake from that point of view. One of the strategic strengths of a lake is that it holds so many different kinds of data from multiple (and different) sources. Variety is the key and that’s where I would focus.


Q. Data lakes are typically on cloud platforms like Azure. Can a data lake be on premises?

Milan: It can be, but only really big companies can justify the cost of running the extra servers needed to store the data. Why would you even bother when Azure and other cloud platforms are so scalable and affordable? It doesn’t make much sense, financially. Plus, Azure contains so many of today’s powerful data lakes technologies like Spark, a lightning-fast unified analytics engine, Azure Databricks and Azure Data Lake Analytics. In fact, Microsoft has a suite of superb Azure analytics tools for data lakes. The nice thing about these tools is that you can work on storage which is extremely affordable with Azure. So, you dump your data into storage on Azure and then you can spin up the analysis tools as you need them – without having to spin up the Azure cluster at the same time.


Q. Since a data lake can hold all sorts of data from different sources, how do you manage a data lake?

Ryan: The key is how you organize the ecosystem of ETLs, jobs and tools. You can use Azure Data Factory or Azure Data Catalogue which lets you manage the documentation around the datasets, what’s in each dataset and how it can be used and so on. As Milan said, Microsoft has recognized the massive impact of data lakes and has already produced some tremendous tools specifically for them.


Q. How is Optimus going to introduce data lakes technology to its customers?

Ryan: Well, we are already implementing data lakes in our analytics practice. What we’re offering clients right now is a one-week Proof of Concept (PoC) for $7500 CAD in which Optimus will do the following:

  • Identify a business question needing a large dataset that cannot be answered with a client’s current BI architecture
  • Ingest into Azure Data Lake storage
  • Define 1 curated zone
  • Create a curated dataset using Spark, Azure Databricks or Azure Data Lake Analytics with R, Python or USQL
  • Create 1 Power BI dashboard with visuals that reflect the answer to the business question
  • Provide a Knowledge Transfer

Q. Speaking of Power BI, Optimus is a huge fan of this tool, correct?

Milan: That’s right. We love it because we can build out stuff quickly for our customers, especially when it comes to PoCs. For visualization of data, nothing beats Power BI, especially when it’s applied to data lakes. It can connect to Hadoop clusters, to large storage volumes – in fact, it can connect to just about anything, including APIs.


Q. What is the purpose of the “one-week PoC”? What will your customers get out of it?

Ryan: We are only doing one curated zone as part of our offer. A customer would have multiple business problems they would want to answer, of course but this one-week PoC gives them a taste of what is possible.  A large project would require a full analyze phase, architecture, build out, test and deploy methodology.  A platform would also need to be chosen to show the data.

Milan: Our customers can expect us to set up the basic structures on Azure for them and we’ll give them examples of business questions around which they want to build so they can see how to expand it to other areas, as well.

 

A data lake can bring enormous opportunity for powerful data analytics that can drive significant business results. How it is set up and used is the key to how successful a role it can play in your company’s use of data analysis. Optimus Information can help by showing you what a data lake can do with our one-week PoC offer. Take advantage of this offer here.


More Resources:

Think Big: How Design Plus Data Will Change Your Business

Is design thinking catching your attention? It should. Data insights not available before now can transform your business models and allow you to lead in your industry when you incorporate elements such as predictive, mobile dashboards and machine learning. This wave of change is forcing data architects to re-think and re-design how programs and applications must be built. To truly innovate, design teams need to push the design thinking envelope on almost every project.

“You can have data without information, but you cannot have information without data.”
– Daniel Keys Moran, computer programmer and science fiction writer.

Since the invention of the first computer, the world has been on a digital light-speed journey – one that has seen massive change in how we interact with our world and with each other. Today, there are more than 2.5 billion[i] smart phones carried in people’s pockets – each more powerful than the ones used to run the spacecraft that landed the first men on the Moon.[ii] In particular, how we interact with and gain insight from data has gone through an incredible transformation. We have evolved from relying on simple historical reporting – from the days of simple reporting to now, where tanker.

The Way It Was

Reporting has always been a critical element for a business to thrive and we have been accustomed to seeing our reports – our data – in fairly standard and historic terms. Let’s take a straightforward quarterly sales report at a consumer retail company, for example. Simple data, like units sold, prices received, cost of goods, volume of shipments and so forth, would be gathered and stored over a three-month period and then used to generate a few charts and graphs. Conclusions would be drawn from this static data and the company would shift strategy based on the conclusions.

Perhaps the conclusions were accurate and maybe they weren’t. Regardless, that’s how it’s been done for a long time: based on the data available.

The Way It Is

Today, the capability exists to break down data into far greater detail, do it in real-time and through disciplines like machine learning and artificial intelligence, draw highly focused and accurate conclusions not at the end of a business quarter but at the end of each day, and, in many cases, as it happens.

IoT Changes Shipping Industry – Reduces Risk and Cost

A client that operates a fleet of tankers equipped with IoT sensors wanted to move beyond its basic data reports and drill deeper into the technical data gathered aboard its vessels. Optimus utilized elements from Microsoft’s IoT Suite, including Azure Data Factory, to create visually appealing reports and dashboards that contained information gathered from thousands of sensors throughout the fleet.

The results meant a far more in-depth data analysis than the company had been getting, delivering more accurate insight for more accurate business decisions. When it comes to tankers, a simple mistake can cost millions in terms of lost time, environmental disasters, financial penalties, missed deadlines and more.

Optimus solved the client’s existing problem while building a platform for continuous improvement with data analysis using Microsoft Azure tools. Because the data can be aggregated in the cloud, the client can analyze greater amounts of data over an extended period of time, thus further enhancing their shipboard operational analysis and implementing global cost saving efforts as a result.

Now, a business can make highly informed decisions immediately and adjust accordingly. Of course, it’s not simply analyzing a few traditional data points, like sales; it’s analyzing where those sales took place, in which store locations, even in which aisles or departments, at what time of day, from which shelf the customer chose a purchase, what the customer’s likely income level is– in other words, the more highly specialized the data, the more highly specialized and precise the conclusions that can be drawn.

Because it’s possible to generate highly detailed data and analyze it from so many different perspectives, every sector of the economy is making use of data analysis.

In the manufacturing sector, factory operations are being revolutionized[iii] by both big data and analytics. Sensors generate endless streams of data on the health of production line equipment, data that’s being examined by the minute for the slightest indication of a potential problem or defect. Conclusions are drawn and actions implemented immediately to avoid any breakdown and disruption in the production process. There’s a positive ripple effect to this: customers don’t experience delays and the company doesn’t experience a loss of revenue.

The virtually unlimited storage capacity in the cloud, coupled to highly sophisticated computer algorithms that can perform serious analysis in, literally, seconds, is placing tremendous demands on data architects. Programs and applications must be agile enough to allow for updates, added features and improvements without delay. This has meant developing new architecture that can not only run a program at lightning speed but can be altered or updated in the areas where it needs improvement, much like making incremental improvements to a car model but without re-designing the whole car every time.

Gone are the days of a monolithic software structure where data warehouses needed a year or more to be designed and several more months for data to be inputted. If missing data was discovered, it would mean an entire rebuilding of the program.

Microservices and Teams

Today, Optimus Information designs architecture so that updates, changes or improvements can be made to one area of a program or application without having to open up the whole program. By using microservices in our software development, Optimus has created functional teams whose responsibility is to just one area of a program. A team focuses only on its specific area and generates improvements without impacting other teams or resulting in an overhaul of an entire software product. Tremendous amounts of time are saved for our clients and the cost of updates or re-designs is driven down dramatically.

Optimus applies the same method to data gathering. By means of advanced tooling, our clients can store raw data, without pre-aggregating it, run a query on that raw data and have the answers they need in a matter of seconds. Previously, it would take weeks to get a result because the data would have to be assessed and compartmentalized as it was gathered and placed into structured environments before a query could be run. This is what we call modern data warehousing. The focus is on agility and speed.

Down the Road from Microsoft by Design

Optimus specializes in working with IT departments of companies that don’t or can’t spend the time and money to develop the cloud-based software architecture needed today. Optimus uses a suite of leading edge services, on the Microsoft Azure platform, that allow us to select exactly the right components to solve a client’s problem. We are physically located close to Microsoft’s Vancouver and Redmond development centres

Optimus is a Microsoft Gold Partner and, in that role, we work very closely with Microsoft on new product previews and trials that are in development, giving feedback that improves our customer’s end product. Optimus employees have often already kicked the tires on new Azure features before they are released. This keeps us at the forefront of rapidly changing technology but let’s us give feedback as enhancements are designed.

If you want to enhance and sharpen the results of your data analysis, we invite you to contact us. We are happy to explore some “what-if” scenarios with you to help propel your data insights – and your business – forward exponentially. Reach out and schedule a virtual coffee anytime.

Game Changers: The Role of Big Data in the Future of Credit Unions

In 2002, Billy Beane was the manager of the Oakland Athletics in Major League Baseball. Oakland was a small market club with a similar sized budget and it struggled to be competitive.

Because Oakland didn’t have the money of big market teams like the New York Yankees or Los Angeles Dodgers, Beane knew he couldn’t hope to attract the high-priced talent – the superstars – to play in Oakland.

Enter Paul Depodesta, aged 27, an economics graduate from Harvard, with an analytical mind and a love of baseball. His arrival on the doorstep of the Oakland A’s gave birth to data analysis in professional sports.

He analyzed player stats, using computer algorithms, and his results allowed Oakland to sign inexpensive players that other teams dismissed. The A’s were propelled into the stratosphere of success, thanks to big data.

The A’s finished the 2002 season with 103 wins, the same number as the New York Yankees – but with a budget about a tenth the size.

This is the “secret sauce” in data analytics: the ability to take substantial amounts of information – in the case of Oakland, endless baseball player statistics – look for patterns and capitalize on what is found.

Credit Unions, Machine Learning and Data Analytics

Credit unions in Canada are rapidly embarking on the same exploration. Using machine learning and data analytics, these financial firms are finding ways to improve service to their clients while, at the same time, discovering nuggets of information from the vast amounts of data they collect, that can then be turned into business opportunities.

Virtually every customer transaction within a credit union is electronic, and the amounts of data being collected are staggering. The need to analyze this information is what drives credit unions today to embrace machine learning and data analytics.

Matthew Maguire is the Chief Data Officer at Co-Op Financial Services, a California-based company that operates an interlinked system of ATM machines throughout the U.S. and Canada. He argues that machine learning and data analysis are critical for mid-sized credit unions as they work to reinforce current customer relationships and build new ones.

“Data is coming in from different places and the challenge is… how do you make it all connect?[i]” he said.

Credit unions are moving quickly into data analysis. Through machine learning, which unearths customer transaction patterns by using algorithms, credit unions are learning a great deal about their customers and are designing strategies to capitalize on that in order to drive sales.

But, for credit unions, data enables other capabilities. Patterns of fraud can be easier to spot and shut down through data analysis.

When a client invests with a credit union, regulations require the client to complete what’s called a Know Your Client form, which essentially draws a profile of risk tolerance and investment objectives. If the client’s portfolio strays from that profile and becomes riskier, big data can alert the financial institution and the problem can be corrected before any monetary loss accrues to the client – or to hundreds of thousands of clients.

Chris Catliff is the president and CEO of Blueshore Financial, a B.C.-based credit union with more than $3 billion in assets. His vision of the future of credit unions is predicated on the power of data analytics in combination with machine learning.

He envisions the day very soon when a client approaching a branch receives a text message saying the client is already checked in at the branch. As they walk through the door, their customer profile and picture pop up on a screen [ii] at a concierge desk and they’re greeted by name.

Blueshore’s ATM machines will respond to a customer’s biometrics and offer a transaction based on a pattern of previous transactions. Up-sell opportunities will present themselves, so staff can suggest options – situations that might never occur without data analysis.

Service, he said, “has to be electronic transactions with the introduction of superior, human touch at various critical points. It’s high tech and high touch.”

Explore Your Data Potential

Like the members they serve, every credit union is unique. It is imperative for a credit union to work with data specialists who can marry the individual needs of each credit union with high levels of expertise across big data, data analysis and machine learning.

One of our strengths here at Optimus is our track-record in the areas of data gathering, analysis, machine learning, dashboarding and data visualization, through which we help our clients tailor data mining and analysis to their business goals.

At the end of the day, it’s all about staying competitive and, like the Oakland Athletics, reaching the pinnacle of success by embracing and employing new strategies to achieve that success.

 

[i] https://www.pymnts.com/big-data/2018/credit-unions-big-data-authentication-aml-kyc/
[ii] http://enterprise-magazine.com/features/betting-big-on-big-data/

 

4 Ways Azure is Rising to Meet Data Warehouse Demands

In today’s data-first world, IT infrastructure is the foundation for strategic decision-making, with companies requiring larger quantities in shorter periods of time. This is putting the traditional data model – where data from systems like CRM, ERP and LOB applications are extracted, transformed and loaded (ETL) into the data warehouse – under pressure. The problem is compounded by increased data volumes from social apps, connected devices (IoT) and emerging sources of data.

The need to gather data from traditional, transactional systems, like ERP, CRM and LOB, and then integrate this data with social, mobile and connected devices has driven the adoption of big data storage technologies such as Hadoop. At Optimus, we’re finding more and more users demand predictive, real-time analytics to make use of their data, something that can’t be done with traditional data warehouse tools. Consequently, organizations are considering cloud-based solutions such as Azure to transform their data warehouse infrastructure.

Microsoft knows this, and are growing their solution portfolio accordingly. Below are four ways in which Microsoft Azure is adapting to meet the demands of today’s modern data warehouse.

1. Consistently High-Performance for all Volumes of Data

Microsoft is working to solve the problem of achieving high levels of performance for large datasets through MPP technologies, in-memory columnstore and optimizations on core query engine. In particular, Optimus is seeing SQL Server emerge as a leader in performance and scalability. SQL Server supports a large number of cores with complex vector instructions while holding terabytes of memory and contains local flash storage that provides high I/O bandwidth. When optimized for inherent parallelism and concurrency, it is not uncommon for users to outperform large distributed databases.

In one example, Microsoft and Intel teamed up to create a 100 terabyte data warehouse using a single server, four Xeon E7 processors and SQL Server 2016. According to the report, “The system was able to load a complex schema derived from TPC-H at 1.6TB/hour, and it took just 5.3 seconds to run a complex query (the minimum cost supplier query) on the entire 100TB database.”

2. Storing Integrated Data

Companies are looking for ways to store integrated – both relational and non-relational – data of any size, type and speed without forcing changes to applications as data scales.

Enter the Azure Data Lake Store. Data Lake makes it simple for everyone, from analysts to developers and data scientists, to access, add and modify data, regardless of its state.

Facilitating all of this is Azure HDInsight, a cloud-based Hadoop and Spark cluster. HDInsight lets your team create analytic clusters, manipulating data into actionable insights. In addition to a fully managed Hadoop service, Microsoft has included PolyBase in HDInsight, which provides the ability to query relational and non-relational data in Hadoop with a single, T-SQL-based query model.

3. Built with Hybrid Data Storage at the Core

While the cloud continues to gain popularity, companies are realizing that they still need to keep at least some information on-premises. Microsoft is acutely aware of this and has built Azure accordingly. Their data warehousing and big data tools are designed to span on-premises and cloud warehouses. Microsoft’s hybrid deployment is designed to provide the control and performance of on-premises with the scalability and redundancy of the cloud. Optimus is seeing users access and integrate data seamlessly, while leveraging advanced analytics capabilities, all through Azure.

4. Machine Learning and Big Data in Real-Time

Traditional advanced analytics applications use outdated methods of transferring data from the warehouse into the application tier to procure intelligence, resulting in unacceptably high latency and little scalability.

In contrast, Microsoft has transformed integrated analytics with machine learning in the cloud. The Cortana Intelligence Suite, coupled with R Server, can be deployed both on-premises with SQL Server and in the cloud with HDInsight. The resultant solution is one that solves for hybrid, scales seamlessly and enables real-time analytics.

There are many factors driving companies to consider an Azure Cloud data warehouse migration. To learn more, check out our e-Book, Building a Modern Data Warehouse on Azure.

Does Your Data Warehouse Belong in the Azure Cloud? Here are Some Things to Consider

It’s no secret: Microsoft Azure is hot right now. This is demonstrated by their 97% growth in Q2 2017. With more organizations migrating their data infrastructure to the cloud every day, some companies are asking themselves: does my data warehouse belong in Azure? While there’s no simple answer to this question, there are some ways in which you can begin to assess your current data infrastructure’s suitability for an Azure Cloud migration.

The Cost Factor

The team at Optimus has found cost to be one of, if not the top driver for cloud adoption. There are several factors businesses should consider where cost in the cloud is concerned:

  • If your business is cyclical (i.e. retail with high volume throughout the holiday season), the cloud pay-as-you-go model makes strong financial sense. Cyclical companies can burst to the cloud when they need to, saving them from buying new servers that may only be required a few weeks per year. Conversely, it may not be cost effective to move workloads that are required to run at a stable level 24/7/365 to the cloud, especially if they are running on equipment that does not need upgrading in the foreseeable future.
  • At Optimus, we have found that many organizations prefer opex over capex. Opex tends to be easier to manage over the long term, especially for fast-growing businesses where a significant capex could stall growth. The more a business transitions to the Azure pay-as-you-go model, the more they shift their data warehouse costs from a capex to an opex.
  • The apportioning of data costs across departments is significantly simplified in Azure. Pricing for individual workloads is made transparent, and data usage is easily tracked.

When considering leveraging Azure for your data warehouse, it is important to remember that a cloud migration is not an all-or-nothing endeavour. Every business will have certain workloads that make financial sense in the cloud and certain workloads that should remain on-premises. Perform an accurate assessment of your current data infrastructure to determine your cloud suitability.

What are Your Data Governance Requirements?

Meeting data governance and regulatory requirements is at the forefront of the mind of anyone considering an Azure migration, and for good reason. Moving an on-premises legacy data infrastructure to the cloud is a difficult landscape to navigate.

Your industry may determine your suitability for an Azure Cloud data warehouse migration. Certain sectors, such as financial and healthcare, have strict data governance laws to comply with. You need to make sure your – and your client’s – data remains within certain jurisdictions, something that may prove challenging and will influence your choice of what data to move to Azure.

Do you need to retain control over user authentication? If yes, you’ll need to look at the feasibility of this with various applications. Your service provider will be able to assess this with you and make the right recommendations.

Latency: Still a Consideration?

The short answer is yes. In particular instances where the speed of data transaction is mission-critical, an internal data warehouse may be required. This is common in the financial industry, where trading companies are under increasing pressure to host their servers physically close to a stock exchange’s computers. In an industry where transactions are conducted in microseconds, speed is priority number one.

While Azure has made significant improvements to latency times, the fact remains that the closer two computers are to each other, the faster they can communicate. At Optimus, we have seen companies with these types of operational concerns benefit from leaving some of their data on-premises. However, because the amount of data required to perform at a high level is typically small, leveraging the public cloud is still a viable option for most organizations.

There are many factors to keep in mind when considering a data warehouse migration to Azure. To learn more, check out our e-Book, Building a Modern Data Warehouse on Azure.

Infographic – The Modern Data Warehouse Framework

Check our latest infographic, The Modern Data Warehouse Framework!

As organizations are collecting and processing increasing amounts of data from a growing number data sources, data systems must evolve to keep up. In order to make the best data-driven decision possible, you need to reimagine the way you look at data warehousing.

We took a look at how to transition your data warehouse to the cloud and put together our top 8 recommendations for building a modern data warehouse on Azure.

 

Download the PDF here

 

The-Modern-Data-Warehouse-Framework Infographic - The Modern Data Warehouse Framework

Infographic – The Modern Data Warehouse Framework

 

Contact us to learn more.

Power BI for Mobile: Take Your Data on the Road

One area where the Power BI software stack really shines is the mobile space. The Power BI product line includes three apps: one for iOS, Windows Phone and Android. These apps allow you to take anything you can generate in Power BI and make it readily available to any stakeholder with a mobile phone or tablet. With a couple swipes, users can quickly interact with all your analysis. Power BI allows you to bring together the advantages of mobile devices, big data systems and compelling visualizations in a way that permits everyone involved to make better decisions.

The Power of the Dashboard

It’s one thing to produce an informative chart, but it’s quite another to deploy a fully interactive dashboard that can fetch real-time updates. Power BI permits you to tie together data from a variety of sources, including numerous non-Microsoft products. For the end user, the guy in marketing who just needs to see today’s report, the component that makes it all accessible is the dashboard.

Power BI dashboards allow you to publish any type of common infographic, geospatial information or visualization. If you need a bubble chart that displays the YTD performance of your company’s retail outlets, there’s an out-of-the-box solution for that with Power BI. It also allows you to create maps and overlay existing information onto those maps. Instead of just seeing that Store #325 is performing well, an app user can pull up the dashboard and see on the map whether that’s a one-off phenomenon or a regional trend.

Making Information Accessible

In the world of data analytics, a lot of work goes into empowering decision makers who may not have strong technical background. It’s extremely beneficial to give those people an app that allows them to quickly sort through the available data in a clear format. If your buyers can quickly bounce between multiple years’ worth of data and make comparisons, they can make important decisions faster.

Power BI also allows you to determine how the dashboard accesses the available information. Rather than simply presenting users a static set of reports, you can configure queries that allow them to sift through in a self-guided fashion. If someone needs access to a real-time inventory report, your dashboard can be configured to fetch that information from the company’s SQL Server installations. This allows members of your organization who might not be data scientists to rapidly develop insights that can guide their choices. 

Cross-Platform Compatibility

Keeping everyone in your business on the same page can be a challenge. Microsoft has gone to great lengths to ensure that the Power BI apps display information faithfully on every platform and function in a similar fashion. Our hypothetical data scientists in our example will have no trouble grabbing an art department iPhone and finding everything they need.

Data Sources

Any data source that can be access inside Office or Power BI can be presented within the app’s dashboard. If you need to present data from an Excel sheet in an appealing manner to someone on the other side of the planet, the app can make that happen. It also allows you to connect to commonly used data sources, such as SQL Server Reports, and outside sources, such as Google Analytics, Salesforce or MailChimp. You can even mix and match functionality, for example, overlaying Salesforce data on Google Maps.

Conclusion

Business intelligence is about putting the right information in the rights hands and in a format that makes a visually compelling case. Your company will likely invest a lot of effort in the coming years into producing analysis and generating insights. With Power BI’s mobile app, you can ensure that the people who need those insights have access to them with the touch of a finger. The app allows you to pass along analysis to stakeholders in a secure environment that makes interacting with the data easy. In short, it makes all your data analytics faster, more appealing and more accessible.

If you have questions about getting started with Power BI or want to push the toolset further, give us a call. We’re always happy to answer any questions.