Posts

GROUP BY vs DISTINCT in SQL Server

Yesterday, I was travelling in the metro where I overheard two college students puzzling over the difference between GROUP BY and DISTINCT.

GROUP BY and DISTINCT cause a lot of confusion and I couldn’t help myself so I jumped into their conversation and explained the differences.

Their problem was understanding where to use GROUP BY and where to use DISTINCT when both return same results in most scenarios when no aggregates are used, and which one had better performance.

GROUP BY and DISTINCT both generate the same query execution plan when no aggregates are used, so there is no appreciable difference in performance in this situation.

GROUP BY becomes important when aggregates are used. DISTINCT is just for filtering out duplicate records from query result sets.

Ideally, when aggregates are not used in a query, then one should use DISTINCT to enhance readability. But when aggregates are used GROUP BY performs much better.

Here are some examples:

DISTINCT

SELECT DISTINCT Name, RollNo
 FROM Student

GROUP BY

SELECT Name, RollNo
 FROM Student
 GROUP BY Name, RollNo

GROUP BY with Aggregate Function

SELECT Name, RollNo, Sum(Marks) TotalMarks
 FROM Student
 GROUP BY Name, RollNo

The post GROUP BY vs DISTINCT in SQL Server appeared first on OptimusBI.

Include more than 1000 Values in Oracle IN Clause

In Oracle we can’t include more than 1000 values in the “IN” clause.

To get around this limitation, you divide the string of comma delimited values to be included into two or more parts using SUBSTR or through any similar other function and then these new strings should be placed in the IN clause of different SQL statements that are later combined using the UNION operator.

Sample Code

First create two hidden parameter that include the initial first and second half of the selected values of the parameter. For this write a custom code can be used.

Sample code to break the parameter into two hidden parameter of half the original length.

Public Function Group1(ByVal parameter as Parameter) as String
 Dim s as String
 s=String.Empty
 If parameter.IsMultiValue then
 For i as integer = 0 to parameter.Count/2
 s = s "" CStr(parameter.Value(i)) + ","
 Next

Else
 s = CStr(parameter.Value(0))

End If
 Return s
 End Function

Public Function Group2(ByVal parameter as Parameter) as String
 Dim s as String
 s=String.Empty
 If parameter.IsMultiValue then
 For i as integer = (parameter.Count/2) to parameter.Count-1
 s = s "" CStr(parameter.Value(i)) + ","
 Next

Else
 s = ""
 End If
 Return s
 End Function

Sample Query

New we use two SQL queries including the two hidden parameters joined by the UNION operator

SELECT * FROM TABLE_1 WHERE VALUE_1 IN (:prmHiddenInspectorFirstGroup1)
UNION
SELECT * FROM TABLE_1 WHERE VALUE_1 IN (:prmHiddenInspectorFirstGroup2)

Obviously, this example breaks when you get to 2000 values, but it is presented here as a sample approach that can be easily modified to suit more advanced situations.

The post Include more than 1000 Values in Oracle IN Clause appeared first on OptimusBI.

Accessing MySQL from Java

How to access MySQL from Java including creating a database, setting the CLASSPATH, writing a Java database class and then checking the results.

Create MySQL Database

First we need to create a database.

C:\WDev\Research\Programming\Java\Database>mysql -u root -p
 Enter password: *****
 ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: Y
 ES)

C:\WDev\Research\Programming\Java\Database>mysql -u root -p
 Enter password: *****
 Welcome to the MySQL monitor. Commands end with ; or \g.
 Your MySQL connection id is 26
 Server version: 5.1.59-community MySQL Community Server (GPL)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
 affiliates. Other names may be trademarks of their respective
 owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE School;
 Query OK, 1 row affected (0.00 sec)

mysql> quit

Setting CLASSPATH in Environment Variables on Your Windows System

Now, download the MYSQL JDBC Connector and set the class path.

Java Database Class

Create your Java database class.

import java.sql.*;

public class Database
 {
 public static void main (String[] args)
 {
 Connection conn = null;
 String userName = "root";
 String password = "mysql";
 try
 {

String url = "jdbc:mysql://localhost/School";
 Class.forName ("com.mysql.jdbc.Driver").newInstance ();
 conn = DriverManager.getConnection (url, userName, password);
 System.out.println ("Database connection established");

}
 catch (Exception e)
 {
 System.err.println ("Error message: " + e.getMessage ());
 }

try
 {
 Statement s = conn.createStatement ();
 int count;
 s.executeUpdate ("DROP TABLE IF EXISTS student");
 s.executeUpdate (
 "CREATE TABLE student ("
 + "id INT UNSIGNED NOT NULL AUTO_INCREMENT,"
 + "PRIMARY KEY (id),"
 + "name CHAR(40), phone CHAR(40))");

count = s.executeUpdate (
 "INSERT INTO student (name, phone)"
 + " VALUES"
 + "('John Rhodes', '123'),"
 + "('Bob Smith', '456'),"
 + "('Larry Jones', '789'),"
 + "('Sandra Bullocks', '4444')");
 s.close ();
 System.out.println (count + " rows were inserted");
 }
 catch (Exception e)
 {
 System.err.println ("Error message: " + e.getMessage ());
 }

try
 {
 Statement s = conn.createStatement ();
 s.executeQuery ("SELECT id, name, phone FROM student");
 ResultSet rs = s.getResultSet ();
 int count = 0;
 while (rs.next ())
 {
 int idVal = rs.getInt ("id");
 String nameVal = rs.getString ("name");
 if (rs.wasNull ())
 nameVal = "(no name available)";

String catVal = rs.getString ("phone");
 System.out.println (
 "id = " + idVal
 + ", name = " + nameVal
 + ", phone = " + catVal);
 ++count;
 }
 rs.close ();
 s.close ();
 System.out.println ("count = " + count + " rows were retrieved");
 }
 catch (Exception e)
 {
 System.err.println ("Error message: " + e.getMessage ());

}

// prepared statement
 try
 {
 PreparedStatement s;
 String nameVal = "Peter";
 String phoneVal = "555";
 s = conn.prepareStatement (
 "INSERT INTO student (name, phone) VALUES(?,?)");
 s.setString (1, nameVal);
 s.setString (2, phoneVal);
 int count = s.executeUpdate ();
 s.close ();
 System.out.println (count + " rows were inserted");
 }
 catch (Exception e)
 {
 System.err.println ("Error message: " + e.getMessage ());

}

finally
 {
 if (conn != null)
 {
 try
 {
 conn.close ();
 System.out.println ("Database connection terminated");
 }
 catch (Exception e) { /* ignore close errors */ }
 }
 }
 }
 }

Executing the Code

And then watch the code in action.

 C:\WDev\Research\Programming\Java\Database>javac Database.java

C:\WDev\Research\Programming\Java\Database>java Database
 Database connection established
 4 rows were inserted
 id = 1, name = John Rhodes, phone = 123
 id = 2, name = Bob Smith, phone = 456
 id = 3, name = Larry Jones, phone = 789
 id = 4, name = Sandra Bullocks, phone = 4444
 count = 4 rows were retrieved
 1 rows were inserted
 Database connection terminated

C:\WDev\Research\Programming\Java\Database>

The post Accessing MySQL from Java appeared first on OptimusBI.

Avoiding the SQL “Division by Zero Encountered” Error

When writing queries that perform lots of calculations, it is very easy to forget to ensure that our queries are set up to avoid the “Division by zero encountered” error.

Luckily, there are three easy workarounds to avoid getting this ugly error in our SELECT statements:

  • CASE statement
  • NULLIF/ISNULL functions
  • SET ARITHABORT OFF and SET ANSI_WARNINGS OFF

Using the CASE statement, your query will look like the following:

SELECT CASE WHEN [Denominator] = 0 THEN 0 ELSE [Numerator] / [Denominator] END AS [Percentage]
 FROM [Table1]

If the denominator or divisor is 0 in the above query, the result becomes 0. Otherwise, the division operation is performed.

Using the NULLIF and ISNULL functions, your query will look like the following:

SELECT ISNULL ([Numerator] / NULLIF ([Denominator], 0), 0) AS [Percentage]
 FROM [Table1]

What this does is change the denominator into NULL if it is zero. Then in the division, any number divided by NULL results into NULL. So if the denominator is 0, then the result of the division will be NULL. Then to return a value of 0 instead of a NULL value, the ISNULL function is used.

Lastly, using the SET ARITHABORT and SET ANSI_WARNINGS, your query will look like the following:

SET ARITHABORT OFF
 SET ANSI_WARNINGS OFF

SELECT [Numerator] / [Denominator]

With both ARITHABORT and ANSI_WARNINGS set to OFF, SQL Server will return a NULL value in a calculation involving a divide-by-zero error. To return a 0 value instead of a NULL value, you can put the division operation inside an ISNULL function:

SET ARITHABORT OFF
 SET ANSI_WARNINGS OFF

SELECT ISNULL ([Numerator] / [Denominator], 0)

The post Avoiding the SQL “Division by Zero Encountered” Error appeared first on OptimusBI.

Working with Clustered Index in MS SQL Server

Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order.

Creating a Table in Clustered Index

To better explain how SQL Server creates clustered indexes, lets create a new table by executing the following command.

Create Table Employee
 (
 EmpId Int,
 EmpName varchar(20)
 )

When you first create a new table, there is no index created by default, and a heap is created.

Now, Adding a few records in this table:-

Insert Into Employee Values (4, Replicate (‘a’,20))
 Insert Into Employee Values (6, Replicate (‘b’,20))

After inserting the data in the Employee table , we can view the contents of the table by executing the following command.

Select EmpID From Employee

The output should be as follows:

Empid
4
6

Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. A clustered index can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.

Some points to remember when using clustered indexes:

  • The reordering of the index occurs every time the index changes (ie: on Update, Insert, Delete).
  • They affect the physical order of data so there can only one clustered index.
  • They reorder the way records in the table are physically stored.

When you create a clustered index, try to create it as a UNIQUE clustered index, not a non-unique clustered index. The reason for this is that while SQL Server will allow you to create a non-unique clustered index, SQL Server will make it unique for you by adding a 4-byte “uniqueifer” to the index key to guarantee uniqueness. This increases the size of the key and reduces performance.

The post Working with Clustered Index in MS SQL Server appeared first on OptimusBI.

SQL Query Optimization at the Stored Procedure Level

You can realize some big performance gains optimizing your SQL queries at the stored procedure level. Proper handling of local variables and floating point numbers as well as query and database settings all present opportunities for query optimization.

Here are some examples illustrating common SQL query optimization opportunities.

Local Variables at Stored Procedure Level

Using local variable at the stored procedure level we are able to enable or disable parameter sniffing as needed. SQL Server’s execution environment sniffs at the parameter values in an SQL statement call that triggered a compilation or recompilation and passes them to the query optimizer to check for opportunities to speed up the query execution plans.

If we see a bell curve distribution in the number of rows retrieved for the filtered column, we can use local variables to disable parameter sniffing.

Here is a stored procedure with parameter sniffing enabled:

@questionid as int
as
set nocount on
begin
select * from Answers where Question_Id=@questionid
end

The same stored procedure can be written this way to avoid parameter sniffing:

ALTER procedure [dbo].[GetAnswers_local]
@questionid as int
as
begin
declare @questionid_local as int
set @questionid_local =@questionid
select * from Answers where Question_Id=@questionid_local
end

For more information on parameter sniffing, check out this post.

Using Table Variables

Using table variable we can store intermediate result sets and perform various SQL operations like various types of joins on these table variables at the stored procedure level.

Floating Point at Application Level

If we are seeing performance issues, we can move floating point operations out of the stored procedure and handle them at the application level .

SELECT a.shampoobrands_id as SHAMPOOBRANDID,

SUM(Value* p.Weight) as ProportionAware,
                SUM(SQUARE(Value)*(p.Weight)) as    CorrectedSumOfSquares
FROM Answers(nolock) a, ShampooBrands(nolock) b, Panelists(nolock) p
                WHERE a.ShampooBrands_Id = b.Id and a.Panelist_Id=p.Id
                AND b.ShampooSegmentSet_Id IN (SELECT id FROM ShampooSegmentSets(nolock) where ShampooSegment_Id = 1)
                AND Question_Id = 2
                AND Panelist_Id IN
                (
                SELECT id FROM Panelists
                )
                group by a.ShampooBrands_Id

This sample query, which has been sanitized, came from an actual client project.

At the time of the import into a SQL table it was multiplied by 10^9 and converted into a numeric value to avoid floating point handling.

After operating at the stored procedure level, numeric data was sent to the application and then at application level the result was again divided by 10^9 to restore its floating points. If Weight is taken as a floating point number, SUM as ProportionAware Multiplication of Value and Weight will be an expensive operation since this particular operation takes place across all the rows in the result set.

Nolock

We can also use NOLOCK with table names to ignore lock on table by any transaction.

SELECT ShampooBrands.Id,ShampooBrands.Name FROM ShampooBrands

Using NOLOCK politely asks SQL Server to ignore locks and read directly from the tables. This means you completely circumvent the lock system, which is a major performance and scalability improvement.

However, you also completely circumvent the lock system, which means your code is living dangerously. You might read the not-necessarily-valid uncommitted modifications of a running transaction. This is a calculated risk.

Nocount

Finally, we can SET NOCOUNT ON at the stored procedure level. This stops the message that shows the count of the number of rows affected by a Transact-SQL statement or stored procedure from being returned as part of the result set.

When SET NOCOUNT is ON, the count is not returned. When SET NOCOUNT is OFF, the count is returned.

The post SQL Query Optimization at the Stored Procedure Level appeared first on OptimusBI.

MySQL to Microsoft SQL Server Migration

Need to migrate data from MySQL into Microsoft SQL Server 2005 and later? Here’s how to do it with ODBC.

Setup ODBC Connection to MySQL Database

I’m going to assume that you already have MySQL and Microsoft SQL Server set up which makes the first step getting the MySQL ODBC Connector.

For this article, I downloaded the MySQL ODBC Connector 5.1.

The setup of this connector is pretty simple:

  1. Open your ODBC Data Source Administrator from the Control Panel -> Administrative Tools. Under the tab labeled as System DSN, press the Add button. odbc-data-source-administrator MySQL to Microsoft SQL Server Migration
  2. On the Create New Data Source dialog that appears, choose MySQL ODBC 5.1 Driver and then press the Finish button.create-new-data-source MySQL to Microsoft SQL Server Migration
  3. After that, a MySQL connection configuration dialog will appear. Add your MySQL database account information in it, preferably the “root” account which has full access to your databases in MySQL. In this case, my database is called “tigerdb”. Do not change the port to anything other than 3306, unless during your MySQL server installation, you have defined something else.mysql-connector-odbc MySQL to Microsoft SQL Server Migration
  4. Press the Test button to ensure your connection settings are set properly and then the OK button when you’re done.

Create a Microsoft SQL Link to your MySQL Database

In this state, you are ready to establish a link towards MySQL database from your Microsoft SQL Server Management Studio. Open a query window and run the following SQL statement:

EXEC master.dbo.sp_addlinkedserver
@server = N'MYSQL',
@srvproduct=N'MySQL',
@provider=N'MSDASQL',
@provstr=N'DRIVER={MySQL ODBC 5.1 Driver}; SERVER=localhost; _
	DATABASE=tigerdb; USER=user; PASSWORD=password; OPTION=3'

This script will produce a link to your MySQL database through the ODBC connection you just created in the previous stage. The link will appear in the Microsoft SQL Server Management Studio like this:

mssql-object-explorer MySQL to Microsoft SQL Server Migration

If it doesn’t show up in the treeview, press the refresh button.

Import Data between the Databases

Create a new database in Microsoft SQL Server. I called mine “testMySQL”. In the query window, run the following SQL statement to import table “users” from the MySQL database tigerdb, into the newly created database in Microsoft SQL called testMySQL.

SELECT * INTO testMySQL.dbo.users
FROM openquery(MYSQL, 'SELECT * FROM tigerdb.users')

That’s it!

The post MySQL to Microsoft SQL Server Migration appeared first on OptimusBI.

Outsourcing Business Intelligence: Hybrid Onshore-Offshore Delivery

I have been involved in many outsourced business intelligence projects for our customers with the full range of delivery models from onsite, offsite, offshore and hybrid. While there are still good reasons to choose each of these models, I find the hybrid model tends to work best.

At Optimus, the different deliver models typically work as follows:

  • Onsite: 100% work done at the client’s location
  • Offsite: almost 100% work done at our local office
  • Offshore: 95% work done at our offshore delivery center
  • Hybrid: 30% work done locally and 70% work is done in our offshore delivery center

The final quality of the work is the same for each delivery model.

However, I have noticed that the hybrid approach brings a significant benefit to our customers.

The main benefits are as follows:

  1. Lower Cost
  2. Quick Delivery Cycles
  3. Partner that takes greater ownership of projects and client’s risk
  4. Economies of scale, expertise, and best practices
  5. Increased flexibility

How Hybrid Onshore-Offshore Works

outsourced-bi-hybrid-delivery Outsourcing Business Intelligence: Hybrid Onshore-Offshore Delivery

We apply agile development methods to business intelligence to deliver on hybrid onshore-offshore projects.

Our local team starts with evaluation and planning of the technology, tools, architecture with your key team members and other stakeholders. A list of the project items is created based on the high-level requirements. These requirements are then fleshed out in the project backlog. The items in the backlog are then arranged in sprints based on priority and timelines.

The Optimus local team then performs interviews with the client teams and end users to document detailed requirements.

With detailed requirements in place, the offshore team starts the work on the design and development. The testing team is involved from the early cycle to receive quick feedback and plan the next delivery cycles. We perform daily scrum meeting to go over feedback and review sessions with the onsite team, offshore team and client team(if needed).

Once the delivery of the sprint is tested by offshore team, it is handed over to the onsite team to perform a walkthrough with the client team. The client team performs the final testing and the delivery item is deployed to the production environment.

This process allows client-facing elements of the projects to be performed during the business hours (local time zone) with the client. Elements that don’t require client interaction, like development and testing, are performed overnight at our facility in Noida, India.

In some cases, you can even give feedback at the end of the day and see it implemented the next morning.

Learn more about our report development services.

The post Outsourcing Business Intelligence: Hybrid Onshore-Offshore Delivery appeared first on OptimusBI.

Gartner Says BI is the Focus of Companies: Some Strategy Tips

Gartner research predicts that business intelligence and analytics need to scale up to support explosive data sources.

Many companies are developing strategies to take complete advantages of their business intelligence initiatives and data trying to extract the maximum value at the minimum risk.

The main values that these companies are trying to extract are as follows:

  • Improved decision making throughout the organizations
  • Quick decision making
  • Scalable solutions
  • Low operational cost
  • Less development time of the solution
  • Easy to maintain
  • Easy to use

In order to have a successful business intelligence strategy, organizations should have to encourage processes, people and technology to work in an integrated mode. The factors that need to be addressed in BI strategy and planning include the following:

We have helped our customers with business intelligence initiatives. We would be happy to share our experience and answer any questions you may have. Please feel free to contact us.

The post Gartner Says BI is the Focus of Companies: Some Strategy Tips appeared first on OptimusBI.

5 Steps to Better Self-Serve BI Reporting

Companies seeking true value from their data, focus on the users of the reports. This is even more important these days as companies move to self-serve BI. I’ve spoken with a number of organizations that think they could be getting more from their self-serve BI.

If you are embarking on a self-serve BI project, or you are concerned that you could be getting more from self-serve BI, then these are important user-centered points to keep in mind.

  1. Understand the needs of self-serve BI end-users.
  2. Ensure the data is correct.
  3. Train end-users to use the tools.
  4. Prioritise your reports.
  5. Give detailed requirements.
  6. Validate your reports against usage scenarios.

A brief summary on each of the above points is as follows:

  1. Understand the needs of self-serve BI end-users

    Before starting any self-serve BI initiative, you should understand the needs of end-users. Senior management, business unit heads and any other stakeholders should be involved in determining the needs.

    Early involvement will help eliminate redundant reporting. This also provides an opportunity to map out the business units who can be satisfied with single report like Employee Performance Reports that can be shared with the Department Managers and HR.

  2. Ensure the data is correct

    Big companies accumulate data sources and data can very quickly get out of control.

    Data sources often include internal applications, external feeds and various software as a service applications. Having a single version of correct data is a must.

    This can be achieved by developing a data warehouse or a central database. Having correct data in one place is a key first step to giving your organization the tools to make good decisions.

  3. Train end-users to use the tools

    Training should be provided to the end users on the self-serve BI tool. This will help them understand the capabilities of the reports. Training is critical as end users may end up creating ineffective reports.

  4. Prioritise your reports

    Management should sit down together and decide which reports need to be made and order the reports by priority. The reports will provide maximum value as the highest priority reports get developed and deployed first.

    This will help prevent end users from getting overwhelmed by a large number of new reports all at once giving them time adapt to the tool.

  5. Give detailed requirements

    Detailed requirements are necessary in the report development process. This step requires effort from both the business analyst and the end user. Having detailed white boarding session(s) will reduce significant amount of time in the iterations after the report is developed.

  6. Validate your reports against usage scenarios

    Test cases from the end users help significantly in the successful development of the report. End users should provide test cases comprising of all possible scenarios. This also helps in combining business logic with the application logic and result in correct query writing.

We have worked on many self-serve BI reporting projects. Contact us and we’ll be happy to share our experience with you and address any questions you may have.

The post 5 Steps to Better Self-Serve BI Reporting appeared first on OptimusBI.