GROUP BY vs DISTINCT in SQL Server

Yesterday, I was travelling in the metro where I overheard two college students puzzling over the difference between GROUP BY and DISTINCT.

GROUP BY and DISTINCT cause a lot of confusion and I couldn’t help myself so I jumped into their conversation and explained the differences.

Their problem was understanding where to use GROUP BY and where to use DISTINCT when both return same results in most scenarios when no aggregates are used, and which one had better performance.

GROUP BY and DISTINCT both generate the same query execution plan when no aggregates are used, so there is no appreciable difference in performance in this situation.

GROUP BY becomes important when aggregates are used. DISTINCT is just for filtering out duplicate records from query result sets.

Ideally, when aggregates are not used in a query, then one should use DISTINCT to enhance readability. But when aggregates are used GROUP BY performs much better.

Here are some examples:

DISTINCT

[code lang=”sql”]SELECT DISTINCT Name, RollNo
FROM Student[/code]

GROUP BY

[code lang=”sql”]SELECT Name, RollNo
FROM Student
GROUP BY Name, RollNo[/code]

GROUP BY with Aggregate Function

[code lang=”sql”]SELECT Name, RollNo, Sum(Marks) TotalMarks
FROM Student
GROUP BY Name, RollNo[/code]


0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *