Yesterday, I was travelling in the metro where I overheard two college students puzzling over the difference between GROUP BY and DISTINCT.
GROUP BY and DISTINCT cause a lot of confusion and I couldn’t help myself so I jumped into their conversation and explained the differences.
Their problem was understanding where to use GROUP BY and where to use DISTINCT when both return same results in most scenarios when no aggregates are used, and which one had better performance.
GROUP BY and DISTINCT both generate the same query execution plan when no aggregates are used, so there is no appreciable difference in performance in this situation.
GROUP BY becomes important when aggregates are used. DISTINCT is just for filtering out duplicate records from query result sets.
Ideally, when aggregates are not used in a query, then one should use DISTINCT to enhance readability. But when aggregates are used GROUP BY performs much better.
Here are some examples:
SELECT DISTINCT Name, RollNo FROM Student
SELECT Name, RollNo FROM Student GROUP BY Name, RollNo
GROUP BY with Aggregate Function
SELECT Name, RollNo, Sum(Marks) TotalMarks FROM Student GROUP BY Name, RollNo