Author | Message |
---|---|
se1k1h1mawar1
Posts: 121
|
Posted 09:25 Oct 20, 2015 |
I have a few things that I would like somebody's insights on about SQL queries. 1) If I understand it correctly, there are some queries that can be done either by 'distinct' or 'group by'. For instance,
would return same result as
The above use of 'group by' probably does not fully utilize the power of 'group by' (i.e. use of it with aggregates) or simply it is a wrong use, but the results the database returns seem to be the same. My question is, is 'distinct' generally less expensive compared to 'group by', or is it totally case dependent and these two are just not comparable because of their functional differences?
2) which one of the three shown below is the least expensive? My guess would be 'intersect' or 'in()'. Cross join is most likely the most expensive since it looks at all of the Cartesian cross products, 'intersect' and 'in()' might be a tie because they both only search through a specific group/list of records for matches.
Any insights will be greatly appreciated. Thank you, Last edited by se1k1h1mawar1 at
09:30 Oct 20, 2015.
|
cysun
Posts: 2935
|
Posted 10:32 Oct 20, 2015 |
1. GROUP BY without aggregation is conceptually wrong because aggregation is needed to reduce a "group" to one record. You may not get an error from some DBMS - most likely because they apply a default "aggregation" by picking the first or last record in a group, but you shouldn't do it. With that said, in this particular case I'd guess DISTINCT is more efficient because it can discard duplicates immediately while GROUP BY needs to produce all the results then discard duplicates group by group. 2. Although equi-joins look like cross joins with additional conditions, DBMS query planners are smart enough to recognize them and implement them as inner joins (otherwise it'd be disastrous to equi-join two large tables). Because joins are so common in queries, join operation is highly optimized in DBMS so queries with joins are usually more efficient than the alternatives. In this example, I'd guess the join one to be the most efficient, followed by the subquery, then the INTERSECT. The reason I think the subquery is more efficient than the INTERSECT is because they both need to run two queries, but the subquery only needs to keep the intermediate result set of one. These are just educated guesses. The bottomline is that there are usually many ways to execute a query, and which way to choose is decided by the query planner and optimizer, which are different from one DBMS to another. It's possible that a DBMS decides to execute all three queries in the same way, in which case so they'd have exactly the same performance. Most DBMS provide a way to display a query plan which can give you a much better idea of how a query will be executed. In PostgreSQL you can do EXPLAIN <query> to see the query plan of a query. In the 2nd part of the course we'll discuss query execution and hopefully by then some terms and concepts here will become more clear. 3. Yes, an inner join would do exactly the same as the equi-join. |
se1k1h1mawar1
Posts: 121
|
Posted 12:29 Oct 20, 2015 |
I see! Thank you for such a thorough reply. It seems much more clearer now. Also, thank you for pointing out my mistake on identifying the equi-join! Sincerely, Last edited by se1k1h1mawar1 at
12:58 Oct 20, 2015.
|
AndersonChristoph
Posts: 34
|
Posted 11:31 Oct 21, 2015 |
Questions & answers this well-defined should be part of a CS422 FAQ page or something. |