reset password
Author Message
se1k1h1mawar1
Posts: 121
Posted 09:25 Oct 20, 2015 |

I have a few things that I would like somebody's insights on about SQL queries.

1)  If I understand it correctly, there are some queries that can be done either by 'distinct' or 'group by'. For instance,

select distinct department_id from courses;

would return same result as

select department_id from courses group by department_id;

The above use of 'group by' probably does not fully utilize the power of 'group by' (i.e. use of it with aggregates) or simply it is a wrong use, but the results the database returns seem to be the same.

My question is, is 'distinct' generally less expensive compared to 'group by', or is it totally case dependent and these two are just not comparable because of their functional differences?

 

2)  which one of the three shown below is the least expensive? My guess would be 'intersect' or 'in()'. Cross join is most likely the most expensive since it looks at all of the Cartesian cross products, 'intersect' and 'in()' might be a tie because they both only search through a specific group/list of records for matches.

    -- 23. List the names that appear in both the students table and the faculty
    --     table.


            select name from students
            intersect
            select name from faculty;

            select name from students where name in (select name from faculty);

            select s.name from students s, faculty f where s.name = f.name;

        
3)  Could I have done an inner join as shown immediately below on the queries like the one at the bottom(#26), or would I have gotten an undesired result with different sets of entries? I assume inner join would have been fine because inner joins (my understanding is that inner joins are the same thing as equi-joins, and the way it works can be visualized as the intersection representation of the Venn Diagram...) only returns complete match(s) unlike full-outer joins, left joins, or right joins. We are only concerned about complete match cases for this case. Did I miss some important factors to take into consideration?

select count(*) from departments d inner join courses c    on d.id = c.department_id where d.name = 'Computer Science';
 

        -- 26. Find the number of courses offered by the Computer Science Department.

        select count(*) from departments d, courses c
            where d.name = 'Computer Science' and d.id = c.department_id;

Any insights will be greatly appreciated. Thank you,
 

Last edited by se1k1h1mawar1 at 09:30 Oct 20, 2015.
cysun
Posts: 2935
Posted 10:32 Oct 20, 2015 |

1. GROUP BY without aggregation is conceptually wrong because aggregation is needed to reduce a "group" to one record. You may not get an error from some DBMS - most likely because they apply a default "aggregation" by picking the first or last record in a group, but you shouldn't do it.

With that said, in this particular case I'd guess DISTINCT is more efficient because it can discard duplicates immediately while GROUP BY needs to produce all the results then discard duplicates group by group.

2. Although equi-joins look like cross joins with additional conditions, DBMS query planners are smart enough to recognize them and implement them as inner joins (otherwise it'd be disastrous to equi-join two large tables). Because joins are so common in queries, join operation is highly optimized in DBMS so queries with joins are usually more efficient than the alternatives.

In this example, I'd guess the join one to be the most efficient, followed by the subquery, then the INTERSECT. The reason I think the subquery is more efficient than the INTERSECT is because they both need to run two queries, but the subquery only needs to keep the intermediate result set of one.

These are just educated guesses. The bottomline is that there are usually many ways to execute a query, and which way to choose is decided by the query planner and optimizer, which are different from one DBMS to another. It's possible that a DBMS decides to execute all three queries in the same way, in which case so they'd have exactly the same performance.

Most DBMS provide a way to display a query plan which can give you a much better idea of how a query will be executed. In PostgreSQL you can do EXPLAIN <query> to see the query plan of a query.

In the 2nd part of the course we'll discuss query execution and hopefully by then some terms and concepts here will become more clear.

3. Yes, an inner join would do exactly the same as the equi-join.

se1k1h1mawar1
Posts: 121
Posted 12:29 Oct 20, 2015 |

I see! Thank you for such a thorough reply. It seems much more clearer now. Also, thank you for pointing out my mistake on identifying the equi-join!   Sincerely,

Last edited by se1k1h1mawar1 at 12:58 Oct 20, 2015.
AndersonChristoph
Posts: 34
Posted 11:31 Oct 21, 2015 |

Questions & answers this well-defined should be part of a CS422 FAQ page or something.