Revisions to SQL Performance: SELECT DISTINCT versus GROUP BY

added 1 characters in body

Source Link

edited Dec 19, 2012 at 17:01

68k
9
122
176

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

While we're discussing this, I think it's important to note tatthat the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ITEMS.ITEM_ID,
       ITEMS.ITEM_CODE,
       ITEMS.ITEMTYPE,
       ITEM_TRANSACTIONS.STATUS,
       (SELECT COUNT(PKID) 
          FROM ITEM_PARENTS 
         WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returnsshouldn't return duplicates).

Note also that 4 tables are not used in your original select.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

While we're discussing this, I think it's important to note tat the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ITEMS.ITEM_ID,
       ITEMS.ITEM_CODE,
       ITEMS.ITEMTYPE,
       ITEM_TRANSACTIONS.STATUS,
       (SELECT COUNT(PKID) 
          FROM ITEM_PARENTS 
         WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

While we're discussing this, I think it's important to note that the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ITEMS.ITEM_ID,
       ITEMS.ITEM_CODE,
       ITEMS.ITEMTYPE,
       ITEM_TRANSACTIONS.STATUS,
       (SELECT COUNT(PKID) 
          FROM ITEM_PARENTS 
         WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because a good query shouldn't return duplicates).

Note also that 4 tables are not used in your original select.

added 240 characters in body

Source Link

edited Dec 19, 2012 at 16:56

Vincent Malgrat

68k
9
122
176

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

AlsoWhile we're discussing this, I think it's important to note tat the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ITEMS.ITEM_ID,
       ITEMS.ITEM_CODE,
       ITEMS.ITEMTYPE,
       ITEM_TRANSACTIONS.STATUS,
       (SELECT COUNT(PKID) 
          FROM ITEM_PARENTS 
         WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

Also I think the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ...
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

While we're discussing this, I think it's important to note tat the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ITEMS.ITEM_ID,
       ITEMS.ITEM_CODE,
       ITEMS.ITEMTYPE,
       ITEM_TRANSACTIONS.STATUS,
       (SELECT COUNT(PKID) 
          FROM ITEM_PARENTS 
         WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

added 905 characters in body

Source Link

edited Dec 19, 2012 at 16:50

Vincent Malgrat

68k
9
122
176

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

Also I think the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ...
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.

The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.

So if your query returns 1M rows and gets aggregated to 1k rows:

The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.

The tkprof explain plan would help demonstrate this hypothesis.

Also I think the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".

IMO your query would have a better plan and would be more easily readable if written like this:

SELECT ...
  FROM ITEMS
  JOIN ITEM_TRANSACTIONS 
    ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID 
   AND ITEM_TRANSACTIONS.FLAG = 1
 WHERE EXISTS (SELECT NULL
                 FROM JOB_INVENTORY   
                 JOIN TASK_INVENTORY_STEP 
                   ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
                WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
                  AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)

In many cases, a DISTINCT can be a sign that the query is not written properly (because it is extermely rare that a good query returns duplicates).

Note also that 4 tables are not used in your original select.

Source Link

answered Dec 19, 2012 at 16:38

Vincent Malgrat

68k
9
122
176

Loading

Collectives™ on Stack Overflow

Return to Answer

Post Timeline