302

Can someone please explain what the partition by keyword does and give a simple example of it in action, as well as why one would want to use it? I have a SQL query written by someone else and I'm trying to figure out what it does.

An example of partition by:

SELECT empno, deptno, COUNT(*) 
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp

The examples I've seen online seem a bit too in-depth.

1

7 Answers 7

302

The PARTITION BY clause sets the range of records that will be used for each "GROUP" within the OVER clause.

In your example SQL, DEPT_COUNT will return the number of employees within that department for every employee record. (It is as if you're de-nomalising the emp table; you still return every record in the emp table.)

emp_no  dept_no  DEPT_COUNT
1       10       3
2       10       3
3       10       3 <- three because there are three "dept_no = 10" records
4       20       2
5       20       2 <- two because there are two "dept_no = 20" records

If there was another column (e.g., state) then you could count how many departments in that State.

It is like getting the results of a GROUP BY (SUM, AVG, etc.) without the aggregating the result set (i.e. removing matching records).

It is useful when you use the LAST OVER or MIN OVER functions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary without a sub select, which is much faster.

Read the linked AskTom article for further details.

3
  • 6
    LAST_VALUE - returns last salary, MAX returns highest salary Commented Dec 29, 2012 at 21:55
  • 2
    Do you mean "without a sub select, which is much slower"? I guess I'm confused if the sub select is slower or faster than last over and min over. I would imagine a sub select would be slower, but the english grammar in the answer doesn't suggest that.
    – Jason
    Commented Apr 22, 2019 at 1:29
  • This approach reduces the number of times the rows get processed, making it more efficient than a subselect. Most noticeable in very large data sets.
    – Guy
    Commented Jun 12, 2019 at 20:21
228

The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:

  1. Boss says "get me number of items we have in stock grouped by brand"

You say: "no problem"

SELECT 
      BRAND
      ,COUNT(ITEM_ID) 
FROM 
      ITEMS
GROUP BY 
      BRAND;

Result:

+--------------+---------------+
|  Brand       |   Count       | 
+--------------+---------------+
| H&M          |     50        |
+--------------+---------------+
| Hugo Boss    |     100       |
+--------------+---------------+
| No brand     |     22        |
+--------------+---------------+
  1. The boss says "Now get me a list of all items, with their brand AND number of items that the respective brand has"

You may try:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) 
 FROM 
      ITEMS
 GROUP BY 
      BRAND;

But you get:

ORA-00979: not a GROUP BY expression 

This is where the OVER (PARTITION BY BRAND) comes in:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) OVER (PARTITION BY BRAND) 
 FROM 
      ITEMS;

Which means:

  • COUNT(ITEM_ID) - get the number of items
  • OVER - Over the set of rows
  • (PARTITION BY BRAND) - that have the same brand

And the result is:

+--------------+---------------+----------+
|  Items       |  Brand        | Count()  |
+--------------+---------------+----------+
|  Item 1      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 2      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 3      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 4      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 5      |  H&M          |   50     | 
+--------------+---------------+----------+

etc...

6
  • 3
    If I want to get one result for each group ..How will I get it ?
    – Viuu -a
    Commented Dec 11, 2017 at 12:38
  • Do you know if OVER PARTITION BY can be used in a WHERE clause? Commented Oct 24, 2018 at 21:19
  • I suggest you ask a question on SO, give specifics and explain what you want to achieve
    – Andrejs
    Commented Oct 25, 2018 at 8:43
  • @Viuu-a: Then you probably will want to use a simple GROUP BY. Commented Mar 11, 2020 at 12:26
  • Good example. However, if you would have a lower count you did not need the "etc..." at the end, since you normally had to list hundreds of items in this result. I guess many will overlook that all rows in this table are listed. Commented Jul 23, 2024 at 8:41
30

It is the SQL extension called analytics. The "over" in the select statement tells oracle that the function is a analytical function, not a group by function. The advantage to using analytics is that you can collect sums, counts, and a lot more with just one pass through of the data instead of looping through the data with sub selects or worse, PL/SQL.

It does look confusing at first but this will be second nature quickly. No one explains it better then Tom Kyte. So the link above is great.

Of course, reading the documentation is a must.

9
EMPNO     DEPTNO DEPT_COUNT

 7839         10          4
 5555         10          4
 7934         10          4
 7782         10          4 --- 4 records in table for dept 10
 7902         20          4
 7566         20          4
 7876         20          4
 7369         20          4 --- 4 records in table for dept 20
 7900         30          6
 7844         30          6
 7654         30          6
 7521         30          6
 7499         30          6
 7698         30          6 --- 6 records in table for dept 30

Here we are getting count for respective deptno. As for deptno 10 we have 4 records in table emp similar results for deptno 20 and 30 also.

1
  • 13
    No expalnation to the question of how PARTITION by works. Just the example output alone does not fully answer the question. Commented Apr 2, 2013 at 14:03
3

the over partition keyword is as if we are partitioning the data by client_id creation a subset of each client id

select client_id, operation_date,
       row_number() count(*) over (partition by client_id order by client_id ) as operationctrbyclient
from client_operations e
order by e.client_id;

this query will return the number of operations done by the client_id

1

I think, this example suggests a small nuance on how the partitioning works and how group by works. My example is from Oracle 12, if my example happens to be a compiling bug.

I tried :

SELECT t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t
group by t.data_key  ---- This does not compile as the compiler feels that t.state isn't in the group by and doesn't recognize the aggregation I'm looking for

This however works as expected :

SELECT distinct t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t;

Producing the number of elements in each state based on the external key "data_key". So, if, data_key = 'APPLE' had 3 rows with state 'A', 2 rows with state 'B', a row with state 'C', the corresponding row for 'APPLE' would be 'APPLE', 3, 2, 1, 6.

2
  • Your query isn't right. First line should be 'SELECT t.data_key, sum(case...) count_a_rows, ... from mytable t group by t.data_key. Your second one works, because you are using analytic functions instead of the group by aggregate functions. Basically your first try was a little of both which wouldn't make sense. Commented Mar 27, 2023 at 20:25
  • Thanks, that's helpful clarification.
    – georgejo
    Commented Jan 31, 2024 at 16:32
1

You can think of the analytic functions like adding a derived query and joining.

SELECT e.empno, e.deptno, A.DEPT_COUNT
FROM emp e
INNER JOIN (
    SELECT deptno, COUNT(*) as DEPT_COUNT
    FROM emp
    GROUP BY deptno
) A
ON e.deptno = A.deptno

I used to have these all over my code until windowing functions (Analytic functions) were supported in SQLServer.

Notice that it is 2 queries, and one join, as opposed to a select subquery which would execute the subquery for every row in the emp table. Depending on how many rows are returned, one is more efficient than the other. For large rowsets, the "Joined Derived Analytics" method is faster.

If "OVER PARTITION BY" uses the same mechanism, the same applies.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.