Postgres - group rows by user, return one row per user in each group

Question

I have a purchases table:

-----------------
user_id | amount
-----------------
1       | 12                     
1       | 4                     
1       | 8     
2       | 23                    
2       | 45                    
2       | 7

I want a query that will return one row per user_id, but the row that I want for each user_id is where the amount is the smallest per user_id. So I should get as my result set:

-----------------
user_id | amount
-----------------                   
1       | 4                                        
2       | 7

Using DISTINCT on the user_id column ensures I don't get duplicate user's, but I don't know how to make it so that returns the user row with the fewest amount.

Gordon Linoff · Accepted Answer · 2021-04-03 01:27:59Z

5

You can use distinct on:

select distinct on (user) t.*
from t
order by user, amount;

Note: If you just want the smallest amount, then group by would be the typical solution:

select user, min(amount)
from t
group by user;

Distinct on is a convenient Postgres extension that makes it easy to get one row per group -- and it often performs better than other methods.

answered Apr 3, 2021 at 1:27

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Paul Maxwell · Accepted Answer · 2021-04-03 01:49:57Z

If your requirement requires ouput of a row that equates to the smallest amount, e.g. the table includes a transaction date and you need this in the output, then a convenient method is to use row_number() over() to select the wanted rows. e.g.

CREATE TABLE mytable(
   user_id  INTEGER  NOT NULL
  ,amount   INTEGER  NOT NULL
  ,trandate DATE   NOT NULL
);
INSERT INTO mytable(user_id,amount,trandate) VALUES (1,12,'2020-09-12');
INSERT INTO mytable(user_id,amount,trandate) VALUES (1,4,'2020-10-02');
INSERT INTO mytable(user_id,amount,trandate) VALUES (1,8,'2020-11-12');
INSERT INTO mytable(user_id,amount,trandate) VALUES (2,23,'2020-12-02');
INSERT INTO mytable(user_id,amount,trandate) VALUES (2,45,'2021-01-12');
INSERT INTO mytable(user_id,amount,trandate) VALUES (2,7,'2021-02-02');

select
user_id, amount, trandate
from (
    select user_id, amount, trandate
        , row_number() over(partition by user_id order by amount) as rn
    from mytable
    ) t
where rn = 1

result:

+---------+--------+------------+
| user_id | amount |  trandate  |
+---------+--------+------------+
|       1 |      4 | 2020-10-02 |
|       2 |      7 | 2021-02-02 |
+---------+--------+------------+

demonstartion of this at db<>fiddle here

Why would you not use select user_id, min(amount) from purchases group by user_id. Isn't that more performant? Honest question
@hrmnjt Using min(amount) doesn't automatically align to its transaction date, but using row_number returns the row so you can keep the minimum amount and its transaction date together (or any data on that row) - so its not a question of performance, but of data integrity.

Collectives™ on Stack Overflow

Postgres - group rows by user, return one row per user in each group

2 Answers 2

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Related