Using array_agg to make array out of N to 1 columns

Question

I have a database where rows in table A could have N >= 0 associated rows in table B. I am trying to construct a query that returns all of the columns of A with an array of all of the associated rows in table B. It looks like this should be possible according to the documentation. It looks like I want case 4 aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ] but when I try to use array_agg(*) it results in an error:

No function matches the given name and argument types. You might need to add explicit type casts.

What am I missing here? I have seen the string concatenation approach but if I could just select everything without having to specify columns that would be easier I think.

SQL:

CREATE TABLE containers (
  id bigint PRIMARY KEY
);

CREATE TABLE elements (
  id bigint PRIMARY KEY,
  container_id bigint REFERENCES containers(id)
);

insert into containers VALUES (1), (2), (3);
insert into elements VALUES (1, 1), (2, 1), (3, 2);

SELECT *
FROM containers
JOIN (
  SELECT elements.id, array_agg(*)
  FROM elements
  GROUP BY elements.id
) AS elements ON elements.container_id=containers.id

db_fiddle: https://dbfiddle.uk/C8XgWVLW

Questions need to be stand alone... please also put the contents of your fiddle directly into the question. — Dale K, Commented Mar 3 at 19:59
@DaleK The question has been updated to contain the SQL for the example. — Cogito Ergo Sum, Commented Mar 3 at 20:11
What would be your expected results? Would you want arrays of [1, 1] [2, 1] etc? What if the columns of elements were different types, such as strings or dates? — Charlieface, Commented Mar 3 at 21:04

JNevill · Accepted Answer · 2025-03-03 20:25:14Z

You don't share your desired output for this sample data, so I'm sharing two options.

One, using array_agg which will give a less descriptive output as the column names are not preserved, merely the column ordinals:

SELECT *
FROM containers
JOIN (
  SELECT container_id, array_agg(elements)
  FROM elements
  GROUP BY container_id
) AS elements ON elements.container_id=containers.id

id	container_id	array_agg
1	1	{"(1,1)","(2,1)"}
2	2	{"(3,2)"}

https://dbfiddle.uk/8XJ4FlaE

And another using json_agg() which serializes the data with the column names:

SELECT *
FROM containers
JOIN (
  SELECT elements.container_id, json_agg(elements)
  FROM elements
  GROUP BY elements.container_id
) AS elements ON elements.container_id=containers.id

id	container_id	json_agg
1	1	[{"id":1,"container_id":1}, {"id":2,"container_id":1}]
2	2	[{"id":3,"container_id":2}]

https://dbfiddle.uk/h0HuyzB5

You'll notice two key changes from your attempt:

We are selecting and grouping on container_id in your subquery since that is your join criteria and desired level of granularity.
We supply the table name (or alias) to the aggregation function so that all columns are included in the array aggregation.

Lastly, both of these can be rewritten without the subquery:

SELECT c.id, json_agg(e)
FROM containers c
INNER JOIN elements e
    ON c.id = e.container_id
GROUP BY c.id

This looks like what I needed. What concept am I missing to know to use the table name in a situation like that? I know it works but I don't really know why. I will also accept this as the answer once the timer allows me to. — Cogito Ergo Sum, Commented Mar 3 at 20:52
To put in procedural terms (as opposed to set-based/declarative terms) You can kind of think of a table as multi-dimensional array and a SQL select statement as row-by-row operation. So instead of specifying the column, which would be a scalar value, you are specifying a table, which would be a single-dimensional array. Because you are aggregating with GROUP BY then the output ends up with a single row containing a multi-dimensional output. — JNevill, Commented Mar 3 at 21:52

Guillaume Outters · Accepted Answer · 2025-03-03 20:30:08Z

1

You'll aggregate complete records, not a list of fields of the record: that is, pass the table name instead of *.

Additionally, you seem to want to group by container_id.

All in all, the query you'll like may be:

SELECT *
FROM containers
JOIN (
  SELECT container_id, array_agg(elements) elements
  FROM elements
  GROUP BY container_id
) AS elements ON elements.container_id=containers.id;

(see it in your fiddle, adapted)

answered Mar 3 at 20:30

Guillaume Outters

2,8381 gold badge18 silver badges21 bronze badges

Add a comment |

Zegarek · Accepted Answer · 2025-03-04 10:13:31Z

SELECT id, array_agg((e.*))
FROM containers
JOIN elements AS e(eid,id)
USING(id)
GROUP BY id;

id	array_agg
1	{"(1,1)","(2,1)"}
2	{"(3,2)"}

_{demo at db<>fiddle}

Your array_agg(*) concept isn't incorrect but that's not the right, or complete syntax for it.
An asterisk * as an aggregate function argument doesn't mean the same thing as it does in a select list: I take that you expected it to be interpreted as all columns, the way it works when used directly in the select list. Meanwhile, as an aggregate function argument, it means the opposite: no columns.

arg_data_type
An input data type on which this aggregate function operates. To create a zero-argument aggregate function, write * in place of the list of argument specifications. (An example of such an aggregate is count(*).)

It's also mentioned in the doc that you linked:

The fourth form invokes the aggregate once for each input row; since no particular input value is specified, it is generally only useful for the count(*) aggregate function.

There's no zero-argument version of the array_agg() function defined in a standard PostgreSQL build, so an array_agg(*) won't work. Note how the error message you got says the function it couldn't find is a function with no arguments (empty parentheses instead of an asterisk or a list of types corresponding to all fields):

ERROR:  function array_agg() does not exist
LINE 4:   SELECT elements.id, array_agg(*)
                              ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

There is only a single-element version, and you can by all means pass whole rows into it, as long as you parenthesise and qualify the * expression, like in the example up top.

Aliasing the elements fields to match join column names enables JOIN..USING and stops it from listing them twice.

There's a slight advantage of array_agg((e.*)) over already mentioned array_agg(e): the former lets you mix things in. If you don't add anything, it works exactly the same:

SELECT id
     , array_agg(e)
     , array_agg((e))
     , array_agg((e.*))
FROM containers
JOIN elements AS e(eid,id)
USING(id)
GROUP BY id;

id	array_agg	array_agg	array_agg
1	{"(1,1)","(2,1)"}	{"(1,1)","(2,1)"}	{"(1,1)","(2,1)"}
2	{"(3,2)"}	{"(3,2)"}	{"(3,2)"}

If you do add something, the tablename-as-select-expression syntax will nest the record:

SELECT id
     , array_agg((e.* , 'x' , id*e.eid))
     , array_agg((e   , 'x' , id*e.eid))
FROM containers
JOIN elements AS e(eid,id)
USING(id)
GROUP BY id;

id	array_agg	array_agg
1	{"(1,1,x,1)","(2,1,x,2)"}	{"("(1,1)",x,1)","("(2,1)",x,2)"}
2	{"(3,2,x,6)"}	{"("(3,2)",x,6)"}

Keep in mind that if you do mix something in, the resulting type will no longer be a regular array type elements[] but a pseudo-type record[] and those cannot be used in column definitions (e.g. create table..as.. and create view..as.. will refuse to accept a query with a pseudo-type column).

Collectives™ on Stack Overflow

Using array_agg to make array out of N to 1 columns

3 Answers 3

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Linked

Related