Postgresql small index for row contains query

Question

I want to store several hundreds of gigabytes of geodata in a postgresql database. I want to query the data on position, time and/or a unique identifier for each object.

My table layout is similar to this:

CREATE TABLE objects(
id int not null,
at timestamp not null,
pos geometry(Point) not null
/* Other columns irrelevant to the question */
) PARTITION BY RANGE (at)

BRIN indexes on at and pos perfectly serve the needs, since they speed up the queries with relatively small indexes.

A B-TREE index on id quickly becomes several gigabytes, so the server memory cannot hold many of them. BRIN is not suitable, as the id's are very spread within a page, so minimum and maximum is not a useful statistic.

Is there an alternate index type which can speed up queries like SELECT ... FROM objects WHERE id = x with smaller index sizes?

The table is append-only, so only for the newest timestamps new rows will be added.

You could CLUSTER on id, and use BRIN, especially if you're not insert/update heavy — Evan Carroll
– Evan Carroll, Commented Aug 8, 2018 at 10:04
@EvanCarrol good suggestion, although that will make the pos or at brin indexes less effective — dtech
– dtech, Commented Aug 8, 2018 at 10:27
Where is id coming from? Who assigns them, and where does the querent obtain them to incorporate into the query at issue? Are you free to reorganize them? It seems strange that at and pos are naturally correlated so as to be mutually useful for BRIN. One would naively expect at and id to be in that situation. — jjanes
– jjanes, Commented Aug 8, 2018 at 21:17
@jjanes id is an unique identifier for the physical object, think car license plate. It's a number but non-sequental so for this use case more or less random. If you think about it in terms of car/license-plates is easy to see why (license_plate, time) and pos are correlated, if 2 records are for the same license plate 1 second apart, the pos of the two records will also be very close together. — dtech
– dtech, Commented Aug 9, 2018 at 11:57
Thanks for the explanation. Now I can see how time and pos can be highly correlated conditioned on the same id, but I don't see how such conditioned correlation gets you compatibility for effective BRIN indexes. I think you would need unconditional correlation for that. — jjanes
– jjanes, Commented Aug 9, 2018 at 12:10

jjanes · Accepted Answer · 2018-08-09 12:28:15Z

Since seems like each value of id has many occurences in the table. Given that, you can use a GIN index with its compressed posting lists to save space. Usually GIN indexes are used with array types or their equivalents, but you can use the btree_gin extension to get access to GIN indexes with a scalar type.

However, GIN indexes have several limitations. They don't support ORDER BY, which doesn't seem to matter to you. And they don't support index-only-scans, which may be important to you. Even if the index shrinks to fit in RAM, each tuple pointer found in the index still has to go to the table for resolution. If your Btree index doesn't fit in RAM, surely your table doesn't either. So you will still have lots of random seeking into the table.

Indeed, that was almost certainly the problem in the first place--not random seeks against the btree index, but rather against the table. When you are using the BRIN index to drive table access, you naturally are going to visit the table with high locality of reference. It is not that the BRIN index is effective, it is that the same condition that allows the BRIN to be used also makes the table access for the particular set of rows being returned also be efficient. If you were to replace the BRIN indexes with equivalent Btree indexes, you would probably find that while the storage requirement went up, the performance stays mostly the same.

Stack Exchange Network

Postgresql small index for row contains query

1 Answer 1

Hot Network Questions

Postgresql small index for row contains query

1 Answer 1

Related

Hot Network Questions