Currently, I'm storing a large amount of time-series data into the following PostgreSQL table:
create table view (
user_id uuid,
product_id integer,
banner_id integer,
campaign_id integer,
price integer,
created_at timestamp
);
I get around 80 million entries on this table per day and then aggregate daily into another table. I've decided to reduce my aggregation window into a smaller tumbling window, of 1 minute, aligning itself to the top of the hour. This was done to save on storage space, since this table is getting bigger and bigger.
This means that I will aggregate every 1 minute and store the results of the calculation into another table. I am, however, afraid that my constant deletions (every 1 minute) will lock the table and prevent insertions from happening. Here's the deletion query:
delete from view where created_at between '2023-01-01 13:58:00' and '2023-01-01 13:59:00'
Which I run after my aggregation one, which looks a bit like this:
select * from view where created_at between '2023-01-01 13:58:00' and '2023-01-01 13:59:00'
I have around ~900 inserts per second on the table. The tumbling window allows me to not worry about race conditions or missing any data, but I'm concerned the deletes might affect the table.
Therefore, my question is:
- Is it recommended to delete daily or every hour instead of every minute?
- Would creating an unlogged table help in this scenario?