4

I'm using PostgreSQL on Windows for Planet OSM database and have noticed considerable decrease in performance when upgrading from v10 to 11 or 12. Here are the details of the experiment I conducted trying to figure out what is causing the issue.

Installed PostgreSQL 10 from scratch. Created a database and a table.

CREATE TABLE ways (
    id bigint NOT NULL,
    version int NOT NULL,
    user_id int NOT NULL,
    tstamp timestamp without time zone NOT NULL,
    changeset_id bigint NOT NULL,
    tags hstore,
    nodes bigint[]
);

Imported ways data from a file and added a primary key.

SET synchronous_commit TO OFF;
COPY ways FROM 'E:\ways.txt';
ALTER TABLE ONLY ways ADD CONSTRAINT pk_ways PRIMARY KEY (id);

The file is 365GB in size.

The copy operation took 3.5h and the resulting table size is 253GB. The primary key operation took 20 minutes and occupied 13GB of disk space.

Then I uninstalled PostgreSQL v10, deleted the data directory and installed v11 from scratch. Created the same kind of database and table. v11 is not able to handle large files, so the I piped the data through the cmd type command, and then added the primary key with the same command as above. synchronous_commit turned off beforehand as above.

COPY ways FROM PROGRAM 'cmd /c "type E:\ways.txt"';

The copy operation took 7 hours and adding primary key took 1h 40m ! The resulting table and pk sizes are the same as in v10. Also very high load on disk drive (quite often at 100%) was observed.

v12 performs the same as v11.

Here are the changes in v11 default postgresql.conf file compared to v10 one. Differences in Authentication, Replication and Logging sections are skipped.

-#replacement_sort_tuples = 150000
+#max_parallel_maintenance_workers = 2
+#parallel_leader_participation = on
~max_wal_size = 1GB     (in v10 is commented out)
~min_wal_size = 80MB    (in v10 is commented out)
+#enable_parallel_append = on
+#enable_partitionwise_join = off
+#enable_partitionwise_aggregate = off
+#enable_parallel_hash = on
+#enable_partition_pruning = on
+#jit_above_cost = 100000
+#jit_inline_above_cost = 500000
+#jit_optimize_above_cost = 500000
+#jit = off
+#jit_provider = 'llvmjit'
+#vacuum_cleanup_index_scale_factor = 0.1

So what is trapping the performance?

Update

I have managed to split the 365GB file into 2GB chunks with the help of split unix utility in mingw shell like so

split -C 2G ways.txt

Then I imported the files into a clean database with the help of the following cmd command

for /f %f in ('dir /b') do psql -U postgres -w -d osm -t -c "set client_encoding TO 'UTF8'; copy ways from 'D:\ways\%f';"

The operation took ~3.5 hour which is the same as v10!

Prior to that I set parallel_leader_participation = on and synchronous_commit = off in the config file and restarted the server.

Then I logged into the psql interactive terminal and ran

ALTER TABLE ONLY ways ADD CONSTRAINT pk_ways PRIMARY KEY (id);

It took 1h 10m which is 30m faster than with the default settings (after the type command if it really matters) but still 3 times slower than in v10.

5
  • Did you run SET synchronous_commit TO OFF; on the 11 or 12 installation as well? But the redirection through type is most probably the killer here. Did you try this on a Linux box, because there this "bug" with the 2GB limit does not exist (assuming you are talking about this) In general it is recommended to run Postgres on Linux anyway (for production use) as it has better performance than on Windows Commented Nov 28, 2019 at 15:52
  • Another option might be pgloader which can use multiple threads for parallel loading of large files - but I don't know how easy it is to get this working on Windows. Commented Nov 28, 2019 at 15:59
  • Yes, synchronous_commit is turned off in all cases prior to copy. I don't think type is the killer here because adding primary key shows the same downgraded performance as well. Commented Nov 28, 2019 at 17:43
  • I should admit that type does impact the copy operation but the considerable slow down in the indexing remains under question. Commented Jan 25, 2020 at 17:26
  • Somehow synchronous_commit = on has 0 effect on Windows 10, but a dramatic effect on Windows Server 2019 (x7 slowdown). Commented May 10, 2021 at 19:20

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.