Showing posts with label mysql. Show all posts
Showing posts with label mysql. Show all posts

Tuesday, June 16, 2026

The insert benchmark on a small server, IO-bound workload : Postgres 19 beta1

This has results for Postgres versions 19 beta1, 18.4 and 17.10 with the Insert Benchmark on a small server using a cached and CPU-bound workload. I also used MySQL 8.4.8 to see where performance was different.

Postgres continues to be boring in a good way. It is hard to find performance regressions.

 tl;dr

  • create index (the l.x step) is faster in Postgres 19beta1. A Postgres expert told me that the sort algorithm was changed to be more CPU efficient
  • the write heavy steps (l.i1, l.i2) are 15% and 9% faster in 19 beta1 vs Postgres 17.10
  • the second write heavy step (l.i2) is more than 20X faster in MySQL 8.4.8 vs Postgres thanks to the CPU overhead from get_actual_variable_range. I have written about this before.

Builds, configuration and hardware

I compiled Postgres from source using -O2 -fno-omit-frame-pointer for versions 19 beta1, 18.4 and 17.10.

I compiled MySQL 8.4.8 from source as well.

The server is an Beelink SER7 with a Ryzen 7 7840HS CPU with 8 cores and AMD SMT disabled, 32G of RAM. Storage is one SSD for the OS and an NVMe SSD for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04.

For 17.10 the config file is named conf.diff.cx10a_c8r32 (cx10a) and is here.

For Postgres 18 and 19 the config file is conf.diff.cx10b_c8r32 (cx10b) which is as similar as possible to the config for version 17.

For MySQL 8.4.8 the config file is my.cnf.cz12a_c8r32.

The Benchmark

The benchmark is explained here and is run with 1 client.

The point query (qp100, qp500, qp1000) and range query (qr100, qr500, qr1000) steps are run for 3600 seconds each.

The benchmark steps are:

  • l.i0
    • insert 800M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 4M rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions) and 1M rows are inserted and deleted per table.
    • Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
  • qr100
    • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance summary with charts is here.

This table lists relative QPS per benchmark step and relative QPS is:
    (QPS for my version / QPS for Postgres 17.10)

The background in the table cells is blue for big improvements and yellow for regressions. There are no regressions here. 

The improvements here for Postgres 19 beta1 are similar to what I reported for the cached workload.

The index create (l.x) step is much faster in 19.10. I usually ignore results on this step but I am curious if something was done in 19.10 to improve index create. A Postgres expert told me that the sort algorithm for index create was changed in version 19 to be more CPU efficient.

For the write-heavy steps (l.i1, l.i2):
  • there are large improvements in 19 beta1 (15% and 9%). The CPU overhead is lower in 19 beta1 compared to 17.10 (see cpupq here).
  • throughput for the l.i2 step is more than 20X larger for MySQL than for Postgres. From vmstat I see that the CPU overhead (cpupq here) is more than 10X larger with Postgres vs MySQL. From flamegraphs the problem is the CPU overhead in get_actual_variable_range. I have written about this before (see here). The Postgres query planner uses too much CPU skipping old versions to figure out selectivity for a query and there are too many old versions because Postgres doesn't collect them ASAP, vacuum takes time. The flamegraphs are in subdirectories here.
For the range query steps (qr100, qr500, qr1000) throughput is ~3% less in 19 beta1 vs 17.10 and ~1% less in 18.4 vs 17.10. For 19 beta1 there is a small increase in CPU overhead (see cpupq here, here and here). I already have flamegraphs for MySQL 8.4.8 and Postgres 19 beta1, soon I will have them for Postgres 17.10 and 18.4 to try and explain this.

dbmsl.i0l.xl.i1l.i2qr100qp100qr500qp500qr1000qp1000
PG 17.101.001.001.001.001.001.001.001.001.001.00
PG 18.41.011.031.001.000.981.000.990.990.990.99
PG 19 beta11.011.151.051.090.971.010.961.010.971.00
MySQL 8.4.80.770.890.7621.620.611.070.660.930.850.84

Thursday, June 11, 2026

Write-heavy sysbench tests, a large server, modern Postgres and MySQL

This has results for modern Postgres and MySQL using write-heavy tests from sysbench and a large server. I think there are regressions in Postgres that arrive in some of versions 16, 17, 18 and 19 beta1 but I am far from certain and this blog post is just another step in my journey to figure that out.

tl;dr

  • Postgres suffers a lot from throughput variation while MySQL+InnoDB does not
  • InnoDB gets much better average throughput on 6 of 10 tests, similar throughput one one and then Postgres does better on 3 of 10 tests
  • For tests from which I provided vmstat and iostat results, Postgres does more write IO per operation. In some cases InnoDB uses more CPU, in other cases it does not.

Builds, configuration and hardware

I compiled:
  • Postgres from source for versions 15.17, 16.13, 17.9 and 18.3.
  • MySQL from source for version 8.4.7
I used a 48-core server from Hetzner
  • an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
  • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
  • 128G RAM
  • Ubuntu 24.04
Configuration files for Postgres:
  • the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 15, 16 and 17.
  • for Postgres 18 I used conf.diff.cx10b_c32r128 (x10b_c32r128) which is as close as possible to the Postgres 17 config and uses io_method=sync
Benchmark

I used sysbench and my usage is explained here. Normally I run 32 of the 42 microbenchmarks listed in that blog post using tables small enough to be cached by the DBMS. Most test only one type of SQL statement.

The tests can be called microbenchmarks. They are very synthetic. But microbenchmarks also make it easy to understand which types of SQL statements have great or lousy performance. Performance testing benefits from a variety of workloads -- both more and less synthetic.

But I did things differently here:
  • I only run the write-heavy tests (to save time)
  • The tables are larger than memory and cannot be cached
  • Each test (microbenchmark) is run for 2 hours when I normally run each for 15 minutes
  • After each test a vacuum is done
The purpose is to search for regressions from new CPU overhead and mutex contention related to MVCC GC (vacuum for Postgres, purge for InnoDB).

Results

I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for Postgres 15.17)
When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. When the relative QPS is 1.2 then some version is about 20% faster than base version.

The per-test results from vmstat and iostat can help to explain why something is faster or slower because it shows how much HW is used per request, including CPU overhead per operation (cpu/o) and context switches per operation (cs/o) which are often a proxy for mutex contention.

Results: writes

The table below has relative QPS for Postgres 16 to 19 and then InnoDB all relative to the throughput for Postgres 15.17. Columns 1 to 4 have results for Postgres and the numbers in yellow highlight the tests where there is a regression in Postgres. For column 5 (MySQL with InnoDB) the numbers in yellow and red indicate tests where InnoDB's throughput is less than Postgres. And then the numbers in green indicate tests where InnoDB's throughput is much larger than Postgres.

Note that when relative QPS (rQPS) is 0.90 then throughput dropped by ~10%.

Summary:
  • throughput for Postgres drops after version 15.17. I don't know yet whether this is a regression.
  • throughput for InnoDB is much better than Postgres in 6 of 10 tests, similar in one test, and much worse in 3 of 10 tests.
The sections that follow this one have more detail on results from the update-index, update-zipf tests and insert tests.

Relative to: Postgres 15.17
col-1 : Postgres 16.13
col-2 : Postgres 17.9
col-3 : Postgres 18.3
col-4 : Postgres 19 beta1
col-5 : MySQL 8.4.7

col-1   col-2   col-3   col-4   col-5
0.94    0.97    0.98    1.02    1.88    update-inlist
0.94    0.90    0.88    0.92    1.43    update-index
0.91    0.86    0.87    0.92    1.19    update-nonindex
0.96    0.99    0.98    0.98    0.71    update-one
0.92    0.83    0.81    0.85    0.93    update-zipf
0.95    0.93    0.84    0.81    1.71    write-only
0.94    0.94    0.90    0.92    1.14    read-write_range=10
0.95    0.96    0.95    0.95    1.93    read-write_range=100
0.89    0.82    0.80    0.84    1.01    delete
1.05    1.05    1.01    1.10    0.53    insert

Results: update-index

Summary:
  • Postgres suffers from too much variance
  • Average throughput is ~1.55X larger for InnoDB than for Postgres
  • Per operation, Postgres does ~1.20X more write IO (KB written) to storage than InnoDB
  • Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
iostat, vmstat normalized by operation rate
r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
35503.0 373.7   58795.7 1345.1  1.375   14.824  53.351  25817   PG 19b1
33140.6 517.8   53449.6 1735.3  0.827   13.226  44.326  40090   MySQL 8.4.7

cs/s    cpu/s   cs/o    cpu/o   dbms
176167  14.4     6.824  .000557 PG 19b1
661395  41.9    16.498  .001046 MySQL 8.4.7

Results: update-zipf

Summary:
  • Postgres suffers from too much variance
  • Average throughput is ~1.09X larger for InnoDB than for Postgres
  • Per operation, Postgres does ~1.30X more write IO (KB written) to storage than InnoDB
  • Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
iostat, vmstat normalized by operation rate
r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
55595.5 620.7   64264.4 1352.3  0.622   7.110   15.490  89396   PG 19b1
27405.9 428.2   37465.1 1133.6  0.282   4.508   11.933  97270   MySQL 8.4.7

cs/s    cpu/s   cs/o    cpu/o   dbms
424392  27.2     4.747  .000304 PG 19b1
1213054 44.5    12.471  .000458 MySQL 8.4.7

Results: insert

Summary:
  • Postgres suffers from too much variance
  • Average throughput is ~2.06X larger for Postgres than for InnoDB
  • Per operation, Postgres does ~1.67X more write IO (KB written) to storage than InnoDB
  • Per operation, Postgres uses more CPU and does more context switches. This is the opposite of what happens above for update-index and update-zipf.

iostat, vmstat normalized by operation rate
r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
1615.5  56.0    15321.7 1170.9  0.007   0.242   5.059   237009  PG 19b1
3.6     0.1     8275.4  340.7   0.000   0.000   3.029   115155  MySQL 8.4.7

cs/s    cpu/s   cs/o    cpu/o   dbms
1214563 46.0    10.547  .000399 PG 19b1
800827  50.5     3.379  .000213 MySQL 8.4.7













Friday, April 10, 2026

MySQL 9.7.0 vs sysbench on a small server

This has results from sysbench on a small server with MySQL 9.7.0 and 8.4.8. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.

I tested MySQL 9.7.0 with and without the hypergraph optimizer enabled. I don't expect it to help much because the queries run here are simple. I hope to learn it doesn't hurt performance in that case.

tl;dr

  • Throughput improves on two tests with the Hypergraph optimizer in 9.7.0 because they get better query plans.
  • One read-only test and several write-heavy tests have small regressions from 8.4.8 to 9.7.0. This might be from new CPU overheads but I don't see obvious problems in the flamegraphs. 

Builds, configuration and hardware

I compiled MySQL from source for versions \8.4.8 and 9.7.0.

The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

The my.cnf files os here for 8.4. I call this the z12a configs and variants of it are used for MySQL 5.6 through 8.4.

For 9.7 I use two configs:

All DBMS versions use the latin1 character set as explained here.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
(QPS for some version) / (QPS for MySQL 8.4.8) 

Results: point queries

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • Throughput with MySQL 9.7.0 is similar to 8.4.8 except for point-query where there are regressions as rQPS drops by 5 and 7 basis points. The point-query test uses simple queries that fetch one column from one row by PK. From vmstat metrics the CPU overhead per query for 9.7.0 is ~8% larger than for 8.4.8, with and without the hypergraph optimizer. I don't see anything obvious in the flamegraphs.
z13a    z13b
0.99    1.01    hot-points
0.95    0.93    point-query
0.99    1.01    points-covered-pk
1.00    1.01    points-covered-si
0.98    1.00    points-notcovered-pk
0.99    1.01    points-notcovered-si
1.00    1.02    random-points_range=1000
0.99    1.01    random-points_range=100
0.96    1.00    random-points_range=10

Results: range queries without aggregation

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • Throughput with MySQL 9.7.0 is similar to 8.4.8. I am skeptical there is a regression for the scan test with the z13b config. I suspect that is noise.
z13a    z13b
0.99    0.99    range-covered-pk
0.99    0.99    range-covered-si
0.99    0.99    range-notcovered-pk
0.98    0.98    range-notcovered-si
1.00    0.96    scan

Results: range queries with aggregation

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • There might be small regressions in several tests with rQPS dropping by a few points but I will ignore that for now.
  • There is a large improvement for the read-only-distinct test with the z13b config. The query for this test is select distinct c from sbtest where id between ? and ? order by c. The reason for the performance improvment is that the hypergraph optimizer chooses a better plan, see here.
  • There is a large improvement for the read-only test with range=10000. This test uses the read-only version of the classic sysbench transaction (see here). One of the queries it runs is the query used by read-only-distinct. So it benefits from the better plan for that query. 
z13a    z13b
0.97    0.97    read-only-count
0.98    1.26    read-only-distinct
0.96    0.95    read-only-order
0.99    1.15    read-only_range=10000
0.97    1.00    read-only_range=100
0.96    0.97    read-only_range=10
0.99    0.99    read-only-simple
0.97    0.96    read-only-sum

Results: writes

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • There might be several small regressions here. I don't see obvious problems in the flamegraphs.
z13a    z13b
0.95    0.92    delete
1.00    1.01    insert
0.97    0.98    read-write_range=100
0.96    0.95    read-write_range=10
0.97    0.96    update-index
0.97    0.92    update-inlist
0.95    0.93    update-nonindex
0.95    0.92    update-one
0.95    0.93    update-zipf
0.97    0.95    write-only

Thursday, April 9, 2026

Sysbench vs MySQL on a small server: another way to view the regressions

This post provides another way to see the performance regressions in MySQL from versions 5.6 to 9.7. It complements what I shared in a recent post. The workload here is cached by InnoDB and my focus is on regressions from new CPU overheads. 

The good news is that there are few regressions after 8.0. The bad news is that there were many prior to that and these are unlikely to be undone.

    tl;dr

    • for point queries
      • there are large regressions from 5.6.51 to 5.7.44, 5.7.44 to 8.0.28 and 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for range queries without aggregation
      • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
      • there are mostly small regressions from 8.0.28 to 8.0.45, but scan has a large regression
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for range queries with aggregation
      • there are large regressions from 5.6.51 to 5.7.44 with two improvements
      • there are large regressions from 5.7.44 to 8.0.28
      • there are small regressions from 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for writes
      • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
      • there are small regressions from 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8
      • there are a few small regressions from 8.4.8 to 9.7.0

    Builds, configuration and hardware

    I compiled MySQL from source for versions 5.6.51, 5.7.44, 8.0.28, 8.0.45, 8.4.8 and 9.7.0.

    The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

    The my.cnf files are here for 5.65.7 and 8.4. I call these the z12a configs.

    For 9.7 I use the z13a config. It is as close as possible to z12a and adds two options for gtid-related features to undo a default config change that arrived in 9.6. 

    All DBMS versions use the latin1 character set as explained here.

    Benchmark

    I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

    The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

    I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
    (QPS for some version) / (QPS for base version) 
    Results: point queries

    MySQL 5.6.51 gets from 1.18X to 1.61X more QPS than 9.7.0 on point queries. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.62 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

    Summary:
    • from 5.6.51 to 9.7.0
      • the median regression is a drop in rQPS of 27 basis points
    • from 5.6.51 to 5.7.44
      • the median regression is a drop in rQPS of 11 basis points
    • from 5.7.44 to 8.0.28
      • the median regression is a drop in rQPS of 25 basis points
    • from 8.0.28 to 8.0.45
      • 7 of 9 tests get more QPS with 8.0.45
      • 2 tests have regressions where rQPS drops by ~6 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions
    • from 8.4.8 to 9.7.0
      • there are few regressions
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the largest regression is an rQPS drop of 38 basis points for point-query. Compared to most of the other tests in this section, this query does less work in the storage engine which implies the regression is from code above the storage engine.
    • the smallest regression is an rQPS drop of 15 basis points for random-points_range=1000. The regression for the same query with a shorter range (=10, =100) is larger. That implies, at least for this query, that the regression is for something above the storage engine (optimizer, parser, etc).
    • the median regression is an rQPS drop of 27 basis points
    0.65    hot-points
    0.62    point-query
    0.72    points-covered-pk
    0.78    points-covered-si
    0.73    points-notcovered-pk
    0.76    points-notcovered-si
    0.85    random-points_range=1000
    0.73    random-points_range=100
    0.66    random-points_range=10

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the largest regression is an rQPS drop of 14 basis points for hot-points.
    • the next largest regression is an rQPS drop of 13 basis points for random-points with range=10. The regressions for that query are smaller when a larger range is used =100, =1000 and this implies the problem is above the storage engine. 
    • the median regression is an rQPS drop of 11 basis points
    0.86    hot-points
    0.90    point-query
    0.89    points-covered-pk
    0.90    points-covered-si
    0.89    points-notcovered-pk
    0.88    points-notcovered-si
    1.00    random-points_range=1000
    0.89    random-points_range=100
    0.87    random-points_range=10

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the largest regression is an rQPS drop of 66 basis points for random-points with range=1000. The regression for that same query with smaller ranges (=10, =100) is smaller. This implies the problem is in the storage engine.
    • the second largest regression is an rQPS drop of 35 basis points for hot-points
    • the median regression is an rQPS drop of 25 basis points
    0.65    hot-points
    0.82    point-query
    0.74    points-covered-pk
    0.75    points-covered-si
    0.76    points-notcovered-pk
    0.84    points-notcovered-si
    0.34    random-points_range=1000
    0.75    random-points_range=100
    0.86    random-points_range=10

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • at last, there are many improvements. Some are from a fix for bug 102037 which I found with help from sysbench
    • the regressions, with rQPS drops by ~6 basis points, are for queries that do less work in the storage engine relative to the other tests in this section
    1.20    hot-points
    0.93    point-query
    1.13    points-covered-pk
    1.19    points-covered-si
    1.09    points-notcovered-pk
    1.04    points-notcovered-si
    2.48    random-points_range=1000
    1.12    random-points_range=100
    0.94    random-points_range=10

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.99    hot-points
    0.96    point-query
    0.99    points-covered-pk
    0.98    points-covered-si
    1.00    points-notcovered-pk
    0.99    points-notcovered-si
    1.00    random-points_range=1000
    1.00    random-points_range=100
    0.98    random-points_range=10

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.99    hot-points
    0.95    point-query
    0.99    points-covered-pk
    1.00    points-covered-si
    0.98    points-notcovered-pk
    0.99    points-notcovered-si
    1.00    random-points_range=1000
    0.99    random-points_range=100
    0.96    random-points_range=10

    Results: range queries without aggregation

    MySQL 5.6.51 gets from 1.35X to 1.52X more QPS than 9.7.0 on range queries without aggregation. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.66 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

    Summary:
    • from 5.6.51 to 9.7.0
      • the median regression is drop in rQPS of 33 basis points
    • from 5.6.51 to 5.7.44
      • the median regression is a drop in rQPS of 16 basis points
    • from 5.7.44 to 8.0.28
      • the median regression is a drop in rQPS ~10 basis points
    • from 8.0.28 to 8.0.45
      • the median regression is a drop in rQPS of 5 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • there are few regressions from 8.4.8 to 9.7.0
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • all tests have large regressions with an rQPS drop that ranges from 26 to 34 basis points
    • the median regression is an rQPS drop of 33 basis points
    0.66    range-covered-pk
    0.67    range-covered-si
    0.66    range-notcovered-pk
    0.74    range-notcovered-si
    0.67    scan

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • all tests have large regressions with an rQPS drop that ranges from 12 to 17 basis points
    • the median regression is an rQPS drop of 16 basis points
    0.85    range-covered-pk
    0.84    range-covered-si
    0.84    range-notcovered-pk
    0.88    range-notcovered-si
    0.83    scan

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • 4 of 5 tests have regressions with an rQPS drop that ranges from 10 to 14 basis points
    • the median regression is ~10 basis points
    • rQPS improves for the scan test
    0.86    range-covered-pk
    0.89    range-covered-si
    0.90    range-notcovered-pk
    0.90    range-notcovered-si
    1.04    scan

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • all tests are slower in 8.0.45 than 8.0.28, but the regression for 3 of 5 is <= 5 basis points
    • rQPS in the scan test drops by 21 basis points
    • the median regression is an rQPS drop of 5 basis points
    0.96    range-covered-pk
    0.95    range-covered-si
    0.91    range-notcovered-pk
    0.96    range-notcovered-si
    0.79    scan

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.95    range-covered-pk
    0.95    range-covered-si
    0.98    range-notcovered-pk
    0.99    range-notcovered-si
    0.98    scan

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.99    range-covered-pk
    0.99    range-covered-si
    0.99    range-notcovered-pk
    0.98    range-notcovered-si
    1.00    scan

    Results: range queries with aggregation

    Summary:
    • from 5.6.51 to 9.7.0 rQPS
      • the median result is a drop in rQPS of ~30 basis points
    • from 5.6.51 to 5.7.44
      • the median result is a drop in rQPS of ~10 basis points
    • from 5.7.44 to 8.0.28
      • the median result is a drop in rQPS of ~12 basis points
    • from 8.0.28 to 8.0.45
      • the median result is an rQPS drop of 5 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • there are few regressions from 8.4.8 to 9.7.0
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the median result is a drop in rQPS of ~30 basis points
    • rQPS for the read-only-distinct test improves by 25 basis point
    0.67    read-only-count
    1.25    read-only-distinct
    0.75    read-only-order
    1.02    read-only_range=10000
    0.74    read-only_range=100
    0.66    read-only_range=10
    0.69    read-only-simple
    0.66    read-only-sum

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the median result is an rQPS drop of ~10 basis points
    • rQPS improves by 45 basis points for read-only-distinct and by 23 basis points for read-only with the largest range (=10000)
    0.86    read-only-count
    1.45    read-only-distinct
    0.93    read-only-order
    1.23    read-only_range=10000
    0.96    read-only_range=100
    0.88    read-only_range=10
    0.85    read-only-simple
    0.86    read-only-sum

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the median result is an rQPS drop of ~12 basis points
    0.91    read-only-count
    0.94    read-only-distinct
    0.89    read-only-order
    0.86    read-only_range=10000
    0.87    read-only_range=100
    0.85    read-only_range=10
    0.90    read-only-simple
    0.87    read-only-sum

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • the median result is an rQPS drop of 5 basis points
    0.89    read-only-count
    0.95    read-only-distinct
    0.95    read-only-order
    0.97    read-only_range=10000
    0.94    read-only_range=100
    0.95    read-only_range=10
    0.93    read-only-simple
    0.93    read-only-sum

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.99    read-only-count
    0.98    read-only-distinct
    0.99    read-only-order
    1.00    read-only_range=10000
    0.98    read-only_range=100
    0.97    read-only_range=10
    0.97    read-only-simple
    0.98    read-only-sum

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.97    read-only-count
    0.98    read-only-distinct
    0.96    read-only-order
    0.99    read-only_range=10000
    0.97    read-only_range=100
    0.96    read-only_range=10
    0.99    read-only-simple
    0.97    read-only-sum

    Results: writes

    Summary:
    • from 5.6.51 to 9.7.0 rQPS 
      • the median result is a drop in rQPS of ~33 basis points
    • from 5.6.51 to 5.7.44
      • the median result is an rQPS drop of ~13 basis points
    • from 5.7.44 to 8.0.28
      • the median result is an rQPS drop of ~18 basis points
    • from 8.0.28 to 8.0.45
      • the median result is an rQPS drop of 9 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • the median result is an rQPS drop of 4 basis points
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the median result is an rQPS drop of ~33 basis points
    0.56    delete
    0.54    insert
    0.72    read-write_range=100
    0.66    read-write_range=10
    0.88    update-index
    0.74    update-inlist
    0.60    update-nonindex
    0.58    update-one
    0.60    update-zipf
    0.67    write-only

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the median result is an rQPS drop of ~13 basis points
    • rQPS improves by 21 basis points for update-index and by 5 basis points for update-inlist
    0.82    delete
    0.80    insert
    0.94    read-write_range=100
    0.88    read-write_range=10
    1.21    update-index
    1.05    update-inlist
    0.86    update-nonindex
    0.85    update-one
    0.86    update-zipf
    0.94    write-only

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the median result is an rQPS drop of ~18 basis points
    0.80    delete
    0.77    insert
    0.87    read-write_range=100
    0.85    read-write_range=10
    0.94    update-index
    0.79    update-inlist
    0.81    update-nonindex
    0.80    update-one
    0.81    update-zipf
    0.83    write-only

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • the median result is an rQPS drop of 9 basis points
    0.91    delete
    0.90    insert
    0.94    read-write_range=100
    0.94    read-write_range=10
    0.80    update-index
    0.92    update-inlist
    0.91    update-nonindex
    0.92    update-one
    0.91    update-zipf
    0.89    write-only

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.98    delete
    0.98    insert
    0.98    read-write_range=100
    0.98    read-write_range=10
    0.99    update-index
    0.99    update-inlist
    0.99    update-nonindex
    0.99    update-one
    0.99    update-zipf
    0.99    write-only

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • the median result is an rQPS drop of 4 basis points
    0.95    delete
    1.00    insert
    0.97    read-write_range=100
    0.96    read-write_range=10
    0.97    update-index
    0.97    update-inlist
    0.95    update-nonindex
    0.95    update-one
    0.95    update-zipf
    0.97    write-only

    Friday, April 3, 2026

    CPU-bound sysbench on a large server: Postgres, MySQL and MariaDB

    This post has results for CPU-bound sysbench vs Postgres, MySQL and MariaDB on a large server using older and newer releases. 

    The goal is to measure:

    • how performance changes over time from old versions to new versions
    • performance between modern MySQL, MariaDB and Postgres

    The context here is a collection of microbenchmarks using a large server with high concurrency. Results on other workloads might be different. But you might be able to predict performance for a more complex workload using the data I share here.

    tl;dr

    • for point queries
      • Postgres is faster than MySQL, MySQL is faster than MariaDB
      • modern MariaDB suffers from huge regressions that arrived in 10.5 and remain in 12.x
    • for range queries without aggregation
      • MySQL is about as fast as MariaDB, both are faster than Postgres (often 2X faster)
    • for range queries with aggregation
      • MySQL is about as fast as MariaDB, both are faster than Postgres (often 2X faster)
    • for writes
      • Postgres is much faster than MariaDB and MySQL (up to 4X faster)
      • MariaDB is between 1.3X and 1.5X faster than MySQL
    • on regressions
      • Postgres tends to be boring with few regressions from old to new versions
      • MySQL and MariaDB are exciting, with more regressions to debug
    Hand-wavy summary

    My hand-wavy summary about performance over time has been the following. It needs a revision, but also needs to be concise. 

    Modern Postgres is about as fast as old Postgres, with some improvements. It has done great at avoiding perf regressions.

    Modern MySQL at low concurrency has many performance regressions from new CPU overheads (code bloat). At high concurrency it is faster than old MySQL because the improvements for concurrency are larger than the regressions from code bloat.

    Modern MariaDB at low concurrency has similar perf as old MariaDB. But at high concurrency it has large regressions for point queries, small regressions for range queries and some large improvements for writes. Note that many things use point queries internally - range scan on non-covering index, updates, deletes. The regressions arrive in 10.5, 10.6, 10.11 and 11.4.

    For results on a small server with a low concurrency workload, I have many posts including:
    Builds, configuration and hardware

    I compiled:
    • Postgres from source for versions 12.22, 13.23, 14.21, 15.16, 16.12, 17.8 and 18.2.
    • MySQL from source for versions 5.6.51, 5.7.44, 8.0.44, 8.4.7 and 9.5.0
    • MariaDB from source for versions 10.2.30, 10.2.44, 10.3.39, 10.4.34, 10.5.29, 10.6.25, 10.11.15, 11.4.10, 11.8.6, 12.2.2 and 12.3.1
    I used a 48-core server from Hetzner
    • an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
    • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
    • 128G RAM
    • Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic). The server has since been updated to Ubuntu 24.04 and I am repeating tests.
    Configuration files for Postgres:
    • the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 1213141516 and 17.
    • for Postgres 18 I used conf.diff.cx10b_c32r128 (x10b_c32r128) which is as close as possible to the Postgres 17 config and uses io_method=sync
    The my.cnf files for MySQL are here: 5.6.515.7.448.0.4x8.4.x9.x.0

    The my.cnf files for MariaDB are here: 10.2, 10.3, 10.4, 10.5, 10.6, 10.11, 11.4, 11.8, 12.2, 12.3.

    I thought I was using the latin1 charset for all versions of MariaDB and MySQL but I recently learned I was using somehting like utf8mb4 on recent versions (maybe MariaDB 11.4+ and MySQL 8.0+). See here for details. I will soon repeat tests using latin1 for all versions. For some tests, the use of a multi-byte charset increases CPU overhead by up to 5%, which reduces throughput by a similar amount.

    With Postgres I have been using a multi-byte charset for all versions.

    Benchmark

    I used sysbench and my usage is explained here. I now run 32 of the 42 microbenchmarks listed in that blog post. Most test only one type of SQL statement. Benchmarks are run with the database cached by Postgres.

    The read-heavy microbenchmarks are run for 600 seconds and the write-heavy for 900 seconds. The benchmark is run with 40 clients and 8 tables with 10M rows per table. The database is cached.

    The purpose is to search for regressions from new CPU overhead and mutex contention. I use the small server with low concurrency to find regressions from new CPU overheads and then larger servers with high concurrency to find regressions from new CPU overheads and mutex contention.

    The tests can be called microbenchmarks. They are very synthetic. But microbenchmarks also make it easy to understand which types of SQL statements have great or lousy performance. Performance testing benefits from a variety of workloads -- both more and less synthetic.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries without aggregation while part 2 has queries with aggregation. 

    I provide charts below with relative QPS. The relative QPS is the following:
    (QPS for some version) / (QPS for base version)
    When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. When the relative QPS is 1.2 then some version is about 20% faster than base version.

    The per-test results from vmstat and iostat can help to explain why something is faster or slower because it shows how much HW is used per request, including CPU overhead per operation (cpu/o) and context switches per operation (cs/o) which are often a proxy for mutex contention.

    The spreadsheet with charts is here and in some cases is easier to read than the charts below. Files with performance summaries are archived here.

    The relative QPS numbers are also here for:
    Files with HW efficiency numbers, average values from vmstat and iostat normalized by QPS, are here for:
    Results: MySQL vs MariaDB vs Postgres

    HW efficiency metrics are here. They have metrics from vmstat and iostat normalized by QPS.

    Point queries
    • Postgres is faster than MySQL is faster than MariaDB
    • MySQL gets about 2X more QPS than MariaDB on 5 of the 9 tests
    • a table for relative QPS by test is here
    • from HW efficiency metrics for the random-points.range1000 test:
      • Postgres is 1.35X faster than MySQL, MySQL is more than 2X faster than MariaDB
      • MariaDB uses 2.28X more CPU and does 23.41X more context switches than MySQL
      • Postgres uses less CPU but does ~1.93X more context switches than MySQL
    Range queries without aggregation
    • MySQL is about as fast as MariaDB, both are faster than Postgres (often 2X faster)
    • MariaDB has lousy results on the range-notcovered-si test because it must do many point lookups to fetch columns not in the index and MariaDB has problems with point queries at high concurrency
    • a table for relative QPS by test is here
    • from HW efficiency metrics for the scan:
      • MySQL is 1.2X faster than Postgres and 1.5X faster than MariaDB
      • MariaDB uses 1.19X more CPU and does ~1000X more context switches than MySQL
      • Postgres uses 1.55X more CPU but does few context switches than MySQL
    Range queries with aggregation
    • MySQL is about as fast as MariaDB, both are faster than Postgres (often 2X faster)
    • a table for relative QPS by test is here
    • from HW efficiency metrics for read-only-count
      • MariaDB is 1.22X faster than MySQL, MySQL is 4.2X faster than Postgres
      • MariaDB uses 1.22X more CPU than MySQL but does ~2X more context switches
      • Postgres uses 4.11X more CPU than MySQL and does 1.08X more context switches
      • Query plans are here and MySQL + MariaDB benefit from the InnoDB clustered index
    • from HW efficiency metrics for read-only.range=10
      • MariaDB is 1.22X faster than MySQL, MySQL is 4.2X fasterMySQL is 1.2X faster than Postgres and 1.5X faster than MariaDB
      • MariaDB uses 1.19X more CPU and does ~1000X more context switches than MySQL
      • Postgres uses 1.55X more CPU but does few context switches than MySQL
    Writes
    • Postgres is much faster than MariaDB and MySQL (up to 4X faster)
    • MariaDB is between 1.3X and 1.5X faster than MySQL
    • a table for relative QPS by test is here
    • from HW efficiency metrics for insert
      • Postgres is 3.03X faster than MySQL, MariaDB is 1.32X faster than MySQL
      • MySQL uses ~1.5X more CPU than MariaDB and ~2X more CPU than Postgres
      • MySQL does ~1.3X more context switches than MariaDB and ~2.9X more than Postgres
    Results: MySQL

    HW efficiency metrics are here. They have metrics from vmstat and iostat normalized by QPS.

    Point queries
    • For 7 of 9 tests QPS is ~1.8X larger or more in 5.7.44 than in 5.6.51
    • For 2 tests there are small regressions after 5.6.51 -- points-covered-si & points-notcovered-si
    • a table for relative QPS by test is here
    • from HW efficiency metrics for points-covered-si:
      • the regression is explained by an increase in CPU
    Range queries without aggregation
    • there is a small regression from 5.6 to 5.7 and a larger one from 5.7 to 8.0
    • a table for relative QPS by test is here
    • from HW efficiency metrics for range-covered-pk:
      • CPU overhead grows by up to 1.4X after 5.6.51, this is true for all of the tests
    Range queries with aggregation
    • regressions after 5.6.51 here are smaller than in the other groups, but 5.7 tends to do better than 8.0, 8.4 and 9.5
    • a table for relative QPS by test is here
    • HW efficiency metrics are here for read-only_range=100
      • QPS changes because CPU/query changes
    Writes
    • QPS improves after 5.6 by up to ~7X
    • a table for relative QPS by test is here
    • HW efficiency metrics are here insert
      • QPS improves after 5.6.51 because CPU per statement drops
    Results: MariaDB

    HW efficiency metrics are here. The have metrics from vmstat and iostat normalized by QPS.

    Point queries
    • QPS for 6 of 9 tests drops in half (or more) from 10.2 to 12.3
    • a table for relative QPS is here
    • most of the regressions arrive in 10.5 and the root cause might be remove support for innodb_buffer_pool_intances and only support one buffer pool instance
    • HW efficiency metrics are here for points-covered-pk
      • there are large increases in CPU overhead and the context switch rate starting in 10.5
    Range queries without aggregation
    • for range-covered-* and range-notcovered-pk there is a small regression in 10.4
    • for range-not-covered-si there is a large regression in 10.5 because this query does frequent point lookups on the PK to get missing columns
    • for scan there is a regression in 10.5 that goes away, but the regressions return in 10.11 and 11.4 
    • a table for relative QPS by test is here
    • HW efficiency metrics are here
    Range queries with aggregation
    • for most tests there are small regressions in 10.4 and 10.5
    • a table for relative QPS by test is here
    • HW efficiency metrics are here
    Writes
    • for most tests modern MariaDB is faster than 10.2
    • table for relative QPS by test is here
    • HW efficiency metrics are here
    Results: Postgres

    HW efficiency metrics are here. They have metrics from vmstat and iostat normalized by QPS.

    Point queries
    • QPS for hot-points increased by ~2.5X starting in Postgres 17.x
    • otherwise QPS is stable from 12.22 through 18.2
    • a table for relative QPS by test is here
    • HW efficiency metrics for the hot-points test are here
      • CPU drops by more than half starting in 17.x
    Range queries without aggregation
    • QPS is stable for the range-not-covered-* and scan tests
    • QPS drops almost in half for the range-covered-* tests
    • a table for relative QPS by test is here
    • all versions use the same query plan for the range-covered-pk test
    • HW efficiency metrics are here for range-covered-pk and for range-covered-si
      • An increase in CPU overhead explains the regressions for range-covered-*
      • I hope to get flamegraphs and thread stacks for these tests to explain what happens
    Range queries with aggregation
    • QPS is stable from 12.22 through 18.2
    • a table for relative QPS by test is here
    • HW efficiency metrics are here
    Writes
    • QPS is stable for 5 of 10 tests
    • QPS improves by up to 1.7X for the other 5 tests, most of that arrives in 17.x
    • a table for relative QPS by test is here
    • HW efficiency metrics are here for update-index




















      CPU-bound sysbench on a large server: Postgres 12 to 19 beta1

      This has results from sysbench on a small server with Postgres versions 12 through 19 beta1. Sysbench is run with high concurrency (40 conne...