4

Environment

  • MySQL 8.0.36, Community Edition
  • 256 GB RAM, innodb_buffer_pool_size = 192G
  • Data volume ~1.2 TB (SSD, NVMe)
  • Master-slave replication (GTID + ROW format), used for planned failover

Background & Research

We conduct a planned master-slave failover drill every week. After switching, the new master's Buffer Pool is "cold", causing a noticeable rise in query latency for the first 10–15 minutes, with some complex reporting queries even timing out. I reviewed the MySQL official documentation on innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup, and performed the following tests:

  1. Enabled innodb_buffer_pool_dump_at_shutdown = ON and innodb_buffer_pool_dump_pct = 80 on the old master
  2. After failover, the new master's ib_buffer_pool file was about 2.1 GB
  3. The startup log showed InnoDB: Buffer pool(s) load completed at 240303 03:12:45, but the high-latency phenomenon persisted, only reduced from ~15 minutes to ~8 minutes

I also consulted Percona blog posts about --innodb-buffer-pool-load-abort and manually triggering LOAD, but could not find a systematic comparison between dump/restore and natural warm-up (no dump, letting traffic gradually fill the pool) specifically for failover scenarios.

Specific Questions

  1. In MySQL 8.0, does innodb_buffer_pool_dump/restore only recover the "address mapping" of pages rather than the actual page content? If so, does the first access to these pages after failover still trigger physical IO, meaning the latency spike is merely "dispersed" rather than eliminated?
  2. For a failover scenario, are there better warm-up strategies (e.g., running SELECT /*+ JOIN_ORDER(...) */ against hot tables on the slave before switching; or using Percona's innodb_buffer_pool_load_now combined with specific SQL warm-up)? If yes, are there any best practices or benchmark data?
  3. When the Buffer Pool size (192G) is much larger than the number of pages the dump file can describe, does MySQL 8.0's LRU algorithm rapidly evict pages that were just loaded during restore, thereby weakening the restore effect?

Expectations

I would like to understand how to minimize cold-cache latency after failover in a large Buffer Pool scenario. Source-level explanations (e.g., logic in buf0dump.cc or buf0lru.cc) are also very welcome.

New contributor
Jacki is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

1 Answer 1

1

The BP dump file is a list of page id's, not the content. It's not necessary for the dump file to contain page content, and it would take a lot of disk space — in your case, up to 192GB.

Upon loading, the code reads the page id's and loads the respective pages in a background thread. This takes a nonzero amount of time, so even if it were working perfectly it wouldn't give you a warm BP immediately.

But I think the BP dump file is not relevant to your switchover.

The BP dump on your replica instance is not the same as the BP dump on the source instance, unless for some reason you run exactly the same queries on both instances for some period of time before the switchover. This is unlikely.

The dump file from the source can't be used by the replica instance. I mean, it would be a list of page id's, but the id's in that list don't correspond to the pages with the same data on the source. Two instances of InnoDB don't necessarily organize pages in the same way, even for the same data.

So even if you dump and load the BP on the replica, it would represent the list of pages that were in RAM on the replica, not the pages in RAM on the old source instance. Unless you were running queries on the replica, the BP on the replica won't include a lot of the pages needed to optimize queries (it may include pages that were needed to perform INSERT/UPDATE/DELETE through replication).

I think the only way to improve the warmup time is before the switchover, you execute a sample of the typical SELECT queries that had run on the source instance for at least a few minutes (the longer the better), to try to prime the BP with a similar working set of data. You need to do this repeatedly, because the BP won't keep a page loaded unless it is accessed repeatedly.

Then do the switchover. If you had primed the BP, you won't have to dump or load the BP on the replica instance.

Obviously, you want to run just read-only queries on the replica (SELECT). Don't re-run INSERT/UPDATE/DELETE or DDL queries.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.