Fix secondary DB stuck on stale versions when atomic_flush=true#14439
Open
hx235 wants to merge 1 commit into
Open
Fix secondary DB stuck on stale versions when atomic_flush=true#14439hx235 wants to merge 1 commit into
hx235 wants to merge 1 commit into
Conversation
✅ clang-tidy: No findings on changed linesCompleted in 363.3s. |
3e409ae to
b3010a1
Compare
Contributor
Author
|
linter not related |
Contributor
Author
|
Ah history md was not auto-generated - just pushed a new one on top of clean testing signals (excluding the unrelated linter warning) |
b3010a1 to
5fbf050
Compare
Contributor
Author
|
Turning into draft mode in order to work better on "TODO: consider cross-CF read consistency requirement in secondary db and whether that conflicts with this change." |
5fbf050 to
86188bb
Compare
Summary:
Bug: When `atomic_flush=true` and manifest rotation races with primary
compaction, the secondary/follower DB can permanently stop installing
new versions. Symptoms include remote compaction failing with "Cannot
find matched SST files", secondary serving permanently stale data
(release builds), and crashing on manifest switch (debug builds).
Root cause: `VersionEditHandlerPointInTime` uses a staging map
(`atomic_update_versions_`) for all-or-nothing atomic group recovery.
When the secondary reads a manifest where some SST files were already
compacted away by the primary, the atomic group becomes permanently
incomplete and the staging map is never cleared. All subsequent version
creation — including force-creates in `OnAtomicGroupReplayBegin()` and
`CheckIterationResult()` — routes through the stale staging map instead
of `versions_`, so versions are trapped and never installed.
Fix: Override `OnAtomicGroupReplayEnd()` in `ManifestTailer` to clear
the staging map when the atomic group is incomplete. This is safe
because:
- All staging map entries are nullptr (no versions are created while
`in_atomic_group_` is true), so nothing is lost.
- Pre-atomic-group versions are already saved in `versions_` by
`OnAtomicGroupReplayBegin()`.
- The all-or-nothing guarantee for primary best-effort recovery
(`VersionEditHandlerPointInTime`) is preserved — only the subclass
behavior changes.
The fix can create temporary cross-CF inconsistency on the secondary:
when an incomplete atomic group is cleared, some CFs may have their
versions installed while others don't (e.g., CF1 has data but CF0's
atomic-flush SST was already deleted by CompactRange on the primary).
This is mitigated by the next TryCatchUpWithPrimary call, which reads
the updated manifest containing CompactRange results and restores
cross-CF consistency.
Test Plan:
- `CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile`:
end-to-end test triggering the bug via remote compaction timing.
Without fix: remote compaction fails ("CompactionService failed to
run the compaction job"). With fix: compaction succeeds.
- `DBSecondaryTest.AtomicFlushSecondaryIncompleteGroup`: reproduces
the bug scenario (incomplete atomic group from CompactRange racing
with secondary manifest read). Verifies SST installation, cross-CF
consistency via NewIterators, and recovery via TryCatchUpWithPrimary.
Without fix: crashes in ManifestTailer::OnColumnFamilyAdd.
- `DBSecondaryTest.AtomicFlushSecondaryMultiCFConsistency`: baseline
cross-CF consistency with atomic_flush + WAL disabled.
$ ./db_secondary_test --gtest_filter="DBSecondaryTest.AtomicFlush*"
[ PASSED ] 2 tests.
$ ./compaction_service_test --gtest_filter="CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile"
[ PASSED ] 1 test.
86188bb to
c6f19d8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context/Summary:
Bug: When
atomic_flush=trueand manifest rotation races with primarycompaction, the secondary/follower DB can permanently stop installing
new versions. Symptoms include remote compaction failing with "Cannot
find matched SST files", secondary serving permanently stale data
(release builds), and crashing on manifest switch (debug builds).
Root cause:
VersionEditHandlerPointInTimeuses a staging map(
atomic_update_versions_) for all-or-nothing atomic group recovery.When the secondary reads a manifest where some SST files were already
compacted away by the primary, the atomic group becomes permanently
incomplete and the staging map is never cleared. All subsequent version
creation — including force-creates in
OnAtomicGroupReplayBegin()andCheckIterationResult()— routes through the stale staging map insteadof
versions_, so versions are trapped and never installed.Fix: Override
OnAtomicGroupReplayEnd()inManifestTailerto clearthe staging map when the atomic group is incomplete. This is safe
because:
- All staging map entries are nullptr (no versions are created while
in_atomic_group_is true), so nothing is lost.- Pre-atomic-group versions are already saved in
versions_byOnAtomicGroupReplayBegin().- The all-or-nothing guarantee for primary best-effort recovery
(
VersionEditHandlerPointInTime) is preserved — only the subclassbehavior changes.
The fix can create temporary cross-CF inconsistency on the secondary:
when an incomplete atomic group is cleared, some CFs may have their
versions installed while others don't (e.g., CF1 has data but CF0's
atomic-flush SST was already deleted by CompactRange on the primary).
This is mitigated by the next TryCatchUpWithPrimary call, which reads
the updated manifest containing CompactRange results and restores
cross-CF consistency.
Test Plan:
-
CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile:end-to-end test triggering the bug via remote compaction timing.
Without fix: remote compaction fails ("CompactionService failed to
run the compaction job"). With fix: compaction succeeds.
-
DBSecondaryTest.AtomicFlushSecondaryIncompleteGroup: reproducesthe bug scenario (incomplete atomic group from CompactRange racing
with secondary manifest read). Verifies SST installation, cross-CF
consistency via NewIterators, and recovery via TryCatchUpWithPrimary.
Without fix: crashes in ManifestTailer::OnColumnFamilyAdd.
-
DBSecondaryTest.AtomicFlushSecondaryMultiCFConsistency: baselinecross-CF consistency with atomic_flush + WAL disabled.