Fix tiered compaction incorrectly moving range tombstones upwards#14795
Fix tiered compaction incorrectly moving range tombstones upwards#14795joshkang97 wants to merge 3 commits into
Conversation
|
| Check | Count |
|---|---|
concurrency-mt-unsafe |
3 |
cppcoreguidelines-avoid-non-const-global-variables |
3 |
| Total | 6 |
Details
db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:25:56: warning: variable 'wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc (4 warning(s))
db_stress_tool/db_stress_tool.cc:41:5: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:416:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:439:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:448:9: warning: function is not thread safe [concurrency-mt-unsafe]
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 4dd4947 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471. |
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 4dd4947 SummaryCorrect, well-scoped bug fix for an off-by-one inconsistency between point key routing ( High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Test does not verify data correctness —
|
| Context | Code executes? | Assumptions hold? | Action needed? |
|---|---|---|---|
| kNonLastRange | YES | YES — primary bug scenario | Fixed by this PR |
| kFullRange | YES | YES | Safe |
| kDisabled | NO | N/A | Safe |
| WritePreparedTxnDB | Potentially | Same seqno routing | Safe |
| Remote compaction | YES | Same CompactionJob logic (not overridden) | Safe |
| User-defined timestamps | YES | Seqno logic orthogonal | Safe |
| FIFO/Universal | Partial | Universal supports per-key placement; FIFO does not | Safe |
| Snapshots | YES | Placement level doesn't affect visibility | Safe |
Assumption stress-test results:
- Claim: "split them at the next seqno" — Verified.
[proximal_after_seqno_+1, kMaxSequenceNumber)≡seq > proximal_after_seqno_. No gap, no overlap. - Claim: overflow guard prevents issues — Verified. At kMaxSequenceNumber, proximal filters out all tombstones (correct — nothing routes to proximal).
- CompactForTieringCollector path — Verified unchanged. Uses original
proximal_after_seqno_at line 2590.
Positive Observations
- Surgical fix: Minimal (5 lines of logic + 3 lines of comments), precisely targets root cause.
- Good comment: Clearly explains WHY the +1 adjustment is needed.
- Proper overflow guard: Prevents integer overflow at kMaxSequenceNumber.
- Well-designed test: Sync points for determinism, targets kNonLastRange, confirmed to fail with old code.
- No performance impact: Single comparison + conditional increment per output file.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471. |
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit f1126ee ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit f1126ee SummaryWell-scoped, correct bug fix for an off-by-one error in tiered compaction range tombstone routing. The fix aligns range tombstone High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Test covers only
|
| Context | Affected? | Safe? | Notes |
|---|---|---|---|
| WritePreparedTxnDB | Yes (uses same compaction) | Yes | Same CompactionJob code path |
| ReadOnly DB | No | N/A | No compaction |
| CompactionService | Yes (shared code) | Yes | FinishCompactionOutputFile is in shared CompactionJob, not CompactionServiceJob |
| User-defined timestamps | Yes | Yes | Timestamps are orthogonal to seqno routing |
| MemPurge | No | N/A | Different code path |
| FIFO compaction | No | N/A | SupportsPerKeyPlacement() is false |
| Universal compaction | Possible | Yes | Can support per-key placement; fix applies |
Saturation analysis: When proximal_after_seqno_ == kMaxSequenceNumber (default), SupportsPerKeyPlacement() returns false, so the fixed code path is never entered. The saturation check (if (range_del_split_seqno < kMaxSequenceNumber)) is a correct defensive measure for the theoretical case.
Data consumer analysis: keep_seqno_range is constructed in exactly one location and consumed by exactly one function (AddRangeDels). No cross-component data consumer issues.
Assumption stress-test: The claim "seq_ >= X+1 is equivalent to seq_ > X for unsigned integers" holds for all values where X+1 doesn't overflow. The saturation check prevents overflow. Valid for all reachable values of proximal_after_seqno_.
Positive Observations
- The fix is surgically precise: 6 lines of logic change, no refactoring, no side effects
- The boundary math is correct: proximal keeps
[proximal_after_seqno_+1, kMaxSequenceNumber)and last keeps[0, proximal_after_seqno_+1), giving a perfect partition of[0, kMaxSequenceNumber)with no gap or overlap - The test directly reproduces the corruption scenario from the bug description
- The test uses sync points (not timing) for determinism
- The release note is concise and appropriate for external users
- No performance impact (code runs once per output file, not per key)
- No serialization or compatibility impact
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@joshkang97 merged this pull request in 26a501b. |
Summary
Fixes an off-by-one bug in how sequence numbers are handled when splitting range tombstones across output levels in per-key-placement (tiered) compaction. The bug was either introduced or propagated in #13256.
Point entries move to proximal output only when their sequence number is strictly greater than
proximal_after_seqno_, but range tombstones were previously split using an inclusive lower bound at that same seqno. That allowed a range tombstone at the boundary to be emitted to the proximal level while point keys at the same seqno stayed in the last level, which could create overlapping files in the proximal level (caught byforce_consistency_checksasL<n> has overlapping ranges).This matters when
proximal_output_range_type_iskNonLastRange: the compaction only owns the selected proximal-level input range, so existing last-level data at the split boundary must stay in the last level. InkFullRange, the compaction owns the relevant proximal-level range, so newer last-level data can be safely emitted to proximal output.The fix splits range tombstones at
proximal_after_seqno_ + 1(saturating atkMaxSequenceNumber), so the half-open[lower, upper)tombstone filter lands on the same boundary as the strictseqno > proximal_after_seqno_rule used for point keys.Test Plan
New regression test
PrecludeLastLevelTestBase.RangeDelAtProximalSeqnoBoundaryStaysInLastLeveluses a targeted manual compaction to exercise thekNonLastRangecase directly, verifying a boundary range tombstone stays in the last level instead of widening the proximal output range.Worked example:
Verification (debug build,
make -j64 tiered_compaction_test):CompactFilescall: