Fix tiered compaction incorrectly moving range tombstones upwards by joshkang97 · Pull Request #14795 · facebook/rocksdb

joshkang97 · 2026-05-27T16:44:18Z

Summary

Fixes an off-by-one bug in how sequence numbers are handled when splitting range tombstones across output levels in per-key-placement (tiered) compaction. The bug was either introduced or propagated in #13256.

Point entries move to proximal output only when their sequence number is strictly greater than proximal_after_seqno_, but range tombstones were previously split using an inclusive lower bound at that same seqno. That allowed a range tombstone at the boundary to be emitted to the proximal level while point keys at the same seqno stayed in the last level, which could create overlapping files in the proximal level (caught by force_consistency_checks as L<n> has overlapping ranges).

This matters when proximal_output_range_type_ is kNonLastRange: the compaction only owns the selected proximal-level input range, so existing last-level data at the split boundary must stay in the last level. In kFullRange, the compaction owns the relevant proximal-level range, so newer last-level data can be safely emitted to proximal output.

The fix splits range tombstones at proximal_after_seqno_ + 1 (saturating at kMaxSequenceNumber), so the half-open [lower, upper) tombstone filter lands on the same boundary as the strict seqno > proximal_after_seqno_ rule used for point keys.

Test Plan

New regression test PrecludeLastLevelTestBase.RangeDelAtProximalSeqnoBoundaryStaysInLastLevel uses a targeted manual compaction to exercise the kNonLastRange case directly, verifying a boundary range tombstone stays in the last level instead of widening the proximal output range.

Worked example:

Initial state:

  L6 (last level):   Put(Key 2)@s1,  Put(Key 12)@s2,  RangeDel[Key 2, Key 12)@s3
  L5 (proximal):     file A [Key 0 .. Key 4]@s4-s5     file B [Key 5 .. Key 9]@s6-s7

  preclude_last_level_min_seqno is forced to 0 via sync point.

Manual CompactFiles selects only L5 file B + the L6 file (output level 6).
Only part of the proximal level is selected, so this is the kNonLastRange case:

  max_last_level_seqno  = 3
  proximal_after_seqno_ = max(0, 3) = 3

  Point keys Key 5 (s6) and Key 9 (s7): both seqno > 3 -> proximal output (L5)   OK

  Range tombstone @s3:
    Old (buggy):  proximal keep range [3, MAX) includes s3, so the tombstone is
                  emitted to the proximal output. The output is built from L5 input
                  file B [Key 5 .. Key 9], but the tombstone covers [Key 2, Key 12),
                  so the proximal output file starts at Key 2 and spills past Key 5
                  into the existing, untouched L5 file A [Key 0 .. Key 4].
                  Two overlapping files in L5 -> Corruption.

    New (fixed):  split at proximal_after_seqno_ + 1 = 4.
                  proximal keep [4, MAX) excludes s3; last-level keep [0, 4) includes
                  s3 -> tombstone stays in the last level (L6).   OK

Verification (debug build, make -j64 tiered_compaction_test):

Without the fix, the test fails at the CompactFiles call:

tiered_compaction_test.cc:2940: Failure
Corruption: force_consistency_checks(DEBUG): VersionBuilder: L5 has overlapping ranges:
  file #11 largest key: Key(4) seq:5, type:1 (Put) vs.
  file #17 smallest key: Key(2) seq:3, type:15 (range deletion)

With the fix, the test passes (range deletions absent from L5, still present in L6).

github-actions · 2026-05-27T16:48:49Z

⚠️ clang-tidy: 6 warning(s) on changed lines

Completed in 524.5s.

Summary by check

Check	Count
`concurrency-mt-unsafe`	3
`cppcoreguidelines-avoid-non-const-global-variables`	3
Total	6

Details

db_stress_tool/db_stress_common.cc (2 warning(s))

db_stress_tool/db_stress_common.cc:25:56: warning: variable 'wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]

db_stress_tool/db_stress_tool.cc (4 warning(s))

db_stress_tool/db_stress_tool.cc:41:5: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:416:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:439:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:448:9: warning: function is not thread safe [concurrency-mt-unsafe]

github-actions · 2026-05-27T17:03:19Z

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 4dd4947

❌ Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

Codex may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/codex-review [context] — Request a code review
/codex-query <question> — Ask about the PR or codebase

meta-codesync · 2026-05-27T17:07:20Z

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

github-actions · 2026-05-27T17:15:08Z

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 4dd4947

Summary

Correct, well-scoped bug fix for an off-by-one inconsistency between point key routing (seq > proximal_after_seqno_) and range tombstone filtering (seq >= proximal_after_seqno_) during tiered compaction with per-key placement. The fix aligns both to use seq > proximal_after_seqno_ by incrementing the split seqno by 1 (with proper overflow guard). No correctness, performance, or compatibility concerns.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Test does not verify data correctness — `tiered_compaction_test.cc:2949-2957`

Issue: The regression test only checks num_range_deletions metadata counts to verify range tombstones stayed in L6. It does not verify functional correctness (e.g., that point lookups for deleted keys return NOT_FOUND, or that range scans produce correct results).
Root cause: The test validates placement but not the end-to-end semantics of the range deletion.
Suggested fix: Add point lookups for keys within the deleted range (Key(2)..Key(12)) to verify they return the expected values given the snapshot and deletion, and/or a full iterator scan to verify correctness.

M2. Missing test for kFullRange case — `tiered_compaction_test.cc`

Issue: The test only exercises the kNonLastRange code path (level-style compaction with partial proximal input). The kFullRange path (universal compaction with all proximal files included) uses the same keep_seqno_range logic but is not tested with the boundary condition.
Root cause: kFullRange typically has keep_in_last_level_through_seqno_ = 0, so the boundary condition is less likely to trigger, but coverage would increase confidence.
Suggested fix: Consider adding a companion test using universal compaction to exercise kFullRange.

🟢 LOW / NIT

L1. No test for `proximal_after_seqno_ == kMaxSequenceNumber` edge case

Issue: When proximal_after_seqno_ == kMaxSequenceNumber, the overflow guard prevents the increment. This is correct because per-key placement is effectively disabled. However, no test verifies this.
Suggested fix: Low priority — this case is an effective no-op and is guarded by the check.

L2. No test for range tombstones straddling the boundary

Issue: The test has a single range tombstone at the boundary seqno. Testing with multiple range tombstones at seqnos on both sides of the boundary would more thoroughly validate the split logic.
Suggested fix: Consider extending the test or adding a companion test.

Cross-Component Analysis

Context	Code executes?	Assumptions hold?	Action needed?
kNonLastRange	YES	YES — primary bug scenario	Fixed by this PR
kFullRange	YES	YES	Safe
kDisabled	NO	N/A	Safe
WritePreparedTxnDB	Potentially	Same seqno routing	Safe
Remote compaction	YES	Same CompactionJob logic (not overridden)	Safe
User-defined timestamps	YES	Seqno logic orthogonal	Safe
FIFO/Universal	Partial	Universal supports per-key placement; FIFO does not	Safe
Snapshots	YES	Placement level doesn't affect visibility	Safe

Assumption stress-test results:

Claim: "split them at the next seqno" — Verified. [proximal_after_seqno_+1, kMaxSequenceNumber) ≡ seq > proximal_after_seqno_. No gap, no overlap.
Claim: overflow guard prevents issues — Verified. At kMaxSequenceNumber, proximal filters out all tombstones (correct — nothing routes to proximal).
CompactForTieringCollector path — Verified unchanged. Uses original proximal_after_seqno_ at line 2590.

Positive Observations

Surgical fix: Minimal (5 lines of logic + 3 lines of comments), precisely targets root cause.
Good comment: Clearly explains WHY the +1 adjustment is needed.
Proper overflow guard: Prevents integer overflow at kMaxSequenceNumber.
Well-designed test: Sync points for determinism, targets kNonLastRange, confirmed to fail with old code.
No performance impact: Single comparison + conditional increment per output file.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

Claude may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/claude-review [context] — Request a code review
/claude-query <question> — Ask about the PR or codebase

meta-codesync · 2026-05-29T18:22:17Z

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

meta-codesync · 2026-05-29T19:03:37Z

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

github-actions · 2026-05-29T19:40:36Z

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit f1126ee

❌ Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

Codex may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/codex-review [context] — Request a code review
/codex-query <question> — Ask about the PR or codebase

github-actions · 2026-05-29T19:57:47Z

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit f1126ee

Summary

Well-scoped, correct bug fix for an off-by-one error in tiered compaction range tombstone routing. The fix aligns range tombstone keep_seqno_range boundaries with the strict > comparison used for point keys by incrementing by 1 (with saturation). The test is well-targeted and the release note is appropriate.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Test covers only `kNonLastRange` case -- `tiered_compaction_test.cc`

Issue: The new test exercises the kNonLastRange case (partial proximal level ownership via CompactFiles). The kFullRange case, where the compaction owns the entire proximal range, is not tested for this specific boundary condition.
Root cause: The kFullRange case is less likely to manifest as corruption (no untouched proximal files to overlap with), but the off-by-one bug still applies logically.
Suggested fix: Consider adding a follow-up test for kFullRange to confirm the boundary tombstone is correctly placed. Not blocking since the core bug path (kNonLastRange) is covered.

M2. No test for tombstone at `proximal_after_seqno_ + 1` going to proximal -- `tiered_compaction_test.cc`

Issue: The test verifies the boundary tombstone at seq == proximal_after_seqno_ stays in last level, but does not verify that a tombstone at seq == proximal_after_seqno_ + 1 correctly goes to the proximal level.
Suggested fix: Adding a companion assertion for the "just above boundary" case would strengthen the regression test and guard against future off-by-two regressions.

🟢 LOW / NIT

L1. Snapshot in test is not released before compaction -- `tiered_compaction_test.cc`

Issue: ManagedSnapshot snapshot(db_.get()) is taken after the Put calls but before the DeleteRange. The snapshot is held throughout the test and released by RAII at scope exit. This is intentional (it prevents the range tombstone from being dropped during compaction since there's a live snapshot below it), but a brief comment explaining why the snapshot is needed would improve readability.

L2. `l6_range_deletions` variable declared but accumulation loop could be simplified -- `tiered_compaction_test.cc`

Issue: The test declares uint64_t l6_range_deletions = 0 then iterates L6 files accumulating num_range_deletions. Since the test creates a very specific file layout (1 L6 file), a simpler assertion like ASSERT_EQ(1, GetLevelFileMetadatas(6).size()); ASSERT_GT(GetLevelFileMetadatas(6)[0]->num_range_deletions, 0) would be more direct. This is purely stylistic and non-blocking.

L3. Comment quality is good -- `compaction_job.cc`

Observation: The added comment clearly explains why the +1 adjustment is needed and references the [lower, upper) filter semantics. This is well-written and sufficient.

Cross-Component Analysis

Context	Affected?	Safe?	Notes
WritePreparedTxnDB	Yes (uses same compaction)	Yes	Same CompactionJob code path
ReadOnly DB	No	N/A	No compaction
CompactionService	Yes (shared code)	Yes	`FinishCompactionOutputFile` is in shared `CompactionJob`, not `CompactionServiceJob`
User-defined timestamps	Yes	Yes	Timestamps are orthogonal to seqno routing
MemPurge	No	N/A	Different code path
FIFO compaction	No	N/A	`SupportsPerKeyPlacement()` is false
Universal compaction	Possible	Yes	Can support per-key placement; fix applies

Saturation analysis: When proximal_after_seqno_ == kMaxSequenceNumber (default), SupportsPerKeyPlacement() returns false, so the fixed code path is never entered. The saturation check (if (range_del_split_seqno < kMaxSequenceNumber)) is a correct defensive measure for the theoretical case.

Data consumer analysis: keep_seqno_range is constructed in exactly one location and consumed by exactly one function (AddRangeDels). No cross-component data consumer issues.

Assumption stress-test: The claim "seq_ >= X+1 is equivalent to seq_ > X for unsigned integers" holds for all values where X+1 doesn't overflow. The saturation check prevents overflow. Valid for all reachable values of proximal_after_seqno_.

Positive Observations

The fix is surgically precise: 6 lines of logic change, no refactoring, no side effects
The boundary math is correct: proximal keeps [proximal_after_seqno_+1, kMaxSequenceNumber) and last keeps [0, proximal_after_seqno_+1), giving a perfect partition of [0, kMaxSequenceNumber) with no gap or overlap
The test directly reproduces the corruption scenario from the bug description
The test uses sync points (not timing) for determinism
The release note is concise and appropriate for external users
No performance impact (code runs once per output file, not per key)
No serialization or compatibility impact

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

Claude may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/claude-review [context] — Request a code review
/claude-query <question> — Ask about the PR or codebase

meta-codesync · 2026-05-29T20:13:58Z

@joshkang97 merged this pull request in 26a501b.

fix

4dd4947

meta-cla Bot added the CLA Signed label May 27, 2026

joshkang97 requested a review from xingbowang May 27, 2026 17:07

release note

db92796

release note update

f1126ee

meta-codesync Bot closed this in 26a501b May 29, 2026

facebook-github-tools Bot added the Merged label May 29, 2026

Uh oh!

Conversation

joshkang97 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ clang-tidy: 6 warning(s) on changed lines

Summary by check

Details

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Code Review - OBSOLETE

🟡 Codex Code Review

meta-codesync Bot commented May 27, 2026

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review - OBSOLETE

✅ Claude Code Review

Summary

Findings

🔴 HIGH

🟡 MEDIUM

M1. Test does not verify data correctness — tiered_compaction_test.cc:2949-2957

M2. Missing test for kFullRange case — tiered_compaction_test.cc

🟢 LOW / NIT

L1. No test for proximal_after_seqno_ == kMaxSequenceNumber edge case

L2. No test for range tombstones straddling the boundary

Cross-Component Analysis

Positive Observations

meta-codesync Bot commented May 29, 2026

meta-codesync Bot commented May 29, 2026

github-actions Bot commented May 29, 2026

🟡 Codex Code Review

github-actions Bot commented May 29, 2026

✅ Claude Code Review

Summary

Findings

🔴 HIGH

🟡 MEDIUM

M1. Test covers only kNonLastRange case -- tiered_compaction_test.cc

M2. No test for tombstone at proximal_after_seqno_ + 1 going to proximal -- tiered_compaction_test.cc

🟢 LOW / NIT

L1. Snapshot in test is not released before compaction -- tiered_compaction_test.cc

L2. l6_range_deletions variable declared but accumulation loop could be simplified -- tiered_compaction_test.cc

L3. Comment quality is good -- compaction_job.cc

Cross-Component Analysis

Positive Observations

meta-codesync Bot commented May 29, 2026

Labels

2 participants

joshkang97 commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

M1. Test does not verify data correctness — `tiered_compaction_test.cc:2949-2957`

M2. Missing test for kFullRange case — `tiered_compaction_test.cc`

L1. No test for `proximal_after_seqno_ == kMaxSequenceNumber` edge case

M1. Test covers only `kNonLastRange` case -- `tiered_compaction_test.cc`

M2. No test for tombstone at `proximal_after_seqno_ + 1` going to proximal -- `tiered_compaction_test.cc`

L1. Snapshot in test is not released before compaction -- `tiered_compaction_test.cc`

L2. `l6_range_deletions` variable declared but accumulation loop could be simplified -- `tiered_compaction_test.cc`

L3. Comment quality is good -- `compaction_job.cc`