Skip to content

Fix tiered compaction incorrectly moving range tombstones upwards#14795

Closed
joshkang97 wants to merge 3 commits into
facebook:mainfrom
joshkang97:fix_read_triggered_compactions
Closed

Fix tiered compaction incorrectly moving range tombstones upwards#14795
joshkang97 wants to merge 3 commits into
facebook:mainfrom
joshkang97:fix_read_triggered_compactions

Conversation

@joshkang97

@joshkang97 joshkang97 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes an off-by-one bug in how sequence numbers are handled when splitting range tombstones across output levels in per-key-placement (tiered) compaction. The bug was either introduced or propagated in #13256.

Point entries move to proximal output only when their sequence number is strictly greater than proximal_after_seqno_, but range tombstones were previously split using an inclusive lower bound at that same seqno. That allowed a range tombstone at the boundary to be emitted to the proximal level while point keys at the same seqno stayed in the last level, which could create overlapping files in the proximal level (caught by force_consistency_checks as L<n> has overlapping ranges).

This matters when proximal_output_range_type_ is kNonLastRange: the compaction only owns the selected proximal-level input range, so existing last-level data at the split boundary must stay in the last level. In kFullRange, the compaction owns the relevant proximal-level range, so newer last-level data can be safely emitted to proximal output.

The fix splits range tombstones at proximal_after_seqno_ + 1 (saturating at kMaxSequenceNumber), so the half-open [lower, upper) tombstone filter lands on the same boundary as the strict seqno > proximal_after_seqno_ rule used for point keys.

Test Plan

New regression test PrecludeLastLevelTestBase.RangeDelAtProximalSeqnoBoundaryStaysInLastLevel uses a targeted manual compaction to exercise the kNonLastRange case directly, verifying a boundary range tombstone stays in the last level instead of widening the proximal output range.

Worked example:

Initial state:

  L6 (last level):   Put(Key 2)@s1,  Put(Key 12)@s2,  RangeDel[Key 2, Key 12)@s3
  L5 (proximal):     file A [Key 0 .. Key 4]@s4-s5     file B [Key 5 .. Key 9]@s6-s7

  preclude_last_level_min_seqno is forced to 0 via sync point.

Manual CompactFiles selects only L5 file B + the L6 file (output level 6).
Only part of the proximal level is selected, so this is the kNonLastRange case:

  max_last_level_seqno  = 3
  proximal_after_seqno_ = max(0, 3) = 3

  Point keys Key 5 (s6) and Key 9 (s7): both seqno > 3 -> proximal output (L5)   OK

  Range tombstone @s3:
    Old (buggy):  proximal keep range [3, MAX) includes s3, so the tombstone is
                  emitted to the proximal output. The output is built from L5 input
                  file B [Key 5 .. Key 9], but the tombstone covers [Key 2, Key 12),
                  so the proximal output file starts at Key 2 and spills past Key 5
                  into the existing, untouched L5 file A [Key 0 .. Key 4].
                  Two overlapping files in L5 -> Corruption.

    New (fixed):  split at proximal_after_seqno_ + 1 = 4.
                  proximal keep [4, MAX) excludes s3; last-level keep [0, 4) includes
                  s3 -> tombstone stays in the last level (L6).   OK

Verification (debug build, make -j64 tiered_compaction_test):

  • Without the fix, the test fails at the CompactFiles call:
    tiered_compaction_test.cc:2940: Failure
    Corruption: force_consistency_checks(DEBUG): VersionBuilder: L5 has overlapping ranges:
      file #11 largest key: Key(4) seq:5, type:1 (Put) vs.
      file #17 smallest key: Key(2) seq:3, type:15 (range deletion)
    
  • With the fix, the test passes (range deletions absent from L5, still present in L6).
@meta-cla meta-cla Bot added the CLA Signed label May 27, 2026
@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown

⚠️ clang-tidy: 6 warning(s) on changed lines

Completed in 524.5s.

Summary by check

Check Count
concurrency-mt-unsafe 3
cppcoreguidelines-avoid-non-const-global-variables 3
Total 6

Details

db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:25:56: warning: variable 'wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc (4 warning(s))
db_stress_tool/db_stress_tool.cc:41:5: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:416:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:439:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:448:9: warning: function is not thread safe [concurrency-mt-unsafe]
@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 4dd4947


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 27, 2026

Copy link
Copy Markdown

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

@joshkang97 joshkang97 requested a review from xingbowang May 27, 2026 17:07
@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 4dd4947


Summary

Correct, well-scoped bug fix for an off-by-one inconsistency between point key routing (seq > proximal_after_seqno_) and range tombstone filtering (seq >= proximal_after_seqno_) during tiered compaction with per-key placement. The fix aligns both to use seq > proximal_after_seqno_ by incrementing the split seqno by 1 (with proper overflow guard). No correctness, performance, or compatibility concerns.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Test does not verify data correctness — tiered_compaction_test.cc:2949-2957
  • Issue: The regression test only checks num_range_deletions metadata counts to verify range tombstones stayed in L6. It does not verify functional correctness (e.g., that point lookups for deleted keys return NOT_FOUND, or that range scans produce correct results).
  • Root cause: The test validates placement but not the end-to-end semantics of the range deletion.
  • Suggested fix: Add point lookups for keys within the deleted range (Key(2)..Key(12)) to verify they return the expected values given the snapshot and deletion, and/or a full iterator scan to verify correctness.
M2. Missing test for kFullRange case — tiered_compaction_test.cc
  • Issue: The test only exercises the kNonLastRange code path (level-style compaction with partial proximal input). The kFullRange path (universal compaction with all proximal files included) uses the same keep_seqno_range logic but is not tested with the boundary condition.
  • Root cause: kFullRange typically has keep_in_last_level_through_seqno_ = 0, so the boundary condition is less likely to trigger, but coverage would increase confidence.
  • Suggested fix: Consider adding a companion test using universal compaction to exercise kFullRange.

🟢 LOW / NIT

L1. No test for proximal_after_seqno_ == kMaxSequenceNumber edge case
  • Issue: When proximal_after_seqno_ == kMaxSequenceNumber, the overflow guard prevents the increment. This is correct because per-key placement is effectively disabled. However, no test verifies this.
  • Suggested fix: Low priority — this case is an effective no-op and is guarded by the check.
L2. No test for range tombstones straddling the boundary
  • Issue: The test has a single range tombstone at the boundary seqno. Testing with multiple range tombstones at seqnos on both sides of the boundary would more thoroughly validate the split logic.
  • Suggested fix: Consider extending the test or adding a companion test.

Cross-Component Analysis

Context Code executes? Assumptions hold? Action needed?
kNonLastRange YES YES — primary bug scenario Fixed by this PR
kFullRange YES YES Safe
kDisabled NO N/A Safe
WritePreparedTxnDB Potentially Same seqno routing Safe
Remote compaction YES Same CompactionJob logic (not overridden) Safe
User-defined timestamps YES Seqno logic orthogonal Safe
FIFO/Universal Partial Universal supports per-key placement; FIFO does not Safe
Snapshots YES Placement level doesn't affect visibility Safe

Assumption stress-test results:

  • Claim: "split them at the next seqno" — Verified. [proximal_after_seqno_+1, kMaxSequenceNumber)seq > proximal_after_seqno_. No gap, no overlap.
  • Claim: overflow guard prevents issues — Verified. At kMaxSequenceNumber, proximal filters out all tombstones (correct — nothing routes to proximal).
  • CompactForTieringCollector path — Verified unchanged. Uses original proximal_after_seqno_ at line 2590.

Positive Observations

  • Surgical fix: Minimal (5 lines of logic + 3 lines of comments), precisely targets root cause.
  • Good comment: Clearly explains WHY the +1 adjustment is needed.
  • Proper overflow guard: Prevents integer overflow at kMaxSequenceNumber.
  • Well-designed test: Sync points for determinism, targets kNonLastRange, confirmed to fail with old code.
  • No performance impact: Single comparison + conditional increment per output file.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 29, 2026

Copy link
Copy Markdown

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

@meta-codesync

meta-codesync Bot commented May 29, 2026

Copy link
Copy Markdown

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D106528471.

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit f1126ee


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit f1126ee


Summary

Well-scoped, correct bug fix for an off-by-one error in tiered compaction range tombstone routing. The fix aligns range tombstone keep_seqno_range boundaries with the strict > comparison used for point keys by incrementing by 1 (with saturation). The test is well-targeted and the release note is appropriate.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Test covers only kNonLastRange case -- tiered_compaction_test.cc
  • Issue: The new test exercises the kNonLastRange case (partial proximal level ownership via CompactFiles). The kFullRange case, where the compaction owns the entire proximal range, is not tested for this specific boundary condition.
  • Root cause: The kFullRange case is less likely to manifest as corruption (no untouched proximal files to overlap with), but the off-by-one bug still applies logically.
  • Suggested fix: Consider adding a follow-up test for kFullRange to confirm the boundary tombstone is correctly placed. Not blocking since the core bug path (kNonLastRange) is covered.
M2. No test for tombstone at proximal_after_seqno_ + 1 going to proximal -- tiered_compaction_test.cc
  • Issue: The test verifies the boundary tombstone at seq == proximal_after_seqno_ stays in last level, but does not verify that a tombstone at seq == proximal_after_seqno_ + 1 correctly goes to the proximal level.
  • Suggested fix: Adding a companion assertion for the "just above boundary" case would strengthen the regression test and guard against future off-by-two regressions.

🟢 LOW / NIT

L1. Snapshot in test is not released before compaction -- tiered_compaction_test.cc
  • Issue: ManagedSnapshot snapshot(db_.get()) is taken after the Put calls but before the DeleteRange. The snapshot is held throughout the test and released by RAII at scope exit. This is intentional (it prevents the range tombstone from being dropped during compaction since there's a live snapshot below it), but a brief comment explaining why the snapshot is needed would improve readability.
L2. l6_range_deletions variable declared but accumulation loop could be simplified -- tiered_compaction_test.cc
  • Issue: The test declares uint64_t l6_range_deletions = 0 then iterates L6 files accumulating num_range_deletions. Since the test creates a very specific file layout (1 L6 file), a simpler assertion like ASSERT_EQ(1, GetLevelFileMetadatas(6).size()); ASSERT_GT(GetLevelFileMetadatas(6)[0]->num_range_deletions, 0) would be more direct. This is purely stylistic and non-blocking.
L3. Comment quality is good -- compaction_job.cc
  • Observation: The added comment clearly explains why the +1 adjustment is needed and references the [lower, upper) filter semantics. This is well-written and sufficient.

Cross-Component Analysis

Context Affected? Safe? Notes
WritePreparedTxnDB Yes (uses same compaction) Yes Same CompactionJob code path
ReadOnly DB No N/A No compaction
CompactionService Yes (shared code) Yes FinishCompactionOutputFile is in shared CompactionJob, not CompactionServiceJob
User-defined timestamps Yes Yes Timestamps are orthogonal to seqno routing
MemPurge No N/A Different code path
FIFO compaction No N/A SupportsPerKeyPlacement() is false
Universal compaction Possible Yes Can support per-key placement; fix applies

Saturation analysis: When proximal_after_seqno_ == kMaxSequenceNumber (default), SupportsPerKeyPlacement() returns false, so the fixed code path is never entered. The saturation check (if (range_del_split_seqno < kMaxSequenceNumber)) is a correct defensive measure for the theoretical case.

Data consumer analysis: keep_seqno_range is constructed in exactly one location and consumed by exactly one function (AddRangeDels). No cross-component data consumer issues.

Assumption stress-test: The claim "seq_ >= X+1 is equivalent to seq_ > X for unsigned integers" holds for all values where X+1 doesn't overflow. The saturation check prevents overflow. Valid for all reachable values of proximal_after_seqno_.

Positive Observations

  • The fix is surgically precise: 6 lines of logic change, no refactoring, no side effects
  • The boundary math is correct: proximal keeps [proximal_after_seqno_+1, kMaxSequenceNumber) and last keeps [0, proximal_after_seqno_+1), giving a perfect partition of [0, kMaxSequenceNumber) with no gap or overlap
  • The test directly reproduces the corruption scenario from the bug description
  • The test uses sync points (not timing) for determinism
  • The release note is concise and appropriate for external users
  • No performance impact (code runs once per output file, not per key)
  • No serialization or compatibility impact

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 29, 2026

Copy link
Copy Markdown

@joshkang97 merged this pull request in 26a501b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants