Fix false-positive corruption error with remote compaction#14808
Fix false-positive corruption error with remote compaction#14808joshkang97 wants to merge 1 commit into
Conversation
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D107128357. |
✅ clang-tidy: No findings on changed linesCompleted in 288.8s. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit e7cc99e ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit e7cc99e SummaryCorrect, well-targeted fix for a real false-positive corruption error in the remote compaction path. The serialization change and test are sound. However, the same systemic issue (missing serialization entries) affects 4 additional High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Four additional
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | No (no compaction_service usage) | N/A | None |
| ReadOnly DB | No (no compaction in read-only) | N/A | None |
| User-defined timestamps | Possible | Yes (UDT has separate verification bypass) | None |
| FIFO / Universal compaction | Possible via remote | Yes (flag is compaction-style agnostic) | None |
| Old primary + new worker | Yes | Yes (ignore_unknown_options=true skips unknown field) |
None |
| New primary + old worker | Yes | Yes (defaults to true, same as current behavior) |
None |
Backward/forward compatibility: Fully compatible. The ConfigOptions with ignore_unknown_options = true in CompactionServiceResult::Read() ensures old readers silently ignore the new field, and new readers use the safe default true when reading old data.
Aggregation correctness: CompactionJobStats::Add() uses has_accurate_num_input_records &= stats.has_accurate_num_input_records, so if any subcompaction signals inaccuracy, the aggregate correctly becomes false.
Positive Observations
- The fix is minimal, well-scoped, and addresses the exact root cause.
- The implementation follows existing serialization patterns precisely (
OptionType::kBoolean,OptionVerificationType::kNormal,OptionTypeFlags::kNone). - The test reuses established infrastructure (
CompactionServiceTestfixture,GenerateTestData(), existing SyncPoint). - The release note is clear and appropriately concise.
- No performance impact -- serialization happens once per remote compaction (cold path).
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@joshkang97 merged this pull request in 768de6e. |
Summary
Fix a false-positive compaction corruption error in the remote compaction path. Remote workers can mark
CompactionJobStats::num_input_recordsas unreliable when input iteration uses seek/skip behavior, but the remote result serialization was droppinghas_accurate_num_input_records. The primary then deserialized the flag as its defaulttrue, trusted an unreliable zero input-record count, and raisedCompaction number of input keys does not match number of keys processed.This change serializes the accuracy flag with the rest of
CompactionJobStatsso the primary preserves the remote worker's "do not verify this count" signal. It also adds a targeted regression test and a release note for the bug fix.Test Plan
New regression test that triggers corruption error without the fix by modifying
has_accurate_num_input_records=false