Add remote compaction format compatibility test to check_format_compatible.sh (#14798)#14798
Add remote compaction format compatibility test to check_format_compatible.sh (#14798)#14798hx235 wants to merge 1 commit into
Conversation
|
@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106321150. |
|
| Check | Count |
|---|---|
cert-err58-cpp |
4 |
| Total | 4 |
Details
tools/ldb_cmd.cc (4 warning(s))
tools/ldb_cmd.cc:5469:19: warning: initialization of 'kInputFile' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5470:19: warning: initialization of 'kResultFile' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5537:51: warning: initialization of 'ARG_JOB_DIR' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5590:50: warning: initialization of 'ARG_JOB_DIR' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands (`remote_compaction_primary`, `remote_compaction_worker`) coordinate via local files: primary's `CompactRange()` writes `input.bin` through `Schedule()` and polls for `result.bin` through `Wait()`; worker polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. For each old ref in `db_forward_with_options_refs` (10.4.fb+), the script tests both directions — current primary + old worker and old primary + current worker — to catch serialization incompatibilities. Old refs lacking the commands are skipped gracefully. Differential Revision: D106321150
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit c74dd23 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit c74dd23 SummaryWell-structured PR that adds cross-version remote compaction format compatibility testing via two new ldb commands and shell script integration. The design follows existing High-severity findings (1):
Full review (click to expand)Findings🔴 HIGHH1. Worker silently ignores
|
| Context | Applies? | Safe? | Notes |
|---|---|---|---|
| LD_LIBRARY_PATH for old binary | No | N/A | OLD_LDB="./ldb" runs without env wrapper; only CURRENT_LDB uses env |
| DB lock conflict (primary + worker) | Yes | Yes | OpenAndCompact opens as secondary instance; no lock conflict |
| Stale input.bin from previous run | No | N/A | rm -rf "$test_dir" && mkdir -p "$test_dir" cleans before each test |
| Multi-subcompaction race | No | N/A | Default max_subcompactions=1 with num_levels=2 produces exactly one Schedule/Wait cycle |
| SANITY_CHECK mode | Yes | Yes | Guarded by if [ ! "$SANITY_CHECK" ] around the ldb save and test execution |
| SHORT_TEST mode | Yes | Yes | Runs only first ref in db_forward_with_options_refs; old branch may lack command, gets skipped |
| Cross-filesystem rename in AtomicWriteStringToFile | No | N/A | tmp file and target are in the same directory (job_dir) |
| Word splitting of CURRENT_LDB | Yes | Yes | Path /tmp/rocksdb_cs_compat_${USER:-$$}/current_ldb never contains spaces |
| Both directions tested | Yes | Yes | First loop calls run_cs_compat_test twice: current-primary+old-worker AND old-primary+current-worker |
Positive Observations
- The design elegantly reuses the existing
check_format_compatible.shinfrastructure without requiring changes to the build system or test framework. - Testing both serialization directions (current-to-old and old-to-current) is thorough.
- The
grep -q remote_compaction_primarycheck gracefully skips old branches that lack the commands. - The
env LD_LIBRARY_PATH=...pattern for running the saved current binary is a clean solution for dynamic linking. - The test data generation (4 flushes x 20 keys) is sufficient to produce a real L0->L1 compaction that exercises the core CompactionServiceInput/Result serialization paths.
- The PR includes comprehensive manual testing across multiple scenarios (same-version, cross-version, SANITY_CHECK, SHORT_TEST, CI break-compat rehearsal).
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands (`remote_compaction_primary`, `remote_compaction_worker`) coordinate via local files: primary's `CompactRange()` writes `input.bin` through `Schedule()` and polls for `result.bin` through `Wait()`; worker polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. For each old ref in `db_forward_with_options_refs` (10.4.fb+), the script tests both directions — current primary + old worker and old primary + current worker — to catch serialization incompatibilities. Old refs lacking the commands are skipped gracefully. Differential Revision: D106321150
|
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands (`remote_compaction_primary`, `remote_compaction_worker`) coordinate via local files: primary's `CompactRange()` writes `input.bin` through `Schedule()` and polls for `result.bin` through `Wait()`; worker polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. For each old ref in `db_forward_with_options_refs` (10.4.fb+), the script tests both directions — current primary + old worker and old primary + current worker — to catch serialization incompatibilities. Old refs lacking the commands are skipped gracefully. Differential Revision: D106321150
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 7b5018d ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 7b5018d SummaryWell-structured PR that adds cross-version format compatibility testing for remote compaction serialization (CompactionServiceInput/Result). The design is sound -- using actual CompactRange + OpenAndCompact exercises the real serialization path rather than synthetic round-trips, which is the right choice for a format compatibility test. The file-based coordination protocol is simple and effective. High-severity findings (1):
Full review (click to expand)Findings🔴 HIGHH1. Orphaned worker processes on script interruption --
|
| Context | Executes? | Safe? | Notes |
|---|---|---|---|
| SANITY_CHECK mode | No | Yes | Guarded |
| SHORT_TEST mode | Yes (first ref) | Yes | Works correctly |
| Old refs without commands | No | Yes | grep -q skips gracefully |
| Default max_subcompactions=1 | Yes | Yes | Single Schedule() call |
Positive Observations
- Atomic write via
tmp+renameprevents partial reads - Bidirectional testing covers both serialization directions
- Graceful skip for old refs lacking the commands
- Consistent error handling following LDBCommand patterns
num_levels=2cleanly constrains to single compaction job
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands (`remote_compaction_primary`, `remote_compaction_worker`) coordinate via local files: primary's `CompactRange()` writes `input.bin` through `Schedule()` and polls for `result.bin` through `Wait()`; worker polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. For each old ref in `db_forward_with_options_refs` (10.4.fb+), the script tests both directions — current primary + old worker and old primary + current worker — to catch serialization incompatibilities. Old refs lacking the commands are skipped gracefully. Differential Revision: D106321150
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands (`remote_compaction_primary`, `remote_compaction_worker`) coordinate via local files: primary's `CompactRange()` writes `input.bin` through `Schedule()` and polls for `result.bin` through `Wait()`; worker polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. For each old ref in `db_forward_with_options_refs` (10.4.fb+), the script tests both directions — current primary + old worker and old primary + current worker — to catch serialization incompatibilities. Old refs lacking the commands are skipped gracefully. Differential Revision: D106321150
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands coordinate via local files: - `remote_compaction_primary`: opens an existing DB and runs `CompactRange()` through `LocalFileCompactionService`, which writes `input.bin` via `Schedule()` and polls for `result.bin` via `Wait()`. - `remote_compaction_worker`: polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. The test script creates a DB using `generate_random_db.sh` with the primary's ldb binary (new optional 3rd argument) so the OPTIONS file matches the primary's version. An overlap key is written to ensure `CompactRange` triggers a real compaction (not a trivial move). For each old ref in `db_forward_with_options_refs`, the script tests both directions -- current primary + old worker and old primary + current worker -- to catch wire-format incompatibilities in `CompactionServiceInput`/`CompactionServiceResult`. Old refs lacking the commands are skipped gracefully. A same-version smoke test runs before cross-version tests to catch integration failures early. The saved ldb binary falls back from `test_dir` to `$PWD/tmp/cs` if `test_dir` is on a noexec filesystem (e.g. /dev/shm in CI Docker containers). Failure to run the saved binary is a hard error with exit code and error message printed. Differential Revision: D106321150
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands coordinate via local files: - `remote_compaction_primary`: opens an existing DB and runs `CompactRange()` through `LocalFileCompactionService`, which writes `input.bin` via `Schedule()` and polls for `result.bin` via `Wait()`. - `remote_compaction_worker`: polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. The test script creates a DB using `generate_random_db.sh` with the primary's ldb binary (new optional 3rd argument) so the OPTIONS file matches the primary's version. An overlap key is written to ensure `CompactRange` triggers a real compaction (not a trivial move). For each old ref in `db_forward_with_options_refs`, the script tests both directions -- current primary + old worker and old primary + current worker -- to catch wire-format incompatibilities in `CompactionServiceInput`/`CompactionServiceResult`. Old refs lacking the commands are skipped gracefully. A same-version smoke test runs before cross-version tests to catch integration failures early. The saved ldb binary falls back from `test_dir` to `$PWD/tmp/cs` if `test_dir` is on a noexec filesystem (e.g. /dev/shm in CI Docker containers). Failure to run the saved binary is a hard error with exit code and error message printed. Differential Revision: D106321150
…tible.sh (facebook#14798) Summary: Add cross-version format compatibility testing for remote compaction to `check_format_compatible.sh`. Two new ldb commands coordinate via local files: - `remote_compaction_primary`: opens an existing DB and runs `CompactRange()` through `LocalFileCompactionService`, which writes `input.bin` via `Schedule()` and polls for `result.bin` via `Wait()`. - `remote_compaction_worker`: polls for `input.bin`, calls `OpenAndCompact()`, writes `result.bin`. The test script creates a DB using `generate_random_db.sh` with the primary's ldb binary (new optional 3rd argument) so the OPTIONS file matches the primary's version. An overlap key is written to ensure `CompactRange` triggers a real compaction (not a trivial move). For each old ref in `db_forward_with_options_refs`, the script tests both directions -- current primary + old worker and old primary + current worker -- to catch wire-format incompatibilities in `CompactionServiceInput`/`CompactionServiceResult`. Old refs lacking the commands are skipped gracefully. A same-version smoke test runs before cross-version tests to catch integration failures early. The saved ldb binary falls back from `test_dir` to `$PWD/tmp/cs` if `test_dir` is on a noexec filesystem (e.g. /dev/shm in CI Docker containers). Failure to run the saved binary is a hard error with exit code and error message printed. Differential Revision: D106321150
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit c02deb5 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
This pull request has been merged in 5177a8e. |
Summary:
Add cross-version format compatibility testing for remote compaction to
check_format_compatible.shas primary and worker RocksDB instances can run different versions.Two new ldb commands coordinate via local files:
remote_compaction_primary: opens an existing DB and runsCompactRange()throughLocalFileCompactionService, which writesinput.binviaSchedule()and polls forresult.binviaWait().remote_compaction_worker: polls forinput.bin, callsOpenAndCompact(), writesresult.bin.The test script creates a DB using
generate_random_db.shwith the primary's ldb binary (new optional 3rd argument) so the OPTIONS file matches the primary's version. An overlap key is written to ensureCompactRangetriggers a real compaction (not a trivial move). For each old ref indb_forward_with_options_refs, the script tests both directions -- current primary + old worker and old primary + current worker -- to catch wire-format incompatibilities inCompactionServiceInput/CompactionServiceResult. Old refs lacking the commands are skipped gracefully.Test plan:
1. generate_random_db.sh 3rd arg (custom ldb) -- PASSED:
2. Simulated noexec: error message with exit code -- VERIFIED:
3. SANITY_CHECK mode -- PASSED:
4. SHORT_TEST mode -- PASSED:
5. Cross-version ldb tests (sibling PR fix verification) -- ALL PASSED:
Four ldb binaries built from OSS repo:
old_ldb: 11.1.fb (no kReadTriggered) + this PR ldb commands, strict ParseArraybroken_ldb: main (has kReadTriggered) + this PR ldb commands, strict ParseArray, -1 hack removedfixed_ldb: main (has kReadTriggered) + this PR + sibling PR SizeTolerantDeserializeArrayfixed_old_ldb: 11.1.fb (no kReadTriggered) + this PR + sibling PR SizeTolerantDeserializeArrayTest 1: Bug repro -- primary without fix (main, has kReadTriggered) + worker without fix (11.1.fb, no kReadTriggered)
Test 2: Bug repro -- primary without fix (11.1.fb, no kReadTriggered) + worker without fix (main, has kReadTriggered)
Test 3: Fix -- primary with fix (main, has kReadTriggered) + worker without fix (11.1.fb, no kReadTriggered)
Test 4: Fix -- primary with fix (11.1.fb, no kReadTriggered) + worker without fix (main, has kReadTriggered)
Test 5: Sanity -- primary with fix (main) + worker with fix (main)
6. CI break-compat rehearsal -- FAIL (expected: FAIL):
Branches pushed to upstream (facebook/rocksdb):
11.1.fb-no-rtc-with-rc-ldb: 11.1.fb + remote compaction ldb commands (narrower counts)main-with-rtc-rc-ldb-break-compat: main + this PR + -1 hack removed + PR Fall back to local compaction and report kUseLocal on remote result parse failure (#14799) #14799 fallback-to-local revertedCI job: https://github.com/facebook/rocksdb/actions/runs/26896515583/job/79336958934
7. CI smoke test -- PASS (expected: PASS):
Branch
rc-ldb-smoke-test: upstream/main + this PR only (5 files).CI job: https://github.com/facebook/rocksdb/actions/runs/26891267696