Skip to content

CPU corruption injector: gdb register flip into one db_stress op (#14858)#14858

Closed
hx235 wants to merge 1 commit into
mainfrom
export-D107999835
Closed

CPU corruption injector: gdb register flip into one db_stress op (#14858)#14858
hx235 wants to merge 1 commit into
mainfrom
export-D107999835

Conversation

@hx235

@hx235 hx235 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

How one run works

  • The orchestration layer, coming up, randomly picks which stress test op instance (so corruption can land at different points in the LSM shape journey) and which target_fn of that op (so to cap instructions to step under a reasonable limit; injector.py in this PR randomly picks which instruction within the target_fn to inject (so corruption can land at different points of a target_fn).
  • Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
  • Navigate: The orchestration layer will pick op_index. entry_fn is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. injector_navigate.py breaks on entry_fn and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first target_fn within that entry_fn.

  • Warm up: injector_critical_instruction.py will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen target_fn by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the target_fn. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of target_fn to count and draw the critical instruction index, then corrupt that index at a later call.

  • Corrupt: on a later call of target_fn, injector_critical_instruction.py single-step to the m-th critical instruction and bit-flip the register through injector_register_corruption.py. The way to corrupt register depends on what instruction it is. If the current call of target_fn's m-th instruction is not a critical instruction, we will try next target_fn till running out of target_fn.

  • Record: injector_telemetry.py provides telemetry to capture the corruption for later analysis.

Test plan:

  1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

  2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run

{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}

Reviewed By: pdillinger

Differential Revision: D107999835

@meta-codesync

meta-codesync Bot commented Jun 16, 2026

Copy link
Copy Markdown

@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107999835.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 0.0s.

@meta-codesync meta-codesync Bot changed the title CPU corruption injector: gdb register flip into one db_stress op Jun 16, 2026
meta-codesync Bot pushed a commit that referenced this pull request Jun 16, 2026
)

Summary:

Injection layer of the CPU corruption injector (tools/cpu_corruption_injector/injector.py), runs inside gdb and corrupt a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection is at db_stress (#14852); orchestration is coming up.

How one run works 
- The orchestration layer, coming up, randomly picks which op instance (so corruption lands at different points in the LSM's life) and which target_fn per run (so it has a reasonable number of instructions to step under a reasonable time limit); injector.py picks which instruction within target_fn.
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Reach the op: entry_fn is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. The orchestration layer picks op_index . `injector_navigate.py` breaks on entry_fn and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` (within `entry_fn`) by the orchestration layer. In order to do that, it needs to approximate how many such instructions within `target_fn`. Hence we have this warm-up phase. It single-steps the first call of target_fn to count and pick the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of target_fn, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. 

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 6/6

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
@meta-codesync meta-codesync Bot force-pushed the export-D107999835 branch from 150e711 to babebbd Compare June 16, 2026 07:03
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit babebbd


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit babebbd


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit babebbd


Summary

This PR adds a defensive CPU corruption injector (Python gdb scripts) and a db_stress verification layer (--verify_cpu_corruption_dir) that detects silent data corruption by reading back the entire keyspace after each write/flush/compaction op. The C++ changes also refactor TestCompactFiles and TestCompactRange from void to Status. The design is well-motivated, the separation between injection (Python/gdb) and detection (C++/db_stress) is clean, and the startup validation is thorough.

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

No high-severity findings.

🟡 MEDIUM

M1. ClassifyReadBack does not verify wide-column entity data — db_stress_test_base.cc:~955
  • Issue: The new ClassifyReadBack uses db_->Get() and compares the returned value byte-for-byte against GenerateValue(expected.GetValueBase(), ...). When use_put_entity_one_in > 0, some keys are stored via PutEntity with wide columns. While db_->Get() returns the default column value (which matches GenerateValue), the non-default wide columns are not verified. A CPU corruption that corrupts only a non-default wide column would go undetected.
  • Root cause: The tool uses Get() for simplicity. The default column carries the canonical value, so the primary corruption vector is covered.
  • Suggested fix: Acceptable for the stated scope. Consider adding a comment noting the limitation, or using GetEntity for entity keys.
M2. Python: _in_allowed_module uses broad substring matching — injector_navigate.py:~260
  • Issue: _in_allowed_module checks if tokens like "memcpy", "zstd", "hash_bytes" appear as substrings of the lowered function name. This could match unintended functions (e.g., my_memcpy_helper in a non-rocksdb library), causing the injector to single-step through irrelevant code (wasting time, not causing incorrect behavior).
  • Root cause: Substring matching chosen for simplicity over exact symbol matching.
  • Suggested fix: Acceptable trade-off. The false-positive case (stepping extra functions) is benign.

🟢 LOW / NIT

L1. Duplicated verification logic with TODO — db_stress_test_base.cc:~1038
  • Issue: The full-keyspace read-back duplicates the canonical VerifyDb/VerifyOrSyncValue verifier (no_batched_ops_stress.cc:3293). The new ClassifyReadBack is simpler but could diverge over time.
  • Suggested fix: Track the acknowledged TODO. Simplicity justified by constrained environment (threads=1, no fault injection).
L2. WriteDataCorruption uses raw C file I/O — db_stress_test_base.cc:~988
  • Issue: Uses fopen/fputs/fclose instead of RAII wrappers.
  • Suggested fix: Minor style issue in error-reporting path that runs at most once.
L3. DataCorruption struct uses Slice with borrowed lifetime — db_stress_test_base.cc:~934
  • Issue: Safe in current code (consumed immediately), but fragile if captured across scopes.
  • Suggested fix: A comment noting the borrowed lifetime would help.
L4. Python: parse_known_args silently ignores unknown arguments — injector.py:~100
  • Issue: Typos in argument names silently discarded.
  • Suggested fix: Consider parse_args or logging unknown args.
L5. Python: op_index silently clamped to 1 — injector.py:~106
  • Issue: max(parsed.op_index, 1) silently clamps non-positive values.
  • Suggested fix: Consider raising an error for non-positive values.
L6. No unit tests for C++ verification code
  • Issue: C++ verification functions lack unit tests. E2E tests planned for orchestration PR.
  • Suggested fix: Acceptable deferral per PR description.

Cross-Component Analysis

Context Executes? Assumptions hold? Action
WritePreparedTxnDB Possible YES - Get reads latest committed Safe
ReadOnly DB NO N/A Safe
User-defined timestamps YES YES - supplies GetNowNanos() read timestamp Safe
BlobDB YES YES - Get dereferences blobs Safe
FIFO/Universal compaction YES YES Safe
Concurrent memtable writes NO - threads=1 enforced N/A Safe
test_batches_snapshots NO - startup rejects N/A Safe
test_multi_ops_txns NO - startup rejects N/A Safe

Assumption stress-test: All three core claims verified with no counterexample found. ExpectedValue::Exists() is safe under threads=1. All fault injection flags are checked. The void->Status refactoring is complete across all override sites.

Positive Observations

  • Thorough startup validation rejecting all fault injection and multi-threaded configurations
  • Clean Python/C++ separation with JSON contract
  • Correct void->Status refactoring preserving existing control flow
  • Well-documented Python modules with clear gdb lifecycle management
  • Good defensive coding patterns throughout

The full review is also saved to review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit babebbd


Summary

This PR adds a well-structured CPU corruption injection framework for db_stress: a Python gdb-based injector that flips register bits in one op, and a C++ detection layer that does full-keyspace read-back verification. The design is sound, the startup validation is thorough, and the code is well-documented. The virtual function signature change from void to Status for TestCompactRange is a clean improvement.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Missing --use_timed_put_one_in guard may cause false positive "lost" reports — db_stress_tool.cc
  • Issue: The startup validation does not reject --use_timed_put_one_in > 0 combined with --verify_cpu_corruption_dir. Timed puts can expire via TTL-triggered compaction, causing keys to disappear from the DB even though the expected state model still thinks they exist. The full-keyspace read-back in MaybeVerifyCpuCorruption would classify such an expired key as a "lost" data corruption — a false positive.
  • Root cause: The startup guard checks fault injection flags and incompatible test modes but does not account for TTL-based key expiration.
  • Suggested fix: Add FLAGS_use_timed_put_one_in != 0 (and possibly FLAGS_compaction_ttl > 0) to the startup rejection block, or document that timed puts are safe because the single-op injection is too fast for TTL expiration.
M2. No validation against --compaction_ttldb_stress_tool.cc
  • Issue: Similar to M1, --compaction_ttl > 0 can cause keys to be dropped during compaction. If a compaction op is the injected op and the full-keyspace read-back runs after it, keys dropped by TTL compaction would look like "lost" corruption.
  • Suggested fix: Same as M1 — either reject this combination at startup or document why it's safe.
M3. ClassifyReadBack does not handle Status::IOError gracefully — db_stress_test_base.cc:~945-950
  • Issue: If Get() returns an unexpected status (e.g., IOError due to a CPU-corruption-induced bad file offset), the code calls port::ImmediateExit(1) with only a stderr message. A CPU-corrupted register could produce an IOError from the read-back itself. This exit path does not write a data_corruption.*.json file, so the orchestration layer would have to infer what happened from the exit code alone.
  • Suggested fix: Consider classifying unexpected Get() errors as "detected-corruption" and writing the JSON file before exiting.
M4. WriteDataCorruption uses fopen/fputs/fclose without RAII — db_stress_test_base.cc:~985-995
  • Issue: If there's an interruption between fopen and fclose, the file handle leaks. Minor risk since this is a diagnostic tool.
  • Suggested fix: Consider std::ofstream or a cleanup guard for RAII consistency.

🟢 LOW / NIT

L1. DataCorruption Slice members are non-owning — db_stress_test_base.cc:~932-936
  • Issue: The Slice members point into db_value and expected_scratch which are alive in the calling scope. Safe but fragile if the struct is ever stored beyond the current scope.
  • Suggested fix: Consider a lifetime comment or owning std::string members.
L2. Missing test for testb/testw/testl/testq and cmpxchg rejection — test_injector_critical_instruction.py
  • Issue: The _CMP_TEST_RE regex handles test[bwlq]?\b and uses \b to reject cmpxchg, but neither is tested.
  • Suggested fix: Add test cases for testb (positive) and cmpxchg (negative rejection).
L3. _MAX_STEPS_PER_CALL = 20000 is not configurable — injector_navigate.py
  • Issue: Hardcoded cap. Well-commented but not exposed as a CLI parameter.
  • Suggested fix: Consider making it configurable if needed for large target functions.

Cross-Component Analysis

Context Executes? Assumptions hold? Action?
WritePreparedTxnDB YES YES (threads=1) Safe
User-defined timestamps YES YES (read ts set) Safe
Merge ops YES YES (Get returns merged) Safe
PutEntity YES YES (Get returns default col) Safe
BlobDB YES YES (transparent deref) Safe
FIFO/Universal compaction YES YES Safe
Concurrent writers NO (threads=1) N/A N/A
Timed put / TTL POSSIBLE May NOT hold See M1/M2

Positive Observations

  1. Thorough startup validation correctly rejects incompatible configurations
  2. Well-documented Python modules with clear architecture comments
  3. Clean virtual function signature change for TestCompactRange
  4. Correct Slice lifetime management in DataCorruption usage
  5. gdb signal handling (stop nopass print) prevents nested gdb conflicts
  6. Well-scoped instruction decoder focuses on data-carrying operations only
  7. Defensive Python error handling prevents gdb from leaving db_stress wedged

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@hx235 hx235 requested a review from pdillinger June 16, 2026 20:54
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 18, 2026
…ook#14852)

Summary:

Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
Summary:

Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
@meta-codesync meta-codesync Bot force-pushed the export-D107999835 branch from babebbd to 29d0654 Compare June 18, 2026 22:16
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
Summary:

Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
@meta-codesync meta-codesync Bot force-pushed the export-D107999835 branch from 29d0654 to 1c9e0ab Compare June 18, 2026 22:19
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 18, 2026
…ook#14852)

Summary:

Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR facebook#14866 test plan's spread in the outcome for verification of detection

Differential Revision: D107999834
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jul 1, 2026
…ook#14852)

Summary:

Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR facebook#14866 test plan's spread in the outcome for verification of detection

Reviewed By: pdillinger

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jul 1, 2026
Summary:
Pull Request resolved: #14852

Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR #14866 test plan's spread in the outcome for verification of detection

Reviewed By: pdillinger

Differential Revision: D107999834

fbshipit-source-id: 18decbf51dc56eec62b735136e2dfb4e1175b773
meta-codesync Bot pushed a commit that referenced this pull request Jul 1, 2026
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Reviewed By: pdillinger

Differential Revision: D107999835
@meta-codesync meta-codesync Bot force-pushed the export-D107999835 branch from 1c9e0ab to 82b207a Compare July 1, 2026 02:08
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Reviewed By: pdillinger

Differential Revision: D107999835
@meta-codesync meta-codesync Bot force-pushed the export-D107999835 branch from 82b207a to a48e3b8 Compare July 1, 2026 02:19
@meta-codesync meta-codesync Bot closed this in 3745f2e Jul 1, 2026
@meta-codesync meta-codesync Bot added the Merged label Jul 1, 2026
@meta-codesync

meta-codesync Bot commented Jul 1, 2026

Copy link
Copy Markdown

This pull request has been merged in 3745f2e.

meta-codesync Bot pushed a commit that referenced this pull request Jul 1, 2026
Summary:

Orchestration layer of the CPU corruption injector -- the glue between detection (#14852) and injection (#14858
).

One CPU-corruption injection says little on its own. What matters is the OUTCOME DISTRIBUTION over many injections -- how often a corruption is silently absorbed (NO_EFFECT), crashes the process (CRASH), is caught by an integrity check (CORRUPTION), or more importantly slips through as a silent data corruption (SDC) and the paths frequently leading to those outcomes. A trustworthy distribution needs a (somewhat) repeatable and reproducible harness as well as a db_stress configuration in which an injected corruption is both reachable and attributable to the chosen stress test op (i.e, write, foreground compaction or flush).

Therefore this PR implements a runner that can launch N independent runs for a chosen op type (i.e, write, foreground compaction or flush). Each run picks where to inject, runs db_stress under gdb via the `injector,py` (#14858), and is classified into one outcome bucket (#14852). 

The runner has DB_STRESS_PRESET -- the pinned db_stress config that isolates a single injected corruption (single-threaded, integrity checks on, other fault injection off, auto-compaction off). The runner also does gdb preflight that fails fast when the build or gdb cannot support injection because, for example, the hard-coded `target_fn` has changed its name, provides parallel launching of many runs and one summary.json per campaign (the outcome distribution plus each run's record). The whole run set is reproducible from one logged base_seed (run i uses base_seed + i).

**Test plan:**

Build: `make DEBUG_LEVEL=0 EXTRA_CXXFLAGS="-g -fno-inline" db_stress` 

# 1. Preflight (`verify_injection_site`) catches a build that can't support injection

Before doing any work, the runner has gdb confirm — on this exact binary — that every injection-site function resolves and gdb can read its source line. A good build logs:

```
INFO gdb check OK for op=compaction: rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add
```

A build that cannot support injection (functions renamed, fully inlined, or absent) fails fast with exit 2 before any run — forced here by pointing `--stress_cmd` at a non-db_stress binary:

```
$ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 1 --stress_cmd /bin/ls --report_dir /tmp/icc/preflight_demo
ERROR gdb could not set a breakpoint on these functions in ls (renamed, fully inlined, or not in this build?): rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add
Function "rocksdb::CompactionJob::Run" not defined.
Function "rocksdb::CompactionIterator::NextFromInput" not defined.
# exit code 2
```

So a broken/inlined build is rejected up front instead of silently producing `NO_INJECTION` runs.

# 2. Compaction op -- 100 runs
```
$ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```
**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 9 | 5 | 6 | 79 | 1 | 0 |

**Spread:**
- target_fn x outcome: `NextFromInput` {SDC 9, CORRUPTION 3, CRASH 2, NO_EFFECT 34, NO_INJECTION 1}; `BlockBuilder::Add` {CORRUPTION 2, CRASH 4, NO_EFFECT 45}.
- corruption_type x outcome: `bit_flip` {SDC 8, CORRUPTION 3, CRASH 6, NO_EFFECT 32}; `flag_flip` {SDC 1, CORRUPTION 2, NO_EFFECT 42}; `lane_bit_flip` {NO_EFFECT 5}.

**Analysis:** all 9 SDCs land on the read/iterate path (`NextFromInput`); corrupting the output writer (`BlockBuilder::Add`) never produced an SDC (its blocks are checksummed -- corruption there is caught or inert). The 5 detected `CORRUPTION`s are compaction's key-order and record-count cross-checks firing (both CompactRange and CompactFiles origins appear), correctly bucketed by the fix.

### A representative compaction SDC: `run_00000`

What we corrupted (`inject.json`):

```json
{"op":"compaction","op_index":17,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rdx,0x160(%r12)","register":"rdx","corruption_type":"bit_flip","before":"0x10","after":"0x18",
   "details":{"source":"rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:719"}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"wrong-value","cf":0,"key":70,"value_from_db":"010000000504070609080B0A0D0C0F0E070F105E78787878","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: OK"}
```

**Walkthrough:** a single bit flip on `rdx` (`0x10 -> 0x18`, 16 -> 24) at `CompactionIterator::NextFromInput` (`compaction_iterator.cc:719`) -- `rdx` holds the value LENGTH stored into the iterator's record (offset `0x160`). The length reads 24 instead of 16, so compaction copies a value 8 bytes too long into the output SST, absorbing adjacent bytes. The internal key is untouched (`ParseInternalKey` passes); the over-long value is the single Slice fed to both the paranoid validator and the SST builder, so the file is self-consistent and every checksum agrees. On read-back `Get(key=70)` returns OK with the wrong bytes -- `value_from_db` is the expected value (`...0F0E`) **plus 8 trailing bytes** (`070F105E78787878`). Silent: read OK, all checks pass, the value visibly grew. `classify()` routes `kind=wrong-value` to the SDC bucket.

### A representative compaction CORRUPTION (detected): `run_00007`

What we corrupted (`inject.json`):

```json
{"op":"compaction","op_index":41,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"mov    (%rbx),%rdi","register":"rdi","corruption_type":"bit_flip","before":"0x7fffeee8c1c0","after":"0x7fffeee8c1c2",
   "details":{"source":"rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","call_chain":["rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:781"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"compactfiles: Corruption: Compaction sees out-of-order keys."}
```

**Walkthrough:** a bit flip on `rdi` (`...c1c0 -> ...c1c2`, a key pointer) at `IterKey::SetKeyImpl` (`dbformat.h:941`), reached from `NextFromInput`, mis-sets the iterator's key so the next emitted key is out of order. Compaction's key-order check catches it and returns `compactfiles: Corruption: Compaction sees out-of-order keys`. The op then takes `SIGABRT`, but `classify()` reads the recorded `data_corruption` result before the crash signal, so the run is correctly bucketed `CORRUPTION` (the bucketization fix; pre-fix this surfaced as `CRASH`). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket.

# 3. Flush op -- 100 runs
```
$ python3 tools/cpu_corruption_injector/runner.py --op flush --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```
**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 2 | 12 | 11 | 74 | 1 | 0 |

**Spread:**
- target_fn x outcome: `BlockBuilder::Add` {SDC 1, CORRUPTION 5, CRASH 5, NO_EFFECT 49}; `NextFromInput` {SDC 1, CORRUPTION 7, CRASH 6, NO_EFFECT 25, NO_INJECTION 1}.
- corruption_type x outcome: `bit_flip` {SDC 2, CORRUPTION 9, CRASH 9, NO_EFFECT 32}; `flag_flip` {CORRUPTION 3, CRASH 2, NO_EFFECT 42}.

**Analysis:** flush mirrors compaction's mechanisms (shared iterator/builder). The 2 SDCs are a value/key-pointer corruption that slips past the checksums; the 12 corruptions are caught by the flush-time key-order / key-size integrity checks.

### A representative flush SDC: `run_00027`

What we corrupted (`inject.json`):

```json
{"op":"flush","op_index":16,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::BlockBuilder::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    (%rdi),%rax","register":"rax","corruption_type":"bit_flip","before":"0x7fffef059400","after":"0x7fffef059440",
   "details":{"source":"rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","call_chain":["rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","rocksdb::BlockBuilder::AddWithLastKeyImpl @ table/block_based/block_builder.cc:351"]}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"lost","cf":0,"key":763,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```

**Walkthrough:** a bit flip on `rax` (a key/value data pointer, `...9400 -> ...9440`) at `Slice::data` (`slice.h:58`), reached from `BlockBuilder::AddWithLastKeyImpl` while the flush builds the output block, makes the builder read key bytes from the wrong address, so key 763's entry is written wrong and the key is dropped from the flushed SST. On read-back `Get(key=763)` returns `NotFound` for a committed key -- silent. `classify()` routes `kind=lost` to the SDC bucket.

### A representative flush CORRUPTION (detected): `run_00047`

What we corrupted (`inject.json`):

```json
{"op":"flush","op_index":7,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"cmp    $0x7,%rax","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x256",
   "details":{"source":"rocksdb::ParseInternalKey @ ./db/dbformat.h:523","call_chain":["rocksdb::ParseInternalKey @ ./db/dbformat.h:523","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:731"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"flush: Corruption: Corrupted Key: Internal Key too small. Size=16. "}
```

**Walkthrough:** a flag flip (`eflags 0x216 -> 0x256`) on a `cmp $0x7,%rax` branch in `ParseInternalKey` (`dbformat.h:523`), reached from `NextFromInput`, makes the parser mis-judge the internal-key size, so the flush emits a malformed key and the key-size integrity check returns `flush: Corruption: Corrupted Key: Internal Key too small`. The op takes `SIGABRT`; `classify()` reads the recorded `data_corruption` before the signal and buckets `CORRUPTION` (bucketization fix). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket.

# 4. Write op (`MemTable::Add`) -- two key spaces
```
$ python3 tools/cpu_corruption_injector/runner.py --op write --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```

A write injection corrupts a single `MemTable::Add` (a Put/Delete/DeleteRange). The corruption is reachable and attributable, but whether it surfaces as a *silent* write SDC depends heavily on the key space. A silent write SDC needs the affected/mispositioned key to have other live versions to fall through to -- which only happens in a dense, multi-version memtable. We therefore run two write campaigns: the default `max_key=1000`, then a small `max_key=8`. The contrast is what motivates randomizing `max_key` (see PR #14867  for `--randomize_stress_flags`).

### 4a. Default `max_key=1000` -- 100 runs (no silent write SDC)

**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 0 | 31 | 13 | 56 | 0 | 0 |

With a 1000-key space almost every write touches a distinct key, so a corrupted entry has no older live version to mask it: a value/key byte flip is caught at write by the per-key checksum (`VerifyEncodedEntry` -> `CORRUPTION`), and a structural flip tends to crash (`CRASH`) rather than silently mis-read. `ERROR=0`, `NO_INJECTION=0`. No write op silently corrupted data -- every reachable corruption was caught or crashed.

### 4b. Small `max_key=8` -- 100 runs (surfaces 2 silent write SDCs)

**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 2 | 33 | 8 | 57 | 0 | 0 |

Shrinking the key space makes each key hold ~125 versions (`ops_per_thread` / `max_key`), so a misplaced entry can fall through to an older version of the *same* key and be returned silently -- the per-key checksum (bytes intact) and on-seek verify cannot see a pure link-position error. 

### A representative write SDC: `run_00028` (skiplist misposition -> silent stale read, flush catches)

What we corrupted (`inject.json`):

```json
{"op":"write","op_index":317,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"cmp    %rbx,-0xb8(%rbp)","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x217",
   "details":{"source":"rocksdb::MemTable::Add @ db/memtable.cc:1319",
              "call_chain":["rocksdb::MemTable::Add @ db/memtable.cc:1319"]}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"resurrected","cf":0,"key":1,"value_from_db":"110000001514171619181B1A1D1C1F1E0100030205040706","value_from_expected":"","op_status":"Get: OK"}
```

**Walkthrough:** a flag flip (CF, `eflags 0x216 -> 0x217`) on the `cmp` that produces `KeyIsAfterNode` inside `InlineSkipList::Insert` (`inlineskiplist.h:1253`; the `@ memtable.cc:1319` in the record is inlining line-drift) inverts the key comparison, so the Delete tombstone for key 1 is linked at the wrong position. The stored bytes and per-key checksum are intact, so neither the checksum nor on-seek verify sees anything wrong -- on read-back `Get(key=1)` returns OK with key 1's live value for a key that was Deleted (`kind=resurrected`, silent). A follow-up `Flush()` in unit test repro *does* catch it: the full-scan order check returns `Corruption: Out-of-order keys found in skiplist` -- caught only after the silent read, not during it. 

### A representative write CORRUPTION (detected) `max_key=1000 or 8` : `run_00018`

Where `run_00028`'s pure link-position error is invisible to the per-key checksum, this run shows a byte-level corruption that the checksum *catches* at write time. What we corrupted (`inject.json`):

```json
{"op":"write","op_index":106,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rsi,(%rdi)","register":"rsi","corruption_type":"bit_flip","before":"0x7fffeec2a21c","after":"0x7fffeec2a25c",
   "details":{"source":"rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39",
              "call_chain":["rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39","rocksdb::GetVarint32 @ ./util/coding.h:280","rocksdb::MemTable::VerifyEncodedEntry @ db/memtable.cc:1102","rocksdb::MemTable::Add @ db/memtable.cc:1189"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"put: Corruption: ProtectionInfo mismatch"}
```

**Walkthrough:** a bit flip on `rsi` (`0x7fffeec2a21c -> 0x7fffeec2a25c`) at `Slice::Slice` (`slice.h:39`) while `MemTable::Add` re-parses the just-encoded entry through `VerifyEncodedEntry` (`memtable.cc:1102`) corrupts the Slice the verifier reads, so the recomputed per-key protection info no longer matches and the put returns `Corruption: ProtectionInfo mismatch`.

Reviewed By: pdillinger

Differential Revision: D108367345
meta-codesync Bot pushed a commit that referenced this pull request Jul 1, 2026
Summary:
Pull Request resolved: #14866

Orchestration layer of the CPU corruption injector -- the glue between detection (#14852) and injection (#14858
).

One CPU-corruption injection says little on its own. What matters is the OUTCOME DISTRIBUTION over many injections -- how often a corruption is silently absorbed (NO_EFFECT), crashes the process (CRASH), is caught by an integrity check (CORRUPTION), or more importantly slips through as a silent data corruption (SDC) and the paths frequently leading to those outcomes. A trustworthy distribution needs a (somewhat) repeatable and reproducible harness as well as a db_stress configuration in which an injected corruption is both reachable and attributable to the chosen stress test op (i.e, write, foreground compaction or flush).

Therefore this PR implements a runner that can launch N independent runs for a chosen op type (i.e, write, foreground compaction or flush). Each run picks where to inject, runs db_stress under gdb via the `injector,py` (#14858), and is classified into one outcome bucket (#14852).

The runner has DB_STRESS_PRESET -- the pinned db_stress config that isolates a single injected corruption (single-threaded, integrity checks on, other fault injection off, auto-compaction off). The runner also does gdb preflight that fails fast when the build or gdb cannot support injection because, for example, the hard-coded `target_fn` has changed its name, provides parallel launching of many runs and one summary.json per campaign (the outcome distribution plus each run's record). The whole run set is reproducible from one logged base_seed (run i uses base_seed + i).

**Test plan:**

Build: `make DEBUG_LEVEL=0 EXTRA_CXXFLAGS="-g -fno-inline" db_stress`

# 1. Preflight (`verify_injection_site`) catches a build that can't support injection

Before doing any work, the runner has gdb confirm — on this exact binary — that every injection-site function resolves and gdb can read its source line. A good build logs:

```
INFO gdb check OK for op=compaction: rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add
```

A build that cannot support injection (functions renamed, fully inlined, or absent) fails fast with exit 2 before any run — forced here by pointing `--stress_cmd` at a non-db_stress binary:

```
$ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 1 --stress_cmd /bin/ls --report_dir /tmp/icc/preflight_demo
ERROR gdb could not set a breakpoint on these functions in ls (renamed, fully inlined, or not in this build?): rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add
Function "rocksdb::CompactionJob::Run" not defined.
Function "rocksdb::CompactionIterator::NextFromInput" not defined.
# exit code 2
```

So a broken/inlined build is rejected up front instead of silently producing `NO_INJECTION` runs.

# 2. Compaction op -- 100 runs
```
$ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```
**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 9 | 5 | 6 | 79 | 1 | 0 |

**Spread:**
- target_fn x outcome: `NextFromInput` {SDC 9, CORRUPTION 3, CRASH 2, NO_EFFECT 34, NO_INJECTION 1}; `BlockBuilder::Add` {CORRUPTION 2, CRASH 4, NO_EFFECT 45}.
- corruption_type x outcome: `bit_flip` {SDC 8, CORRUPTION 3, CRASH 6, NO_EFFECT 32}; `flag_flip` {SDC 1, CORRUPTION 2, NO_EFFECT 42}; `lane_bit_flip` {NO_EFFECT 5}.

**Analysis:** all 9 SDCs land on the read/iterate path (`NextFromInput`); corrupting the output writer (`BlockBuilder::Add`) never produced an SDC (its blocks are checksummed -- corruption there is caught or inert). The 5 detected `CORRUPTION`s are compaction's key-order and record-count cross-checks firing (both CompactRange and CompactFiles origins appear), correctly bucketed by the fix.

### A representative compaction SDC: `run_00000`

What we corrupted (`inject.json`):

```json
{"op":"compaction","op_index":17,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rdx,0x160(%r12)","register":"rdx","corruption_type":"bit_flip","before":"0x10","after":"0x18",
   "details":{"source":"rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:719"}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"wrong-value","cf":0,"key":70,"value_from_db":"010000000504070609080B0A0D0C0F0E070F105E78787878","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: OK"}
```

**Walkthrough:** a single bit flip on `rdx` (`0x10 -> 0x18`, 16 -> 24) at `CompactionIterator::NextFromInput` (`compaction_iterator.cc:719`) -- `rdx` holds the value LENGTH stored into the iterator's record (offset `0x160`). The length reads 24 instead of 16, so compaction copies a value 8 bytes too long into the output SST, absorbing adjacent bytes. The internal key is untouched (`ParseInternalKey` passes); the over-long value is the single Slice fed to both the paranoid validator and the SST builder, so the file is self-consistent and every checksum agrees. On read-back `Get(key=70)` returns OK with the wrong bytes -- `value_from_db` is the expected value (`...0F0E`) **plus 8 trailing bytes** (`070F105E78787878`). Silent: read OK, all checks pass, the value visibly grew. `classify()` routes `kind=wrong-value` to the SDC bucket.

### A representative compaction CORRUPTION (detected): `run_00007`

What we corrupted (`inject.json`):

```json
{"op":"compaction","op_index":41,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"mov    (%rbx),%rdi","register":"rdi","corruption_type":"bit_flip","before":"0x7fffeee8c1c0","after":"0x7fffeee8c1c2",
   "details":{"source":"rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","call_chain":["rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:781"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"compactfiles: Corruption: Compaction sees out-of-order keys."}
```

**Walkthrough:** a bit flip on `rdi` (`...c1c0 -> ...c1c2`, a key pointer) at `IterKey::SetKeyImpl` (`dbformat.h:941`), reached from `NextFromInput`, mis-sets the iterator's key so the next emitted key is out of order. Compaction's key-order check catches it and returns `compactfiles: Corruption: Compaction sees out-of-order keys`. The op then takes `SIGABRT`, but `classify()` reads the recorded `data_corruption` result before the crash signal, so the run is correctly bucketed `CORRUPTION` (the bucketization fix; pre-fix this surfaced as `CRASH`). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket.

# 3. Flush op -- 100 runs
```
$ python3 tools/cpu_corruption_injector/runner.py --op flush --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```
**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 2 | 12 | 11 | 74 | 1 | 0 |

**Spread:**
- target_fn x outcome: `BlockBuilder::Add` {SDC 1, CORRUPTION 5, CRASH 5, NO_EFFECT 49}; `NextFromInput` {SDC 1, CORRUPTION 7, CRASH 6, NO_EFFECT 25, NO_INJECTION 1}.
- corruption_type x outcome: `bit_flip` {SDC 2, CORRUPTION 9, CRASH 9, NO_EFFECT 32}; `flag_flip` {CORRUPTION 3, CRASH 2, NO_EFFECT 42}.

**Analysis:** flush mirrors compaction's mechanisms (shared iterator/builder). The 2 SDCs are a value/key-pointer corruption that slips past the checksums; the 12 corruptions are caught by the flush-time key-order / key-size integrity checks.

### A representative flush SDC: `run_00027`

What we corrupted (`inject.json`):

```json
{"op":"flush","op_index":16,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::BlockBuilder::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    (%rdi),%rax","register":"rax","corruption_type":"bit_flip","before":"0x7fffef059400","after":"0x7fffef059440",
   "details":{"source":"rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","call_chain":["rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","rocksdb::BlockBuilder::AddWithLastKeyImpl @ table/block_based/block_builder.cc:351"]}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"lost","cf":0,"key":763,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```

**Walkthrough:** a bit flip on `rax` (a key/value data pointer, `...9400 -> ...9440`) at `Slice::data` (`slice.h:58`), reached from `BlockBuilder::AddWithLastKeyImpl` while the flush builds the output block, makes the builder read key bytes from the wrong address, so key 763's entry is written wrong and the key is dropped from the flushed SST. On read-back `Get(key=763)` returns `NotFound` for a committed key -- silent. `classify()` routes `kind=lost` to the SDC bucket.

### A representative flush CORRUPTION (detected): `run_00047`

What we corrupted (`inject.json`):

```json
{"op":"flush","op_index":7,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"cmp    $0x7,%rax","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x256",
   "details":{"source":"rocksdb::ParseInternalKey @ ./db/dbformat.h:523","call_chain":["rocksdb::ParseInternalKey @ ./db/dbformat.h:523","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:731"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"flush: Corruption: Corrupted Key: Internal Key too small. Size=16. "}
```

**Walkthrough:** a flag flip (`eflags 0x216 -> 0x256`) on a `cmp $0x7,%rax` branch in `ParseInternalKey` (`dbformat.h:523`), reached from `NextFromInput`, makes the parser mis-judge the internal-key size, so the flush emits a malformed key and the key-size integrity check returns `flush: Corruption: Corrupted Key: Internal Key too small`. The op takes `SIGABRT`; `classify()` reads the recorded `data_corruption` before the signal and buckets `CORRUPTION` (bucketization fix). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket.

# 4. Write op (`MemTable::Add`) -- two key spaces
```
$ python3 tools/cpu_corruption_injector/runner.py --op write --runs 100 --stress_cmd /data/users/huixiao/rocksdb/db_stress  --report_dir /tmp/icc/preflight_demo
```

A write injection corrupts a single `MemTable::Add` (a Put/Delete/DeleteRange). The corruption is reachable and attributable, but whether it surfaces as a *silent* write SDC depends heavily on the key space. A silent write SDC needs the affected/mispositioned key to have other live versions to fall through to -- which only happens in a dense, multi-version memtable. We therefore run two write campaigns: the default `max_key=1000`, then a small `max_key=8`. The contrast is what motivates randomizing `max_key` (see PR #14867  for `--randomize_stress_flags`).

### 4a. Default `max_key=1000` -- 100 runs (no silent write SDC)

**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 0 | 31 | 13 | 56 | 0 | 0 |

With a 1000-key space almost every write touches a distinct key, so a corrupted entry has no older live version to mask it: a value/key byte flip is caught at write by the per-key checksum (`VerifyEncodedEntry` -> `CORRUPTION`), and a structural flip tends to crash (`CRASH`) rather than silently mis-read. `ERROR=0`, `NO_INJECTION=0`. No write op silently corrupted data -- every reachable corruption was caught or crashed.

### 4b. Small `max_key=8` -- 100 runs (surfaces 2 silent write SDCs)

**Runs' outcomes (`summary.json`):**

| SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR |
| --- | --- | --- | --- | --- | --- |
| 2 | 33 | 8 | 57 | 0 | 0 |

Shrinking the key space makes each key hold ~125 versions (`ops_per_thread` / `max_key`), so a misplaced entry can fall through to an older version of the *same* key and be returned silently -- the per-key checksum (bytes intact) and on-seek verify cannot see a pure link-position error.

### A representative write SDC: `run_00028` (skiplist misposition -> silent stale read, flush catches)

What we corrupted (`inject.json`):

```json
{"op":"write","op_index":317,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"cmp    %rbx,-0xb8(%rbp)","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x217",
   "details":{"source":"rocksdb::MemTable::Add @ db/memtable.cc:1319",
              "call_chain":["rocksdb::MemTable::Add @ db/memtable.cc:1319"]}}]}
```

The recorded silent corruption (`data_corruption.<tid>.json`):

```json
{"kind":"resurrected","cf":0,"key":1,"value_from_db":"110000001514171619181B1A1D1C1F1E0100030205040706","value_from_expected":"","op_status":"Get: OK"}
```

**Walkthrough:** a flag flip (CF, `eflags 0x216 -> 0x217`) on the `cmp` that produces `KeyIsAfterNode` inside `InlineSkipList::Insert` (`inlineskiplist.h:1253`; the `@ memtable.cc:1319` in the record is inlining line-drift) inverts the key comparison, so the Delete tombstone for key 1 is linked at the wrong position. The stored bytes and per-key checksum are intact, so neither the checksum nor on-seek verify sees anything wrong -- on read-back `Get(key=1)` returns OK with key 1's live value for a key that was Deleted (`kind=resurrected`, silent). A follow-up `Flush()` in unit test repro *does* catch it: the full-scan order check returns `Corruption: Out-of-order keys found in skiplist` -- caught only after the silent read, not during it.

### A representative write CORRUPTION (detected) `max_key=1000 or 8` : `run_00018`

Where `run_00028`'s pure link-position error is invisible to the per-key checksum, this run shows a byte-level corruption that the checksum *catches* at write time. What we corrupted (`inject.json`):

```json
{"op":"write","op_index":106,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rsi,(%rdi)","register":"rsi","corruption_type":"bit_flip","before":"0x7fffeec2a21c","after":"0x7fffeec2a25c",
   "details":{"source":"rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39",
              "call_chain":["rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39","rocksdb::GetVarint32 @ ./util/coding.h:280","rocksdb::MemTable::VerifyEncodedEntry @ db/memtable.cc:1102","rocksdb::MemTable::Add @ db/memtable.cc:1189"]}}]}
```

The recorded detection (`data_corruption.<tid>.json`):

```json
{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"put: Corruption: ProtectionInfo mismatch"}
```

**Walkthrough:** a bit flip on `rsi` (`0x7fffeec2a21c -> 0x7fffeec2a25c`) at `Slice::Slice` (`slice.h:39`) while `MemTable::Add` re-parses the just-encoded entry through `VerifyEncodedEntry` (`memtable.cc:1102`) corrupts the Slice the verifier reads, so the recomputed per-key protection info no longer matches and the put returns `Corruption: ProtectionInfo mismatch`.

Reviewed By: pdillinger

Differential Revision: D108367345

fbshipit-source-id: 250b069c8e9d63a2dc257f6de86f1d9040284dd6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment