Per-StressTest fault injection with env/fs cleanup#14750
Closed
hx235 wants to merge 2 commits into
Closed
Conversation
Summary: Pure mechanical refactor: replace all direct FLAGS_db / FLAGS_expected_values_dir / FLAGS_secondaries_base reads with accessor methods on StressTest. No new flags, no parameters, no behavior change. Prepares for multi-DB stress test where each StressTest instance has its own DB. Changes: - GetDbPath(), GetExpectedValuesDir(), GetSecondariesBase() accessors return the corresponding FLAGS values directly - Replace ~15 FLAGS_db references with GetDbPath() in db_stress_test_base.cc - Move SharedState constructor from .h to .cc (needs full StressTest type for GetExpectedValuesDir()) - Move DbStressListener constructor from .h to .cc (same reason) - Replace FLAGS_db / FLAGS_expected_values_dir in db_stress_driver.cc with accessor calls - NO changes to db_crashtest.py or db_stress_gflags.cc Differential Revision: D104959943
Summary:
Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection — errors injected into one DB do not leak to others.
FS/Env architecture change:
Before (upstream):
raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open
After (this diff):
Global: raw_env (no wrappers)
Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction
Used inside StressTest: cleanup ops that must succeed (external file delete)
Per StressTest:
raw_fs → DbStressFSWrapper (db_stress_fs_, always)
→ FaultInjectionTestFS (db_fault_injection_fs_, optional)
→ CompositeEnvWrapper (db_env_, always) → DB::Open
Expected values state: Env::Default() (always local PosixEnv even when raw_env is remote)
FS layer order swapped (DbStressFSWrapper now innermost). Safe because:
- Error injection returns early — inner wrapper never executes (same behavior)
- DbStressFSWrapper assertions do not modify data (checksum validation, IOActivity checks)
- MANIFEST rename tracking slightly better for crash simulation in new order
Stored members (per StressTest):
- db_stress_fs_: DbStressFSWrapper. Always active regardless of fault injection flags.
- db_fault_injection_fs_: FaultInjectionTestFS. Only when fault injection flags set. Direct access for Enable/Disable/SetThreadLocal.
- db_env_: CompositeEnvWrapper. Always present. THE env for all DB I/O (options_.env).
Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest.
Other changes:
- CleanupOutputDirectory simplified (uses raw_env, no disable/enable needed)
- SstFileManager recreated with per-StressTest env in Open()
- Remote compaction override env uses options.env directly (fixes pre-existing silent bug)
- Comments added: Env::Default() always local, DbStressDestroyDb MANIFEST explanation, fault injection log path in TEST_TMPDIR
Differential Revision: D104959945
|
@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104959945. |
|
| Check | Count |
|---|---|
concurrency-mt-unsafe |
2 |
cppcoreguidelines-avoid-non-const-global-variables |
4 |
| Total | 6 |
Details
db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_test_base.cc (2 warning(s))
db_stress_tool/db_stress_test_base.cc:128:31: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:3883:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc (2 warning(s))
db_stress_tool/db_stress_tool.cc:39:49: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:39:49: warning: variable 'fault_fs_for_crash_report' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]
hx235
added a commit
to hx235/rocksdb
that referenced
this pull request
May 20, 2026
Summary: Pull Request resolved: facebook#14750 Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection; errors injected into one DB do not leak to others. FS/Env architecture change: Before (upstream): raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open After (this diff): Global: raw_env (no wrappers) Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction Used inside StressTest: cleanup ops that must succeed (external file delete) Per StressTest: raw_fs → DbStressFSWrapper (db_stress_fs_, always) → FaultInjectionTestFS (db_fault_injection_fs_, optional) → CompositeEnvWrapper (db_env_, always) → DB::Open Expected values state: Env::Default() (always local PosixEnv even when raw_env is remote) FS layer order swapped (DbStressFSWrapper now innermost). Safe because: - Error injection returns early; inner wrapper never executes (same behavior) - DbStressFSWrapper assertions do not modify data (checksum validation, IOActivity checks) - MANIFEST rename tracking slightly better for crash simulation in new order Stored members (per StressTest): - db_stress_fs_: DbStressFSWrapper. Always active regardless of fault injection flags. - db_fault_injection_fs_: FaultInjectionTestFS. Only when fault injection flags set. Direct access for Enable/Disable/SetThreadLocal. - db_env_: CompositeEnvWrapper. Always present. THE env for all DB I/O (options_.env). Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest. Other changes: - CleanupOutputDirectory simplified (uses raw_env, no disable/enable needed) - SstFileManager recreated with per-StressTest env in Open() - Remote compaction override env uses options.env directly (fixes pre-existing silent bug) - Comments added: Env::Default() always local, DbStressDestroyDb MANIFEST explanation, fault injection log path in TEST_TMPDIR - TestFSWritableFile::Close() now mirrors the production FSWritableFile close boundary. After the first Close() attempt, later explicit or destructor Close() calls are wrapper-level no-ops, while FaultInjectionTestFS still records the first close attempt for crash/recovery simulation. - Added targeted fault_injection_fs_test coverage for injected metadata-close failures to ensure FaultInjectionTestFS does not retry the inner Close() path. Differential Revision: D104959945
hx235
pushed a commit
to hx235/rocksdb
that referenced
this pull request
May 22, 2026
Summary:
Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection; errors injected into one DB do not leak to others.
FS/Env architecture change:
Before (upstream):
raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open
After (this diff):
Global: raw_env (no wrappers)
Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction
Used inside StressTest: cleanup ops that must succeed (external file delete)
Per StressTest:
raw_fs → DbStressFSWrapper (db_stress_fs_, always)
→ FaultInjectionTestFS (db_fault_injection_fs_, optional)
→ CompositeEnvWrapper (db_env_, always) → DB::Open
Expected values state: Env::Default() (always local PosixEnv even when raw_env is remote)
FS layer order swapped (DbStressFSWrapper now innermost). Safe because:
- Error injection returns early; inner wrapper never executes (same behavior)
- DbStressFSWrapper assertions do not modify data (checksum validation, IOActivity checks)
- MANIFEST rename tracking slightly better for crash simulation in new order
Stored members (per StressTest):
- db_stress_fs_: DbStressFSWrapper. Always active regardless of fault injection flags.
- db_fault_injection_fs_: FaultInjectionTestFS. Only when fault injection flags set. Direct access for Enable/Disable/SetThreadLocal.
- db_env_: CompositeEnvWrapper. Always present. THE env for all DB I/O (options_.env).
Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest.
Other changes:
- CleanupOutputDirectory simplified (uses raw_env, no disable/enable needed)
- SstFileManager recreated with per-StressTest env in Open()
- Remote compaction override env uses options.env directly (fixes pre-existing silent bug)
- Comments added: Env::Default() always local, DbStressDestroyDb MANIFEST explanation, fault injection log path in TEST_TMPDIR
- TestFSWritableFile::Close() now mirrors the production FSWritableFile close boundary. After the first Close() attempt, later explicit or destructor Close() calls are wrapper-level no-ops, while FaultInjectionTestFS still records the first close attempt for crash/recovery simulation.
- Added targeted fault_injection_fs_test coverage for injected metadata-close failures to ensure FaultInjectionTestFS does not retry the inner Close() path.
Reviewed By: anand1976
Differential Revision: D104959945
hx235
pushed a commit
to hx235/rocksdb
that referenced
this pull request
May 22, 2026
Summary:
Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection; errors injected into one DB do not leak to others.
FS/Env architecture change:
Before (upstream):
raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open
After (this diff):
Global: raw_env (no wrappers)
Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction
Used inside StressTest: cleanup ops that must succeed (external file delete)
Per StressTest:
raw_fs → DbStressFSWrapper (db_stress_fs_, always)
→ FaultInjectionTestFS (db_fault_injection_fs_, optional)
→ CompositeEnvWrapper (db_env_, always) → DB::Open
Expected values state: Env::Default() (always local PosixEnv even when raw_env is remote)
FS layer order swapped (DbStressFSWrapper now innermost). Safe because:
- Error injection returns early; inner wrapper never executes (same behavior)
- DbStressFSWrapper assertions do not modify data (checksum validation, IOActivity checks)
- MANIFEST rename tracking slightly better for crash simulation in new order
Stored members (per StressTest):
- db_stress_fs_: DbStressFSWrapper. Always active regardless of fault injection flags.
- db_fault_injection_fs_: FaultInjectionTestFS. Only when fault injection flags set. Direct access for Enable/Disable/SetThreadLocal.
- db_env_: CompositeEnvWrapper. Always present. THE env for all DB I/O (options_.env).
Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest.
Other changes:
- CleanupOutputDirectory simplified (uses raw_env, no disable/enable needed)
- SstFileManager recreated with per-StressTest env in Open()
- Remote compaction override env uses options.env directly (fixes pre-existing silent bug)
- Comments added: Env::Default() always local, DbStressDestroyDb MANIFEST explanation, fault injection log path in TEST_TMPDIR
- TestFSWritableFile::Close() now mirrors the production FSWritableFile close boundary. After the first Close() attempt, later explicit or destructor Close() calls are wrapper-level no-ops, while FaultInjectionTestFS still records the first close attempt for crash/recovery simulation.
- Added targeted fault_injection_fs_test coverage for injected metadata-close failures to ensure FaultInjectionTestFS does not retry the inner Close() path.
Reviewed By: anand1976
Differential Revision: D104959945
meta-codesync Bot
pushed a commit
that referenced
this pull request
May 22, 2026
Summary: Pull Request resolved: #14757 Pull Request resolved: #14750 Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection; errors injected into one DB do not leak to others. FS/Env architecture change: Before (upstream): raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open After (this diff): Global: raw_env (no wrappers) Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction Used inside StressTest: cleanup ops that must succeed (external file delete) Per StressTest: raw_fs → DbStressFSWrapper (db_stress_fs_, always) → FaultInjectionTestFS (db_fault_injection_fs_, optional) → CompositeEnvWrapper (db_env_, always) → DB::Open Expected values state: Env::Default() (always local PosixEnv even when raw_env is remote) FS layer order swapped (DbStressFSWrapper now innermost). Safe because: - Error injection returns early; inner wrapper never executes (same behavior) - DbStressFSWrapper assertions do not modify data (checksum validation, IOActivity checks) - MANIFEST rename tracking slightly better for crash simulation in new order Stored members (per StressTest): - db_stress_fs_: DbStressFSWrapper. Always active regardless of fault injection flags. - db_fault_injection_fs_: FaultInjectionTestFS. Only when fault injection flags set. Direct access for Enable/Disable/SetThreadLocal. - db_env_: CompositeEnvWrapper. Always present. THE env for all DB I/O (options_.env). Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest. Other changes: - CleanupOutputDirectory simplified (uses raw_env, no disable/enable needed) - SstFileManager recreated with per-StressTest env in Open() - Remote compaction override env uses options.env directly (fixes pre-existing silent bug) - Comments added: Env::Default() always local, DbStressDestroyDb MANIFEST explanation, fault injection log path in TEST_TMPDIR - TestFSWritableFile::Close() now mirrors the production FSWritableFile close boundary. After the first Close() attempt, later explicit or destructor Close() calls are wrapper-level no-ops, while FaultInjectionTestFS still records the first close attempt for crash/recovery simulation. - Added targeted fault_injection_fs_test coverage for injected metadata-close failures to ensure FaultInjectionTestFS does not retry the inner Close() path. Reviewed By: anand1976 Differential Revision: D104959945 fbshipit-source-id: 7cf9bb494dec2b372528d5f119c023b6d392ffca
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Prerequisite for multi-DB stress test support where each StressTest instance owns one DB. Moves fault injection from a single global to per-StressTest instance so each DB gets isolated fault injection — errors injected into one DB do not leak to others.
FS/Env architecture change:
Before (upstream):
raw_fs → FaultInjectionTestFS (global) → DbStressFSWrapper → db_stress_env → DB::Open
After (this diff):
Global: raw_env (no wrappers)
Used outside StressTest: non-FS ops (threads, time, sleep), test framework FS setup (dirs), DB destruction
Used inside StressTest: cleanup ops that must succeed (external file delete)
FS layer order swapped (DbStressFSWrapper now innermost). Safe because:
Stored members (per StressTest):
Eliminated globals: db_stress_env → renamed to raw_env (no wrappers, DbStressFSWrapper moved to per-StressTest); db_stress_listener_env, db_stress_raw_fs removed; fault_fs_guard, fault_env_guard moved to per-StressTest.
Other changes:
Differential Revision: D104959945