Skip to content

Add async I/O release UAF regression coverage#14844

Closed
xingbowang wants to merge 1 commit into
facebook:mainfrom
xingbowang:2026_06_09_main
Closed

Add async I/O release UAF regression coverage#14844
xingbowang wants to merge 1 commit into
facebook:mainfrom
xingbowang:2026_06_09_main

Conversation

@xingbowang

@xingbowang xingbowang commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a controlled async filesystem regression test for released direct I/O async reads, verifying released handles are aborted before a later Poll() can complete them.
  • Add a MultiScan-style regression test for skipping a pending range remainder before seeking into a later async read.
  • Extend db_stress/crashtest coverage so MultiScan can stop early after --num_iterations results and randomize --multiscan_use_async_io again.

Test Plan

CI

@meta-cla meta-cla Bot added the CLA Signed label Jun 10, 2026
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 82.2s.

Add a controlled async filesystem regression test that models io_uring completing a released async read while polling another read. Without the ReleaseBlock async-cancellation fix, the test reproduces the stale callback as an ASAN heap-use-after-free in RandomAccessFileReader::ReadAsyncCallback; with the fix, the released request is aborted before Poll can complete it.

Add MultiScan async IO stress coverage

Have db_stress sometimes abandon a prepared MultiScan after --num_iterations results, which exercises cleanup of prefetched blocks when callers do not drain all ranges. Re-enable randomized --multiscan_use_async_io in db_crashtest so crash stress can cover the async MultiScan path again.
@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D108191370.

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit d668ad8


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit d668ad8


Summary

Well-structured PR adding valuable regression test coverage for async I/O use-after-free bugs. The ControlledAsyncFS test infrastructure is well-designed and the two new regression tests exercise critical UAF scenarios. The db_stress early-exit logic and async IO re-enablement are reasonable additions.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. ControlledAsyncFS::AbortIO skips callback for handles not in pending list -- util/io_dispatcher_test.cc:283
  • Issue: When AbortIO() is called with a handle that is NOT in pending_handles_ (e.g., already completed by a stray Poll()), the continue skips the callback invocation. The filesystem contract (file_system.h:813-815) states "AbortIO implementation should ensure that all the read requests related to io_handles should be aborted and it should call the callback for these io_handles."
  • Root cause: The if (it == pending_handles_.end()) { continue; } guard silently skips handles that were already completed. While this is acceptable in this controlled test context (the test carefully manages which handles are pending), it does not strictly match the filesystem contract.
  • Suggested fix: Add a comment documenting the deviation from the full contract.
M2. ControlledAsyncFS lacks thread safety -- util/io_dispatcher_test.cc:233
  • Issue: pending_handles_ vector, abort_count_, and stray_completion_count_ are accessed without synchronization. If ReadSet or IODispatcher invokes Poll(), AbortIO(), or ReadAsync() from different threads, this would be a data race.
  • Root cause: The test infrastructure assumes single-threaded usage, which is valid for the current tests but not documented.
  • Suggested fix: Add a comment at the class level: // Thread-safety: single-threaded test use only.
M3. Premature re-enablement of multiscan_use_async_io -- tools/db_crashtest.py:524
  • Issue: The TODO comment that was removed said "re-enable after resolving 'Req failed: Unknown error -14' errors". The PR re-enables the flag but doesn't mention whether the underlying errors were resolved.
  • Suggested fix: Add a sentence to the PR description confirming the "-14" errors are resolved by the abort-on-release fix, or link to the fix commit.

🟢 LOW / NIT

L1. ControlledAsyncRandomAccessFile does not override MultiRead -- util/io_dispatcher_test.cc:210
  • Issue: If the IODispatcher falls back to sync reads, MultiRead calls bypass ControlledAsyncFS's tracking since ReadTrackingRandomAccessFile is never created in this inheritance path.
  • Suggested fix: Not a bug for current tests (async-only), but worth noting if reused.
L2. Test uses assert instead of ASSERT_* -- util/io_dispatcher_test.cc:258
  • Issue: assert(!io_handles.empty()) in Poll() will be stripped in release builds.
  • Suggested fix: Use ASSERT_FALSE or handle gracefully.
L3. Early-exit variables could be const -- db_stress_tool/db_stress_test_base.cc:2093
  • Issue: early_exit and abandon_scan_after_early_exit are set once and never modified.
  • Suggested fix: const bool early_exit = thread->rand.OneIn(2);

Cross-Component Analysis

Context Affected? Assessment
No io_uring YES Tests properly skip with ROCKSDB_GTEST_SKIP
Production code NO All changes are test-only
Sync fallback Partially ControlledAsyncRandomAccessFile doesn't track MultiRead (L1)
db_stress early exit YES Correctness verified -- cmp_iter stays synchronized via Seek on next range
Handle lifecycle OK del_fn is called by ReadSet::DeleteAsyncIOHandle() after AbortIO

Assumption stress-test:

  • Claim: "ReleaseBlock(0) ensures Poll() never completes block 0." Verified: ReleaseAsyncIOForBlock calls fs_->AbortIO() which removes handle from pending_handles_, then DeleteAsyncIOHandle calls del_fn. Subsequent Poll() cannot find the handle.
  • Claim: "Early exit doesn't cause verification failures." Verified: break happens before iter->Next(), so both iterators are at the same verified position. Next range re-Seeks both.

Positive Observations

  • The ControlledAsyncFS design is elegant -- it uses real file reads for data while controlling only the async scheduling, avoiding the complexity of mocking file content.
  • The complete_stray_first_ flag is a clever way to deterministically test io_uring's out-of-order completion behavior.
  • The CreateAndOpenSST signature extension with env_override as the last default parameter is backward-compatible and clean.
  • The early-exit op_logs ("E" for abandon, "R" for range-skip) provide good debugging context.
  • The test properly separates two UAF scenarios (released block completed by stray vs. skipped range remainder blocking later read).

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync meta-codesync Bot closed this in 4975341 Jun 10, 2026
@meta-codesync meta-codesync Bot added the Merged label Jun 10, 2026
@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown

@xingbowang merged this pull request in 4975341.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant