Skip to content

Add read-scoped block buffers for scan reads#14806

Closed
xingbowang wants to merge 22 commits into
facebook:mainfrom
xingbowang:codex/multiscan-retained-provider
Closed

Add read-scoped block buffers for scan reads#14806
xingbowang wants to merge 22 commits into
facebook:mainfrom
xingbowang:codex/multiscan-retained-provider

Conversation

@xingbowang

@xingbowang xingbowang commented May 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Add read-scoped block buffers for scan reads. Introduce an experimental read-scoped block buffer provider API, configured through ReadOptions::read_scoped_block_buffer_provider, so supported block-based table iterator scans and MultiScan data-block reads can use caller-provided read-scoped storage for final data-block contents.

When configured, supported provider-backed scan data-block reads bypass the data-block cache while preserving normal index/filter block-cache behavior. Known-uncompressed reads can attach provider cleanup to provider-backed read buffers without copying. Compressed reads decompress directly into provider-backed output, while maybe-compressed reads that turn out to be uncompressed copy once into provider-backed final contents. mmap reads ignore the provider.

Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts.

Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, data-block cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers.

Bonus change: Fixed a flaky test in ReserveThread

Testing

  • CI
@meta-cla meta-cla Bot added the CLA Signed label May 30, 2026
@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown

⚠️ clang-tidy: 2 warning(s) on changed lines

Completed in 470.6s.

Summary by check

Check Count
bugprone-argument-comment 2
Total 2

Details

file/random_access_file_reader_test.cc (1 warning(s))
file/random_access_file_reader_test.cc:114:21: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
table/block_fetcher.cc (1 warning(s))
table/block_fetcher.cc:359:23: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
@xingbowang xingbowang marked this pull request as draft May 30, 2026 23:45
Introduce the experimental ReadScopedBlockBufferProvider API and ReadOptions::read_scoped_block_buffer_provider so supported block-based table iterator and MultiScan data-block reads can use caller-provided read-scoped storage. Provider-backed scan reads bypass block cache, attach provider cleanup to uncompressed read buffers without copying, and decompress compressed blocks directly into provider-backed output.

Add an independent MultiScanArgs::bypass_block_cache / IODispatcher JobOptions bypass knob for scan reads that should skip data-block cache lookup and insertion while preserving fill_cache=false behavior. Plumb the option through MultiScan, db_bench, db_stress, and IODispatcher.

Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts.

Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers.
@xingbowang xingbowang force-pushed the codex/multiscan-retained-provider branch from 0772171 to 4ae4eab Compare May 31, 2026 11:59
CreateAndOpenSST returns BlockBasedTable instances whose Rep keeps a reference to the InternalKeyComparator passed through TableReaderOptions. Keep that comparator in fixture-owned storage alongside the existing ImmutableOptions and EnvOptions storage so iterator creation does not read a stack object after the helper returns.
RandomAccessFileReader::Read tracks direct-I/O progress in a local current_size variable after adding caller-owned aligned buffers. When the read uses the internal temporary AlignedBuffer and copies into caller scratch, keep the buffer's CurrentSize in sync before calling AlignedBuffer::Read().

This avoids a debug assertion in checksum verification with use_direct_reads when no caller-owned direct I/O buffer is supplied.
Delete copy and move operations for the stress read-scoped provider, whose destructor validates allocation lifetime invariants.

Also split the debug-only cleanup pointer selection in CreateAndPinBlockFromBuffer so clang-tidy no longer sees the initial value overwritten before use.
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit c5f5998


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 31, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit c5f5998


Summary

This PR adds a well-structured ReadScopedBlockBufferProvider API for caller-managed block storage in scan reads, plus bypass_block_cache for MultiScan. The core design is sound, but there are several correctness issues in the IODispatcher integration paths and test coverage gaps.

High-severity findings (3):

  • [io_dispatcher_imp.cc:375-380] ReadSet::SyncRead() hardcodes use_cache=true, ignoring bypass_block_cache and provider settings for fallback reads.
  • [io_dispatcher_imp.cc:1057-1060] ExecuteSyncIO() still uses AlignedBuf aligned_buf with the old MultiRead(AlignedBuf*) API, inconsistent with the new AlignedBuffer& signature.
  • [io_dispatcher_imp.cc:706-715] SubmitJob calls LookupAndPinBlocksInCache unconditionally even when bypass_block_cache=true, defeating the purpose of cache bypass.
Full review (click to expand)

Findings

🔴 HIGH

H1. ReadSet::SyncRead() ignores bypass_block_cache and provider -- io_dispatcher_imp.cc:375-380
  • Issue: SyncRead() is the fallback when a block isn't prefetched or async IO fails. It hardcodes use_cache=true and use_block_cache_for_lookup=true in the call to RetrieveBlock. When bypass_block_cache=true or a ReadScopedBlockBufferProvider is configured, this sync fallback path will still insert/lookup blocks in the block cache, creating inconsistent behavior between prefetched and on-demand reads within the same scan.
  • Root cause: SyncRead was written before bypass_block_cache existed and was not updated by this PR.
  • Suggested fix: Compute use_cache using ShouldUseBlockCacheForIteratorDataBlocks(rep->table_options, job_->job_options.read_options, job_->job_options.bypass_block_cache) and pass that to RetrieveBlock.
H2. ExecuteSyncIO() uses stale AlignedBuf API -- io_dispatcher_imp.cc:1057-1060
  • Issue: The diff changes MultiRead to take AlignedBuffer& instead of AlignedBuf*, but ExecuteSyncIO() still uses AlignedBuf aligned_buf and passes direct_io ? &aligned_buf : nullptr. Since the signature changed to AlignedBuffer&, this code will fail to compile or silently use the wrong overload. Additionally, when a ReadScopedBlockBufferProvider is configured, the sync IO path should use provider-backed storage for direct IO reads.
  • Root cause: ExecuteSyncIO was not updated to match the new MultiRead signature. The diff is truncated so we cannot confirm this was addressed later.
  • Suggested fix: Replace AlignedBuf aligned_buf with AlignedBuffer direct_io_buffer and wire up provider-backed allocation for the sync path.
H3. SubmitJob calls LookupAndPinBlocksInCache when bypass_block_cache=true -- io_dispatcher_imp.cc:706-715
  • Issue: SubmitJob iterates all blocks and calls LookupAndPinBlocksInCache unconditionally. When bypass_block_cache=true, this performs unnecessary cache lookups for every block, adding overhead to a path specifically designed to avoid cache interaction.
  • Root cause: SubmitJob was not updated to check bypass_block_cache before cache lookup.
  • Suggested fix: Guard the cache lookup with ShouldUseBlockCacheForIteratorDataBlocks() or check job->job_options.bypass_block_cache before calling LookupAndPinBlocksInCache.

🟡 MEDIUM

M1. CreateAndPinBlockFromBuffer -- missing read_buffer_cleanup at call sites
  • Issue: The diff adds const SharedCleanablePtr& read_buffer_cleanup to CreateAndPinBlockFromBuffer. The PollAndProcessAsyncIO and ExecuteSyncIO call sites need updating to pass the appropriate cleanup for provider-backed reads. Without this, provider-backed uncompressed blocks in the no-cache path will fail with "read-scoped block buffer provider requires read buffer cleanup".
  • Suggested fix: Ensure both call sites pass the appropriate cleanup from the read buffer's provider lease.
M2. std::function overhead in AlignedBuffer::Allocator -- util/aligned_buffer.h
  • Issue: AlignedBuffer::Allocator uses std::function with type erasure overhead. RocksDB typically avoids std::function on performance-sensitive paths.
  • Suggested fix: Consider a function pointer + context pair, or document that this is acceptable for the scan/provider path.
M3. AsyncIOState missing provider wiring -- io_dispatcher_imp.cc:957-1015
  • Issue: ExecuteAsyncIO still uses the old aligned_buf field and doesn't wire up provider-backed allocation for async reads with direct IO.
  • Suggested fix: Wire up MakeReadScopedAlignedBuffer for the async direct IO path.
M4. Hot-path shared_ptr check in GetReadScopedBlockBufferProvider
  • Issue: GetReadScopedBlockBufferProvider checks a shared_ptr on every block read via ShouldUseBlockCacheForIteratorDataBlocks. Consider caching at iterator construction.

🟢 LOW / NIT

L1. ReadAsync parameter ordering -- random_access_file_reader.h
  • Issue: direct_io_buffer added after dbg creates non-intuitive ordering of two defaulted parameters.
L2. bypass_block_cache vs fill_cache naming confusion -- options.h
  • Issue: The dual cache control mechanisms could confuse users. Consider cross-referencing in docs.

Cross-Component Analysis

Context Code executes? Assumptions hold? Action needed?
WritePreparedTxnDB YES (scans) YES - provider is read-path only Safe
ReadOnly DB YES YES Safe
CompactionService NO N/A Safe
User-defined timestamps YES YES - orthogonal Safe
BlobDB NO - separate path N/A Safe
Concurrent iterators YES Provider must be thread-safe Per API contract
Direct IO + non-aligned reads YES Alignment contract covers this Verify with tests
Immortal tables (mmap) YES Provider bypasses cache, mmap may still be used Verify block_contents_pinned

Positive Observations

  • Well-designed ReadScopedBlockBufferProvider API with clear ownership semantics via SharedCleanablePtr
  • AllocateReadScopedBlockBuffer validation provides defense-in-depth against misbehaving providers
  • StressReadScopedBlockBufferProvider in db_stress provides comprehensive invariant checking
  • AlignedBuffer::AllocateNewBufferWithStatus is a clean extension for external allocation
  • LookupAndPinBlocksInCache graceful null handling is the right approach
  • Good test coverage for direct IO external buffer paths

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit cbf3237


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit cbf3237


Summary

Large, well-structured PR that introduces read-scoped block buffer providers and cache bypass for scan reads. The core concept is sound and the implementation covers many paths (direct I/O, async I/O, compressed/uncompressed blocks, stress testing). However, several correctness gaps exist where the new bypass and provider logic is not consistently applied across all read paths, and the IODispatcher sync fallback ignores the new settings entirely.

High-severity findings (3):

  • [io_dispatcher_imp.cc: SyncRead] ReadSet::SyncRead() hardcodes use_cache=true, ignoring bypass_block_cache and provider settings when async I/O falls back to sync.
  • [io_dispatcher_imp.cc: SubmitJob] SubmitJob() unconditionally calls LookupAndPinBlocksInCache for all blocks, performing cache lookups even when bypass_block_cache=true.
  • [io_dispatcher_imp.cc: ExecuteSyncIO] ExecuteSyncIO() still uses the old AlignedBuf type and passes pointer/nullptr to MultiRead, not the new AlignedBuffer& reference. This will fail to compile against the changed MultiRead signature.

The full review with 5 medium and 5 low findings, cross-component analysis, and positive observations is in review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit dc9c2e6


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit dc9c2e6


Summary

Large, well-structured PR that introduces read-scoped block buffer providers for scan reads, allowing caller-provided storage to bypass the block cache. The refactoring of AlignedBuf to AlignedBuffer for direct I/O paths is clean and the stress test provider is thorough. However, there are correctness issues in the IODispatcher integration that need attention.

High-severity findings (2):

  • [util/io_dispatcher_imp.cc:SyncRead] ReadSet::SyncRead() hardcodes use_cache=true, defeating bypass_data_block_cache and provider-based cache bypass for blocks that fall back to synchronous reads.
  • [util/io_dispatcher_imp.cc:SubmitJob] SubmitJob() unconditionally calls LookupAndPinBlocksInCache() even when bypass_data_block_cache=true, performing unnecessary cache lookups that the feature is designed to avoid.
Full review (click to expand)

Findings

🔴 HIGH

H1. ReadSet::SyncRead hardcodes use_cache=true, bypassing cache bypass logic -- util/io_dispatcher_imp.cc:356
  • Issue: The existing ReadSet::SyncRead() method calls RetrieveBlock<Block_kData> with use_cache=true and use_block_cache_for_lookup=true. When bypass_data_block_cache=true or a read_scoped_block_buffer_provider is configured, blocks that cannot be served from async I/O or the initial SubmitJob dispatch fall back to SyncRead() via ReadIndex(). These blocks will be read through the cache, violating the caller's intent. The NewDataBlockIterator() path correctly uses ShouldUseDataBlockCacheForIterator(), but SyncRead() does not.
  • Root cause: The SyncRead fallback path was not updated to respect the new cache bypass and provider configurations.
  • Suggested fix: Compute use_cache using ShouldUseDataBlockCacheForIterator(rep->table_options, job_->job_options.read_options, job_->job_options.bypass_data_block_cache) and pass it to RetrieveBlock. When not using cache, also pass the ReadScopedBlockBufferProviderRef to enable provider-backed block storage.
H2. SubmitJob unconditionally performs cache lookup even with bypass -- util/io_dispatcher_imp.cc:713
  • Issue: In SubmitJob(), the loop at line 709 calls LookupAndPinBlocksInCache() for every block handle regardless of bypass_data_block_cache or provider configuration. While the PR correctly changes LookupAndPinBlocksInCache() to handle a null block cache (by returning OK early), when a block cache IS configured but bypass_data_block_cache=true, the function still performs a full cache lookup (including cache key computation, statistics updates, and potential hash table lookup). This is unnecessary overhead for the bypass case and could pollute cache statistics.
  • Root cause: The SubmitJob method pre-dates the bypass feature and was not updated to skip cache lookups.
  • Suggested fix: Guard the cache lookup with the same ShouldUseDataBlockCacheForIterator() check. When bypassing, add all blocks directly to block_indices_to_read without attempting cache lookup.

🟡 MEDIUM

M1. ReadSet destructor memory release logic change may double-release -- util/io_dispatcher_imp.cc:~ReadSet
  • Issue: The old destructor guarded memory release with if (auto dispatcher_data = dispatcher_data_.lock()) and checked block_sizes_[i] > 0 && pinned_blocks_[i].GetValue(). The new code checks pinned_blocks_[i].GetValue() || async_io_map_.find(i) != async_io_map_.end() and calls ReleasePrefetchMemory(i). For blocks that were sync-read during ReadIndex() fallback (Case 3 in ReadIndex), block_sizes_[i] may be set (from SubmitJob) but TryAcquireMemory() was never called. The new destructor would call ReleasePrefetchMemory() for these blocks, potentially releasing memory that was never acquired, corrupting the dispatcher's memory accounting.
  • Suggested fix: ReleasePrefetchMemory() should check whether memory was actually acquired for this block before calling dispatcher_data->ReleaseMemory(). Consider adding a per-block flag indicating whether memory was acquired.
M2. GetBlockContents has_trailer not set for provider-backed uncompressed blocks -- table/block_fetcher.cc:GetBlockContents
  • Issue: In the new provider path within GetBlockContents(), when read_scoped_buf_lease_.cleanup.get() != nullptr && compression_type() == kNoCompression, the code sets contents_->data and contents_->cleanup but does NOT set contents_->has_trailer (the #ifndef NDEBUG block at the end of GetBlockContents only runs for the else branch that goes through heap_buf_). The provider path returns early via the else if branch, skipping the final has_trailer assignment. This could cause debug-mode assertion failures in code that expects has_trailer to be set.
  • Suggested fix: Add #ifndef NDEBUG contents_->has_trailer = footer_.GetBlockTrailerSize() > 0; #endif to the provider branch, matching the pattern used in the non-provider code paths.
M3. AlignedBuffer::AllocateNewBufferWithStatus leaves stale state on error -- util/aligned_buffer.h:AllocateNewBufferWithStatus
  • Issue: If the external allocator returns an error (null data, short buffer, misaligned, etc.), the function returns the error Status but does NOT reset bufstart_, capacity_, or buf_ from their previous values. If the AlignedBuffer had a previous allocation, it retains that stale state. Subsequent calls to BufferStart() or Capacity() would return the old buffer's values, which could lead to use-after-free if the old buffer was already released.
  • Suggested fix: On error, leave the AlignedBuffer state unchanged (which the current code already does -- the issue is only if the allocator partially modifies buf_ via the owner field before returning error). Alternatively, clear internal state on error.
M4. std::shared_ptr in ReadOptions adds atomic overhead on every copy -- include/rocksdb/options.h:ReadOptions
  • Issue: read_scoped_block_buffer_provider is a std::shared_ptr in ReadOptions. ReadOptions is copied frequently in hot paths (e.g., per-iterator creation, per-MultiGet call). Each copy increments/decrements an atomic reference count. This is the first shared_ptr field in ReadOptions, setting a precedent that could accumulate overhead.
  • Suggested fix: Consider using a raw pointer or UnownedPtr with the caller responsible for lifetime, similar to table_index_factory. Alternatively, document that this is intentional for the experimental API.

🟢 LOW / NIT

L1. BlockContents default constructor leaves backing_size uninitialized implicitly -- table/format.h:BlockContents
  • Issue: size_t backing_size = 0 has an in-class initializer, but the copy of BlockContents(const Slice& _data) constructor does not mention it. The in-class initializer handles this correctly in C++11+, but it may be clearer to be explicit in the move assignment operator comment.
L2. CreateAndPinBlockFromBuffer test sync point captures by value unnecessarily -- util/io_dispatcher_imp.cc:CreateAndPinBlockFromBuffer
  • Issue: The #ifndef NDEBUG block creates test_read_buffer_cleanup as a copy of read_buffer_cleanup, then uses effective_read_buffer_cleanup to switch between test and production. The copy involves atomic ref count operations. This is debug-only so performance is not critical, but the pattern is unnecessarily complex.
L3. NOLINTNEXTLINE comment on stress gflag -- db_stress_tool/db_stress_gflags.cc
  • Issue: Only one of the two new DEFINE_bool macros has a NOLINTNEXTLINE suppression. The read_scoped_block_buffer_provider flag definition is missing it, which may trigger a clang-tidy warning.
L4. Missing db_bench support for read_scoped_block_buffer_provider -- tools/db_bench_tool.cc
  • Issue: The PR adds bypass_data_block_cache to db_bench but does NOT add support for configuring read_scoped_block_buffer_provider in db_bench. For benchmarking the provider feature, users would need to modify db_bench. Consider adding a built-in simple provider for benchmarking.

Cross-Component Analysis

Context Affected? Analysis
WritePreparedTxnDB No Provider only affects read paths, not visibility or transactions
ReadOnly DB / SecondaryInstance Safe Provider-backed reads work identically to normal reads
CompactionService / Remote No Provider is on ReadOptions which is per-read, not serialized
User-defined timestamps Safe Provider is orthogonal to key ordering
BlobDB Safe Blob reads use separate BlobFileReader::ReadFromFile, which was correctly updated
FIFO / Universal compaction No Provider only affects scan reads, not compaction
Prefix seek Safe Provider is block-level, independent of seek mode
Concurrent writers Safe Provider allocations use per-provider synchronization
Snapshots Safe Provider is memory management, not visibility

Positive Observations

  • The StressReadScopedBlockBufferProvider is well-designed with comprehensive invariant checking (alignment, size, lifecycle tracking, abort on violation).
  • The AlignedBuffer external allocator design cleanly separates allocation policy from the buffer management, enabling provider-backed direct I/O without duplicating the read loop.
  • The AllocateReadScopedBlockBuffer() validation function properly checks all lease contract requirements (non-null data, sufficient size, alignment, non-null cleanup).
  • The refactoring from AlignedBuf (type alias for unique_ptr<void>) to AlignedBuffer (full class) for direct I/O paths is a good cleanup that makes the ownership model more explicit.
  • The ShouldUseDataBlockCacheForIterator() centralization is a good pattern for consistent cache bypass decisions.
  • Compressed block decompression into provider-backed memory avoids an extra copy that would otherwise be needed.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@xingbowang xingbowang marked this pull request as ready for review June 1, 2026 12:07
@xingbowang

Copy link
Copy Markdown
Contributor Author
  • H1: not valid. ReadSet::SyncRead() now computes use_data_block_cache = ShouldUseDataBlockCacheForIterator(...) and passes it to RetrieveBlock(). When cache is bypassed/provider is set, RetrieveBlock() gets use_cache=false, and then passes the ReadScopedBlockBufferProvider into
    ReadAndParseBlockFromFile().
  • H2: not valid. SubmitJob() now checks use_data_block_cache before cache lookup. If false, it directly puts all blocks in block_indices_to_read; LookupAndPinBlocksInCache() is only called in the else branch.
  • M1: not valid as stated. Sync fallback through ReadIndex() moves the sync-read block out of pinned_blocks_ before returning, so the destructor’s pinned_blocks_[i].GetValue() condition will not fire for successfully consumed fallback blocks. Pending never-dispatched blocks also are not in async_io_map_,
    so they are not released.
  • M2: not valid. GetBlockContents() sets contents_->has_trailer after the whole if / else if / else chain, so the provider-backed uncompressed branch also gets it in debug builds.
  • M3: not valid as stated. AllocateNewBufferWithStatus() validates the local ExternalAllocation first and only assigns bufstart_, capacity_, cursize_, and buf_ after all checks pass. On error, the previous AlignedBuffer state is intentionally unchanged and still owned by buf_.
@meta-codesync

meta-codesync Bot commented Jun 1, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@xingbowang

Copy link
Copy Markdown
Contributor Author

For M4, it is true that we are doing 1-3 copies on ReadOptions depending on the public API

  • NewIterator: 2 copies
    • public _read_options -> local read_options
    • local read_options -> ArenaWrappedDBIter::read_options_
  • NewMultiScan: 3 copies
    • public read_options -> MultiScan::read_options
    • MultiScan::read_options_ -> DBImpl::NewIterator() local read_options
    • local read_options -> ArenaWrappedDBIter::read_options_
  • Get: 1 copy
    • public _read_options -> local read_options
  • MultiGet: 1 copy
    • public _read_options -> local read_options
  • NewIterators: 2 copies per iterator path
    • public _read_options -> local read_options
    • local read_options -> each ArenaWrappedDBIter::read_options_

There are multiple options to avoid this.

  • Option 1: limit the feature to iterator only
    • Bind the shared_ptr to DBIterator, then set it as a private var in ReadOptions with raw pointer type
    • Pro: safe. no copy. limit the scope to iterator, which is what we need right now.
    • Con: Make it harder to support this in future for Get and MultiGet. Private var in public API ReadOptions is strange. Some feature uses BlockBasedIterator directly would require separate plumbing.
  • Option 2: Use a new ReadScope like what we have for snapshot and bounds.
    • Create a ReadScope object to hold the shared_ptr life time, then use a raw pointer to ReadScope inside ReadOptions.
    • Pro: Similar to snapshot and bounds
    • Con: It is similar to using a raw pointer.

Comparing raw pointer with ReadScope, the difference is semantic:

  • Raw ReadScopedBlockBufferProvider*: exposes one implementation detail directly in ReadOptions.
  • Raw ReadScope*: exposes a lifetime container / read context. The provider is one capability owned by that context.

Why ReadScope* is cleaner:

  • It makes lifetime explicit: “this scope must outlive the iterator/read operation.”
  • It can later hold other read-scoped state, not just the block buffer provider.
  • It gives Get/MultiGet a natural future API without adding new parameters.
  • It avoids making ReadOptions look like it owns provider memory or that the provider applies broadly everywhere.

So implementation-wise both are raw pointers; API-wise ReadScope* is a better abstraction.

@meta-codesync

meta-codesync Bot commented Jun 1, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 5036514


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 5036514


Summary

This PR adds a well-structured ReadScopedBlockBufferProvider API for scan reads, cache bypass for MultiScan, and extends AlignedBuffer/RandomAccessFileReader for external aligned allocations. The design is sound overall, but the diff is truncated so some areas could not be fully verified. Two high-severity issues were identified in the IODispatcher integration.

High-severity findings (2):

  • [io_dispatcher_imp.cc] ReadSet::SyncRead() always passes use_cache=true, ignoring bypass_data_block_cache and provider settings -- blocks that fall back to sync read will use the cache even when bypass is configured.
  • [io_dispatcher_imp.cc] ExecuteSyncIO() still uses old AlignedBuf + pointer-based MultiRead API; must be updated to use AlignedBuffer& reference and provider-backed buffers when configured.
Full review (click to expand)

Findings

🔴 HIGH

H1. ReadSet::SyncRead() ignores cache bypass -- util/io_dispatcher_imp.cc:394
  • Issue: SyncRead() calls RetrieveBlock<Block_kData>() with hardcoded use_cache=true and use_block_cache_for_lookup=true. When bypass_data_block_cache=true or a read-scoped provider is configured, blocks that couldn't be prefetched fall through to SyncRead() and re-enter the cache path.
  • Root cause: ShouldUseDataBlockCacheForIterator() is integrated into NewDataBlockIterator and CreateAndPinBlockFromBuffer but not SyncRead.
  • Suggested fix: Compute use_cache from ShouldUseDataBlockCacheForIterator() and pass it to RetrieveBlock. For the no-cache path, construct the block directly.
H2. ExecuteSyncIO() not updated for new MultiRead signature -- util/io_dispatcher_imp.cc:1072
  • Issue: MultiRead changed from AlignedBuf* to AlignedBuffer&. ExecuteSyncIO() still passes direct_io ? &aligned_buf : nullptr, which won't compile. When a provider is configured, this path should also use provider-backed buffers.
  • Suggested fix: Replace with AlignedBuffer direct_io_buffer; and pass by reference. Propagate provider cleanup to CreateAndPinBlockFromBuffer.

🟡 MEDIUM

M1. ExecuteAsyncIO() may not use provider-backed direct IO buffer
  • Issue: Async IO path calls ReadAsync() with &async_state->aligned_buf. The AsyncIOState gains new fields but it's unclear from the truncated diff if ExecuteAsyncIO() fully uses provider-backed buffers.
M2. LookupAndPinBlocksInCache still runs for all blocks before bypass check
  • Issue: SubmitJob() does cache lookup for ALL blocks unconditionally, even with bypass_data_block_cache=true.
  • Suggested fix: Guard with ShouldUseDataBlockCacheForIterator().
M3. AlignedBuffer::AllocateNewBufferWithStatus() doesn't support copy_data
  • Issue: Only handles fresh allocation; if a caller needs data preservation with a custom allocator, data will be lost.
M4. std::function overhead in every AlignedBuffer instance
  • Issue: 32+ bytes added to all AlignedBuffer instances for the common no-provider case.
M5. Raw pointer ReadOptions::read_scoped_block_buffer_provider lifetime risk
  • Issue: ReadOptions is copied throughout the stack. Provider lifetime is the caller's responsibility but hard to uphold given internal copying.

🟢 LOW / NIT

L1. ReadScopedBlockBufferProviderRef type verbosity -- could use a raw pointer
L2. MakeReadScopedAlignedBuffer lambda captures by value unnecessarily
L3. Missing NOLINTNEXTLINE on read_scoped_block_buffer_provider gflag
L4. BlockContents::operator=(BlockContents&&) not marked noexcept

Cross-Component Analysis

Context Affected? Assessment
WritePreparedTxnDB No Orthogonal to read_callback
ReadOnly DB Safe Read-only operation
CompactionService Safe Compaction uses own ReadOptions
Compaction reads Safe Provider not set in compaction ReadOptions
Secondary cache Safe Provider bypasses data-block cache entirely

Positive Observations

  • Clean provider/cache separation via BlockContents::cleanup vs allocation
  • Comprehensive validation in AllocateReadScopedBlockBuffer()
  • Zero-copy uncompressed blocks and direct decompression into provider storage
  • Thorough stress test provider with leak detection
  • bypass_data_block_cache properly propagated through all MultiScanArgs operations

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
Document that successful ReadScopedBlockBufferProvider leases are cleaned up even when later I/O or decompression fails.

Ignore the provider when mmap reads are enabled, and pass mmap eligibility through the iterator and MultiScan data-block cache/provider decisions.

Avoid provider-backed serialized read scratch for possibly compressed data blocks. Compressed blocks now use normal read scratch and decompress into provider-backed final block contents; actual uncompressed blocks from possibly compressed tables are copied once into provider-backed final contents.

Add dispatcher and iterator coverage for mmap ignoring the provider, and verify compressed direct reads do not use aligned provider scratch.
Document that the provider backs final data-block contents, while RocksDB may still use ordinary temporary scratch for serialized block bytes when a block may be compressed.

Call out that mmap reads ignore the provider and that data-block cache bypass options do not affect index/filter blocks.
Use ReadOptions::read_scoped_block_buffer_provider as the signal for supported scan data-block reads to bypass the data-block cache. Remove MultiScanArgs::bypass_data_block_cache and the corresponding IODispatcher job option, stress flag, and db_bench flag.

Update IODispatcher tests to exercise the provider-backed cache bypass path directly and keep provider ownership checks after read sets are released.
Remove stale wording left from the explicit MultiScan data-block cache bypass option. Rename local test variables so the cache-bypass tests describe provider-backed jobs/read sets instead of a bypass option.
Document ReadScopedBlockBufferProvider::Lease in terms of final provider-backed BlockContents ownership instead of enumerating implementation-specific ways RocksDB may populate the allocation.
Describe the provider as one API configured through ReadOptions and clarify that supported block-based table iterator scans and MultiScan reads share the provider-backed data-block cache bypass behavior.
Explain why read_buffer_requires_cleanup exists: known-uncompressed provider-backed file-read buffers must attach their cleanup directly to BlockContents, while maybe-compressed reads can use ordinary scratch and copy/decompress into provider-backed final contents later.
…ained-provider

# Conflicts:
#	util/io_dispatcher_imp.cc
#	util/io_dispatcher_test.cc
ReserveThreads() depends on the background thread pool's waiting-thread accounting. Under high process-level contention, the test can reach a sync point before the expected number of threads is reservable, producing transient partial reservations such as 1 instead of 2 or 0 instead of 1.

Replace the cleanup-only workaround with an eventual assertion for positive reservation checks. Partial successful reservations are released before retrying so the retry does not perturb the test state, while zero-capacity checks remain immediate.
…ained-provider

# Conflicts:
#	unreleased_history/bug_fixes/compaction_service_inaccurate_input_record_count
@meta-codesync

meta-codesync Bot commented Jun 6, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 7c8c8e5


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 7c8c8e5


Summary

This PR adds an experimental ReadScopedBlockBufferProvider API for scan reads to use caller-provided storage for data-block contents, bypassing the data-block cache. The changes are well-structured and cover the full read path (sync, async, direct I/O, compressed, uncompressed). The design follows existing RocksDB patterns (raw pointer in ReadOptions, provider abstraction). A few issues were identified, mostly around performance overhead and edge cases.

High-severity findings (2):

  • [util/aligned_buffer.h] New std::function<...> allocator_ member adds ~32 bytes to every AlignedBuffer instance, including the vast majority that never use an external allocator.
  • [table/block_based/block_based_table_reader.cc:1831] LookupAndPinBlocksInCache assert replaced with early OK return -- behavioral change needs careful caller verification.
Full review (click to expand)

Findings

🔴 HIGH

H1. AlignedBuffer std::function allocator adds memory overhead to all instances -- util/aligned_buffer.h
  • Issue: The new Allocator allocator_ field (std::function<Status(size_t, size_t, ExternalAllocation*)>) is stored in every AlignedBuffer instance. A std::function typically occupies 32-48 bytes. The overwhelming majority of AlignedBuffer instances (in FilePrefetchBuffer, BlockFetcher, WritableFileWriter, SequentialFileReader, ReadaheadRandomAccessFile, etc.) never use an external allocator.
  • Root cause: The allocator is stored as an instance member rather than being passed only when needed.
  • Suggested fix: Consider an alternative design that avoids per-instance overhead:
    1. Pass the allocator as a parameter to AllocateNewBufferUsingAllocator instead of storing it.
    2. Use a raw function pointer + void* context (8+8 bytes, and zero when null) instead of std::function.
    3. Create a derived AlignedBufferWithAllocator class.
      Note that FSAllocationPtr buf_ already uses std::function as the deleter, so AlignedBuffer already has that cost; this adds a second std::function member.
H2. LookupAndPinBlocksInCache assert replaced with early OK return -- block_based_table_reader.cc:1831
  • Issue: The old code had assert(block_cache) enforcing that this method is only called when a block cache exists. The PR replaces it with if (!block_cache) return Status::OK(). This changes the contract: callers (block_based_table_iterator.cc:927, io_dispatcher_imp.cc:739) now silently get an empty CachableEntry and proceed as if the lookup "succeeded" but found nothing.
  • Root cause: When the provider bypasses data-block cache, some code paths may still call LookupAndPinBlocksInCache. The assert would fire, so it was relaxed.
  • Suggested fix: This is likely correct behavior for the provider case, but add a comment explaining why the early return is safe and under what conditions block_cache can be null. Also consider adding a TEST_SYNC_POINT to validate in tests.

🟡 MEDIUM

M1. Cache bypass is all-or-nothing when provider is set
  • Issue: ShouldUseDataBlockCacheForIterator returns false whenever a provider is configured, meaning data blocks are never cached. There is no option to populate the cache alongside the provider.
  • Suggested fix: Document this trade-off clearly or consider a future option.
M2. Extra memcpy for maybe-compressed blocks that turn out uncompressed -- block_fetcher.cc
  • Issue: For maybe_compressed_ blocks, the provider is not used in PrepareBufferForBlockFromFile; instead, GetBlockContents copies to the provider via CopyBufferToReadScopedBlockContents. This adds an extra copy compared to the non-provider path.
  • Suggested fix: Document the expected cost.
M3. ReadAsync stores raw direct_io_buffer pointer in ReadAsyncInfo -- random_access_file_reader.cc
  • Issue: The direct_io_buffer_ pointer stored in ReadAsyncInfo is used in the async callback. If the caller destroys the buffer before the callback fires, this is use-after-free.
  • Suggested fix: Add a comment documenting the lifetime requirement.
M4. BlockContents dual ownership model -- table/format.h
  • Issue: BlockContents now has two ownership mechanisms: allocation and cleanup. Add an assertion that both are not set simultaneously.
M5. PR scope is large (36 files, 2342 insertions)
  • Suggested fix: Consider splitting in future iterations.

🟢 LOW / NIT

L1. Comments still reference AllocateNewBuffer -- util/aligned_buffer.h:54-57
L2. Raw pointer not documented for serialization/remote compaction contexts
L3. Stress test provider not enabled by default in any CI configuration

Cross-Component Analysis

Context Executes? Safe? Action?
WritePreparedTxnDB YES YES Safe
ReadOnly DB YES YES Safe
Remote compaction NO (raw ptr) N/A Document
mmap reads Ignored YES Safe
Get/MultiGet NO N/A Documented

Positive Observations

  • Clean Lease abstraction with SharedCleanablePtr lifetime management
  • Consistent with existing ReadOptions patterns (raw pointers)
  • Thorough stress test provider with alignment/leak/double-free validation
  • Good error propagation through provider allocation failures
  • Correct mmap bypass at GetReadScopedBlockBufferProvider level
  • Pragmatic flaky test fix for ReserveThreads
  • Good new direct I/O test coverage

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented Jun 6, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 3813ab2


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 3813ab2


Summary

Well-structured PR that adds an experimental read-scoped block buffer provider API for scan reads. The implementation correctly integrates with BlockFetcher, decompression, direct I/O, and cache bypass logic. The dual ownership model in BlockContents follows established patterns (similar to PinnableSlice). All identified call sites are updated. MultiGet/Get exclusion is intentional per the TODO. The main concerns are around missing integration-level tests for the table reader path, std::function overhead in direct I/O allocator, and ensuring async buffer lifetime safety.

High-severity findings (0):

No high-severity findings.

Medium findings (5): M1: Missing integration test for table scan with provider. M2: std::function overhead in AlignedBuffer::Allocator. M3: ReadAsyncInfo::direct_io_buffer_ raw pointer lifetime. M4: RetrieveMultipleBlocks intentionally does not pass provider (document limitation). M5: LookupAndPinBlocksInCache null cache contract relaxation.

The full review with detailed findings, cross-component analysis, and positive observations is in review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented Jun 8, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 3c58f6f


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented Jun 8, 2026

Copy link
Copy Markdown

@xingbowang merged this pull request in 8208586.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 3c58f6f


Summary

Well-structured addition of an experimental read-scoped block buffer provider API for scan reads. The design correctly bypasses data-block cache for provider-backed reads while preserving index/filter cache behavior. Key concerns center on the MultiRead assertion requiring all callers to provide a non-null AlignedBufferAllocationContext, potential persistent-cache insertion of provider-backed blocks, and performance overhead from std::function type erasure on direct-I/O hot paths. The BlockContents ownership extension (cleanup + backing_size) is clean and the AssertSingleOwner() invariant is well-placed.

High-severity findings (2):

  • [util/io_dispatcher_imp.cc:1074] MultiRead caller passes nullptr when direct_io=false, will fire assert(direct_io_buffer != nullptr) added in RandomAccessFileReader::MultiRead. (If this file is updated in the truncated diff portion, disregard.)
  • [table/block_fetcher.cc:~376+] Persistent cache insertion (InsertCompressedBlockToPersistentCacheIfNeeded / InsertUncompressedBlockToPersistentCacheIfNeeded) still runs for provider-backed blocks, potentially inserting provider-scoped data into the persistent cache where it will outlive the provider lease.

The full review with 2 HIGH, 6 MEDIUM, and 4 LOW findings plus cross-component analysis and positive observations is in review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
anand1976 pushed a commit that referenced this pull request Jun 10, 2026
Summary:
Add read-scoped block buffers for scan reads. Introduce an experimental read-scoped block buffer provider API, configured through ReadOptions::read_scoped_block_buffer_provider, so supported block-based table iterator scans and MultiScan data-block reads can use caller-provided read-scoped storage for final data-block contents.

When configured, supported provider-backed scan data-block reads bypass the data-block cache while preserving normal index/filter block-cache behavior. Known-uncompressed reads can attach provider cleanup to provider-backed read buffers without copying. Compressed reads decompress directly into provider-backed output, while maybe-compressed reads that turn out to be uncompressed copy once into provider-backed final contents. mmap reads ignore the provider.

Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts.

Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, data-block cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers.

Bonus change: Fixed a flaky test in ReserveThread

## Testing

- CI

Pull Request resolved: #14806

Reviewed By: anand1976

Differential Revision: D106999951

Pulled By: xingbowang

fbshipit-source-id: b1f23d4bab6318b6373ba2ca99a5c4d6a842dc5a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant