Add read-scoped block buffers for scan reads#14806
Conversation
|
| Check | Count |
|---|---|
bugprone-argument-comment |
2 |
| Total | 2 |
Details
file/random_access_file_reader_test.cc (1 warning(s))
file/random_access_file_reader_test.cc:114:21: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
table/block_fetcher.cc (1 warning(s))
table/block_fetcher.cc:359:23: warning: argument name 'direct_io_buffer' in comment does not match parameter name 'direct_io_buffer_context' [bugprone-argument-comment]
Introduce the experimental ReadScopedBlockBufferProvider API and ReadOptions::read_scoped_block_buffer_provider so supported block-based table iterator and MultiScan data-block reads can use caller-provided read-scoped storage. Provider-backed scan reads bypass block cache, attach provider cleanup to uncompressed read buffers without copying, and decompress compressed blocks directly into provider-backed output. Add an independent MultiScanArgs::bypass_block_cache / IODispatcher JobOptions bypass knob for scan reads that should skip data-block cache lookup and insertion while preserving fill_cache=false behavior. Plumb the option through MultiScan, db_bench, db_stress, and IODispatcher. Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts. Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers.
0772171 to
4ae4eab
Compare
CreateAndOpenSST returns BlockBasedTable instances whose Rep keeps a reference to the InternalKeyComparator passed through TableReaderOptions. Keep that comparator in fixture-owned storage alongside the existing ImmutableOptions and EnvOptions storage so iterator creation does not read a stack object after the helper returns.
RandomAccessFileReader::Read tracks direct-I/O progress in a local current_size variable after adding caller-owned aligned buffers. When the read uses the internal temporary AlignedBuffer and copies into caller scratch, keep the buffer's CurrentSize in sync before calling AlignedBuffer::Read(). This avoids a debug assertion in checksum verification with use_direct_reads when no caller-owned direct I/O buffer is supplied.
Delete copy and move operations for the stress read-scoped provider, whose destructor validates allocation lifetime invariants. Also split the debug-only cleanup pointer selection in CreateAndPinBlockFromBuffer so clang-tidy no longer sees the initial value overwritten before use.
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit c5f5998 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit c5f5998 SummaryThis PR adds a well-structured High-severity findings (3):
Full review (click to expand)Findings🔴 HIGHH1.
|
| Context | Code executes? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (scans) | YES - provider is read-path only | Safe |
| ReadOnly DB | YES | YES | Safe |
| CompactionService | NO | N/A | Safe |
| User-defined timestamps | YES | YES - orthogonal | Safe |
| BlobDB | NO - separate path | N/A | Safe |
| Concurrent iterators | YES | Provider must be thread-safe | Per API contract |
| Direct IO + non-aligned reads | YES | Alignment contract covers this | Verify with tests |
| Immortal tables (mmap) | YES | Provider bypasses cache, mmap may still be used | Verify block_contents_pinned |
Positive Observations
- Well-designed
ReadScopedBlockBufferProviderAPI with clear ownership semantics viaSharedCleanablePtr AllocateReadScopedBlockBuffervalidation provides defense-in-depth against misbehaving providersStressReadScopedBlockBufferProviderin db_stress provides comprehensive invariant checkingAlignedBuffer::AllocateNewBufferWithStatusis a clean extension for external allocationLookupAndPinBlocksInCachegraceful null handling is the right approach- Good test coverage for direct IO external buffer paths
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit cbf3237 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit cbf3237 SummaryLarge, well-structured PR that introduces read-scoped block buffer providers and cache bypass for scan reads. The core concept is sound and the implementation covers many paths (direct I/O, async I/O, compressed/uncompressed blocks, stress testing). However, several correctness gaps exist where the new bypass and provider logic is not consistently applied across all read paths, and the IODispatcher sync fallback ignores the new settings entirely. High-severity findings (3):
The full review with 5 medium and 5 low findings, cross-component analysis, and positive observations is in ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit dc9c2e6 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit dc9c2e6 SummaryLarge, well-structured PR that introduces read-scoped block buffer providers for scan reads, allowing caller-provided storage to bypass the block cache. The refactoring of High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. ReadSet::SyncRead hardcodes use_cache=true, bypassing cache bypass logic --
|
| Context | Affected? | Analysis |
|---|---|---|
| WritePreparedTxnDB | No | Provider only affects read paths, not visibility or transactions |
| ReadOnly DB / SecondaryInstance | Safe | Provider-backed reads work identically to normal reads |
| CompactionService / Remote | No | Provider is on ReadOptions which is per-read, not serialized |
| User-defined timestamps | Safe | Provider is orthogonal to key ordering |
| BlobDB | Safe | Blob reads use separate BlobFileReader::ReadFromFile, which was correctly updated |
| FIFO / Universal compaction | No | Provider only affects scan reads, not compaction |
| Prefix seek | Safe | Provider is block-level, independent of seek mode |
| Concurrent writers | Safe | Provider allocations use per-provider synchronization |
| Snapshots | Safe | Provider is memory management, not visibility |
Positive Observations
- The
StressReadScopedBlockBufferProvideris well-designed with comprehensive invariant checking (alignment, size, lifecycle tracking, abort on violation). - The
AlignedBufferexternal allocator design cleanly separates allocation policy from the buffer management, enabling provider-backed direct I/O without duplicating the read loop. - The
AllocateReadScopedBlockBuffer()validation function properly checks all lease contract requirements (non-null data, sufficient size, alignment, non-null cleanup). - The refactoring from
AlignedBuf(type alias forunique_ptr<void>) toAlignedBuffer(full class) for direct I/O paths is a good cleanup that makes the ownership model more explicit. - The
ShouldUseDataBlockCacheForIterator()centralization is a good pattern for consistent cache bypass decisions. - Compressed block decompression into provider-backed memory avoids an extra copy that would otherwise be needed.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
|
For M4, it is true that we are doing 1-3 copies on ReadOptions depending on the public API
There are multiple options to avoid this.
Comparing raw pointer with ReadScope, the difference is semantic:
Why ReadScope* is cleaner:
So implementation-wise both are raw pointers; API-wise ReadScope* is a better abstraction. |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 5036514 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 5036514 SummaryThis PR adds a well-structured High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1.
|
| Context | Affected? | Assessment |
|---|---|---|
| WritePreparedTxnDB | No | Orthogonal to read_callback |
| ReadOnly DB | Safe | Read-only operation |
| CompactionService | Safe | Compaction uses own ReadOptions |
| Compaction reads | Safe | Provider not set in compaction ReadOptions |
| Secondary cache | Safe | Provider bypasses data-block cache entirely |
Positive Observations
- Clean provider/cache separation via
BlockContents::cleanupvsallocation - Comprehensive validation in
AllocateReadScopedBlockBuffer() - Zero-copy uncompressed blocks and direct decompression into provider storage
- Thorough stress test provider with leak detection
bypass_data_block_cacheproperly propagated through all MultiScanArgs operations
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Document that successful ReadScopedBlockBufferProvider leases are cleaned up even when later I/O or decompression fails. Ignore the provider when mmap reads are enabled, and pass mmap eligibility through the iterator and MultiScan data-block cache/provider decisions. Avoid provider-backed serialized read scratch for possibly compressed data blocks. Compressed blocks now use normal read scratch and decompress into provider-backed final block contents; actual uncompressed blocks from possibly compressed tables are copied once into provider-backed final contents. Add dispatcher and iterator coverage for mmap ignoring the provider, and verify compressed direct reads do not use aligned provider scratch.
Document that the provider backs final data-block contents, while RocksDB may still use ordinary temporary scratch for serialized block bytes when a block may be compressed. Call out that mmap reads ignore the provider and that data-block cache bypass options do not affect index/filter blocks.
Use ReadOptions::read_scoped_block_buffer_provider as the signal for supported scan data-block reads to bypass the data-block cache. Remove MultiScanArgs::bypass_data_block_cache and the corresponding IODispatcher job option, stress flag, and db_bench flag. Update IODispatcher tests to exercise the provider-backed cache bypass path directly and keep provider ownership checks after read sets are released.
Remove stale wording left from the explicit MultiScan data-block cache bypass option. Rename local test variables so the cache-bypass tests describe provider-backed jobs/read sets instead of a bypass option.
Document ReadScopedBlockBufferProvider::Lease in terms of final provider-backed BlockContents ownership instead of enumerating implementation-specific ways RocksDB may populate the allocation.
Describe the provider as one API configured through ReadOptions and clarify that supported block-based table iterator scans and MultiScan reads share the provider-backed data-block cache bypass behavior.
Explain why read_buffer_requires_cleanup exists: known-uncompressed provider-backed file-read buffers must attach their cleanup directly to BlockContents, while maybe-compressed reads can use ordinary scratch and copy/decompress into provider-backed final contents later.
…ained-provider # Conflicts: # util/io_dispatcher_imp.cc # util/io_dispatcher_test.cc
ReserveThreads() depends on the background thread pool's waiting-thread accounting. Under high process-level contention, the test can reach a sync point before the expected number of threads is reservable, producing transient partial reservations such as 1 instead of 2 or 0 instead of 1. Replace the cleanup-only workaround with an eventual assertion for positive reservation checks. Partial successful reservations are released before retrying so the retry does not perturb the test state, while zero-capacity checks remain immediate.
…ained-provider # Conflicts: # unreleased_history/bug_fixes/compaction_service_inaccurate_input_record_count
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 7c8c8e5 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 7c8c8e5 SummaryThis PR adds an experimental High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. AlignedBuffer
|
| Context | Executes? | Safe? | Action? |
|---|---|---|---|
| WritePreparedTxnDB | YES | YES | Safe |
| ReadOnly DB | YES | YES | Safe |
| Remote compaction | NO (raw ptr) | N/A | Document |
| mmap reads | Ignored | YES | Safe |
| Get/MultiGet | NO | N/A | Documented |
Positive Observations
- Clean
Leaseabstraction withSharedCleanablePtrlifetime management - Consistent with existing ReadOptions patterns (raw pointers)
- Thorough stress test provider with alignment/leak/double-free validation
- Good error propagation through provider allocation failures
- Correct mmap bypass at
GetReadScopedBlockBufferProviderlevel - Pragmatic flaky test fix for
ReserveThreads - Good new direct I/O test coverage
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 3813ab2 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 3813ab2 SummaryWell-structured PR that adds an experimental read-scoped block buffer provider API for scan reads. The implementation correctly integrates with BlockFetcher, decompression, direct I/O, and cache bypass logic. The dual ownership model in BlockContents follows established patterns (similar to PinnableSlice). All identified call sites are updated. MultiGet/Get exclusion is intentional per the TODO. The main concerns are around missing integration-level tests for the table reader path, High-severity findings (0): No high-severity findings. Medium findings (5): M1: Missing integration test for table scan with provider. M2: The full review with detailed findings, cross-component analysis, and positive observations is in ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106999951. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 3c58f6f ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
@xingbowang merged this pull request in 8208586. |
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 3c58f6f SummaryWell-structured addition of an experimental read-scoped block buffer provider API for scan reads. The design correctly bypasses data-block cache for provider-backed reads while preserving index/filter cache behavior. Key concerns center on the High-severity findings (2):
The full review with 2 HIGH, 6 MEDIUM, and 4 LOW findings plus cross-component analysis and positive observations is in ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
Summary: Add read-scoped block buffers for scan reads. Introduce an experimental read-scoped block buffer provider API, configured through ReadOptions::read_scoped_block_buffer_provider, so supported block-based table iterator scans and MultiScan data-block reads can use caller-provided read-scoped storage for final data-block contents. When configured, supported provider-backed scan data-block reads bypass the data-block cache while preserving normal index/filter block-cache behavior. Known-uncompressed reads can attach provider cleanup to provider-backed read buffers without copying. Compressed reads decompress directly into provider-backed output, while maybe-compressed reads that turn out to be uncompressed copy once into provider-backed final contents. mmap reads ignore the provider. Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts. Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, data-block cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers. Bonus change: Fixed a flaky test in ReserveThread ## Testing - CI Pull Request resolved: #14806 Reviewed By: anand1976 Differential Revision: D106999951 Pulled By: xingbowang fbshipit-source-id: b1f23d4bab6318b6373ba2ca99a5c4d6a842dc5a
Summary
Add read-scoped block buffers for scan reads. Introduce an experimental read-scoped block buffer provider API, configured through ReadOptions::read_scoped_block_buffer_provider, so supported block-based table iterator scans and MultiScan data-block reads can use caller-provided read-scoped storage for final data-block contents.
When configured, supported provider-backed scan data-block reads bypass the data-block cache while preserving normal index/filter block-cache behavior. Known-uncompressed reads can attach provider cleanup to provider-backed read buffers without copying. Compressed reads decompress directly into provider-backed output, while maybe-compressed reads that turn out to be uncompressed copy once into provider-backed final contents. mmap reads ignore the provider.
Extend AlignedBuffer and RandomAccessFileReader direct-I/O paths to support external aligned allocations, then use that support for read-scoped iterator, async I/O, and MultiRead scratch buffers. Centralize read-scoped I/O policy, keep coalesced async reads safe when blocks are released before completion, and validate provider lease contracts.
Add focused coverage for read-scoped ownership, compressed and uncompressed blocks, direct I/O, data-block cache bypass behavior, invalid provider leases, async release handling, and stress-test provider invariants. Add public API release notes for read-scoped block buffers.
Bonus change: Fixed a flaky test in ReserveThread
Testing