Skip MultiScan cache lookup when block cache is disabled by xingbowang · Pull Request #14785 · facebook/rocksdb

xingbowang · 2026-05-25T14:31:45Z

Summary

Return OK from BlockBasedTable::LookupAndPinBlocksInCache when a table has no block cache installed.
Teach IODispatcher to detect the missing block cache once and send all requested blocks directly to the read path instead of probing for impossible cache hits.
Add IO dispatcher coverage for no-block-cache tables, including verification that SubmitJob skips cache lookups.

Test Plan

CI

github-actions · 2026-05-25T14:37:10Z

✅ clang-tidy: No findings on changed lines

Completed in 266.9s.

Tables configured without a block cache cannot produce cache hits during MultiScan Prepare, so probing the cache for every candidate block only adds CPU overhead. Teach IODispatcher to detect the missing block cache once and send every requested block directly to the read path. Keep LookupAndPinBlocksInCache safe for no-cache configurations by returning OK when no block cache is installed. Existing cache-enabled behavior is unchanged: the dispatcher still looks up each block, pins hits, and reads only misses, including when ReadOptions::fill_cache is false. Tests: built io_dispatcher_test; ran timeout 60 ./io_dispatcher_test --gtest_filter=IODispatcherTest.LookupAndPinBlocksInCacheNoOpsWithoutBlockCache:IODispatcherTest.SubmitJobWithoutBlockCacheSkipsCacheLookup:IODispatcherTest.FillCacheFalseStillLooksUpExistingCacheEntries; ran git diff --check.

github-actions · 2026-05-25T15:02:33Z

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 5ab3614

❌ Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

Codex may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/codex-review [context] — Request a code review
/codex-query <question> — Ask about the PR or codebase

github-actions · 2026-05-25T15:20:13Z

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 5ab3614

Summary

Clean, low-risk PR that fixes a real bug (assertion failure when IODispatcher is used with no_block_cache=true) and adds a sensible optimization to skip N pointless cache lookups. Follows established RocksDB patterns throughout.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. `reserve()` over-allocates in common case -- `io_dispatcher_imp.cc:708`

Issue: block_indices_to_read.reserve(job->block_handles.size()) is called unconditionally. In the with-cache path, most blocks may be cache hits, so only a small fraction of indices are added. This over-allocates memory for the common high-cache-hit-rate case.
Root cause: The reserve() was added as a blanket optimization but is only truly beneficial for the no-cache path where all indices are added.

Suggested fix: Move reserve() inside the if (!has_block_cache) branch only, or use a conditional strategy:

if (!has_block_cache) {
  block_indices_to_read.reserve(job->block_handles.size());
  // ... populate all indices
} else {
  // no reserve -- let vector grow dynamically for the (likely small) miss set
}

M2. Consider `LIKELY` hint for `has_block_cache` -- `io_dispatcher_imp.cc:711`

Issue: Most production deployments use a block cache, making has_block_cache == true the overwhelmingly common case. Adding LIKELY(has_block_cache) would help branch prediction on the hot path.
Suggested fix: if (LIKELY(has_block_cache)) { ... } else { ... } (swap branch order).

🟢 LOW / NIT

L1. Pre-existing: silent block drop on `LookupAndPinBlocksInCache` error -- `io_dispatcher_imp.cc:717-718`

Issue: In the original (with-cache) code, if (!s.ok()) { continue; } skips blocks that fail cache lookup -- they are neither cached nor added to the disk-read list, resulting in a silent drop. This is a pre-existing issue, not introduced by this PR, but worth noting.
Suggested fix: Consider adding the block index to block_indices_to_read on error so it can be read from disk as a fallback.

L2. Simpler no-cache index fill -- `io_dispatcher_imp.cc:712-714`

Issue: The no-cache loop for (i=0; i<size; ++i) emplace_back(i) is a sequential iota. Could use std::iota or resize+iota for slightly cleaner code.

Suggested fix:

block_indices_to_read.resize(job->block_handles.size());
std::iota(block_indices_to_read.begin(), block_indices_to_read.end(), size_t{0});

L3. Header documentation for null-cache behavior -- `block_based_table_reader.h`

Issue: LookupAndPinBlocksInCache now has a new contract: returns Status::OK() with null out_parsed_block when no block cache is configured. This should be documented in the header declaration.
Suggested fix: Add a brief comment: // Returns OK with empty out_parsed_block when no block cache is configured.

Cross-Component Analysis

Context	Executes?	Assumptions hold?	Action needed?
WritePreparedTxnDB	YES (via iterator)	YES -- cache config is orthogonal to txn visibility	Safe
ReadOnly DB	YES	YES -- common no-cache scenario	Enabled by this PR
CompactionService	YES	YES -- remote workers often use no_block_cache	Enabled by this PR
User-defined timestamps	YES	YES -- orthogonal	Safe
FIFO / Universal compaction	YES	YES -- compaction style doesn't affect cache config	Safe
Prefix seek	YES	YES -- orthogonal	Safe
BlockCacheLookupForReadAheadSize	NO when no cache	N/A -- guarded by `block_cache.get()` check at iterator.cc:74	Safe

Assumption stress test results:

"Returning OK with null is safe for all callers" -- VERIFIED. Both callers (SubmitJob and BlockCacheLookupForReadAheadSize) correctly handle OK-with-null as cache miss. BlockCacheLookupForReadAheadSize is additionally never called when no cache.
"Checking block_cache once is sufficient" -- VERIFIED. rep_->table_options is immutable after table construction.
"Dictionary skip is safe" -- VERIFIED. Dictionary is only needed for block_cache.LookupFull() decompression context. No cache means no LookupFull. Normal RetrieveBlock handles its own decompression.

Positive Observations

Correct pattern usage: Follows the established if (!block_cache) guard pattern used throughout RocksDB (MaybeReadBlockAndLoadToCache, EraseFromCache, TEST_BlockInCache).
Good test design: Three well-structured tests covering: (1) direct API behavior, (2) SubmitJob integration with sync point verification that cache is NOT consulted, (3) regression guard ensuring fill_cache=false still does lookups when cache exists.
Sensible optimization: Eliminates N cache key generations + N dictionary lookups + N cache probes that would all return null anyway.
Enables real use cases: CompactionService remote workers, embedded systems, and streaming workloads frequently use no_block_cache=true.
Appropriate use of TEST_SYNC_POINT: Allows precise verification that the function is entered (or not) without introducing flakiness.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

Claude may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/claude-review [context] — Request a code review
/claude-query <question> — Ask about the PR or codebase

meta-cla Bot added the CLA Signed label May 25, 2026

xingbowang force-pushed the codex/multiscan-no-block-cache branch from ed91273 to 5ab3614 Compare May 25, 2026 14:45

xingbowang marked this pull request as draft May 25, 2026 14:56

xingbowang closed this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip MultiScan cache lookup when block cache is disabled#14785

Skip MultiScan cache lookup when block cache is disabled#14785
xingbowang wants to merge 1 commit into
facebook:mainfrom
xingbowang:codex/multiscan-no-block-cache

xingbowang commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026

github-actions Bot commented May 25, 2026

Findings

🔴 HIGH

🟡 MEDIUM

M1. `reserve()` over-allocates in common case -- `io_dispatcher_imp.cc:708`

M2. Consider `LIKELY` hint for `has_block_cache` -- `io_dispatcher_imp.cc:711`

🟢 LOW / NIT

L1. Pre-existing: silent block drop on `LookupAndPinBlocksInCache` error -- `io_dispatcher_imp.cc:717-718`

L2. Simpler no-cache index fill -- `io_dispatcher_imp.cc:712-714`

L3. Header documentation for null-cache behavior -- `block_based_table_reader.h`

Cross-Component Analysis

Positive Observations

Labels

2 participants

Uh oh!

Conversation

xingbowang commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ clang-tidy: No findings on changed lines

github-actions Bot commented May 25, 2026

🟡 Codex Code Review

github-actions Bot commented May 25, 2026

✅ Claude Code Review

Summary

Findings

🔴 HIGH

🟡 MEDIUM

M1. reserve() over-allocates in common case -- io_dispatcher_imp.cc:708

M2. Consider LIKELY hint for has_block_cache -- io_dispatcher_imp.cc:711

🟢 LOW / NIT

L1. Pre-existing: silent block drop on LookupAndPinBlocksInCache error -- io_dispatcher_imp.cc:717-718

L2. Simpler no-cache index fill -- io_dispatcher_imp.cc:712-714

L3. Header documentation for null-cache behavior -- block_based_table_reader.h

Cross-Component Analysis

Positive Observations

Labels

2 participants

xingbowang commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

M1. `reserve()` over-allocates in common case -- `io_dispatcher_imp.cc:708`

M2. Consider `LIKELY` hint for `has_block_cache` -- `io_dispatcher_imp.cc:711`

L1. Pre-existing: silent block drop on `LookupAndPinBlocksInCache` error -- `io_dispatcher_imp.cc:717-718`

L2. Simpler no-cache index fill -- `io_dispatcher_imp.cc:712-714`

L3. Header documentation for null-cache behavior -- `block_based_table_reader.h`