Skip to content

Allow skipping lmax index and filter block prefetches for external file ingestion (#14859)#14859

Open
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D108678511
Open

Allow skipping lmax index and filter block prefetches for external file ingestion (#14859)#14859
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D108678511

Conversation

@joshkang97

@joshkang97 joshkang97 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary:

External SST ingestion can spend noticeable foreground commit time opening the newly ingested table reader and prefetching index/filter blocks, especially for large files going directly to the last level. This adds IngestExternalFileOptions::prefetch_lmax_index_and_filter_blocks, defaulting to existing behavior, so bulk-load callers can skip that commit-time metadata prefetch when they do not need the cache warmed immediately.

The option is threaded through prepared ingestion handles into the manifest writer path and drives the existing LoadTableHandlers() prefetch_index_and_filter_in_cache argument.

Benchmarks:

db_bench --benchmarks=ingestexternalfile --num=2200000 --ingest_external_file_num_batches=1 --ingest_external_file_batch_size=1 --ingest_external_file_use_file_info=true --ingest_external_file_fill_cache=true --cache_index_and_filter_blocks=true --bloom_bits=10 --statistics=true --stats_level=3 --ingest_external_file_prefetch_lmax_index_and_filter_blocks=<true|false>

| ingest_external_file_prefetch_lmax_index_and_filter_blocks | rocksdb.ingest.external.file.run.micros | rocksdb.table.open.io.micros | index/filter cache adds | last-level read bytes |
| true | 3459 us | 2560 us | 1 / 1 | 3551559 |
| false | 2112 us | 1145 us | 0 / 0 | 1333 |

Differential Revision: D108678511

@meta-cla meta-cla Bot added the CLA Signed label Jun 16, 2026
@meta-codesync

meta-codesync Bot commented Jun 16, 2026

Copy link
Copy Markdown

@joshkang97 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108678511.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 261.9s.

…le ingestion (facebook#14859)

Summary:

External SST ingestion can spend noticeable foreground commit time opening the newly ingested table reader and prefetching index/filter blocks, especially for large files going directly to the last level. This adds `IngestExternalFileOptions::prefetch_lmax_index_and_filter_blocks`, defaulting to existing behavior, so bulk-load callers can skip that commit-time metadata prefetch when they do not need the cache warmed immediately.

The option is threaded through prepared ingestion handles into the manifest writer path and drives the existing `LoadTableHandlers()` `prefetch_index_and_filter_in_cache` argument. 

Benchmarks:

```
db_bench --benchmarks=ingestexternalfile --num=2200000 --ingest_external_file_num_batches=1 --ingest_external_file_batch_size=1 --ingest_external_file_use_file_info=true --ingest_external_file_fill_cache=true --cache_index_and_filter_blocks=true --bloom_bits=10 --statistics=true --stats_level=3 --ingest_external_file_prefetch_lmax_index_and_filter_blocks=<true|false>
```

| `ingest_external_file_prefetch_lmax_index_and_filter_blocks` | `rocksdb.ingest.external.file.run.micros` | `rocksdb.table.open.io.micros` | index/filter cache adds | last-level read bytes |
| `true` | `3459 us` | `2560 us` | `1 / 1` | `3551559` |
| `false` | `2112 us` | `1145 us` | `0 / 0` | `1333` |

Differential Revision: D108678511
@meta-codesync meta-codesync Bot changed the title Allow skipping lmax index and filter block prefetches for external file ingestion Jun 16, 2026
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 63df8f1


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 63df8f1


Summary

Clean, well-scoped PR that adds a useful performance knob for bulk-load callers. The implementation correctly threads a transient per-file flag through the ingestion path to selectively skip index/filter prefetching at commit time. No correctness issues found.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. FileMetaData struct bloat -- db/version_edit.h:340
  • Issue: Adding bool skip_index_and_filter_blocks_prefetch to FileMetaData increases the per-file memory footprint for every SST file in the DB, not just ingested ones. FileMetaData is one of the most frequently allocated structures in RocksDB (one per SST file, kept in memory for the entire file lifetime). The field is only useful during the single LoadTableHandlers call at ingestion commit time -- it's dead weight afterwards.
  • Root cause: Using a persistent in-memory struct (FileMetaData) to carry a transient signal. An alternative would be to pass the skip set directly to LoadTableHandlersHelper (e.g., a std::unordered_set<uint64_t> of file numbers to skip prefetch for), avoiding the struct bloat entirely.
  • Suggested fix: Consider passing prefetch-skip information out-of-band (e.g., via the files_meta vector in LoadTableHandlersHelper or a companion set), rather than adding a permanent field to FileMetaData. If the current approach is kept, add a comment noting the field is transient and only meaningful during ingestion commit.
M2. Missing test for num_levels=1 (Lmax == L0) -- db/external_sst_file_test.cc
  • Issue: When num_levels=1, Lmax is level 0. The condition file->picked_level == cfd_->NumberLevels() - 1 evaluates to picked_level == 0. This means with prefetch_lmax_index_and_filter_blocks=false, a file ingested to L0 in a single-level DB would skip prefetch. This is technically correct (L0 IS Lmax), but it's an edge case that should be tested.
  • Suggested fix: Add a test case with num_levels=1 to confirm and document intent.

🟢 LOW / NIT

L1. Test does not verify default (true) behavior -- db/external_sst_file_test.cc:468
  • Issue: No test verifies that prefetch_lmax_index_and_filter_blocks=true preserves existing behavior (Lmax files DO get index/filter cached).
  • Suggested fix: Add a test case that ingests to Lmax with the default and verifies cache adds.
L2. Option documentation could mention fill_cache interaction -- include/rocksdb/options.h:2955
  • Issue: The doc comment doesn't mention that fill_cache=false already suppresses block cache population entirely, making this option redundant in that case.
  • Suggested fix: Add a note about the interaction.
L3. Naming polarity inversion -- db/external_sst_file_ingestion_job.h:184 vs db/version_edit.h:342
  • Issue: IngestedFileInfo::prefetch_lmax_index_and_filter_blocks (positive) vs FileMetaData::skip_index_and_filter_blocks_prefetch (negative) inverts the boolean polarity. The conversion !file->prefetch_lmax_index_and_filter_blocks at the boundary is easy to get wrong during future modifications. Minor style concern.

Cross-Component Analysis

Context Affected? Analysis
Non-ingestion FileMetaData Safe Default false means no skip. Only set true in AssignLevelsForOneBatch.
Manifest persistence Correct Not persisted. On restart, rebuilt as false. Correct since prefetch is only relevant at commit time.
VersionBuilder copy Correct ApplyFileAddition copies via new FileMetaData(meta), preserving the flag.
Multiple LoadTableHandlersHelper callers Safe Default false leaves non-ingestion callers unaffected.
MergeForSameColumnFamily Correct Per-file flags preserved; options equality enforced.
CreateEquivalentFileIngestingCompactions Safe Copies flag but it's irrelevant for compaction.

Positive Observations

  • Default true preserves backwards compatibility.
  • Correctly included in operator== = default for IngestExternalFileOptions.
  • Good stress test randomization coverage.
  • Well-integrated db_bench support.
  • Test uses statistics tickers for precise verification.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant