Allow skipping lmax index and filter block prefetches for external file ingestion (#14859)#14859
Allow skipping lmax index and filter block prefetches for external file ingestion (#14859)#14859joshkang97 wants to merge 1 commit into
Conversation
|
@joshkang97 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108678511. |
✅ clang-tidy: No findings on changed linesCompleted in 261.9s. |
…le ingestion (facebook#14859) Summary: External SST ingestion can spend noticeable foreground commit time opening the newly ingested table reader and prefetching index/filter blocks, especially for large files going directly to the last level. This adds `IngestExternalFileOptions::prefetch_lmax_index_and_filter_blocks`, defaulting to existing behavior, so bulk-load callers can skip that commit-time metadata prefetch when they do not need the cache warmed immediately. The option is threaded through prepared ingestion handles into the manifest writer path and drives the existing `LoadTableHandlers()` `prefetch_index_and_filter_in_cache` argument. Benchmarks: ``` db_bench --benchmarks=ingestexternalfile --num=2200000 --ingest_external_file_num_batches=1 --ingest_external_file_batch_size=1 --ingest_external_file_use_file_info=true --ingest_external_file_fill_cache=true --cache_index_and_filter_blocks=true --bloom_bits=10 --statistics=true --stats_level=3 --ingest_external_file_prefetch_lmax_index_and_filter_blocks=<true|false> ``` | `ingest_external_file_prefetch_lmax_index_and_filter_blocks` | `rocksdb.ingest.external.file.run.micros` | `rocksdb.table.open.io.micros` | index/filter cache adds | last-level read bytes | | `true` | `3459 us` | `2560 us` | `1 / 1` | `3551559` | | `false` | `2112 us` | `1145 us` | `0 / 0` | `1333` | Differential Revision: D108678511
5bdcd6d to
63df8f1
Compare
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 63df8f1 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 63df8f1 SummaryClean, well-scoped PR that adds a useful performance knob for bulk-load callers. The implementation correctly threads a transient per-file flag through the ingestion path to selectively skip index/filter prefetching at commit time. No correctness issues found. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. FileMetaData struct bloat --
|
| Context | Affected? | Analysis |
|---|---|---|
| Non-ingestion FileMetaData | Safe | Default false means no skip. Only set true in AssignLevelsForOneBatch. |
| Manifest persistence | Correct | Not persisted. On restart, rebuilt as false. Correct since prefetch is only relevant at commit time. |
| VersionBuilder copy | Correct | ApplyFileAddition copies via new FileMetaData(meta), preserving the flag. |
| Multiple LoadTableHandlersHelper callers | Safe | Default false leaves non-ingestion callers unaffected. |
| MergeForSameColumnFamily | Correct | Per-file flags preserved; options equality enforced. |
| CreateEquivalentFileIngestingCompactions | Safe | Copies flag but it's irrelevant for compaction. |
Positive Observations
- Default
truepreserves backwards compatibility. - Correctly included in
operator== = defaultforIngestExternalFileOptions. - Good stress test randomization coverage.
- Well-integrated db_bench support.
- Test uses statistics tickers for precise verification.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary:
External SST ingestion can spend noticeable foreground commit time opening the newly ingested table reader and prefetching index/filter blocks, especially for large files going directly to the last level. This adds
IngestExternalFileOptions::prefetch_lmax_index_and_filter_blocks, defaulting to existing behavior, so bulk-load callers can skip that commit-time metadata prefetch when they do not need the cache warmed immediately.The option is threaded through prepared ingestion handles into the manifest writer path and drives the existing
LoadTableHandlers()prefetch_index_and_filter_in_cacheargument.Benchmarks:
|
ingest_external_file_prefetch_lmax_index_and_filter_blocks|rocksdb.ingest.external.file.run.micros|rocksdb.table.open.io.micros| index/filter cache adds | last-level read bytes ||
true|3459 us|2560 us|1 / 1|3551559||
false|2112 us|1145 us|0 / 0|1333|Differential Revision: D108678511