Skip to content

External file ingestion, support fast prepare path from DB generated files (#14854)#14854

Closed
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D108440584
Closed

External file ingestion, support fast prepare path from DB generated files (#14854)#14854
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D108440584

Conversation

@joshkang97

@joshkang97 joshkang97 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary:

External SST ingestion can already skip the open-and-scan metadata path when ingesting files produced by SstFileWriter. This extends that fast path to live SST files produced by a DB, so applications that move DB-generated files between DBs can request prepared metadata directly from the source DB and pass it through ingestion.

The API is intentionally limited to live table files whose metadata is already available from the DB and cached table reader. Callers remain responsible for keeping the source file alive and unchanged until ingestion completes.

Reviewed By: xingbowang

Differential Revision: D108440584

@meta-cla meta-cla Bot added the CLA Signed label Jun 15, 2026
@meta-codesync

meta-codesync Bot commented Jun 15, 2026

Copy link
Copy Markdown

@joshkang97 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108440584.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 397.0s.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit fa09b9b


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync meta-codesync Bot changed the title External file ingestion, support fast prepare path from DB generated files Jun 15, 2026
joshkang97 added a commit to joshkang97/rocksdb that referenced this pull request Jun 15, 2026
…files (facebook#14854)

Summary:

External SST ingestion can already skip the open-and-scan metadata path when ingesting files produced by `SstFileWriter`. This extends that fast path to live SST files produced by a DB, so applications that move DB-generated files between DBs can request prepared metadata directly from the source DB and pass it through ingestion.

The API is intentionally limited to live table files whose metadata is already available from the DB and cached table reader. Callers remain responsible for keeping the source file alive and unchanged until ingestion completes.

Reviewed By: xingbowang

Differential Revision: D108440584
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit fa09b9b


Summary

Well-structured PR that cleanly extends the existing PreparedFileInfo fast-path to DB-generated files. The implementation correctly uses Version/CFD ref-counting to safely access FileMetaData outside the mutex, validates paths against live table files, and populates PreparedFileInfo in a format compatible with the existing ingestion pipeline. The test covers the happy path and two error cases with sync point verification that the fast path is taken.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. no_io=false allows blocking I/O but API docs imply metadata is "already available" -- db/db_impl/db_impl.cc
  • Issue: The PR description says "metadata is already available from the DB and cached table reader," but the implementation passes no_io=false to GetTableProperties, which means it CAN perform disk I/O if the table reader is not cached. This is functionally correct but the API documentation and PR description are misleading about the performance characteristics.
  • Root cause: For live files, the table reader is usually cached, but eviction can cause cache misses. With no_io=false, the call blocks on I/O to re-read properties. Alternatively, no_io=true would return Status::Incomplete() on cache miss, which would be surprising to callers.
  • Suggested fix: Either (a) document in the db.h API comment that this may perform I/O if the table reader has been evicted from cache, or (b) keep the current behavior but clarify the PR description. The current behavior (allowing I/O) is the safer choice.
M2. Test does not verify smallest/largest InternalKey correctness -- db/external_sst_file_test.cc
  • Issue: The test checks file_size and key_largest_seqno but does not verify that prepared_file_info->smallest and prepared_file_info->largest are correct. These are the most critical fields for ingestion correctness.
  • Root cause: Omission in test coverage.
  • Suggested fix: Add assertions like:
    ParsedInternalKey smallest;
    ASSERT_OK(ParseInternalKey(prepared_file_info->smallest.Encode(), &smallest, false));
    ASSERT_EQ(smallest.user_key.ToString(), Key(10));
M3. Missing test for files with range deletions -- db/external_sst_file_test.cc
  • Issue: DB-generated files can contain range deletions, which affect the smallest/largest bounds differently (range deletion endpoints use kTypeRangeDeletion with kMaxSequenceNumber). No test exists for the new DB-generated path with range deletions.
  • Suggested fix: Add a test that creates a DB file with both point keys and range deletions, extracts PreparedFileInfo, and verifies the bounds.
M4. Missing test for nullptr file_info argument -- db/external_sst_file_test.cc
  • Issue: The implementation correctly handles file_info == nullptr by returning InvalidArgument, but the test does not exercise this path.
  • Suggested fix: Add ASSERT_TRUE(source_db->GetPreparedFileInfoForExternalSstIngestion(file_path, nullptr).IsInvalidArgument());
M5. Missing test for files after bottom-level compaction (seqno zeroing) -- db/external_sst_file_test.cc
  • Issue: After bottom-level compaction, sequence numbers in keys can be zeroed out. The ingestion path handles this correctly (if (largest_seqno == 0) { smallest_seqno = 0; }), but this edge case is untested.
  • Suggested fix: Add a test that compacts a file to the bottommost level, extracts PreparedFileInfo, and ingests it.

🟢 LOW / NIT

L1. Comment removed from prepared_file_info.h without replacement -- table/prepared_file_info.h
  • Issue: The struct-level documentation was removed but no replacement was added. The options.h forward declaration has an updated comment, but the actual struct definition should also be documented.
  • Suggested fix: Add a brief comment above the struct.
L2. #include <atomic> added but not needed -- db/external_sst_file_test.cc
  • Issue: The diff adds #include <atomic> but the new test code doesn't use any atomic operations.
  • Suggested fix: Remove the unnecessary include.
L3. Release note could mention allow_db_generated_files requirement
  • Issue: The release note doesn't mention that the caller must set allow_db_generated_files=true when ingesting with the returned PreparedFileInfo.
  • Suggested fix: Add this usage requirement to the release note.

Cross-Component Analysis

Context Does code execute? Assumptions hold? Action needed?
WritePreparedTxnDB Yes (inherits DBImpl) Yes - read-only metadata extraction Safe
ReadOnly DB Possible Would work for metadata extraction Safe
SecondaryInstance Possible Files are available, metadata in version Safe
User-defined timestamps Yes FileMetaData keys include timestamps; ingestion path handles padding Safe
BlobDB Yes Blob references are in values not keys; metadata is correct Safe
FIFO/Universal compaction Yes Different compaction but FileMetaData format is the same Safe

Assumption stress-test results:

  1. "Path must exactly match" - Verified. String equality against TableFileName() output. Symlinks fail by design.
  2. "file_meta pointer valid after mutex release" - Verified. Version Ref prevents deallocation.
  3. "Seqno overwrite from FileDescriptor is correct" - Verified. fd.largest_seqno/fd.smallest_seqno track actual bounds.
  4. "InternalKeys from FileMetaData are valid for ingestion" - Verified. FileMetaData stores actual keys (not shortened), with real sequence numbers correct for allow_db_generated_files=true.

Positive Observations

  • Clean separation of concerns: metadata extraction is a pure read operation
  • Correct Ref/Unref pattern with RAII cleanup via Defer
  • Path validation prevents metadata extraction for non-owned files
  • Sync point test verifies the fast path is actually taken
  • Good error path coverage: resets file_info to nullptr on failure
  • Compatible with existing IngestExternalFileArg::file_infos infrastructure

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
…files (facebook#14854)

Summary:

External SST ingestion can already skip the open-and-scan metadata path when ingesting files produced by `SstFileWriter`. This extends that fast path to live SST files produced by a DB, so applications that move DB-generated files between DBs can request prepared metadata directly from the source DB and pass it through ingestion.

The API is intentionally limited to live table files whose metadata is already available from the DB and cached table reader. Callers remain responsible for keeping the source file alive and unchanged until ingestion completes.

Reviewed By: xingbowang

Differential Revision: D108440584
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit f4b9853


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit f4b9853


Summary

Clean, well-structured PR that extends the PreparedFileInfo fast-path to DB-generated SST files. The implementation correctly handles refcounting, mutex scope, and metadata extraction. The TOCTOU concern (file compacted after metadata retrieval) is documented as a caller responsibility.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. no_io=false may cause unexpected latency when table reader is not cached -- db/db_impl/db_impl.cc:6058
  • Issue: GetTableProperties is called with no_io=false, meaning it will open the SST file from disk if the table reader is not already in the table cache. This is documented in the API comment ("This may open the table file"), but callers might not expect a metadata-only call to perform significant I/O including reading the full index/filter blocks.
  • Root cause: The API intentionally trades latency for correctness (it needs real table properties). The comment in db.h does note this possibility.
  • Suggested fix: Consider adding a no_io option or documenting expected latency more prominently. Alternatively, the API doc is already sufficient -- this is a design choice, not a bug.
M2. Test does not verify smallest InternalKey or key_smallest_seqno -- db/external_sst_file_test.cc:787
  • Issue: The test verifies file_size and key_largest_seqno but does not verify:
    • prepared_file_info->smallest / prepared_file_info->largest InternalKeys
    • key_smallest_seqno
    • table_properties.num_entries or table_properties.num_range_deletions
      These are the critical fields consumed by GetIngestedFileInfoFromFileInfo and should be validated.
  • Suggested fix: Add assertions for smallest/largest keys (parse them and check user_key, seqno, type) and key_smallest_seqno.
M3. Missing test coverage for range deletions and multiple keys -- db/external_sst_file_test.cc
  • Issue: The test only covers a single point key. DB-generated files commonly have range deletions, multiple keys with varying seqnos, and files at different levels. The smallest/largest from FileMetaData include range deletion boundaries, and this path should be tested.
  • Suggested fix: Add a test case with Put + DeleteRange + Flush, then verify the prepared metadata bounds include the range deletion endpoints.

🟢 LOW / NIT

L1. Comment removed from prepared_file_info.h without full replacement -- table/prepared_file_info.h:15-17
  • Issue: The old comment ("Produced by SstFileWriter::Finish and consumed by ExternalSstFileIngestionJob...") was removed. While the PreparedFileInfo doc in options.h was updated, the internal header now has no doc comment at all. Since this is an internal-only header, the impact is low, but a brief comment would help future readers.
  • Suggested fix: Add a one-line comment like: // See PreparedFileInfo forward-declaration in options.h for documentation.
L2. file_infos doc comment simplified, lost some detail -- include/rocksdb/options.h:3007
  • Issue: The old file_infos comment explained the parallel-to-external_files semantics (same size and order) which was useful context. The new comment ("Optimizes for prepare performance, see PreparedFileInfo for details.") loses this structural requirement.
  • Suggested fix: Keep the note about parallel ordering: "Parallel to external_files (same size and order). Optimizes for prepare performance, see PreparedFileInfo for details."
L3. Test doesn't verify nullptr output parameter error path -- db/external_sst_file_test.cc
  • Issue: The implementation checks file_info == nullptr and returns InvalidArgument, but this path is not tested.
  • Suggested fix: Add: ASSERT_TRUE(source_db->GetPreparedFileInfoForExternalSstIngestion(file_path, nullptr).IsInvalidArgument());
L4. ReadOptions default-constructed without setting any options -- db/db_impl/db_impl.cc:6029
  • Issue: ReadOptions read_options; is default-constructed. This is fine for GetTableProperties, but for consistency with other call sites, consider whether io_activity or other fields should be set.
  • Suggested fix: Nit -- acceptable as-is since GetTableProperties only uses read_options for FindTable which primarily cares about no_io.
L5. Inconsistent #include <atomic> addition -- db/external_sst_file_test.cc:8
  • Issue: #include <atomic> was added to the test file but no std::atomic usage is visible in the new test code. This may be a leftover from a previous iteration.
  • Suggested fix: Remove #include <atomic> if unused.

Cross-Component Analysis

Context Does code execute? Assumptions hold? Action needed?
WritePreparedTxnDB YES (inherits DBImpl) YES -- metadata extraction is read-only Safe
ReadOnly DB YES (inherits DBImpl) YES -- method only reads metadata Safe
Secondary DB YES (inherits DBImpl) YES -- but secondary may have stale metadata Document caveat
User-defined timestamps YES YES -- FileMetaData.smallest/largest include timestamps, and GetIngestedFileInfoFromFileInfo handles UDT padding Safe
BlobDB YES PARTIAL -- SST with blob references could be ingested, but blob files aren't transferred. This is a pre-existing limitation, not introduced by this PR. Not this PR's scope
Concurrent compaction YES YES -- Version ref keeps FileMetaData alive. File could be deleted after Version unref, but that's the documented TOCTOU responsibility on the caller. Safe

Assumption Stress Test

Claim: "FileDescriptor seqno fields match actual file contents"

  • Precondition: The MANIFEST correctly tracks smallest/largest seqno for each file.
  • This is a fundamental RocksDB invariant maintained by compaction and flush. If it's violated, many other things break. The override is correct because TableProperties.key_largest_seqno may be UINT64_MAX for old SST files that predate this property, while FileDescriptor always has the correct values from the MANIFEST.
  • Verdict: Sound.

Claim: "Path must exactly match a live table file"

  • Precondition: Caller provides the exact path as reported by GetLiveFilesMetaData().
  • The validation compares against TableFileName(cf_paths, file_number, path_id). This could fail if the caller uses a different but equivalent path (e.g., with symlinks or different casing on case-insensitive filesystems). This is documented behavior and matches the strict-path design intent.
  • Verdict: Acceptable -- strict matching is the right choice for safety.

Positive Observations

  • Clean separation of concerns: The method only extracts metadata -- it doesn't modify any DB state (except potentially populating the table cache, which is a read-side cache).
  • Correct refcounting: The Defer pattern with file_version->Ref()/Unref() and file_cfd->Ref()/UnrefAndTryDelete() correctly keeps the metadata alive while accessing it outside the mutex.
  • Good error validation: The method validates the file name format, checks the file exists in a live version, and validates the path matches -- preventing misuse.
  • SyncPoint in test: The test uses SyncPoint to verify the fast path is actually taken, preventing silent fallback to the slow path.
  • API documentation: The db.h comment clearly documents the table cache side effect.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync meta-codesync Bot closed this in 5f66923 Jun 16, 2026
@meta-codesync meta-codesync Bot added the Merged label Jun 16, 2026
@meta-codesync

meta-codesync Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request has been merged in 5f66923.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment