Skip to content

Fix backward iteration over embedded blob SSTs#14902

Open
pdillinger wants to merge 1 commit into
facebook:mainfrom
pdillinger:embedded_blob_fix1
Open

Fix backward iteration over embedded blob SSTs#14902
pdillinger wants to merge 1 commit into
facebook:mainfrom
pdillinger:embedded_blob_fix1

Conversation

@pdillinger

Copy link
Copy Markdown
Contributor

Summary:
D110140135/#14896 worked around a db_stress failure by disabling test_backward_scan whenever embedded-blob SSTs are ingested. The real limitation was in EmbeddedBlobResolvingIterator: DBIter requires the underlying iterator's value to be pinned for backward iteration (Prev/SeekForPrev), and rebuilt wide-column entity values (with same-file blob columns) were stored in an iterator-owned std::string that was cleared on every reposition, so IsValuePinned() returned false and DBIter returned NotSupported (or tripped the IsValuePinned() assertion in FindValueForCurrentKeyUsingSeek). Whole-value same-file blobs were already pinned and worked.

This change makes the rebuilt wide-column value pinnable like a whole-value blob: it is moved into a heap buffer and pinned into resolved_pinned_value_, whose cleanup ResetState() hands to the PinnedIteratorsManager across repositioning so the value survives backward iteration. The now-redundant resolved_value_ member is removed and value()/IsValuePinned() simplified. The Get/MultiGet resolution path (MaybeResolveEmbeddedValue) is unchanged. The db_stress test_backward_scan workaround is removed now that backward iteration over embedded blobs (including wide-column entities) is supported.

Test Plan:
Added DBBlobIndexTest.EmbeddedBlobBackwardIteration: ingests an embedded-blob SST with whole-value blobs plus a mixed wide-column entity (inline default column + embedded same-file blob column), then iterates backward via SeekToLast/Prev and SeekForPrev, verifying values and columns. Verified it fails before the fix (NotSupported at the entity) and passes after.

Full db_blob_index_test and sst_file_reader_test suites pass. make format-auto clean. Direct db_stress run with
ingest_external_file_with_embedded_blobs=1, use_put_entity_one_in=3, test_backward_scan=1, iterpercent>0 ran with no divergence/NotSupported/ assert.

…cebook#14896)

Summary:
D110140135/facebook#14896 worked around a db_stress failure by disabling
test_backward_scan whenever embedded-blob SSTs are ingested. The real
limitation was in EmbeddedBlobResolvingIterator: DBIter requires the
underlying iterator's value to be pinned for backward iteration
(Prev/SeekForPrev), and rebuilt wide-column entity values (with same-file
blob columns) were stored in an iterator-owned std::string that was cleared
on every reposition, so IsValuePinned() returned false and DBIter returned
NotSupported (or tripped the IsValuePinned() assertion in
FindValueForCurrentKeyUsingSeek). Whole-value same-file blobs were already
pinned and worked.

This change makes the rebuilt wide-column value pinnable like a whole-value
blob: it is moved into a heap buffer and pinned into resolved_pinned_value_,
whose cleanup ResetState() hands to the PinnedIteratorsManager across
repositioning so the value survives backward iteration. The now-redundant
resolved_value_ member is removed and value()/IsValuePinned() simplified.
The Get/MultiGet resolution path (MaybeResolveEmbeddedValue) is unchanged.
The db_stress test_backward_scan workaround is removed now that backward
iteration over embedded blobs (including wide-column entities) is supported.

Test Plan:
Added DBBlobIndexTest.EmbeddedBlobBackwardIteration: ingests an embedded-blob
SST with whole-value blobs plus a mixed wide-column entity (inline default
column + embedded same-file blob column), then iterates backward via
SeekToLast/Prev and SeekForPrev, verifying values and columns. Verified it
fails before the fix (NotSupported at the entity) and passes after.

Full db_blob_index_test and sst_file_reader_test suites pass. make
format-auto clean. Direct db_stress run with
ingest_external_file_with_embedded_blobs=1, use_put_entity_one_in=3,
test_backward_scan=1, iterpercent>0 ran with no divergence/NotSupported/
assert.
@pdillinger pdillinger requested a review from anand1976 June 30, 2026 23:47
@meta-cla meta-cla Bot added the CLA Signed label Jun 30, 2026
@github-actions

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 181.8s.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 517e5a9


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 517e5a9


Summary

This PR correctly fixes backward iteration over embedded blob SSTs by making rebuilt wide-column entity values pinnable via a heap-allocated buffer pinned into resolved_pinned_value_. The approach is sound: the PinSlice/DelegateCleanupsTo/Reset lifecycle is correct, the DBIter contract is satisfied, and the test coverage is appropriate. The change is well-scoped and removes the db_stress workaround cleanly.

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Heap allocation per wide-column value resolution on iterator path -- embedded_blob_resolving_iterator.h:340
  • Issue: Every resolved wide-column entity now incurs a new std::string heap allocation (plus the subsequent delete in cleanup). This is on the iterator hot path for SSTs with wide-column entities.
  • Root cause: The design chose a heap allocation to satisfy PinnableSlice's cleanup model. The previous resolved_value_ (stack-owned std::string member) avoided heap allocation but couldn't be pinned.
  • Mitigating factors: Wide-column entity resolution already performs heavyweight deserialization/serialization with multiple allocations (vector of WideColumn, rewritten_columns, etc.), so this additional allocation is marginal in relative terms. The PR description's FIXME comment in MaybeResolveEmbeddedValue (line 1514) already acknowledges this path is "very heavyweight."
  • Suggested improvement (not blocking): Consider using PinnableSlice::PinSelf() instead -- move the resolved value into resolved_pinned_value_'s internal self_space_ buffer and mark it as self-pinned. This would avoid the heap allocation while still satisfying the DelegateCleanupsTo contract (PinSelf doesn't register cleanups, so DelegateCleanupsTo is a no-op, but the data_ pointer would become dangling after Reset). So the current approach is actually necessary for correctness. This finding is informational only.
M2. No test for backward iteration with blob cache -- db_blob_index_test.cc:433
  • Issue: The new EmbeddedBlobBackwardIteration test uses default options which configure no explicit blob cache. The ResolveEmbeddedBlobPinned path (no BlobSource) uses a different pinning mechanism than ResolveEmbeddedBlobCached. Both paths should be tested for backward iteration to ensure the pinning works in both cases.
  • Suggested fix: Add a variant of the test with options.blob_cache = NewLRUCache(...) to exercise the cached blob path during backward iteration.

🟢 LOW / NIT

L1. The resolved_value_ removal simplifies the class nicely
  • The elimination of resolved_value_ and the if/else in value() is a clean simplification. The unified path through resolved_pinned_value_ reduces cognitive load and potential for inconsistency.
L2. Consider adding a forward-then-reverse direction-change test -- db_blob_index_test.cc
  • Issue: The test exercises SeekToLast+Prev and SeekForPrev, but doesn't test the direction-change case (e.g., Next followed by Prev on the same iterator). This is the path that exercises FindValueForCurrentKeyUsingSeek which has the assert(iter_.iter()->IsValuePinned()) at db_iter.cc:1464.
  • Suggested fix: Add a sub-test that does SeekToFirst, Next (to k2), then Prev (back to k1) to exercise the direction-change path.

Cross-Component Analysis

Context Does code execute? Assumptions hold? Action needed?
WritePreparedTxnDB YES (via DBIter) YES - pinning is orthogonal to visibility Safe
ReadOnly DB YES YES - read path only Safe
User-defined timestamps YES YES - pinning is independent of timestamp handling Safe
FIFO/Universal compaction N/A - ingested SSTs, not compaction-produced N/A Safe
No PinnedIteratorsManager Possible in forward-only iteration Safe - value_is_pinned_ check requires PIM Handled
Concurrent writers N/A - iterator operates on immutable SST N/A Safe
BlobDB (separate-file blobs) Code exits early (not same-file) YES Safe

Assumption Stress Test

Claim: "PinSlice assert(!pinned_) will never fire"

  • Precondition: resolved_pinned_value_ must not already be pinned when PinSlice is called.
  • For whole-value blob path: MaybeResolveEmbeddedValue pins directly into resolved_pinned_value_, then !value_pinned is false, so PinSlice is NOT called. Correct.
  • For wide-column path: MaybeResolveEmbeddedValue does NOT pin into resolved_pinned_value_ (value_pinned=false), so PinSlice is called on an unpinned PinnableSlice. Correct.
  • Guard: value_prepared_ prevents double-materialization. Safe.

Claim: "No double-free of heap string"

  • When PIM active: DelegateCleanupsTo moves cleanup to PIM, clearing resolved_pinned_value_'s cleanup list. Then Reset() calls DoCleanup() which is a no-op (function==nullptr). String deleted when PIM::ReleasePinnedData() runs.
  • When PIM inactive: DelegateCleanupsTo is NOT called. Reset() calls DoCleanup() which runs ReleaseResolvedValueBuffer, deleting the string. Correct.
  • Verified via Cleanable::DelegateCleanupsTo implementation (cleanable.cc:51-66): after delegation, source's cleanup_.function is set to nullptr.

Claim: "Slice returned by value() remains valid during backward iteration"

  • DBIter saves pinned_value_ = iter_.value() which is a Slice pointing to the heap string's data.
  • On reposition, ResetState() delegates the heap string's cleanup to PIM before Reset().
  • PIM keeps the cleanup alive until ReleasePinnedData() is called at the end of FindValueForCurrentKey.
  • The Slice in pinned_value_ remains valid throughout the backward scan. Correct.

Positive Observations

  • The fix is minimal and well-targeted -- it addresses the exact issue (wide-column values not being pinnable) without overengineering.
  • The resolved_value_ removal is a net simplification of the class.
  • The test directly verifies the previously-failing scenario (backward iteration over wide-column entity with embedded blob columns).
  • The db_stress workaround removal is clean -- the comment explaining why it was needed is now obsolete.
  • The cleanup callback pattern (static function + heap-allocated buffer) follows established RocksDB conventions for PinnableSlice lifecycle management.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@anand1976 anand1976 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@meta-codesync

meta-codesync Bot commented Jul 1, 2026

Copy link
Copy Markdown

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D110271453.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants