Fix backward iteration over embedded blob SSTs#14902
Conversation
…cebook#14896) Summary: D110140135/facebook#14896 worked around a db_stress failure by disabling test_backward_scan whenever embedded-blob SSTs are ingested. The real limitation was in EmbeddedBlobResolvingIterator: DBIter requires the underlying iterator's value to be pinned for backward iteration (Prev/SeekForPrev), and rebuilt wide-column entity values (with same-file blob columns) were stored in an iterator-owned std::string that was cleared on every reposition, so IsValuePinned() returned false and DBIter returned NotSupported (or tripped the IsValuePinned() assertion in FindValueForCurrentKeyUsingSeek). Whole-value same-file blobs were already pinned and worked. This change makes the rebuilt wide-column value pinnable like a whole-value blob: it is moved into a heap buffer and pinned into resolved_pinned_value_, whose cleanup ResetState() hands to the PinnedIteratorsManager across repositioning so the value survives backward iteration. The now-redundant resolved_value_ member is removed and value()/IsValuePinned() simplified. The Get/MultiGet resolution path (MaybeResolveEmbeddedValue) is unchanged. The db_stress test_backward_scan workaround is removed now that backward iteration over embedded blobs (including wide-column entities) is supported. Test Plan: Added DBBlobIndexTest.EmbeddedBlobBackwardIteration: ingests an embedded-blob SST with whole-value blobs plus a mixed wide-column entity (inline default column + embedded same-file blob column), then iterates backward via SeekToLast/Prev and SeekForPrev, verifying values and columns. Verified it fails before the fix (NotSupported at the entity) and passes after. Full db_blob_index_test and sst_file_reader_test suites pass. make format-auto clean. Direct db_stress run with ingest_external_file_with_embedded_blobs=1, use_put_entity_one_in=3, test_backward_scan=1, iterpercent>0 ran with no divergence/NotSupported/ assert.
✅ clang-tidy: No findings on changed linesCompleted in 181.8s. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 517e5a9 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 517e5a9 SummaryThis PR correctly fixes backward iteration over embedded blob SSTs by making rebuilt wide-column entity values pinnable via a heap-allocated buffer pinned into No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Heap allocation per wide-column value resolution on iterator path --
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (via DBIter) | YES - pinning is orthogonal to visibility | Safe |
| ReadOnly DB | YES | YES - read path only | Safe |
| User-defined timestamps | YES | YES - pinning is independent of timestamp handling | Safe |
| FIFO/Universal compaction | N/A - ingested SSTs, not compaction-produced | N/A | Safe |
| No PinnedIteratorsManager | Possible in forward-only iteration | Safe - value_is_pinned_ check requires PIM | Handled |
| Concurrent writers | N/A - iterator operates on immutable SST | N/A | Safe |
| BlobDB (separate-file blobs) | Code exits early (not same-file) | YES | Safe |
Assumption Stress Test
Claim: "PinSlice assert(!pinned_) will never fire"
- Precondition:
resolved_pinned_value_must not already be pinned when PinSlice is called. - For whole-value blob path: MaybeResolveEmbeddedValue pins directly into resolved_pinned_value_, then
!value_pinnedis false, so PinSlice is NOT called. Correct. - For wide-column path: MaybeResolveEmbeddedValue does NOT pin into resolved_pinned_value_ (value_pinned=false), so PinSlice is called on an unpinned PinnableSlice. Correct.
- Guard:
value_prepared_prevents double-materialization. Safe.
Claim: "No double-free of heap string"
- When PIM active: DelegateCleanupsTo moves cleanup to PIM, clearing resolved_pinned_value_'s cleanup list. Then Reset() calls DoCleanup() which is a no-op (function==nullptr). String deleted when PIM::ReleasePinnedData() runs.
- When PIM inactive: DelegateCleanupsTo is NOT called. Reset() calls DoCleanup() which runs ReleaseResolvedValueBuffer, deleting the string. Correct.
- Verified via Cleanable::DelegateCleanupsTo implementation (cleanable.cc:51-66): after delegation, source's cleanup_.function is set to nullptr.
Claim: "Slice returned by value() remains valid during backward iteration"
- DBIter saves
pinned_value_ = iter_.value()which is a Slice pointing to the heap string's data. - On reposition, ResetState() delegates the heap string's cleanup to PIM before Reset().
- PIM keeps the cleanup alive until ReleasePinnedData() is called at the end of FindValueForCurrentKey.
- The Slice in pinned_value_ remains valid throughout the backward scan. Correct.
Positive Observations
- The fix is minimal and well-targeted -- it addresses the exact issue (wide-column values not being pinnable) without overengineering.
- The
resolved_value_removal is a net simplification of the class. - The test directly verifies the previously-failing scenario (backward iteration over wide-column entity with embedded blob columns).
- The db_stress workaround removal is clean -- the comment explaining why it was needed is now obsolete.
- The cleanup callback pattern (static function + heap-allocated buffer) follows established RocksDB conventions for PinnableSlice lifecycle management.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D110271453. |
Summary:
D110140135/#14896 worked around a db_stress failure by disabling test_backward_scan whenever embedded-blob SSTs are ingested. The real limitation was in EmbeddedBlobResolvingIterator: DBIter requires the underlying iterator's value to be pinned for backward iteration (Prev/SeekForPrev), and rebuilt wide-column entity values (with same-file blob columns) were stored in an iterator-owned std::string that was cleared on every reposition, so IsValuePinned() returned false and DBIter returned NotSupported (or tripped the IsValuePinned() assertion in FindValueForCurrentKeyUsingSeek). Whole-value same-file blobs were already pinned and worked.
This change makes the rebuilt wide-column value pinnable like a whole-value blob: it is moved into a heap buffer and pinned into resolved_pinned_value_, whose cleanup ResetState() hands to the PinnedIteratorsManager across repositioning so the value survives backward iteration. The now-redundant resolved_value_ member is removed and value()/IsValuePinned() simplified. The Get/MultiGet resolution path (MaybeResolveEmbeddedValue) is unchanged. The db_stress test_backward_scan workaround is removed now that backward iteration over embedded blobs (including wide-column entities) is supported.
Test Plan:
Added DBBlobIndexTest.EmbeddedBlobBackwardIteration: ingests an embedded-blob SST with whole-value blobs plus a mixed wide-column entity (inline default column + embedded same-file blob column), then iterates backward via SeekToLast/Prev and SeekForPrev, verifying values and columns. Verified it fails before the fix (NotSupported at the entity) and passes after.
Full db_blob_index_test and sst_file_reader_test suites pass. make format-auto clean. Direct db_stress run with
ingest_external_file_with_embedded_blobs=1, use_put_entity_one_in=3, test_backward_scan=1, iterpercent>0 ran with no divergence/NotSupported/ assert.