block GetCreationTimeOfOldestFile on async file open completion (#14723)#14723
block GetCreationTimeOfOldestFile on async file open completion (#14723)#14723joshkang97 wants to merge 1 commit into
Conversation
|
@joshkang97 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104285992. |
✅ clang-tidy: No findings on changed linesCompleted in 388.6s. |
…book#14723) Summary: ### Context `GetCreationTimeOfOldestFile()` assumed `max_open_files` = -1 meant every live SST had a pinned table reader. That is not true with `open_files_async`: recovery intentionally skips loading table files, `DB::Open()` returns, and `BGWorkAsyncFileOpen()` pins readers later. Any caller — e.g. fb_rocksdb's daily report at `FbRocksDb.cpp:1025` — invoking the API in the window between `DB::Open` returning and the background opener completing trips a debug assert. Reported in https://fb.workplace.com/groups/rocksdb/permalink/31668956482726231/. Removing the assert alone is insufficient. For legacy DBs whose manifest does not carry `file_creation_time`, `FileMetaData::TryGetFileCreationTime()` falls back to the pinned reader; with no reader, it returns `kUnknownFileCreationTime` and the function silently returns 0 (the "info unavailable" sentinel). The caller cannot distinguish "no info" from "raced with async open." ### Changes - Remove the invalid debug assert in `Version::GetCreationTimeOfOldestFile`. - Add a private helper `DBImpl::WaitForAsyncFileOpen()` that blocks on `bg_cv_` while `bg_async_file_open_state_ == kScheduled`. The synchronization machinery (`bg_async_file_open_state_` + `bg_cv_`) already exists — the destructor wait loop in `db_impl.cc:674-687` uses the same pattern. The helper is a no-op when `open_files_async = false`, and bails on `shutting_down_` so `DB::Close()` is not blocked by an in-flight caller. - Call `WaitForAsyncFileOpen()` at the top of `DBImpl::GetCreationTimeOfOldestFile()` (inside the `max_open_files == -1` branch). - Document the blocking behavior in `include/rocksdb/db.h`. - Replace the regression test with one that uses a `DBImpl::WaitForAsyncFileOpen::BeforeWait` sync point: spawn a thread that calls `GetCreationTimeOfOldestFile`, deterministically confirm it blocks inside the wait, release async open, confirm the caller wakes with the real value. ### Potential Followups (not included here) - Apply the same wait to `GetLiveFilesMetaData` and `GetColumnFamilyMetaData` — both zero out `oldest_ancester_time` / `file_creation_time` in the SST metadata they return during the async-open window (`db/version_set.cc:7877-7878`, `2090-2091`, `2172-2173`). - Address compaction-picker effects: TTL/periodic file selection (`db/version_set.cc:4039`, `4092-4094`), bottommost over-marking (`db/version_set.cc:4700-4701`), FIFO TTL/temperature pickers (`db/compaction/compaction_picker_fifo.cc:105-107`, `167-170`, `401-411`), and tiered-compaction output time inheritance (`db/compaction/compaction.cc:981`, `1000`). - Harden `FbRocksDb.cpp:1025` to check `status.ok()` instead of `status.code() != kNotSupported`. Differential Revision: D104285992
81c5a18 to
543039d
Compare
…book#14723) Summary: ### Context `GetCreationTimeOfOldestFile()` assumed `max_open_files` = -1 meant every live SST had a pinned table reader. That is not true with `open_files_async`: recovery intentionally skips loading table files, `DB::Open()` returns, and `BGWorkAsyncFileOpen()` pins readers later. Any caller — e.g. fb_rocksdb's daily report at `FbRocksDb.cpp:1025` — invoking the API in the window between `DB::Open` returning and the background opener completing trips a debug assert. Reported in https://fb.workplace.com/groups/rocksdb/permalink/31668956482726231/. Removing the assert alone is insufficient. For legacy DBs whose manifest does not carry `file_creation_time`, `FileMetaData::TryGetFileCreationTime()` falls back to the pinned reader; with no reader, it returns `kUnknownFileCreationTime` and the function silently returns 0 (the "info unavailable" sentinel). The caller cannot distinguish "no info" from "raced with async open." ### Changes - Remove the invalid debug assert in `Version::GetCreationTimeOfOldestFile`. - Add a private helper `DBImpl::WaitForAsyncFileOpen()` that blocks on `bg_cv_` while `bg_async_file_open_state_ == kScheduled`. The synchronization machinery (`bg_async_file_open_state_` + `bg_cv_`) already exists — the destructor wait loop in `db_impl.cc:674-687` uses the same pattern. The helper is a no-op when `open_files_async = false`, and bails on `shutting_down_` so `DB::Close()` is not blocked by an in-flight caller. - Call `WaitForAsyncFileOpen()` at the top of `DBImpl::GetCreationTimeOfOldestFile()` (inside the `max_open_files == -1` branch). - Document the blocking behavior in `include/rocksdb/db.h`. - Replace the regression test with one that uses a `DBImpl::WaitForAsyncFileOpen::BeforeWait` sync point: spawn a thread that calls `GetCreationTimeOfOldestFile`, deterministically confirm it blocks inside the wait, release async open, confirm the caller wakes with the real value. ### Potential Followups (not included here) - Apply the same wait to `GetLiveFilesMetaData` and `GetColumnFamilyMetaData` — both zero out `oldest_ancester_time` / `file_creation_time` in the SST metadata they return during the async-open window (`db/version_set.cc:7877-7878`, `2090-2091`, `2172-2173`). - Address compaction-picker effects: TTL/periodic file selection (`db/version_set.cc:4039`, `4092-4094`), bottommost over-marking (`db/version_set.cc:4700-4701`), FIFO TTL/temperature pickers (`db/compaction/compaction_picker_fifo.cc:105-107`, `167-170`, `401-411`), and tiered-compaction output time inheritance (`db/compaction/compaction.cc:981`, `1000`). - Harden `FbRocksDb.cpp:1025` to check `status.ok()` instead of `status.code() != kNotSupported`. Differential Revision: D104285992
543039d to
d094290
Compare
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit d094290 SummaryThis PR correctly addresses the debug assert crash and race in High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. Missing API documentation update —
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (inherits DBImpl) | YES | None |
| ReadOnly DB | YES (supports open_files_async) | YES | None — WaitForAsyncFileOpen works the same |
| SecondaryInstance | NO (no open_files_async support) | N/A | None |
| FIFO compaction | Not directly — FIFO reads creation_time via table properties, not this API | N/A | Pre-existing separate issue (mentioned in PR follow-ups) |
| Concurrent callers | YES | YES — each caller gets its own stack-local once_flag; WaitForAsyncFileOpen is thread-safe (acquires mutex_) | None |
| Shutdown during wait | YES | Partially — wait exits but retry still runs | See M2 |
Debated findings resolved:
-
Multi-CF retry logic (initially flagged as HIGH by 5+ agents): After debate, determined to be a false positive.
WaitForAsyncFileOpen()completes ALL CFs' file opening globally. When the loop proceeds to CF2, its firstversion->GetCreationTimeOfOldestFile(&ctime)call sees the already-pinned readers and returns real values. Thecall_onceonly needs to retry the CF that triggered the wait. If async opening failed for some CFs, returning 0 is correct (creation time is genuinely unknown). -
SuperVersion lifetime during wait (initially flagged as HIGH): After verification, determined to be safe.
GetAndRefSuperVersion()increments the ref count, preventing the Version and its FileMetaData from being freed during the wait. Theversionpointer remains valid. -
Test 1 sync point logic (initially flagged as HIGH): After verification, determined to be correct. For modern DBs,
TryGetFileCreationTime()returns the manifest value directly (not 0), so thectime == 0condition is false andWaitForAsyncFileOpen()is never called. The sync point is never reached,waitedstays false, and the test correctly validates this behavior. -
Thread safety of concurrent callers (initially flagged as HIGH): After analysis, determined to be safe. Each call to
GetCreationTimeOfOldestFile()creates its own stack-localstd::once_flag.WaitForAsyncFileOpen()acquiresmutex_properly and the wait loop correctly handles spurious wakeups by re-checking bothbg_async_file_open_state_andshutting_down_.
Positive Observations
- The fix correctly identifies that modern DBs (manifest carries
file_creation_time) never need the wait — the optimization to only wait whenctime == 0is well-designed. WaitForAsyncFileOpen()follows the established pattern from the destructor wait loop (db_impl.cc:774-783), ensuring consistency.- The
shutting_down_check prevents the wait from blockingDB::Close(). - Test 2 uses a well-designed sync point dependency chain to deterministically test the blocking behavior with a simulated legacy DB.
- The release note is clear and at the right level for external consumption.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…book#14723) Summary: ### Context `GetCreationTimeOfOldestFile()` assumed `max_open_files` = -1 meant every live SST had a pinned table reader. That is not true with `open_files_async`: recovery intentionally skips loading table files, `DB::Open()` returns, and `BGWorkAsyncFileOpen()` pins readers later. Any caller — e.g. fb_rocksdb's daily report at `FbRocksDb.cpp:1025` — invoking the API in the window between `DB::Open` returning and the background opener completing trips a debug assert. Reported in https://fb.workplace.com/groups/rocksdb/permalink/31668956482726231/. Removing the assert alone is insufficient. For legacy DBs whose manifest does not carry `file_creation_time`, `FileMetaData::TryGetFileCreationTime()` falls back to the pinned reader; with no reader, it returns `kUnknownFileCreationTime` and the function silently returns 0 (the "info unavailable" sentinel). The caller cannot distinguish "no info" from "raced with async open." ### Changes - Remove the invalid debug assert in `Version::GetCreationTimeOfOldestFile`. - Add a private helper `DBImpl::WaitForAsyncFileOpen()` that blocks on `bg_cv_` while `bg_async_file_open_state_ == kScheduled`. The synchronization machinery (`bg_async_file_open_state_` + `bg_cv_`) already exists — the destructor wait loop in `db_impl.cc:674-687` uses the same pattern. The helper is a no-op when `open_files_async = false`, and bails on `shutting_down_` so `DB::Close()` is not blocked by an in-flight caller. - Call `WaitForAsyncFileOpen()` at the top of `DBImpl::GetCreationTimeOfOldestFile()` (inside the `max_open_files == -1` branch). - Document the blocking behavior in `include/rocksdb/db.h`. - Replace the regression test with one that uses a `DBImpl::WaitForAsyncFileOpen::BeforeWait` sync point: spawn a thread that calls `GetCreationTimeOfOldestFile`, deterministically confirm it blocks inside the wait, release async open, confirm the caller wakes with the real value. ### Potential Followups (not included here) - Apply the same wait to `GetLiveFilesMetaData` and `GetColumnFamilyMetaData` — both zero out `oldest_ancester_time` / `file_creation_time` in the SST metadata they return during the async-open window (`db/version_set.cc:7877-7878`, `2090-2091`, `2172-2173`). - Address compaction-picker effects: TTL/periodic file selection (`db/version_set.cc:4039`, `4092-4094`), bottommost over-marking (`db/version_set.cc:4700-4701`), FIFO TTL/temperature pickers (`db/compaction/compaction_picker_fifo.cc:105-107`, `167-170`, `401-411`), and tiered-compaction output time inheritance (`db/compaction/compaction.cc:981`, `1000`). - Harden `FbRocksDb.cpp:1025` to check `status.ok()` instead of `status.code() != kNotSupported`. Differential Revision: D104285992
d094290 to
98b3995
Compare
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 98b3995 SummarySound bugfix that correctly addresses the High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Consider checking background error after wait —
|
| Context | Safe? | Notes |
|---|---|---|
| Shutdown (CancelAllBackgroundWork) | Yes | shutting_down_ set under mutex_ with bg_cv_.SignalAll() — wait always terminates |
| Destructor wait loop | Yes | Both use same bg_cv_; no priority inversion |
| DBImplReadOnly / DBImplSecondary | Yes | MarkAsyncFileOpenNotNeeded() sets state to kComplete; wait is no-op |
| WritePreparedTxnDB | Yes | Inherits DBImpl; no impact on creation time queries |
| Multiple concurrent callers | Yes | Each gets own stack-local once_flag; all wait on same bg_cv_ (standard pattern) |
| BGWorkAsyncFileOpen failure | Safe but imprecise | State transitions to kComplete regardless; API returns 0 (see M1) |
Positive Observations
- Correct synchronization: Reuses established
mutex_+bg_cv_pattern from destructor wait loop (db_impl.cc:774-783) — proven reliable. - Lazy evaluation: Modern DBs never block; only legacy DBs without manifest
file_creation_timetrigger the wait. Good performance preservation. - FileMetaData sharing insight: Correctly leverages the fact that
FileMetaDataobjects are shared between versions, so readers pinned byBGWorkAsyncFileOpenare visible through any version referencing the same metadata. - Test design: Tests use deterministic sync point dependencies rather than sleep-based timing, following RocksDB best practices.
- Surgical scope: Minimal change footprint — only touches the affected code paths without architectural disruption.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
This pull request has been merged in e07ccc3. |
Summary:
Context
GetCreationTimeOfOldestFile()assumedmax_open_files= -1 meant every live SST had a pinned table reader. That is not true withopen_files_async: recovery intentionally skips loading table files,DB::Open()returns, andBGWorkAsyncFileOpen()pins readers later. Any caller — e.g. fb_rocksdb's daily report atFbRocksDb.cpp:1025— invoking the API in the window betweenDB::Openreturning and the background opener completing trips a debug assert. Reported in https://fb.workplace.com/groups/rocksdb/permalink/31668956482726231/.Removing the assert alone is insufficient. For legacy DBs whose manifest does not carry
file_creation_time,FileMetaData::TryGetFileCreationTime()falls back to the pinned reader; with no reader, it returnskUnknownFileCreationTimeand the function silently returns 0 (the "info unavailable" sentinel). The caller cannot distinguish "no info" from "raced with async open."Changes
Version::GetCreationTimeOfOldestFile.DBImpl::WaitForAsyncFileOpen()that blocks onbg_cv_whilebg_async_file_open_state_ == kScheduled. The synchronization machinery (bg_async_file_open_state_+bg_cv_) already exists — the destructor wait loop indb_impl.cc:674-687uses the same pattern. The helper is a no-op whenopen_files_async = false, and bails onshutting_down_soDB::Close()is not blocked by an in-flight caller.WaitForAsyncFileOpen()at the top ofDBImpl::GetCreationTimeOfOldestFile()(inside themax_open_files == -1branch).include/rocksdb/db.h.DBImpl::WaitForAsyncFileOpen::BeforeWaitsync point: spawn a thread that callsGetCreationTimeOfOldestFile, deterministically confirm it blocks inside the wait, release async open, confirm the caller wakes with the real value.Potential Followups (not included here)
GetLiveFilesMetaDataandGetColumnFamilyMetaData— both zero outoldest_ancester_time/file_creation_timein the SST metadata they return during the async-open window (db/version_set.cc:7877-7878,2090-2091,2172-2173).db/version_set.cc:4039,4092-4094), bottommost over-marking (db/version_set.cc:4700-4701), FIFO TTL/temperature pickers (db/compaction/compaction_picker_fifo.cc:105-107,167-170,401-411), and tiered-compaction output time inheritance (db/compaction/compaction.cc:981,1000).FbRocksDb.cpp:1025to checkstatus.ok()instead ofstatus.code() != kNotSupported.Differential Revision: D104285992