Add file ingestion histograms#14814
Conversation
✅ clang-tidy: No findings on changed linesCompleted in 259.4s. |
Adds a public `Statistics` histogram, `INGEST_EXTERNAL_FILE_TIME` (`"rocksdb.ingest.external.file.micros"`), recording the end-to-end latency in microseconds of each `IngestExternalFile(s)` call. Ingestion timing was previously only available through per-thread `perf_context` counters, which require setting a `PerfLevel` and are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards. It is recorded with an RAII `StopWatch` at the top of `DBImpl::IngestExternalFiles` -- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per the `statistics.h` requirement; the C API needs no change. New `ExternalSSTFileTest.IngestionTimingHistogram` ingests two external files with statistics enabled and asserts the histogram count goes 1 -> 2 with `max > 0`, confirming per-call (not per-CF) recording. The existing `StatisticsTest.SanityHistograms` covers enum/name-map sync. - `make -j192 statistics_test external_sst_file_test` -- build OK - `./statistics_test --gtest_filter='*Sanity*'` -- 2 PASSED - `./external_sst_file_test --gtest_filter='*IngestionTimingHistogram*'` -- PASSED, and 5/5 under `COERCE_CONTEXT_SWITCH=1 --gtest_repeat=5` - `make format-auto` / `make check-sources` -- clean ghstack-source-id: 50abdc2 Pull-Request: #14814
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI reached the early-review threshold — reviewing commit a63aa8f ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI reached the early-review threshold — reviewing commit a63aa8f SummaryClean, well-structured change that adds a new No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMNone. 🟢 LOW / NITL1. Missing test for error-path histogram recording --
|
| Context | Executes? | Safe? | Notes |
|---|---|---|---|
| WritePreparedTxnDB | Yes | Yes | Inherits DBImpl, timing-only change |
| ReadOnly DB | No | N/A | Ingestion not supported |
| SecondaryInstance | No | N/A | Ingestion not supported |
| User-defined timestamps | Yes | Yes | Timing only, no key comparison |
| FIFO/Universal compaction | Yes | Yes | Timing only |
| Remote compaction | No | N/A | Not a compaction path |
StopWatch zero-cost verification:
- When
stats_ == nullptr:stats_enabled_= false,start_time_= 0 (noNowMicros()call), destructor is a no-op. - When stats level <=
kExceptTimers: same zero-cost behavior. - When enabled: two
NowMicros()calls (construction + destruction), negligible for a cold-path operation.
PERF_TIMER_GUARD interaction: Both are independent RAII objects. PERF_TIMER_GUARD writes to thread-local perf_context, StopWatch writes to shared Statistics. No conflict.
Positive Observations
- Correct use of RAII pattern ensures all return paths are covered without code duplication.
- Comment in the source code clearly documents the design intent (one sample per call, not per CF).
- Java bindings are kept in sync with correct sequential byte value (0x43).
- Release note is concise and appropriate for external consumption.
- Test follows existing histogram test patterns (
FLUSH_TIMEinflush_job_test.cc).
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI reached the early-review threshold — reviewing commit 25eb468 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI reached the early-review threshold — reviewing commit 25eb468 SummaryClean, well-implemented addition of a new histogram following established RocksDB patterns. The StopWatch RAII placement is correct, Java bindings are properly synchronized, and the enum ordering is maintained. No correctness or compatibility issues found. High-severity findings (0): Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Test does not cover failed ingestion or stats-disabled paths --
|
| Check | Result |
|---|---|
| Enum before HISTOGRAM_ENUM_MAX | Correct |
| HistogramsNameMap ordering matches enum | Correct |
| SanityHistograms test will pass | Yes -- both size and positional ordering maintained |
| Java JNI byte 0x43 unique | Yes -- next available after 0x42 |
| Java bidirectional mapping | Correctly implemented |
| C API changes needed | No -- C API uses integer types |
| StopWatch null-safe | Yes -- checks statistics != nullptr |
| StopWatch stats-level gated | Yes -- requires > kExceptTimers |
| All return paths covered (RAII) | Yes |
| Thread safety | OK -- StopWatch is stack-local; Statistics is thread-safe |
| IngestExternalFile delegates to IngestExternalFiles | Verified at db_impl.cc:6664-6672 |
Positive Observations
- Follows the exact same
StopWatchpattern used byDB_GET,DB_WRITE,DB_MULTIGET - RAII design ensures timing is captured on all exit paths including early error returns
- No performance concern -- ingestion is not a hot path, and StopWatch is zero-cost when stats are disabled
- Java bindings properly updated in all three required locations
- Release note is concise and appropriate
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 25eb468 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 25eb468 SummaryClean, well-structured PR that follows established RocksDB patterns for adding a histogram. The StopWatch placement, enum ordering, Java bindings, name mapping, and test are all correct. No correctness, performance, or compatibility issues found. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMNone. 🟢 LOW / NITL1. Consider testing multi-CF ingestion count --
|
| Context | Executes? | Safe? | Notes |
|---|---|---|---|
| WritePreparedTxnDB | Yes | Yes | Timing only, no behavioral impact |
| ReadOnly DB | No | N/A | Returns NotSupported before reaching IngestExternalFiles |
| Secondary Instance | No | N/A | Returns NotSupported |
| User-defined timestamps | Yes | Yes | Timing only |
| Concurrent ingestions | Yes | Yes | StopWatch is stack-local RAII, thread-safe |
Assumption verification:
- "Null-safe": Verified -- StopWatch constructor checks
statistics &&before any dereference (stop_watch.h:26,29,34) - "No cost when disabled": Verified --
start_time_is 0 when stats disabled, no clock calls (stop_watch.h:42-43) - "One sample per call": Verified -- StopWatch at function entry,
IngestExternalFile(singular) delegates toIngestExternalFiles(plural) atdb_impl.cc:6672 - "Covers all return paths": Verified -- RAII destructor fires on every exit
StatisticsImpl array sizing: HISTOGRAM_ENUM_MAX change is handled automatically via INTERNAL_HISTOGRAM_ENUM_MAX (statistics_impl.h:38-39). The SanityHistograms test validates enum-to-name-map consistency.
Positive Observations
- Correct use of RAII
StopWatchpattern, consistent withDB_GET,DB_WRITE,COMPACTION_TIME, and other existing histograms - Enum naming convention (
*_TIME->*.microsstring) matches established patterns - Java binding hex value
0x43is correctly sequenced - Test is deterministic and follows
ExternalSSTFileTestconventions - Release note is appropriately concise for external users
- No performance concern: ingestion is a cold path, and StopWatch has zero overhead when statistics are disabled
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Stack from ghstack (oldest at bottom):
Summary
Adds a public
Statisticshistogram,INGEST_EXTERNAL_FILE_TIME("rocksdb.ingest.external.file.micros"), recording the end-to-end latency in microseconds of eachIngestExternalFile(s)call. Ingestion timing was previously only available through per-threadperf_contextcounters, which require setting aPerfLeveland are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards.It is recorded with an RAII
StopWatchat the top ofDBImpl::IngestExternalFiles-- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per thestatistics.hrequirement; the C API needs no change.Test Plan
New
ExternalSSTFileTest.IngestionTimingHistogramingests two external files with statistics enabled and asserts the histogram count goes 1 -> 2 withmax > 0, confirming per-call (not per-CF) recording. The existingStatisticsTest.SanityHistogramscovers enum/name-map sync.make -j192 statistics_test external_sst_file_test-- build OK./statistics_test --gtest_filter='*Sanity*'-- 2 PASSED./external_sst_file_test --gtest_filter='*IngestionTimingHistogram*'-- PASSED, and 5/5 underCOERCE_CONTEXT_SWITCH=1 --gtest_repeat=5make format-auto/make check-sources-- clean