Skip to content

Add file ingestion histograms#14814

Closed
joshkang97 wants to merge 3 commits into
gh/joshkang97/3/basefrom
gh/joshkang97/3/head
Closed

Add file ingestion histograms#14814
joshkang97 wants to merge 3 commits into
gh/joshkang97/3/basefrom
gh/joshkang97/3/head

Conversation

@joshkang97

@joshkang97 joshkang97 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Summary

Adds a public Statistics histogram, INGEST_EXTERNAL_FILE_TIME ("rocksdb.ingest.external.file.micros"), recording the end-to-end latency in microseconds of each IngestExternalFile(s) call. Ingestion timing was previously only available through per-thread perf_context counters, which require setting a PerfLevel and are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards.

It is recorded with an RAII StopWatch at the top of DBImpl::IngestExternalFiles -- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per the statistics.h requirement; the C API needs no change.

Test Plan

New ExternalSSTFileTest.IngestionTimingHistogram ingests two external files with statistics enabled and asserts the histogram count goes 1 -> 2 with max > 0, confirming per-call (not per-CF) recording. The existing StatisticsTest.SanityHistograms covers enum/name-map sync.

  • make -j192 statistics_test external_sst_file_test -- build OK
  • ./statistics_test --gtest_filter='*Sanity*' -- 2 PASSED
  • ./external_sst_file_test --gtest_filter='*IngestionTimingHistogram*' -- PASSED, and 5/5 under COERCE_CONTEXT_SWITCH=1 --gtest_repeat=5
  • make format-auto / make check-sources -- clean
[ghstack-poisoned]
@meta-cla meta-cla Bot added the CLA Signed label Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 259.4s.

[ghstack-poisoned]
joshkang97 added a commit that referenced this pull request Jun 3, 2026
Adds a public `Statistics` histogram, `INGEST_EXTERNAL_FILE_TIME` (`"rocksdb.ingest.external.file.micros"`), recording the end-to-end latency in microseconds of each `IngestExternalFile(s)` call. Ingestion timing was previously only available through per-thread `perf_context` counters, which require setting a `PerfLevel` and are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards.

It is recorded with an RAII `StopWatch` at the top of `DBImpl::IngestExternalFiles` -- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per the `statistics.h` requirement; the C API needs no change.

New `ExternalSSTFileTest.IngestionTimingHistogram` ingests two external files with statistics enabled and asserts the histogram count goes 1 -> 2 with `max > 0`, confirming per-call (not per-CF) recording. The existing `StatisticsTest.SanityHistograms` covers enum/name-map sync.

- `make -j192 statistics_test external_sst_file_test` -- build OK
- `./statistics_test --gtest_filter='*Sanity*'` -- 2 PASSED
- `./external_sst_file_test --gtest_filter='*IngestionTimingHistogram*'` -- PASSED, and 5/5 under `COERCE_CONTEXT_SWITCH=1 --gtest_repeat=5`
- `make format-auto` / `make check-sources` -- clean

ghstack-source-id: 50abdc2
Pull-Request: #14814
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit a63aa8f


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit a63aa8f


Summary

Clean, well-structured change that adds a new INGEST_EXTERNAL_FILE_TIME histogram following established RocksDB patterns. The implementation is correct: RAII StopWatch is null-safe, self-gating on stats level, stack-allocated, and covers all return paths. File ingestion is a cold path, so the timing overhead is negligible. Java bindings, enum ordering, and name mapping are all correct.

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

None.

🟢 LOW / NIT

L1. Missing test for error-path histogram recording -- db/external_sst_file_test.cc
  • Issue: The test only covers successful ingestion paths. Since StopWatch uses RAII, the histogram is also recorded on error paths (e.g., invalid arguments, null column family). A test verifying the count increments even on a failed ingestion would document this intentional behavior.
  • Root cause: The current test focuses on the happy path only.
  • Suggested fix: Consider adding an assertion that the histogram count increments even when IngestExternalFile returns a non-OK status. This is optional since the RAII behavior is guaranteed by StopWatch's implementation, and existing histogram tests in the codebase (e.g., FLUSH_TIME tests) follow the same happy-path-only pattern.
L2. Histogram name vs enum name slight mismatch -- monitoring/statistics.cc
  • Issue: The enum is INGEST_EXTERNAL_FILE_TIME but the string name is rocksdb.ingest.external.file.micros. The suffix differs (TIME vs micros). This is not a bug -- other histograms follow the same pattern (e.g., COMPACTION_TIME maps to rocksdb.compaction.times.micros, FLUSH_TIME maps to rocksdb.flush.write.time). The naming is consistent with existing conventions.
  • Suggested fix: No change needed. Just noting for awareness.

Cross-Component Analysis

Context Executes? Safe? Notes
WritePreparedTxnDB Yes Yes Inherits DBImpl, timing-only change
ReadOnly DB No N/A Ingestion not supported
SecondaryInstance No N/A Ingestion not supported
User-defined timestamps Yes Yes Timing only, no key comparison
FIFO/Universal compaction Yes Yes Timing only
Remote compaction No N/A Not a compaction path

StopWatch zero-cost verification:

  • When stats_ == nullptr: stats_enabled_ = false, start_time_ = 0 (no NowMicros() call), destructor is a no-op.
  • When stats level <= kExceptTimers: same zero-cost behavior.
  • When enabled: two NowMicros() calls (construction + destruction), negligible for a cold-path operation.

PERF_TIMER_GUARD interaction: Both are independent RAII objects. PERF_TIMER_GUARD writes to thread-local perf_context, StopWatch writes to shared Statistics. No conflict.

Positive Observations

  • Correct use of RAII pattern ensures all return paths are covered without code duplication.
  • Comment in the source code clearly documents the design intent (one sample per call, not per CF).
  • Java bindings are kept in sync with correct sequential byte value (0x43).
  • Release note is concise and appropriate for external consumption.
  • Test follows existing histogram test patterns (FLUSH_TIME in flush_job_test.cc).

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit 25eb468


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI reached the early-review threshold — reviewing commit 25eb468


Summary

Clean, well-implemented addition of a new histogram following established RocksDB patterns. The StopWatch RAII placement is correct, Java bindings are properly synchronized, and the enum ordering is maintained. No correctness or compatibility issues found.

High-severity findings (0):
No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Test does not cover failed ingestion or stats-disabled paths -- db/external_sst_file_test.cc:301
  • Issue: The test only exercises the happy path (successful ingestion with StatsLevel::kAll). It does not verify that (a) the histogram still records timing on a failed ingestion, or (b) the histogram is not recorded when stats level is kExceptTimers or lower.
  • Root cause: Missing edge case test coverage.
  • Suggested fix: Add a test case that triggers a failed ingestion (e.g., comparator mismatch) and asserts hd.count == 1 with hd.max > 0. Add another test case with StatsLevel::kExceptTimers and assert hd.count == 0.
M2. Test does not cover multi-CF ingestion (per-call vs per-CF claim) -- db/external_sst_file_test.cc:301
  • Issue: The PR claims "one sample per call, not per column family" but the test only uses single-CF ingestion via GenerateAndAddExternalFile. The multi-CF path through IngestExternalFiles with multiple args is not tested.
  • Root cause: The test comment references this behavior but doesn't verify it.
  • Suggested fix: Add a test case that calls IngestExternalFiles with 2+ column families and asserts exactly one histogram sample is recorded.

🟢 LOW / NIT

L1. Minor naming pattern inconsistency -- monitoring/statistics.cc:389
  • Issue: The enum name is INGEST_EXTERNAL_FILE_TIME but the string name is "rocksdb.ingest.external.file.micros". Other DB-level operations use a "rocksdb.db.*" prefix (e.g., FLUSH_TIME maps to "rocksdb.db.flush.micros", DB_GET maps to "rocksdb.db.get.micros").
  • Suggested fix: Consider "rocksdb.db.ingest.external.file.micros" for consistency with the db.* namespace. This is a public-facing string so it's worth getting right before release. Not blocking.

Cross-Component Analysis

Check Result
Enum before HISTOGRAM_ENUM_MAX Correct
HistogramsNameMap ordering matches enum Correct
SanityHistograms test will pass Yes -- both size and positional ordering maintained
Java JNI byte 0x43 unique Yes -- next available after 0x42
Java bidirectional mapping Correctly implemented
C API changes needed No -- C API uses integer types
StopWatch null-safe Yes -- checks statistics != nullptr
StopWatch stats-level gated Yes -- requires > kExceptTimers
All return paths covered (RAII) Yes
Thread safety OK -- StopWatch is stack-local; Statistics is thread-safe
IngestExternalFile delegates to IngestExternalFiles Verified at db_impl.cc:6664-6672

Positive Observations

  • Follows the exact same StopWatch pattern used by DB_GET, DB_WRITE, DB_MULTIGET
  • RAII design ensures timing is captured on all exit paths including early error returns
  • No performance concern -- ingestion is not a hot path, and StopWatch is zero-cost when stats are disabled
  • Java bindings properly updated in all three required locations
  • Release note is concise and appropriate

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 25eb468


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 25eb468


Summary

Clean, well-structured PR that follows established RocksDB patterns for adding a histogram. The StopWatch placement, enum ordering, Java bindings, name mapping, and test are all correct. No correctness, performance, or compatibility issues found.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

None.

🟢 LOW / NIT

L1. Consider testing multi-CF ingestion count -- db/external_sst_file_test.cc
  • Issue: The test verifies histogram count increments with two single-CF ingestion calls, but does not test a single IngestExternalFiles call with multiple column families to confirm exactly one sample per call (not per CF).
  • Suggested fix: Add a test case that calls IngestExternalFiles with 2+ CFs and asserts hd.count == 1 for that single call. This would directly validate the "one sample per call, not per CF" claim in the PR description and comments.
L2. Pre-existing: Missing Java bindings for NUM_OP_PER_TRANSACTION and MULTISCAN_PREPARE_ITERATORS
  • Issue: These two histograms exist in the C++ Histograms enum (statistics.h:753,756) but are missing from java/rocksjni/portal.h and java/src/main/java/org/rocksdb/HistogramType.java. This is a pre-existing issue not introduced by this PR.
  • Suggested fix: Address in a separate PR to keep this one focused.

Cross-Component Analysis

Context Executes? Safe? Notes
WritePreparedTxnDB Yes Yes Timing only, no behavioral impact
ReadOnly DB No N/A Returns NotSupported before reaching IngestExternalFiles
Secondary Instance No N/A Returns NotSupported
User-defined timestamps Yes Yes Timing only
Concurrent ingestions Yes Yes StopWatch is stack-local RAII, thread-safe

Assumption verification:

  • "Null-safe": Verified -- StopWatch constructor checks statistics && before any dereference (stop_watch.h:26,29,34)
  • "No cost when disabled": Verified -- start_time_ is 0 when stats disabled, no clock calls (stop_watch.h:42-43)
  • "One sample per call": Verified -- StopWatch at function entry, IngestExternalFile (singular) delegates to IngestExternalFiles (plural) at db_impl.cc:6672
  • "Covers all return paths": Verified -- RAII destructor fires on every exit

StatisticsImpl array sizing: HISTOGRAM_ENUM_MAX change is handled automatically via INTERNAL_HISTOGRAM_ENUM_MAX (statistics_impl.h:38-39). The SanityHistograms test validates enum-to-name-map consistency.

Positive Observations

  • Correct use of RAII StopWatch pattern, consistent with DB_GET, DB_WRITE, COMPACTION_TIME, and other existing histograms
  • Enum naming convention (*_TIME -> *.micros string) matches established patterns
  • Java binding hex value 0x43 is correctly sequenced
  • Test is deterministic and follows ExternalSSTFileTest conventions
  • Release note is appropriately concise for external users
  • No performance concern: ingestion is a cold path, and StopWatch has zero overhead when statistics are disabled

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
[ghstack-poisoned]
@joshkang97 joshkang97 closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant