Skip to content

Add file ingestion histograms (#14836)#14836

Closed
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D107721260
Closed

Add file ingestion histograms (#14836)#14836
joshkang97 wants to merge 1 commit into
facebook:mainfrom
joshkang97:export-D107721260

Conversation

@joshkang97

@joshkang97 joshkang97 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary:

Adds a public Statistics histogram, INGEST_EXTERNAL_FILE_TIME ("rocksdb.ingest.external.file.micros"), recording the end-to-end latency in microseconds of each IngestExternalFile(s) call. Ingestion timing was previously only available through per-thread perf_context counters, which require setting a PerfLevel and are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards.

It is recorded with an RAII StopWatch at the top of DBImpl::IngestExternalFiles -- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per the statistics.h requirement; the C API needs no change.

Reviewed By: xingbowang

Differential Revision: D107721260

@joshkang97 joshkang97 force-pushed the export-D107721260 branch from f2dbd3e to 75aae94 Compare June 8, 2026 23:48
@meta-cla meta-cla Bot added the CLA Signed label Jun 8, 2026
@meta-codesync

meta-codesync Bot commented Jun 8, 2026

Copy link
Copy Markdown

@joshkang97 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107721260.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 258.4s.

Summary:

Adds a public `Statistics` histogram, `INGEST_EXTERNAL_FILE_TIME` (`"rocksdb.ingest.external.file.micros"`), recording the end-to-end latency in microseconds of each `IngestExternalFile(s)` call. Ingestion timing was previously only available through per-thread `perf_context` counters, which require setting a `PerfLevel` and are not aggregated, so there was no process-wide latency distribution (p50/p99/max) for dashboards.

It is recorded with an RAII `StopWatch` at the top of `DBImpl::IngestExternalFiles` -- one sample per call (not per column family), covering all return paths. It is null-safe and self-gating on the stats level, so there is no cost when statistics are off, and ingestion is not a hot path. Java bindings are kept in sync per the `statistics.h` requirement; the C API needs no change.

Reviewed By: xingbowang

Differential Revision: D107721260
@meta-codesync meta-codesync Bot changed the title Add file ingestion histograms Jun 9, 2026
@joshkang97 joshkang97 force-pushed the export-D107721260 branch from 75aae94 to fa4d9ea Compare June 9, 2026 00:20
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit fa4d9ea


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit fa4d9ea


Summary

Clean, well-scoped addition of two histogram metrics for IngestExternalFiles latency. The implementation is correct, thread-safe, and has no performance concerns. No high-severity issues found.

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

No high-severity findings.

🟡 MEDIUM

M1. Missing test for kExceptTimers stats level — db/external_sst_file_test.cc
  • Issue: The test only verifies StatsLevel::kAll (timers enabled) but does not verify that histograms are NOT recorded when the stats level is kExceptTimers or lower. The implementation gates recording on stats_->get_stats_level() > StatsLevel::kExceptTimers, but this condition is never tested.
  • Suggested fix: Add a test case that sets StatsLevel::kExceptTimers, performs a successful ingestion, and asserts histogram counts remain 0.
M2. Missing test for multi-CF ingestion — db/external_sst_file_test.cc
  • Issue: The test comment says "One sample per call (not per CF)" but this is only tested with single-CF ingestion calls via IngestExternalFile. The code records timing in IngestExternalFiles (plural), which handles multi-CF ingestion. No test validates that a single multi-CF call produces exactly one sample per histogram.
  • Suggested fix: Add a test that calls IngestExternalFiles with multiple column families and verifies count == 1 for each histogram.

🟢 LOW / NIT

L1. Enum naming: _TIME vs _MICROSinclude/rocksdb/statistics.h
  • Issue: The new enums use _TIME suffix (INGEST_EXTERNAL_FILE_PREPARE_TIME) while the most recent histogram additions use _MICROS (MULTISCAN_PREPARE_MICROS). Both conventions exist in the codebase (COMPACTION_TIME, FLUSH_TIME vs STALL_MICROS, DB_MUTEX_WAIT_MICROS), so this is not wrong, but it creates mild inconsistency with the newest additions.
  • Suggested fix: Consider renaming to INGEST_EXTERNAL_FILE_PREPARE_MICROS / INGEST_EXTERNAL_FILE_RUN_MICROS to match recent convention, or keep as-is since _TIME has ample precedent.
L2. PR description mentions "RAII StopWatch" but implementation uses manual NowMicros()
  • Issue: The PR description says "recorded with an RAII StopWatch" but the actual implementation uses three manual NowMicros() calls. The manual approach is correct (and arguably better for two-phase measurement), but the description is misleading.
  • Suggested fix: Update PR description to match implementation.
L3. Redundant stats level guard — db/db_impl/db_impl.cc
  • Issue: The manual check stats_->get_stats_level() > StatsLevel::kExceptTimers is redundant with the guard inside reportTimeToHistogram() (statistics.h:844). However, the manual check serves a useful purpose: it avoids calling NowMicros() when stats are disabled. This is a valid optimization pattern.
  • Assessment: Acceptable as-is. No action needed.

Cross-Component Analysis

Context Affected? Assessment
WritePreparedTxnDB No Ingestion path unchanged
ReadOnly DB No Ingestion not supported
User-defined timestamps No Timing is orthogonal
Concurrent ingestion Safe stats_ is thread-safe, RecordTimeToHistogram is thread-safe
Java bindings Yes 0x43/0x44 values correct, non-conflicting, within signed byte range

Thread safety: stats_ is set once at DB open and never modified. Reading it without mutex is safe and follows established patterns throughout the codebase (e.g., DB_GET StopWatch at db_impl.cc:3144).

Performance: Three NowMicros() calls on the ingestion path (not a hot path). Self-gating when stats are disabled. No concern.

Positive Observations

  • Clean separation of prepare vs run phases, providing actionable insight (prepare = no write blocking, run = write blocking under mutex)
  • Proper null-safety and stats-level gating
  • Only recording on success avoids skewing latency distributions with error paths
  • Java bindings kept in sync with correct hex value assignment
  • Release note is concise and informative
  • Test covers both success and failure paths

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request has been merged in 1ede57e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment