Expose BackupEngine C API additions: StopBackup and rate limiters#14722
Expose BackupEngine C API additions: StopBackup and rate limiters#14722rajsuvariya-stripe wants to merge 6 commits into
Conversation
Add rocksdb_backup_engine_stop_backup() to c.h and db/c.cc so that language bindings using the C API (e.g. Rust via librocksdb-sys) can cancel an in-progress backup without needing a custom C++ shim. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Committed-By-Agent: claude
Add rocksdb_backup_engine_options_set_backup_rate_limiter() and rocksdb_backup_engine_options_set_restore_rate_limiter() to c.h and db/c.cc so that language bindings (e.g. Rust via librocksdb-sys) can pass a full RateLimiter object — including kAllIo mode for read+write throttling — without needing a custom C++ shim. The existing backup_rate_limit / restore_rate_limit uint64_t setters only throttle writes; these new functions expose the richer shared_ptr<RateLimiter> fields that override the uint64_t limits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Committed-By-Agent: claude
| } | ||
|
|
||
| void rocksdb_create_backup_options_set_flush_before_backup( | ||
| rocksdb_create_backup_options_t* options, unsigned char v) { |
There was a problem hiding this comment.
why not use a bool here instead of the unsigned char v?
There was a problem hiding this comment.
Good catch — <stdbool.h> is included and bool does appear in a few places (e.g. rocksdb_multi_get_entity_cf, rocksdb_load_latest_options, rocksdb_write_buffer_manager_*). However, those look like inconsistencies
introduced in newer additions rather than an intentional shift, because:
- The file explicitly documents the convention at line 40: "Bools have the type unsigned char (0 == false; rest == true)"
- The overwhelming majority of the file — including all existing backup engine option setters (set_share_table_files, set_sync, set_destroy_old_data, set_backup_log_files) — uses unsigned char
Staying consistent with the immediate family of backup functions and the documented convention seems like the right call here. That said, if you'd prefer we use bool to match the newer additions and move the file
toward bool uniformity, I'm happy to make that change.
There was a problem hiding this comment.
i am fine as long as code owners are fine with the format.
| rocksdb_create_backup_options_t* options, void* callback_arg, | ||
| void (*callback)(void* arg)) { |
There was a problem hiding this comment.
Curious - how do we plan to use the callback_arg. The C++ api will call the callback_function without any arguments (std::function<void()>) - so at best we can track the number of times the callback function was called -- but that can be achieved other ways as well instead of passing callback args to this translation layer.
There was a problem hiding this comment.
The C++ callback is std::function<void()> with no arguments. The C API bridges this by capturing callback_arg in a lambda that calls the C function pointer with that arg: [callback, callback_arg]]() { callback(callback_arg); }. This is the standard C idiom for stateful callbacks (same pattern as pthread_create, qsort_r, etc.) — it lets callers pass context without requiring global state. Without callback_arg, a Rust or C caller would have to use a global variable to track progress state.
| rocksdb_env_destroy(benv); | ||
| CheckNoError(err); | ||
|
|
||
| int progress_count = 0; |
There was a problem hiding this comment.
progress_count is just incremented through the callback function, but I don't see validations if it was ever incremented or called into.
There was a problem hiding this comment.
Fixed — added assert(progress_count > 0) after the backup call. Also set callback_trigger_interval_size = 1 when opening the backup engine so the callback fires even on the small test database (the default 4MB threshold would never be reached in a unit test).
| { | ||
| rocksdb_ratelimiter_t* limiter = | ||
| rocksdb_ratelimiter_create_with_mode(1024 * 1024, 100 * 1000, 10, 2, | ||
| 0); | ||
| rocksdb_backup_engine_options_set_backup_rate_limiter(bdo, limiter); | ||
| rocksdb_backup_engine_options_set_restore_rate_limiter(bdo, limiter); | ||
| rocksdb_ratelimiter_destroy(limiter); | ||
| } | ||
|
|
There was a problem hiding this comment.
can this test evolved to see if there's going to be any error in actually running the backup with provided rate limiter?
There was a problem hiding this comment.
Fixed in the latest commit. For the rate limiter test, added a block that opens a real backup engine with both backup_rate_limiter and restore_rate_limiter set, runs a backup against the live DB, and asserts no error is returned.
| extern ROCKSDB_LIBRARY_API void rocksdb_backup_engine_stop_backup( | ||
| rocksdb_backup_engine_t* be); |
There was a problem hiding this comment.
would like to see a test for this method too.
There was a problem hiding this comment.
Fixed in the latest commit. For stop_backup, added a call to rocksdb_backup_engine_stop_backup in the backup_with_progress_callback test after the backup completes (safe to call even when no backup is in progress, exercises the code path).
|
Hey @jaykorean - Can you help review this PR? Thanks! |
|
@hemal-stripe I'm sorry, I currently don't have bandwidth at the moment. Added my teammates to help out. |
|
| Check | Count |
|---|---|
cert-err58-cpp |
4 |
clang-analyzer-optin.core.EnumCastOutOfRange |
4 |
concurrency-mt-unsafe |
9 |
cppcoreguidelines-avoid-non-const-global-variables |
5 |
cppcoreguidelines-pro-type-member-init |
1 |
cppcoreguidelines-special-member-functions |
1 |
| Total | 24 |
Details
db/blob/db_blob_direct_write_test.cc (1 warning(s))
db/blob/db_blob_direct_write_test.cc:127:7: warning: class 'ActiveBlobVisibilityWritableFile' defines a non-default destructor but does not define a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
db/db_wal_test.cc (1 warning(s))
db/db_wal_test.cc:117:5: warning: uninitialized record type: 'fs_stat' [cppcoreguidelines-pro-type-member-init]
db_stress_tool/db_stress_common.cc (4 warning(s))
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:25:56: warning: variable 'wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_test_base.cc (6 warning(s))
db_stress_tool/db_stress_test_base.cc:261:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4025:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4566:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4600:11: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4639:11: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:5373:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc (4 warning(s))
db_stress_tool/db_stress_tool.cc:41:5: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:402:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:416:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:448:9: warning: function is not thread safe [concurrency-mt-unsafe]
include/rocksdb/file_system.h (4 warning(s))
include/rocksdb/file_system.h:195:10: warning: The value '3' provided to the cast expression is not in the valid range of values for 'FileOpenContract' [clang-analyzer-optin.core.EnumCastOutOfRange]
include/rocksdb/file_system.h:195:10: warning: The value '3' provided to the cast expression is not in the valid range of values for 'FileOpenContract' [clang-analyzer-optin.core.EnumCastOutOfRange]
include/rocksdb/file_system.h:195:10: warning: The value '3' provided to the cast expression is not in the valid range of values for 'FileOpenContract' [clang-analyzer-optin.core.EnumCastOutOfRange]
include/rocksdb/file_system.h:195:10: warning: The value '3' provided to the cast expression is not in the valid range of values for 'FileOpenContract' [clang-analyzer-optin.core.EnumCastOutOfRange]
tools/ldb_cmd.cc (4 warning(s))
tools/ldb_cmd.cc:5469:19: warning: initialization of 'kInputFile' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5470:19: warning: initialization of 'kResultFile' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5537:51: warning: initialization of 'ARG_JOB_DIR' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
tools/ldb_cmd.cc:5590:50: warning: initialization of 'ARG_JOB_DIR' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Committed-By-Agent: claude
27eed2a to
a23a485
Compare
|
@pdillinger @archang19 I have fixed the lint issues and removed the progress callback for now (I will raise a separate PR for that). Can you please review this PR now? |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit a23a485 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit a23a485 SummaryClean C API extension that correctly follows existing patterns for shared_ptr rate limiter assignment and StopBackup delegation. The main gaps are missing test coverage for High-severity findings (1):
Full review (click to expand)Findings🔴 HIGHH1. Missing test for
|
| Context | Assumptions hold? | Action needed? |
|---|---|---|
| WritePreparedTxnDB | YES (backup options, not DB operations) | None |
| ReadOnly DB | YES | None |
| Concurrent writers | YES (StopBackup is atomic/thread-safe) | None |
| Multiple BackupEngine instances | YES (independent stop flags) | None |
Ownership semantics verified: shared_ptr copy from limiter->rep to options->rep.backup_rate_limiter correctly increments reference count. Destroying the C wrapper is safe.
Positive Observations
- Implementation correctly follows the existing
rocksdb_options_set_ratelimiter()pattern - shared_ptr ownership model avoids lifetime issues at the C/C++ boundary
- End-to-end test provides functional validation beyond set/get roundtrips
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…rate limiter, add docs - Test that StopBackup is one-way: after calling stop_backup, verify create_new_backup returns an error (Status::Incomplete) - Add end-to-end restore test in the rate limiter block to exercise the restore_rate_limiter path - Document StopBackup irreversibility in c.h - Document that set_backup/restore_rate_limiter overrides the uint64_t rate limit when non-null Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Committed-By-Agent: claude
|
@archang19 @pdillinger can you help with review for this PR? I have fixed some of the comments from the LLM review. |
|
/claude-review |
1 similar comment
|
/claude-review |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107654882. |
✅ Claude Code ReviewRequested by @xingbowang SummaryClean, well-structured PR that correctly exposes BackupEngine C API additions. The shared_ptr ownership, thread safety, and error propagation are all correct. The rate limiter setter pattern matches the existing High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Missing getter functions for rate limiters —
|
| Context | Affected? | Assessment |
|---|---|---|
| BackupEngineReadOnly | No | StopBackup not applicable; rate limiters only used during backup/restore |
| Multiple BackupEngine instances | No | Each instance has independent stop_backup_ flag and options |
| Concurrent stop + backup | Safe | Atomic memory_order_release/acquire on stop_backup_ flag |
| shared_ptr lifetime | Safe | Copy into BackupEngineOptions increments refcount; C wrapper destruction only decrements it |
| Rate limiter override of uint64_t | Verified | backup_engine.cc constructor checks if (options_.backup_rate_limiter == nullptr && options_.backup_rate_limit > 0) |
Positive Observations
- Correct ownership model: The shared_ptr copy into
BackupEngineOptionsensures the RateLimiter stays alive even afterrocksdb_ratelimiter_destroyis called on the C wrapper. The test correctly validates this pattern. - Good documentation: The
c.hcomments accurately describe one-way StopBackup semantics and rate limiter override behavior. - Consistent patterns: The implementation follows established C API conventions (
if (limiter)guard,SaveErrorfor error propagation,ROCKSDB_LIBRARY_APIon declarations). - Thorough integration test: The rate limiter test creates a full engine, performs backup and restore, validating end-to-end functionality.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
Please fix build failure |
rate_restore_path was 200 bytes but snprintf appends ".rate_restore" (13 bytes) to dbname which is also 200 bytes, potentially needing 212 bytes total. GCC's -Werror=format-truncation= rejects this. Increase buffer to 220 bytes to satisfy the static analysis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Committed-By-Agent: claude
Without an early ShouldStopBackup() check, CreateNewBackupWithMetadata
proceeded into GetFileDbIdentities -> SstFileDumper, which tried to look
up an unregistered comparator, returned Status::OK() with a null pointer,
and triggered assert(user_comparator) in sst_file_dumper.cc.
Add the check at the top of CreateNewBackupWithMetadata so it returns
Incomplete("Backup stopped") immediately, consistent with the checks
already present in CopyOrCreateFile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
|
@xingbowang I have fixed the two issues which were causing the build failures, can you rerun these builds?
|
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 6dddbad ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 6dddbad SummaryClean, well-implemented PR that exposes existing BackupEngine functionality through the C API. The code follows established patterns exactly and introduces no correctness or performance concerns. The early High-severity findings (0): Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. StopBackup declaration placement in
|
| Concern | Assessment |
|---|---|
| shared_ptr lifetime after wrapper destroy | Safe — shared_ptr copy increments refcount |
| StopBackup on read-only engine | N/A at C API level — C API opens read-write |
| Early ShouldStopBackup vs GarbageCollect | No issue — early check prevents starting work |
| Concurrent StopBackup + CreateNewBackup | Safe — atomic with proper memory ordering |
| Performance impact | None — single atomic load on cold path |
Positive Observations
- Perfect adherence to existing C API patterns
- Good
c.hdocumentation explaining one-way StopBackup behavior - Comprehensive end-to-end test: backup + restore with rate limiters
- Correct shared_ptr lifetime management in tests
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107654882. |
|
@xingbowang merged this pull request in 0493154. |
Summary
Extends RocksDB's C API with two
BackupEnginecapabilities needed bylanguage bindings (e.g. Rust via librocksdb-sys) that consume the C API:
StopBackup: Add
rocksdb_backup_engine_stop_backup()to allow cancellingan in-progress backup.
Rate limiters: Add
rocksdb_backup_engine_options_set_backup_rate_limiter()androcksdb_backup_engine_options_set_restore_rate_limiter()to expose theshared_ptr<RateLimiter>fields onBackupEngineOptions. The existinguint64_tsetters only throttle writes; these expose the richerRateLimiterobject that supports read+write throttling (e.g.
kAllIomode).Test plan
db/c_test.ccoverStopBackupand rate limitersetter/getter roundtrips, plus opening a real backup engine with rate
limiters set and running a backup end-to-end
make checkpasses with no regressions