Skip to content

Finish C API code generation (continues #14572)#14868

Closed
zaidoon1 wants to merge 28 commits into
facebook:mainfrom
zaidoon1:c-api-codegen
Closed

Finish C API code generation (continues #14572)#14868
zaidoon1 wants to merge 28 commits into
facebook:mainfrom
zaidoon1:c-api-codegen

Conversation

@zaidoon1

Copy link
Copy Markdown
Contributor

Summary

This continues and finishes #14572 ("Add semi-automated code generation for RocksDB C API bindings") by @xingbowang. The original author is unavailable to finish it, so I've taken it over. All 13 of the original commits are preserved (this branch was created from the PR head and builds on top of it — git log shows the original Xingbo Wang authorship intact); my follow-up work is in the commits prefixed C API codegen:.

The underlying design is unchanged and is the original author's: hand-written source templates (tools/c_api_gen/c_base.h / c_base.cc) plus two generators (auto-discovery from the C++ headers + a spec-driven generator) are inlined into a single, self-contained, @generated include/rocksdb/c.h and db/c.cc. This grows the public C API by 668 functions while keeping c.h a single includable header with no -I requirement (so bindgen and other FFI tools keep working unchanged).

This branch reconciles the PR with ~4 months of main and addresses the outstanding review feedback (clang-tidy bot, the automated code review, the c.h self-containedness discussion, and @pdillinger's points about include/rocksdb hygiene and @generated marking).

What changed on top of the original PR

Reconciled with current main

  • Merged current main (conflicts were confined to the generated/test files) and regenerated. Reconciled the 14 C API functions main added since the merge-base (e.g. rocksdb_set_db_options, the backup-engine rate limiters, memtable_batch_lookup_optimization, optimize_multiget_for_io, …) and restored 5 enum constants that upstream had added by hand (rocksdb_txndb_write_policy_*, ..._index_block_search_type_auto, rocksdb_blob_cache_read_byte).

Maintainer feedback (@pdillinger)

  • No non-user-includable files in include/rocksdb. Moved the hand-written templates out of include/rocksdb/ and db/ to tools/c_api_gen/c_base.{h,cc}. They were #include-ing generated fragments, which broke make check-headers and was shipped by make install. include/rocksdb/ now contains only the user-facing, self-contained, @generated c.h.
  • c.h / c.cc carry the // @generated marker.

Backward compatibility (zero ABI break)

  • The generator derived each wrapper's C type purely from the C++ field, which had silently changed 5 already-shipped signatures (e.g. rocksdb_writeoptions_disable_WAL intunsigned char). Added an ABI type-pinning layer (tools/c_api_gen/abi_type_overrides.json) so already-shipped functions keep their historical C signature (the body still casts to the real field type). A repo-wide diff against the merge-base now reports 0 ABI drift.
  • New check_api_compatibility.py gate (wired into CI + make) fails on any removed/changed public function or removed enum/typedef symbol, vs a reference revision. Intentional changes go in an allowlist with a reason.

Correctness (from the automated review)

  • Restored 5 option setters that were declared in c.h but defined nowhere (link failure for downstream bindings such as rust-rocksdb). Added check_api_completeness.py (dependency-free; runs in CI + make) asserting every declared function has exactly one definition — this is the gate that would have caught it.
  • CopyStringVector now null-checks malloc; the WAL filter std::moves the WriteBatch; the backup exclude-files callback captures by value instead of the wrapper pointer.

Build / CI robustness

  • Removed the dead C_API_CODEGEN_STAMP Makefile prerequisite (it was a silent no-op).
  • The make check staleness check is now opt-out-able (behind SKIP_FORMAT_BUCK_CHECKS) and skips gracefully when clang++ is unavailable, so make check works without the codegen toolchain. CI remains the authoritative gate.
  • Pinned clang-format consistently through regen_all.py / verify_generated_up_to_date.py (CI uses clang-format-21) so regeneration is byte-reproducible across environments.
  • Cleared all 20 clang-tidy warnings the bot reported on db/c.cc changed lines (fixed in the c_base.cc template, not the generated output).

Test coverage

  • Added gen_roundtrip_tests.py, which derives 462 set→get→assert round-trip checks across 25 option objects from the same generated fragments and wires them into db/c_test.c. Coverage now tracks the generated surface automatically.

Docs

  • Added the unreleased_history/public_api_changes note and fixed claude_md/add_public_api.md, which still told contributors to hand-edit the now-@generated c.h/c.cc.

Test Plan

  • make c_test && ./c_testpasses, including the 462 generated round-trip assertions (a successful link also confirms the API is complete).
  • python3 tools/c_api_gen/check_api_completeness.py — all 1737 declared functions defined exactly once.
  • python3 tools/c_api_gen/check_api_compatibility.py --ref <release> — 1070 reference functions + 229 enum/typedef symbols preserved, 0 removed/changed.
  • python3 tools/c_api_gen/verify_generated_up_to_date.py — generated output is stable.
  • include/rocksdb/c.h confirmed self-contained (only <stdbool.h>, <stddef.h>, <stdint.h>).

cc @xingbowang

xingbowang and others added 24 commits April 6, 2026 15:44
Summary:
RocksDB's C API (include/rocksdb/c.h / db/c.cc) is large and
hand-maintained, with hundreds of repetitive setter/getter wrappers that
follow strict naming and type conventions. Adding new C++ fields requires
manually adding matching C wrappers, which is tedious and error-prone.

This PR introduces a semi-automated code generation system for the
mechanical parts of the C API while keeping complex wrappers (callbacks,
ownership-transfer, multi-get families) hand-written.

The system has two complementary generators:

1. Auto-discovery (auto_simple_bindings.py): Scans selected C++
   public option/metadata structs and auto-generates getter/setter wrappers
   for simple scalar, enum, string, and chrono fields. Generated .inc
   files are checked in and compiled into c.h/c.cc via #include.
   Fields that cannot be auto-generated must be explicitly blocklisted in
   auto_simple_bindings_blocklist.json with a policy and reason.

2. Spec-driven generator (generate_c_api.py): Takes spec.json as
   input and emits consistent boilerplate for method-style wrappers whose
   C shape cannot be inferred from the C++ header alone (e.g. rocksdb_put,
   rocksdb_transaction_commit, WriteBatch methods).

Coverage improvement: This PR adds 725 new C API functions, growing the
public C API surface from 1,053 to 1,778 exported functions -- a 68.9%
increase. The bulk of the new coverage (436 functions) comes from
auto-discovered option struct setters/getters that were previously missing.

Coverage breakdown by family:
- Option structs (auto-discovered):     436 new functions
- Metadata structs (auto-discovered):    89 new functions
- ReadOptions (auto-discovered):         46 new functions
- JobInfo structs (auto-discovered):     46 new functions
- Spec-driven (subset wrappers):         56 new functions
- DB simple operations:                  27 new functions
- Transaction/TransactionDB/WriteBatch:  29 new functions

The generated code emits the same style as today's hand-written wrappers
(SaveError, Slice(), malloc-owned buffers, unsigned char booleans) and is
organized in clearly marked generated sections within c.h/c.cc.

Test Plan:
- Existing db_test.c C API tests pass (1743 lines of tests extended/verified)
- python3 tools/c_api_gen/regen_all.py && git diff --exit-code -- include/rocksdb/c_api_gen db/c_api_gen verifies generated output is stable
- python3 tools/c_api_gen/validate_generated_equivalence.py --ref HEAD verifies equivalence with reference hand-written wrappers
# Conflicts:
#	utilities/trie_index/trie_index_factory.cc
…ream)

Reconciles the semi-automated C API code generation branch (PR facebook#14572,
authored by Xingbo Wang) with ~4 months of upstream main.

Conflicts were limited to the generated/test files: include/rocksdb/c.h,
db/c.cc, db/c_test.c.

Reconciliation strategy (generated files are resolved by regeneration, not
hand-merge):

- Bucket A (already generated by this branch; upstream added hand-written
  copies): rocksdb_options_{get,set}_memtable_batch_lookup_optimization,
  rocksdb_readoptions_{get,set}_optimize_multiget_for_io,
  rocksdb_block_based_options_set_uniform_cv_threshold,
  rocksdb_transactiondb_options_set_write_policy. Verified the generated
  signatures are ABI-identical to upstream's hand-written ones; the generated
  versions win and upstream's hand-written copies are dropped on regeneration.

- Bucket B (simple new C++ option fields; auto-discovery now emits them from
  the merged headers): rocksdb_options_{get,set}_async_wal_precreate,
  rocksdb_options_{get,set}_use_direct_io_for_compaction_reads.

- Bucket C (complex wrappers; ported by hand into the c_base.* templates):
  rocksdb_set_db_options, rocksdb_backup_engine_stop_backup,
  rocksdb_backup_engine_options_set_backup_rate_limiter,
  rocksdb_backup_engine_options_set_restore_rate_limiter.

- New upstream ReadOptions field read_scoped_block_buffer_provider (a complex
  provider pointer) added to auto_simple_bindings_blocklist.json with
  policy "manual"; the fail-closed generator surfaced it automatically.

include/rocksdb/c.h and db/c.cc regenerated via tools/c_api_gen/regen_all.py;
db/c_test.c resolved by hand (kept both sides' new test coverage). All 14
upstream-added C API functions verified present; db/c.cc syntax-checks clean
under C++20.
…lates

Addresses RocksDB maintainer feedback and the build-system issues found while
taking over PR facebook#14572.

Maintainer concern (pdillinger): include/rocksdb must contain only
user-includable headers. The hand-written generator template c_base.h was a
non-includable "SOURCE TEMPLATE" living in include/rocksdb and still carrying
unresolved #include "c_api_gen/*.inc" directives, which broke `make
check-headers` and was shipped by `make install`. Move the templates next to
the generator inputs they are:
  include/rocksdb/c_base.h -> tools/c_api_gen/c_base.h
  db/c_base.cc             -> tools/c_api_gen/c_base.cc
include/rocksdb/ now contains only the user-facing, self-contained, @generated
c.h. Generator/validator paths and the generated banners are updated to match;
the inlined c.h/c.cc are unaffected at the API level.

Build system:
- Remove the dead `$(OBJ_DIR)/db/c.o: $(C_API_CODEGEN_STAMP)` prerequisites
  (the variable was never defined, so they were silent no-ops) and the orphan
  tools/c_api_gen/stamp .gitignore entry. Regeneration requires clang++ and is
  intentionally not on the compile path; the committed c.h/c.cc are
  self-contained.
- `make check`'s C API staleness check is moved inside the
  SKIP_FORMAT_BUCK_CHECKS guard (so it can be opted out like the other format
  checks) and now runs the robust temp-dir verifier instead of regenerating in
  place + `git diff`. It skips gracefully (with a message) when clang++ is not
  installed, so `make check` works without the codegen toolchain. CI remains
  the authoritative gate.

clang-format reproducibility:
- Thread a pinned `--clang-format` through regen_all.py and down from
  verify_generated_up_to_date.py, and format with an explicit
  `--style=file:.clang-format`. This closes the hole where the regenerated
  c.h/c.cc were formatted with an unpinned clang-format while CI pinned
  clang-format-21, which could produce spurious staleness failures. README
  documents the required clang/clang-format versions.

Cleanup:
- Remove a duplicated _find_clang_binary/get_clang_binary pair (a merge
  artifact; the first definitions were dead) and broaden the clang++ version
  fallback list (newest first, incl. clang++-21).
- CI: `apt-get update` before installing clang-format-21.

Generated c.h/c.cc/.inc refreshed; tools/c_api_gen/verify_generated_up_to_date.py
passes and db/c.cc syntax-checks clean under C++20.
…s gate

Fixes a P0 link-time breakage for downstream C bindings (rust-rocksdb, etc.):
five public C API functions were declared in include/rocksdb/c.h but their
hand-written implementations had been removed during the codegen migration and
never regenerated, so they were defined nowhere. They build fine into
librocksdb (the symbols are simply absent) and RocksDB's own c_test does not
call them, so CI never noticed -- but any binding that references them fails to
link.

Restore the implementations in tools/c_api_gen/c_base.cc (verbatim from the
historical/upstream versions, preserving their signatures and therefore ABI):
- rocksdb_options_set_bottommost_compression_options
- rocksdb_options_set_bottommost_compression_options_zstd_max_train_bytes
- rocksdb_options_set_bottommost_compression_options_use_zstd_dict_trainer
- rocksdb_options_set_bottommost_compression_options_max_dict_buffer_bytes
- rocksdb_options_set_max_bytes_for_level_multiplier_additional
These set multiple fields of a nested struct / take a caller-owned array, which
is exactly why the simple auto-generator cannot emit them; a comment documents
that they are intentionally hand-written.

Add tools/c_api_gen/check_api_completeness.py: a cheap, dependency-free gate
(no clang/libclang needed) asserting every `extern ROCKSDB_LIBRARY_API`
function declared in c.h has exactly one definition in c.cc. This is the check
that would have caught the bug above automatically; it is wired into
`make check-c-api-gen` (runs unconditionally, before the clang-gated staleness
check) and into CI.

Running the new gate also surfaced a pre-existing upstream bug unrelated to
codegen: rocksdb_writebatch_wi_create_from(const char*, size_t) has been
declared in c.h but defined nowhere in RocksDB for a long time.
WriteBatchWithIndex has no constructor from a serialized representation (unlike
plain WriteBatch), so the function is unimplementable as declared and no
consumer can have been linking against it successfully. Remove the dead
declaration so the API is honest and the completeness gate passes.

After the fix: all 1737 declared C API functions have exactly one definition;
verify_generated_up_to_date passes; db/c.cc syntax-checks clean under C++20.
RocksDB's C API has a strong backward-compatibility guarantee, but the
auto-generator derived each wrapper's C type purely from the C++ field type.
For functions that shipped before codegen this silently changed 5 public
signatures -- an ABI break for every downstream binding (Rust, Go, Python, ...)
that was compiled against the old headers:

  rocksdb_writeoptions_disable_WAL               int      -> unsigned char
  rocksdb_restore_options_set_keep_log_files     int      -> unsigned char
  rocksdb_block_based_options_set_checksum       char     -> int
  rocksdb_block_based_options_set_format_version int      -> uint32_t
  rocksdb_block_based_options_set_block_size     size_t   -> uint64_t

rocksdb_writeoptions_disable_WAL in particular is one of the most widely used C
API functions.

Add an ABI type-pinning layer:
- tools/c_api_gen/abi_type_overrides.json records, per function, the C type
  that must be preserved for compatibility, with a reason.
- auto_simple_bindings.py applies these overrides to the generated specs: the
  public C parameter/return type is pinned to the historical type while the
  body still casts to the real C++ field type via
  static_cast<decltype(field)>(value), so behavior is identical and only the
  signature is held stable.

Policy: only already-shipped functions get overrides; brand-new functions use
the natural C++ -> C mapping. A repo-wide signature diff against the merge-base
now reports 0 ABI drifts (previously 5). The hardened equivalence validator
(next commit) enforces this so future drift fails CI.

verify_generated_up_to_date and the link-completeness check pass; db/c.cc
syntax-checks clean under C++20.
…ze validator

Adds the authoritative, ongoing backward-compatibility enforcement that the
ABI-preservation work needs, and fixes the existing validator's misleading
defaults (per review feedback that it provided near-zero real assurance).

New: tools/c_api_gen/check_api_compatibility.py
- Compares the current public c.h against a reference revision at the
  SIGNATURE level (return type + parameter types; parameter names are ignored
  since they do not affect the C ABI). Robust: no false positives from
  formatting or function-body token differences.
- Fails on any removed function or changed signature; brand-new functions are
  reported, never an error.
- Intentional, reviewed incompatibilities are recorded in
  api_compatibility_allowlist.json with a reason. Seeded with the one
  intentional removal (rocksdb_writebatch_wi_create_from, which was never
  implemented).
- Dependency-free (git + python). Wired into `make check-c-api-gen` (graceful
  skip if the baseline ref is not resolvable locally) and into CI against
  origin/main. Confirmed: 1070 reference functions preserved, 668 new, 0
  removed/changed.

Fix: validate_generated_equivalence.py
- `--ref` was defaulting to HEAD, i.e. comparing the generated output against
  itself (a tautology now that the wrappers are generated). Make `--ref`
  required and document that it must be a pre-migration revision; note that it
  is a best-effort body-equivalence aid and check_api_compatibility.py is the
  authoritative back-compat gate.

README: document the source templates, the verification tools, and
abi_type_overrides.json.
…se.cc

All of these live in the hand-written template (the generators emit clean
code), so they are fixed in tools/c_api_gen/c_base.cc and propagated by
regeneration; db/c.cc is never edited directly.

Correctness:
- CopyStringVector: null-check the malloc result before strdup'ing into it
  (previously segfaulted on allocation failure).
- WalFilter::LogRecordFound: std::move the WriteBatch out of the local
  c_new_batch into *new_batch instead of copying a soon-to-be-destroyed batch.
- set_exclude_files_callback: capture the callback and state by value rather
  than the options wrapper pointer, so the stored std::function does not depend
  on the wrapper outliving it (matches the safer progress_callback pattern).

clang-tidy (clears all 20 warnings the bot reported on db/c.cc changed lines):
- SliceTransformWrapper: initialize rep_ and declare the rule-of-5 (it owns
  rep_ and is neither copyable nor movable)
  [cppcoreguidelines-pro-type-member-init, -special-member-functions].
- Use std::make_shared for the VectorRepFactory and the two WriteBufferManager
  allocations [modernize-make-shared].
- Add braces to the table-properties positional accessors, the WAL filter
  destructor, and the callback logger [readability-braces-around-statements].
- Split "Slice a, b;" declarations [readability-isolate-declaration].

The remaining clang-tidy findings are on pre-existing wrapper structs
(rocksdb_comparator_t, rocksdb_mergeoperator_t, ...) that this PR does not
modify and that the bot did not flag; left untouched to avoid churning
unrelated code. db/c.cc syntax-checks clean under C++20; completeness,
compatibility, and staleness checks pass.
…at check

Merge-completeness fix. While reconciling with main, regeneration overwrote
three enums that upstream had extended by hand after the branch diverged,
dropping five public enum constants from c.h (a backward-incompatible removal
that broke C consumers and the merged c_test.c which references them):

- rocksdb tickers enum: add rocksdb_blob_cache_read_byte and bump
  rocksdb_total_metric_count 85 -> 86
- index block search type enum: add
  rocksdb_block_based_table_index_block_search_type_auto = 2
- transaction DB write policy enum (was entirely absent):
  rocksdb_txndb_write_policy_write_{committed,prepared,unprepared}

Restored in tools/c_api_gen/c_base.h.

My function-level reconciliation and the link-completeness check only covered
functions, so these slipped through. Extend check_api_compatibility.py to also
diff non-function public symbols (enum constants and typedefs) against the
reference: it now confirms 229 enum/typedef symbols are preserved and would
have flagged this regression. (Enum constants do not affect linking, so the
completeness check cannot see them; this is the right gate.)
~44% of the newly generated C API functions had no test coverage, the bulk
being auto-discovered option getters/setters. Rather than hand-writing
hundreds of assertions (which would drift as the API grows), derive the tests
from the same checked-in generated fragments.

Add tools/c_api_gen/gen_roundtrip_tests.py: it parses the generated
declaration fragments, pairs every rocksdb_<obj>_set_<field> with its
rocksdb_<obj>_get_<field>, and for each option object with a parameterless
create/destroy emits a create -> set sentinel -> assert getter returns it ->
destroy test. The sentinel is chosen from the getter's return type
(unsigned char -> 1, const char* -> "test", else 42), which round-trips
reliably even through ABI-pinned setter parameter types because the generated
setters are direct field assignments. No clang is required (it only reads the
committed .inc files).

Output c_api_gen/c_generated_roundtrip_tests.c.inc (232 assertions across 25
option objects) is included by db/c_test.c and invoked from main() in a new
"generated_option_roundtrips" phase. Wired into regen_all.py so coverage
tracks the generated surface automatically. c_test.c compiles clean.
…nerated files

- Add the required unreleased_history/public_api_changes entry for the large
  set of new C API functions.
- add_public_api.md Step 5 and the file checklist still told contributors to
  hand-edit include/rocksdb/c.h and db/c.cc, which are now @generated and would
  be clobbered on regeneration. Point manual wrappers at the source templates
  (tools/c_api_gen/c_base.h / c_base.cc) + regen, and mark c.h/c.cc @generated.
… enum type

P2 cleanup:
- Remove tools/c_api_gen/generated/c_preview.{h,cc}.inc. These were a checked-in
  preview output of generate_c_api.py, not referenced by regen_all.py or any
  build/CI step, so they silently drifted. Update the README to describe
  regen_all.py as the entry point and to preview spec output to a scratch path
  instead.
- Document why FamilyConfig.enum_c_type is per-family (option families ship
  enums as int, job-info families as uint32_t, e.g. the pre-existing
  rocksdb_flushjobinfo_flush_reason): unifying them would be an ABI break, so
  the apparent inconsistency is intentional and must stay.
Running c_test surfaced a real bug in the generated round-trip tests:
rocksdb_options_get_use_fsync(obj) == 42 failed. use_fsync is a bool field
whose spec-driven C wrapper uses the legacy `int` getter shape (not
`unsigned char`), so the getter-return-type heuristic picked the scalar
sentinel 42 -- which a bool field collapses to 1.

A bool field is not distinguishable from a real scalar by the C signature
alone, so test with 1 and 0 instead: both round-trip through bool, scalar, and
enum fields, and exercising both directions also verifies the setter actually
changes the value regardless of its default. 462 assertions across 25 option
objects.
@meta-cla meta-cla Bot added the CLA Signed label Jun 19, 2026
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 73.5s.

zaidoon1 added 2 commits June 19, 2026 01:55
…heck ref)

Addresses the red checks on the PR:

- check-format-and-targets (clang-format): my hand-written additions to the
  c_base templates weren't clang-format-clean and propagated into the generated
  c.h/c.cc. Reflowed the two backup-engine rate-limiter declarations (return
  type on its own line) and the exclude-files callback body to match
  clang-format, and shortened the generated banner's "Edit ..." line so it
  stays <= 80 columns for the longer `c_base.cc` edit target (clang-format was
  reflowing it). Verified with git-clang-format against the merge base:
  "clang-format did not modify any files".

- check-format-and-targets (check-sources): replaced a non-ASCII em dash in a
  regen_all.py comment with "--" (RocksDB requires ASCII-only source files).

- build-linux-clang-21-no_test_run: the backward-compat CI step referenced
  origin/main, which `git fetch origin main` does not create (it writes
  FETCH_HEAD). Use --ref FETCH_HEAD.

Verified locally: check-sources, check-buck-targets (BUCK unchanged),
git-clang-format, link-completeness, backward-compat (FETCH_HEAD), and the
generated-up-to-date check all pass.

(The 3 build-windows-vs2022 failures are unrelated CI infra: CMake reports
"could not find any instance of Visual Studio" during configuration, before any
compilation; all Linux/macOS CMake builds pass.)
The backward-compat CI step runs `git fetch` and the checker runs `git show`,
but the containerized checkout (/__w/rocksdb/rocksdb) is owned by a different
user, so git aborts with "detected dubious ownership" (exit 128). Mark the
workspace as a safe directory before the git operations.
@meta-codesync

meta-codesync Bot commented Jun 19, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D109149150.

@xingbowang

Copy link
Copy Markdown
Contributor

Peter had another comment before that I have not addressed. Please address accordingly. THanks.

I STILL object to include/rocksdb containing files that are not user-includable headers.
For c.h (which phabriactor won't show to me), generated files need "@generated" in them, for infrastructure reasons. And the notice about being generated should be immediately after copyright, to minimize confusion.

@zaidoon1

zaidoon1 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Both of these are already taken care of in this PR. I think Peter was looking at the original #14572, before the cleanup.

The non-includable file he means is c_base.h, the hand-written template that pulls in the generated fragments. I moved it and the matching .cc out of the public headers and put them next to the generator:

include/rocksdb/c_base.h  ->  tools/c_api_gen/c_base.h
db/c_base.cc              ->  tools/c_api_gen/c_base.cc

So the only thing this PR touches under include/rocksdb/ is c.h:

$ git diff --name-status main..HEAD -- include/rocksdb/
M	include/rocksdb/c.h

c.h is includable on its own: the only includes are <stdbool.h>, <stddef.h> and <stdint.h>, so no -I is needed (that's what keeps bindgen working), and make check-headers passes (the build-linux-unity-and-headers job is green). The generated .inc fragments live in the top-level c_api_gen/ dir, not under include/.

The @generated line is there too, right after the license header. Phabricator hides the file from you, so here's the top of include/rocksdb/c.h:

//  Copyright (c) Meta Platforms, Inc. and affiliates.
//  This source code is licensed under both the GPLv2 (found in the
//  COPYING file in the root directory) and Apache 2.0 License
//  (found in the LICENSE.Apache file in the root directory).
// @generated
// -----------------------------------------------------------------------------
// Auto-generated by tools/c_api_gen/regen_all.py.
// DO NOT EDIT THIS FILE DIRECTLY.
// Edit tools/c_api_gen/c_base.h or the inputs under tools/c_api_gen/,
// then rerun: python3 tools/c_api_gen/regen_all.py
// -----------------------------------------------------------------------------

db/c.cc has the same marker in the same place.

@xingbowang

xingbowang commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

From Peter:

Devmate says to fix the buck c_test:
Add the generated fragments as headers on the c_test_bin target in fbcode/rocks/buckifier/defs.bzl:
cpp_binary(
name = "c_test_bin",
srcs = ["db/c_test.c"],
headers = native.glob(["c_api_gen/**/*.inc"]),
compiler_flags = ROCKSDB_COMPILER_FLAGS,
include_paths = ROCKSDB_INCLUDE_PATHS,
...
)
(end devmate stuff)
If awkward to add to this diff, it could be in a preceding diff this one depends on.
Obviously rebase to get the Windows CI fix.
There are some legit concerns raised by review assistant about quietly losing CI coverage of check-c-api-gen due to some of the ways things can be skipped. Perhaps there should be an environment variable->make variable that says "upgrade any check-c-api-gen warnings to errors" so that the core CI jobs intended to test that flow are not accidentally regressed to a skipped (maybe with warning that no one will see) flow.
Based on my previous review of this line of work and addressing the main concerns I had, generally LGTM.

zaidoon1 added 2 commits June 23, 2026 23:50
…ew feedback

Follow-up to maintainer feedback on the PR.

check-c-api-gen could silently lose coverage (a reviewer concern): the make
target is skippable (no clang++ -> skip; unresolvable compat ref -> skip; the
whole thing is behind SKIP_FORMAT_BUCK_CHECKS), and CI doesn't even run the
make target -- it called the three checkers directly. So a CHECK_C_API_GEN_STRICT
flag that only lived in the make target would be inert in CI.

- Add CHECK_C_API_GEN_STRICT: when set, every "skip" in `make check-c-api-gen`
  becomes a hard error. Treat 0/no/false (not just empty) as "off" so
  CHECK_C_API_GEN_STRICT=0 doesn't accidentally enable it.
- Actually use it: the dedicated build-linux-clang-21-no_test_run CI job now runs
  `make check-c-api-gen CHECK_C_API_GEN_STRICT=1 API_COMPAT_REF=FETCH_HEAD
  CLANG_FORMAT_BINARY=clang-format-21` (single source of truth instead of three
  ad-hoc python invocations), so if a prerequisite ever goes missing the job
  hard-fails instead of quietly passing.
- Fix the clang-availability gate: it tested `$(CXX)` verbatim, which mis-detects
  a ccache-prefixed/versioned CXX (e.g. "ccache clang++-21") -- both as a false
  "no clang" skip and, for g++, as a false "have clang" that then crashes the
  generator. Detect a clang the way the generator does. Correspondingly, make
  the generator's _find_clang_binary pick the clang *token* out of a multi-word
  $CXX (use the bare clang for -ast-dump, not the launcher).
- Fix the stale "does NOT modify the working tree" comment (the verifier
  regenerates in place and compares against a pre-run snapshot).

Buck c_test: db/c_test.c #includes the generated c_api_gen/*.inc, so under Buck's
sandbox the c_test_bin target needs them as headers. That target lives in the
internal //rocks/buckifier:defs.bzl (the OSS buckifier only emits a parameterless
add_c_test_wrapper()), so the headers glob has to be added there. Document the
coupling in buckifier/targets_builder.py so it isn't lost. Make/CMake already
resolve the include via -I. / PROJECT_SOURCE_DIR.
@zaidoon1

zaidoon1 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Went through all three:

Buck c_test: the c_test_bin target isn't in the OSS tree. The OSS buckifier only emits a parameterless add_c_test_wrapper() (BUCK:474, written by buckifier/targets_builder.py), and that wrapper is defined in the internal //rocks/buckifier:defs.bzl. So the headers = native.glob(["c_api_gen/**/*.inc"]) has to go on the cpp_binary inside that internal defs.bzl - I can't make or validate that from the fork, so it'll need to be the preceding internal diff you mentioned. I added a comment in buckifier/targets_builder.py documenting the coupling so it doesn't get lost on the next regen. (Make and CMake already resolve the include via -I. and include_directories(${PROJECT_SOURCE_DIR}), so it's only Buck's sandbox that needs the headers declared.)

Windows CI: I merged upstream/main to pick up #14869/#14872

check-c-api-gen skipping: good catch, and it was worse than just the skips - the dedicated build-linux-clang-21-no_test_run job wasn't going through the make target at all (it called the checkers directly), so a strict flag on the target alone would've been inert. So I did both:

  • Added CHECK_C_API_GEN_STRICT, which turns every skip (missing clang++/clang-format, or an unresolvable compat baseline) into a hard error. 0/no/false count as off so =0 doesn't surprise anyone.
  • Wired it into that CI job - it now runs make check-c-api-gen CHECK_C_API_GEN_STRICT=1 API_COMPAT_REF=FETCH_HEAD CLANG_FORMAT_BINARY=clang-format-21 as the single gate, so it fails loudly instead of quietly skipping if a prerequisite ever disappears.

While there I also fixed the clang-detection gate (it tested $(CXX) verbatim and mis-handled a ccache-prefixed/versioned CXX).

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 25d3b04


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 25d3b04


Summary

This is a well-structured, large-scale PR that finishes the C API code generation infrastructure. The design is sound: hand-written templates + two generators (spec-driven and auto-discovery) producing .inc fragments inlined into the final c.h/c.cc, with comprehensive CI gates for completeness, backward compatibility, and staleness. The ABI type-pinning layer is a thoughtful solution to the historical signature drift problem. The 462 auto-generated round-trip tests provide good coverage of the generated getters/setters.

High-severity findings (2):

  • [c_base.cc:1066] CopyString does not check malloc return value for NULL before calling memcpy, causing undefined behavior (crash/segfault) on allocation failure.
  • [c_base.cc:1080] CopyStringVector does not check individual strdup return values, producing a partially-valid array with NULL entries on allocation failure.
Full review (click to expand)

Findings

🔴 HIGH

H1. CopyString missing malloc NULL check -- tools/c_api_gen/c_base.cc:1066
  • Issue: CopyString calls malloc(slice.size()) and immediately passes the result to memcpy without checking for NULL. On allocation failure, this is undefined behavior (typically a segfault).
  • Root cause: The function was originally hand-written without the check, and the PR preserves it unchanged. However, the PR description specifically claims "CopyStringVector now null-checks malloc" -- the sister function CopyString was missed.
  • Suggested fix: Add a NULL check: if (result == nullptr) return nullptr; after the malloc call. Callers must also be audited to handle a NULL return.
H2. CopyStringVector missing per-element strdup NULL check -- tools/c_api_gen/c_base.cc:1080
  • Issue: While CopyStringVector correctly checks the outer malloc result, it does not check individual strdup calls. If strdup fails for element N, result[N] will be NULL, and the caller receives a partially-valid array with no indication of which elements failed.
  • Root cause: The NULL check improvement from the PR description only addressed the array-level allocation.
  • Suggested fix: Check each strdup return; on failure, free all previously allocated strings and the array, then return NULL.

🟡 MEDIUM

M1. Missing static_assert for rocksdb_writestallcondition_t reinterpret_cast -- c_api_gen/c_generated_jobinfo_metadata_subset.cc.inc:76
  • Issue: rocksdb_writestallinfo_cur() and _prev() use reinterpret_cast<const rocksdb_writestallcondition_t*>(&info->rep.condition.cur) to cast a WriteStallCondition enum member address to a struct pointer. While the struct rocksdb_writestallcondition_t { WriteStallCondition rep; } should be layout-compatible, there is no static_assert guarding this assumption -- unlike all other reinterpret_cast-ed types in the file which have both sizeof and offsetof assertions.
  • Root cause: The WriteStallCondition wrapper was not included in the static_assert block at lines 667-728.
  • Suggested fix: Add static_assert(sizeof(rocksdb_writestallcondition_t) == sizeof(WriteStallCondition), ...) and static_assert(offsetof(rocksdb_writestallcondition_t, rep) == 0, ...).
M2. Missing static_assert for rocksdb_slice_t to Slice reinterpret_cast -- tools/c_api_gen/c_base.cc:2647
  • Issue: rocksdb_batched_multi_get_cf_slice casts const rocksdb_slice_t* to const Slice* with a comment "they have identical memory layout" but no static_assert to enforce this. While Slice currently has the same {const char*, size_t} layout, a future refactor adding a member would silently break this.
  • Root cause: The cast was added without compile-time verification.
  • Suggested fix: Add static_assert(sizeof(rocksdb_slice_t) == sizeof(Slice), ...) and verify field offsets match.
M3. std::call_once lambda uses unnecessary [&] capture -- table/block_based/block_based_table_factory.cc:1133
  • Issue: The lambda in std::call_once(once, [&]() { ... }) captures by reference, but does not actually use any local variables -- it only calls ObjectLibrary::Default() and RegisterBuiltinTrieIndexFactory. The [&] is misleading and could accidentally capture parameters if the lambda body is modified.
  • Root cause: Copy-paste or coding habit.
  • Suggested fix: Change [&] to [].
M4. check-c-api-gen target missing from .PHONY -- Makefile:811
  • Issue: The check-c-api-gen target is not listed in the .PHONY declaration. If a file named check-c-api-gen were created in the build directory (unlikely but possible), Make would consider the target up-to-date and skip it.
  • Root cause: The .PHONY list was updated to include check-c-api-c_test but not check-c-api-gen.
  • Suggested fix: Add check-c-api-gen to the .PHONY list on line 811.
M5. git config --global --add safe.directory '*' in CI -- .github/workflows/pr-jobs.yml
  • Issue: Setting safe.directory to '*' globally marks ALL directories as safe for git. While this is inside a CI container that is ephemeral, it's an overly broad permission that could mask genuine ownership problems.
  • Root cause: The container checkout has a different user, and the compat checker needs to run git show.
  • Suggested fix: Only mark $GITHUB_WORKSPACE as safe (the first git config line already does this). Remove the '*' line.

🟢 LOW / NIT

L1. Pre-existing: CopyString uses reinterpret_cast for malloc return -- tools/c_api_gen/c_base.cc:1066
  • Issue: reinterpret_cast<char*>(malloc(...)) is used instead of static_cast<char*>(malloc(...)). Both work, but static_cast is idiomatic for void* -> T* conversions.
  • Suggested fix: Use static_cast<char*>.
L2. Enum getter type inconsistency across families -- tools/c_api_gen/auto_simple_bindings.py
  • Issue: The auto-generator uses int for enum getters in option families but uint32_t for listener/job-info families. This is intentional (documented in FamilyConfig.enum_c_type) and preserves ABI, but it creates an inconsistent API surface for new users.
  • Suggested fix: Document this inconsistency in the C API header or the README. No code change needed since changing it would break ABI.
L3. GetMapEntryAt O(n^2) iteration pattern -- tools/c_api_gen/c_base.cc
  • Issue: GetMapEntryAt is O(n) per call due to std::advance on std::map iterators, making sequential iteration O(n^2). The comment already documents this. Used for table_properties map access in CompactionJobInfo.
  • Suggested fix: Already documented; not a hot path (accessed from C API callbacks during compaction events). Consider adding an iterator-based C API in the future if this becomes a bottleneck.
L4. Roundtrip tests only cover objects with parameterless create/destroy
  • Issue: The test generator (gen_roundtrip_tests.py) only creates tests for option objects that have _create() taking no args and a matching _destroy(). Objects like ReadOptions (which has a parameterless create) are covered, but complex objects requiring constructor parameters (e.g., DB, Transaction) are not.
  • Suggested fix: This is expected and documented -- DB operations have separate test coverage in c_test.c. No action needed, but consider documenting the coverage gap.
L5. DECL_RE in check_api_completeness.py may miss #ifdef-guarded declarations
  • Issue: The regex-based declaration scanner doesn't understand preprocessor conditionals. If a function is declared inside #ifdef _WIN32, it would be counted as declared on all platforms but may only be defined conditionally.
  • Suggested fix: Currently not an issue since c.h has no platform-conditional declarations, but worth noting for future changes.

Cross-Component Analysis

Context Code executes? Assumptions hold? Action needed?
WritePreparedTxnDB N/A N/A C API wrappers are thin, no transaction-layer assumptions
ReadOnly DB N/A N/A Wrappers delegate to C++ methods which handle this
User-defined timestamps Yes (via _with_ts functions) Yes Properly wrapped with ts/tslen parameters
FIFO / Universal compaction N/A N/A No compaction-specific logic in wrappers
Concurrent writers N/A N/A Thread safety delegated to C++ layer
Windows / MSVC Yes Yes c.h is pure C with _WIN32 DLL export guards

Positive Observations

  • Comprehensive static_asserts: The PR adds layout assertions for 15+ wrapper structs, validating that reinterpret_cast between C++ types and their C wrapper structs is safe. This is a strong defensive pattern.
  • ABI type-pinning is elegant: Rather than silently changing signatures or maintaining manual overrides scattered throughout the code, the abi_type_overrides.json + static_cast<decltype(field)> pattern centralizes ABI compatibility in one place.
  • Three-layer CI gate (completeness + compatibility + staleness) provides defense-in-depth against common code-generation migration bugs.
  • Exclude-files callback fix: Capturing callback + state by value in the lambda (line 1769) rather than capturing the wrapper pointer is the correct pattern, avoiding a use-after-free if the options object is destroyed before the lambda runs.
  • WalFilter WriteBatch move: Using std::move(c_new_batch.rep) (line 368) instead of copy is both correct and efficient since c_new_batch is a local about to be destroyed.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented Jun 24, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D109149150.

@xingbowang

Copy link
Copy Markdown
Contributor

There is some internal CI failures. I am fixing it. Should be merged in a day or 2.

@meta-codesync meta-codesync Bot closed this in 1cec28d Jun 24, 2026
@meta-codesync meta-codesync Bot added the Merged label Jun 24, 2026
@meta-codesync

meta-codesync Bot commented Jun 24, 2026

Copy link
Copy Markdown

@xingbowang merged this pull request in 1cec28d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants