Skip to content

feat(write_thread): add writer state transition validation guards#14904

Open
darion-yaphet wants to merge 1 commit into
facebook:mainfrom
darion-yaphet:write-thread-state-transition-guards
Open

feat(write_thread): add writer state transition validation guards#14904
darion-yaphet wants to merge 1 commit into
facebook:mainfrom
darion-yaphet:write-thread-state-transition-guards

Conversation

@darion-yaphet

@darion-yaphet darion-yaphet commented Jul 1, 2026

Copy link
Copy Markdown

Summary

Every writer state transition in WriteThread is funneled through SetState(), but until now the set of legal transitions was only implicit — encoded indirectly in the control flow of JoinBatchGroup, ExitAsBatchGroupLeader, the parallel-memtable path, and a handful of scattered comments. There was no single place that stated "a writer in state X may only move to Y or Z," which makes the lock-free handoff protocol harder to reason about and easy to break silently during refactors. A bad transition would not fail loudly; it would just corrupt the writer state machine and surface later as a hang or a mysterious assertion far from the root cause.

This PR makes the writer state machine explicit and self-checking, without changing any runtime behavior.

  • Add IsValidStateTransition(old_state, new_state), a static helper that encodes the allowed writer state transitions as executable rules. It rejects no-op transitions (old == new), rejects publishing the internal-only states (STATE_INIT, STATE_LOCKED_WAITING) as a destination, and otherwise enforces the per-state transition table via a switch on old_state.
  • Assert on it inside SetState() on both paths: once before the fast atomic-publish compare_exchange_strong, and again on the STATE_LOCKED_WAITING slow path after the observed state has been re-read. The slow-path check matters because the waker may overwrite the observed state between the initial relaxed load and taking the mutex.
  • Document the full transition table directly in write_thread.h, next to the state enum, including the special role of STATE_LOCKED_WAITING as a temporary wait marker installed by BlockingAwaitState(). Because that marker hides the writer's prior state from the waker, SetState() accepts any normal wake-up state out of it — the comment now spells this out so the asymmetry is not mistaken for a bug.

Why debug-only

IsValidStateTransition() is only ever called from assert(), so it compiles out entirely in release builds (NDEBUG / -fno-rtti). SetState() is on the hot write path, hit thousands of times per second under load, so paying zero cost in production is deliberate. The value is concentrated where it belongs: debug, ASAN/TSAN, and stress builds, where an illegal transition now fails immediately at the offending call site instead of much later.

What this does not change

  • No change to the lock-free handoff protocol, memory ordering, or the CAS in SetState().
  • No change to release-build behavior — the guard is purely an assertion.
  • No API or option surface change; IsValidStateTransition() is a private static helper.

Test Plan

  • make check (debug) passes.
  • ASSERT_STATUS_CHECKED=1 make check passes.
  • COERCE_CONTEXT_SWITCH=1 stress runs of the write-path tests (db_write_test, write_batch_test, db_test) to drive the new assertions under contention and exercise the STATE_LOCKED_WAITING slow path.
  • Confirmed release build is unaffected: the helper is fully optimized out when NDEBUG is set.
Introduce IsValidStateTransition(), a static helper that encodes the
writer state machine as executable rules, and assert on it in SetState()
on both the fast atomic-publish path and the STATE_LOCKED_WAITING slow
path. This catches illegal state transitions in debug and stress builds
while leaving the lock-free handoff protocol and release-build behavior
unchanged. Document the full set of allowed transitions and the
STATE_LOCKED_WAITING wait-marker semantics in the header next to the
state enum.
@meta-cla meta-cla Bot added the CLA Signed label Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant