Skip to content

Enterprise-grade C library for atomic, process-shared deduplication using SHA-256 hashing

Notifications You must be signed in to change notification settings

hamkee-dev-group/hkgate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

hkgate

Enterprise-grade C library providing an in-flight de-duplication gate keyed only by SHA-256(body) using process-shared memory (POSIX shm_open + mmap).

Semantics (exact after hash finalization)

Guaranteed

  • After SHA-256 finalization, hk_gate_try_acquire() is atomic across all processes:
    • If the lock is absent (or stale), exactly one caller gets HK_ALLOW and the lock is created/refreshed.
    • While the lock exists and is not stale, all callers for the same hash get HK_DROP.
  • hk_gate_release() removes the lock (idempotent). Next acquire after release is HK_ALLOW unless another process acquires first.
  • Locks older than ttl_seconds are considered stale, recovered, and do not block indefinitely (stale recovery is counted).

Not possible (by design)

  • It is not possible to deterministically drop duplicates before reading and hashing the full body, because the key is content-derived and no client-provided early key exists.
  • Therefore, multiple identical requests may hash concurrently in an unavoidable overlap window before the first finishes hashing.

Architecture

  • Shared memory segment layout:
    • header (magic/version/config + atomic counters)
    • array of shard locks (pthread_mutex_t, PTHREAD_PROCESS_SHARED, robust if available)
    • fixed-capacity hash table entries
  • Hash table:
    • deterministic fixed-capacity open addressing per shard segment (linear probing)
    • no pointers stored in shared memory (only plain structs)
  • Sharding:
    • shard = hash[0] % shard_count
    • each shard owns a contiguous region of capacity_entries / shard_count

Sizing guidance

Shared memory must accommodate:

sizeof(header)
+ shard_count * sizeof(pthread_mutex_t)
+ capacity_entries * sizeof(entry)

Example: shard_count=256, capacity_entries=262144 (1024 entries/shard) uses roughly:

  • mutexes: 256 * ~40 bytes (implementation-dependent)
  • entries: 262144 * 64 bytes (entry struct) ≈ 16 MiB
  • plus header/alignment

In practice, allocate >= 20 MiB for this example.

Operational notes

  • TTL source: wall-clock seconds (time(NULL) / caller-provided now_sec). If the wall clock jumps, TTL behavior follows the jump.
  • Crash safety: uses robust mutexes where supported (PTHREAD_MUTEX_ROBUST). If a process dies while holding a shard lock, the next locker recovers (EOWNERDEAD + pthread_mutex_consistent), increments error_count, and continues.
    • If robust mutexes are not available, build falls back to non-robust; deadlock freedom after owner death is then not guaranteed.
  • Stale recovery: on lookup during try_acquire, if an entry is older than TTL it is deleted and the new acquire is allowed; stale_recovered_count increments.

Build

Dependencies:

  • OpenSSL development headers/libs (libssl, libcrypto)
  • pthread
make
make test

Optional sanitizer builds:

make asan
# ThreadSanitizer generally conflicts with robust process-shared mutexes; provided for best-effort only.
make tsan

Examples

  • examples/demo_multiproc.c – simple multi-process acquire behavior
  • examples/bench.c – multi-process stress/throughput reporting

API

See include/hkgate.h.

Minimal flow:

  1. Stream request body into SHA-256 context via hk_sha256_init/update/final.
  2. Call hk_gate_try_acquire() with the finalized hash.
  3. If HK_ALLOW, process request; when complete, an external agent calls hk_gate_release() for that hash.

About

Enterprise-grade C library for atomic, process-shared deduplication using SHA-256 hashing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published