inputs
Ok, you're handing out inexpensive coupon cards,
which might have a guid QR code printed on them.
A card might be displayed in a store window,
where many users could access it.
Devices, like Android phones, have GUIDs.
It's not quite clear if you have a user_device_id, or
a user_id with some many-to-many relationship with devices.
For simplicity I will assume the former.
So we have several entity IDs having per-ID quota limits:
I assume there is a distribution over entity activity,
perhaps Zipfian, where some entities are far more popular
or active than other entities.
In addition to quota values, each entity should also
store a recently seen rate of redemption events,
to help us assess risk of imminent quota violation.
The fact that some offers are more popular than others,
and can be handled differently,
is essential to keeping the "at risk" query rate down.
fast path
Independent of the backend technology you choose for this
(kafka, redis, rdbms), I recommend you apply a boolean label
to each entity: "fast" vs "slow" path for processing.
The fast-path entities are "boring"; they are far from
their limit so there is little danger of soon exceeding their quotas.
audit
Each redemption transaction event
shall be appended to a common event log.
This is easy to do, cheaply and reliably,
even during a network partition.
We append to a local log, which later will show up
in an eventually consistent log.
Thanks to transaction guids, logging an event is idempotent.
Such logging is
highly available.
At "slow" intervals (every 100ms, every 10s, whatever)
those events will be rolled up into counters.
And we will evaluate the related entity to see if
it is in danger of "soon" violating any quota.
If so, we update that boolean attribute,
turning it into a slow-path entity.
rate of change
So to recap, our sharded backend infrastructure is publishing
a highly available mapping from entity ID to a boolean attribute.
For most entities it reports "fast-path".
The update rate, promoting from fast- to slow-path, is very low.
chasing 9's
Depending on your backend stability, and where you set
the "transition to slow path" thresholds, it is
possible for a partition to happen, a flash mob to arrive,
and then some entity gets hammered on both sides of
the partition, violating its quota. The
CAP theorem
tells us: that's life. Pick your tradeoffs.
Fortunately the quality of the logged records is very good,
so you can monitor quota violations and intelligently
match thresholds and business goals against what your
infrastructure is actually observed to do.
slow path
All nodes have access to (a possibly stale) entity --> path mapping.
If the entity is marked "slow path", then we forward to
a smallish number of specialized nodes which do the
transaction "carefully", meaning they acquire locks.
Personally, I would choose a postgres RDBMS to do that,
but there's lots of other ways to acquire distributed locks.
time passes
The OP mentioned a "time range" associated with each entity.
So a fast-path entity that heated up may be promoted
to slow-path, then time passes and we see the entity has
cooled down, so we COMMIT an rdbms transaction and
then make a separate update to mark the entity as fast-path.
There's hysteresis on the entity's thresholds,
to ensure the write rate for such path transitions is low.
leases
Think of fast-path as an optimistic lock for an entity,
which grants all worker nodes permission to log redemption events
against that entity.
If a proposed redemption involves any slow-path entity,
then the request is forwarded to a smaller set of nodes
where we make different CAP tradeoffs, in the interest of safety and consistency.