0

We maintain counters in a relational database where each incoming request increments or decrements a counter, subject to a maximum limit.

Throughput is 300 requests per second, and many (or all) requests may target the same counter.

For correctness, each request currently performs the following synchronously in a single transaction -

  • Read the counter to check whether it can be incremented further.
  • Insert a log entry representing the delta.
  • Update (increment/decrement) the counter row.

The main issue is high contention: when many transactions update the same counter row, row-level locks cause updates to serialize, leading to increased latency.

Counters - These counters are basically budget (max amount) that a running offer can be used for, if it exceeds beyond that, we would want to stop showing offer to users.

Can using more advanced transactional techniques—such as Serializable Snapshot Isolation (SSI) or other MVCC-based approaches—significantly reduce contention and improve throughput for this kind of highly contended counter update, while still keeping the updates synchronous in the same transaction?

29
  • 1
    I highly doubt you can meaningfuly optimize such simple operation, while preserving strong consistency. Typical real time systems simply drop this requirement, and move to either eventual consistency, or even allow inconsistencies. But to fully understand what can be done more info is required. What this "counter" and "delta" really are? Commented Dec 27, 2025 at 21:24
  • Added counter use-case @freakish Commented Dec 27, 2025 at 21:33
  • Right, so one way is to move this data out of DB to some custom server and process the data in memory. And then push the results to the DB through a background worker. This is the eventual consistency way I've talked about. Another way is to completely get rid of transaction, and allow inconsistencies, i.e. going above limits. This is what many fininacial processors allow, because such situations are simply very rare. And it is cheaper to handle such cases manually. Commented Dec 27, 2025 at 21:44
  • Decrementing counters atomically is easy and efficient. The problem is that the logging step turns this single atomic update into a much longer transaction. How to resolve this is going to depend on your business requirements, but could involve unconditionally logging the intention to decrement the counter (compare: event sourcing), or adding the log in a separate transaction after updating the counter succeeded and has been committed. Commented Dec 28, 2025 at 7:25
  • So it's a bounded counter @amon. Hence we first check if counter can further be reduced on not, business use-case is read the occurances first (counter current value) and check if they are allowed, if yes then we will make a log to consider it as a success(we will have to record at what time what counter is dec by which user), and then we update the counter - inc/dec in a transaction. So that parallel request can't be in race and both read operation as allowed and make a write success and counter exceeds then it possibility could. We were thinking if using SSI isolation can remove the locking. Commented Dec 28, 2025 at 8:38

1 Answer 1

1

300 increments (updates) a second sustained would suggest an extremely heavily-used website!

Given that this controls the continued presentation of an offer to the users and is updated on each claim of the offer, my suggestion would be to crudely meter the number of concurrent presentations so as to reduce the average contention on this field - for example, only attempt to present the offer when the current time is evenly divisible by 10 seconds, or something like that.

Also, if you aren't willing to potentially present the offer then yank it away when it turns out the limit has been reached, then accepting some degree of softness on the limit would be best.

Making the workings of an offer fit the apparent limitations of the technology at hand, is usually the far more sane business solution.

10
  • Rejecting a few orders would be a better user experience than overselling or retracting a discount. Users cannot find out if order was rejected legitimately, so they can not complain. Commented Dec 28, 2025 at 10:49
  • That correct @Basilevs, we plan to reject the order, we will handle it gracefully only on UI part (something like offer is not available anymore please retry, etc), but that's better than later cancel it or overselling it. We were thinking if inc operations on cache would be faster, which can act as a gatekeeper here, and in background we can have a worker scanning logs and updating counters something like that, would that sound feasible approach ? Commented Dec 28, 2025 at 15:59
  • @Steve the idea here is if SSI or any other advanced technique can really help with contention here. Or we have to update the counters in the background only in batching by using some sort of workers, etc Commented Dec 28, 2025 at 16:00
  • @tusharRawat whatever caching your are trying to do DBMS already has one better. With and without replication. Do not attempt. Commented Dec 28, 2025 at 16:06
  • @Basilevs, DBMS might have caching but on update it must acquire locks correct ? Having 300RPS to update the same counter without any race condition, will serialise all of our requests, and there we have a chance of increase tail latencies and request on DB will be queued for update, and this is the problem here. Commented Dec 28, 2025 at 16:24

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.