2

We are designing a coupon system with 300 request per second on redemptions, where updating global and per-user coupon counters directly in database within a transaction causes some contention and latency,

e.g. - Updating counters such as offer_budget seems to be bottleneck as all users might be claiming the same offer causing serialisation/locking while using transactions for safe updates, however user_id_coupoun_id counters are not really the problem as it's unlikely that same user with same offer will redeem at same time. Max latency expectation on client is 200-300ms on one redeem call here.

Because of this, our design is -

  1. Enforce coupon budgets (global and per-user limits) using Redis counters for low-latency checks and updates.
  2. Write each coupon redemption as an immutable log record in the database.
  3. On step 2 success we considered redemption as success. // return success to client here.
  4. Periodically we reconcile/sync counters back to the database using background jobs by scanning our log table since last checkpoint (checkpoint - we keep checkpoint for the record we have scanned the table up untill now, so that we don't have to scan entire table to build counters everytime).

This works for redemptions, but refunds introduce complexity.

On order cancellation or refund, what is the correct and safest approach to handle coupon refunds ? Specifically, how should systems increase the global offer budget and per-user (user_id, offer_id) counters and log these changes?

Should refunds: Increment Redis counters back and write a refund log, OR Move both redemption & refund logic entirely to database transactions (updating counters and logs atomically), accepting higher latency but relying on strong consistency—assuming ~300 request per second ?

In practice, what approach do large-scale systems use to balance safety, latency, and correctness in this case?

3
  • Correctness and latency seem to be irrelevant for refunds. One user can not predict or rely on refunds done by another, so there is no way for them to notice an invalid rejection of coupon, if some fedunds are "in progress". Commented Dec 21, 2025 at 13:12
  • @Basilevs What approach should we take for refunds to inc counter back. Same as we are doing in redeem flow? Commented Dec 21, 2025 at 14:41
  • We can't do transaction update by doing increment on budget(and user offer) + log for refunds. Because background worker is also running which is rolling back the logs to counter so budget might get inc twice here. Commented Dec 21, 2025 at 15:09

1 Answer 1

1

Its hard to work out your exact business logic here. I'm going to assume

  1. You print and distribute lots of identical coupons. ie 10% product X
  2. Customers are limited to n of the same type of coupon
  3. types of coupons are limited to m total usages. ie although you printed 100,000 coupons, you will only redeem say 100.

You already have a working solution for these counters, but it doesn't take into account refunds.

In which case the simplest option would be to double up and have refund counters for the same key. Then instead of checking used < limit, you just check used - refunds < limit

In practice, my experience is that with refunds, companies are much more worried about fraud than whether the marketing budget is a few percent under or over.

I would expect them to use atomic transactions to ensure coupons cannot be reused, or exceed whatever limit's have been set on a per customer basis

But not be overly worried about going over or under the total usage. So an end of day, or once per X transactions check on the total coupons used would be fine.

Your redis solution seems good, but might be overkill given memory prices.


Additional Clarification.

I have seen a similar problem at a large retailer. Stock Levels. In this case the retailer has an ecommerce website. The requirement is to put "Low Stock" warnings/adverts against products that have less than X items left in stock. When there are zero items left the product should no longer be displayed.

Because the stock levels change with each purchase and need to be queried for every view of the webpage this would require the website to be regenerated from the database on every view. Rather than be cached or loaded from a CDN. Which would break the website under load.

Updating the products stock level with a purchase is not an issue. A purchase requires server side and database activity anyway and purchases and much less frequent than page views, so the purchase pages are not cached.

The solution for the product view page is to check the stock levels on cache refresh only.

This means that some products which should not be shown, or should be shown with the "low stock" tag are incorrectly shown, or shown without the tag.

Also it means some purchases will attempt to buy products which have no stock. These purchases will fail "sorry we have run out of that item"

However, it requires no further infrastructure ie. its cheap. and it scales.

Sometimes product which the website thinks are out of stock, will have had returns or cancelled orders. So sometimes sales are gained from the incorrect display.

When a sale is unable to complete, the customer can be offered an alternative, whereas if they couldn't find the product at all,the whole basket may have been lost as they would go elsewhere.

This solution keeps the static cache requirement for the view page, so it scales much more cheaply than your redis cache, with negligible downsides

21
  • Hey @Ewan, makes sense—we could maintain separate refund counters alongside redemption counters. In redeem, we’d validate used − refunds < limit, and in refund, ensure used − refunds ≥ 0, with async rollups from logs to DB. However, this doubles Redis counter cardinality (e.g., user_offer_id, offer_budget), which can be expensive. When you say the Redis approach might be overkill, do you mean it’s better to handle both redeem and refund flows purely via DB transactions? Would MySQL/PostgreSQL realistically handle ~200 RPS safely here? Commented Dec 21, 2025 at 19:10
  • I mean to say survive latency under 200-300ms with 300 request per second contention on any transaction db like mysql/postgresql ? Commented Dec 21, 2025 at 19:17
  • Lets say for whatever reason using the db naively was a bottleneck. You could have a local memory cache, which rolls up to the db every 1000 coupons used. Now each box only read/writes once every 3 seconds. You might go over the limit by worse case number of nodes * 1000, but, you on the plus side you can not pay for reddis. Maybe you even cancel orders after placement if they have gone over the budget if its a big deal Commented Dec 21, 2025 at 19:43
  • We were also thinking if we separate counters into 2 buckets low contention (like user_offer_id) and high contention(like offer_budget) and update low contention counters in transaction and high contention with cache counters & background sync them from logs to DB. This can help us to save cache memory as high cardinality counters like user_offer_id can be read from DB directly to perform redemption / refunds, how ever low cardinality counters like offer_budget (as at any point in time total offers can be limited only in number) can stay in cache and operated there for redemption / refund. Commented Dec 21, 2025 at 19:56
  • 1
    I'd have to see the whole code I guess. but it sounds like you will have a race condition to me. Customer A starts purchase,counter check ok, counter++, Customer B starts purchase, counter check fail, Customer A cancels/fails purchase. counter is now incorrect and you lost Customer Bs sale. Commented Dec 26, 2025 at 18:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.