Timeline for answer to Handling high-throughput counter updates without contention by candied_orange
Current License: CC BY-SA 4.0
Post Revisions
15 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jan 4 at 8:26 | comment | added | tusharRawat | Agree @candied_orange, with one worker also we can have this problem. I tried to solve it in another way, please do check follow up question - softwareengineering.stackexchange.com/questions/460659/… Let me know your thoughts. | |
| Dec 31, 2025 at 19:55 | comment | added | candied_orange | @tusharRawat when you’re talking about workers going down the problem is temporal contention. You need to know how far though the log it got so whatever comes back up doesn’t drop or dupe. But if you think about it, that’s exactly the same problem you had when there was only one aggregating worker. So long as you don’t change the allocated territory a worker can go down, come back up, and use whatever trick you already were using to restart where it left off. Same as when you only had one worker reading the log. It’s just a little behind the others now. | |
| Dec 31, 2025 at 13:15 | comment | added | tusharRawat | @Basilevs, because of contention only we do aggregating counters in the background itself, instead of doing it in every request which causes contention, on request we try to just log that change and return success, and then on background our workers are running to derive the counters from the logs itself, now the question is with a set of workers running parallely scanning logs and aggregating counters at runtime, avoid 2 or more workers picking same record to process, which I think candied is trying to answer here. | |
| Dec 31, 2025 at 7:28 | comment | added | Basilevs | Op has mentioned, that contention happens on a single counter, so aggregator isolation does nothing. | |
| Dec 31, 2025 at 6:53 | comment | added | tusharRawat | Agree, but even if we restrict the workers lets say to 4, and each have a range assigned. If 1 worker goes down, and another comes back up, how would we make sure the new got attached to the old range here and the old one should not come back up and say it had the territory as before, basically if one worker has acquired the territory at the same time another worker should not say that I have the same territory. I am considering workers are just our app servers here - like in this case 4 machines running. | |
| Dec 31, 2025 at 1:01 | comment | added | candied_orange | @tusharRawat The whole point of marking out territory for the workers ahead of time is to entirely prevent the possibility of 2 workers aggregating the same counter. The need for an atomic operation has been narrowed down to when the number of workers change, and with it their ranges of territory. You must do that safely. But you don't need to do that often. | |
| Dec 30, 2025 at 19:20 | comment | added | tusharRawat | hm, make sense we will probably need some checkpoint also on our Aggregators table right ? Also in case if we are lagging behind we can inc workers would still be something we will have to carefully do, as we are doing inc/dec operation here, which is not idempotent, 2 workers aggregating same counter at same time will be wrong. | |
| Dec 30, 2025 at 13:45 | history | edited | candied_orange | CC BY-SA 4.0 |
added 47 characters in body
|
| Dec 30, 2025 at 13:43 | comment | added | candied_orange | @tusharRawat Added some tables myself. | |
| Dec 30, 2025 at 13:36 | history | edited | candied_orange | CC BY-SA 4.0 |
adding tables
|
| Dec 30, 2025 at 13:31 | history | edited | candied_orange | CC BY-SA 4.0 |
adding tables
|
| Dec 29, 2025 at 11:20 | history | edited | candied_orange | CC BY-SA 4.0 |
added 240 characters in body
|
| Dec 27, 2025 at 17:39 | comment | added | tusharRawat | Added the log table structure in the question. | |
| Dec 27, 2025 at 8:29 | comment | added | tusharRawat | Correct, the challenging part we are trying to figure out is making sure at any point in time if different workers are doing the aggregation from same table, they don't pick the same counters to do the aggregation, even if they do only one worker should be able to proceed for that turn and others who picked same counter should be rejected/retry the process, want strong guarantees that different aggregation workers work on disjoint set. | |
| Dec 26, 2025 at 23:38 | history | answered | candied_orange | CC BY-SA 4.0 |