Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

5
  • 4
    Can you edit your question to include more information about the problem? It is difficult to write an answer to this question with such vague information. Commented Feb 27, 2024 at 15:34
  • 2
    Agree with Greg - need more information to be able to offer suitable advice. Why are you summing individual replica state. Why do other replicas need to know? Bear in mind that you shouldn't be relying on a cache for essential system state. It should have a proper data layer backing it (e.g. a database) - which your jobs can refer to if the data is not in the cache. Commented Feb 27, 2024 at 15:45
  • Sure, I added more context to the question, feel free to ask further questions. Also I'm open to using different technologies or solutions than the ones I stated. Commented Feb 27, 2024 at 17:28
  • Appreciate the edit. Still don’t understand “The problem with this solution is that the system relies upon external components to be hermetic.” What’s that mean? Commented Feb 27, 2024 at 20:51
  • What I meant is that it seems like the proposed solution is not good, because it makes the scheduling system rely on a completely separate system (the component that listens to replica failures) to function correctly, while it is agnostic to it's existence. I might be wrong about this but I think it would be better to design the scheduling system so it could be fault tolerant on it's own. Commented Feb 27, 2024 at 21:26