fix(cron): scale missed-job grace window with schedule frequency#2112
Closed
ticketclosed-wontfix wants to merge 1 commit intoNousResearch:mainfrom
Closed
fix(cron): scale missed-job grace window with schedule frequency#2112ticketclosed-wontfix wants to merge 1 commit intoNousResearch:mainfrom
ticketclosed-wontfix wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Replace hardcoded 120-second grace period with a dynamic window that scales with the job's scheduling frequency (half the period, clamped to [120s, 2h]). Daily jobs now catch up if missed by up to 2 hours instead of being silently skipped after just 2 minutes.
fee97a6 to
8382a7e
Compare
Contributor
|
Merged via #2449. Cherry-picked with authorship preserved. Great fix — the production scenario in the PR description made this easy to evaluate. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replaces the hardcoded 120-second grace window in
get_due_jobs()with a dynamic window that scales with the job's scheduling frequency.Formula:
min(period / 2, 2 hours), floored at 120 seconds.Why
The current hardcoded 120-second threshold silently skips recurring cron jobs that fire even 2 minutes late. For daily jobs this is problematic — a brief gateway reconnect, network blip, or load spike at the scheduled time causes the job to be fast-forwarded to the next day with no retry.
This was hit in production: a ~1-minute gateway reconnect at 07:05 caused a daily briefing (scheduled at
5 7 * * *) to be silently skipped. The other two daily jobs at 07:00 and 07:10 ran fine — only the one whose tick landed during the reconnect was lost.Changes
cron/jobs.py: Added_compute_grace_seconds(schedule)helper that calculates grace as half the schedule period, clamped to [120s, 7200s]. Updatedget_due_jobs()to use it instead of the hardcoded120. Updated the log message to include the grace value for observability.tests/cron/test_jobs.py: Updated two existing tests to match the new dynamic grace behaviour for hourly jobs.How to test
hermes cron create "test" --schedule "0 9 * * *" --name test-dailynext_run_atto 10 minutes ago in~/.hermes/cron/jobs.jsonpython -c "from cron.scheduler import tick; tick()"— the job should fire (within 2h grace)next_run_atto 3 hours ago — the job should be fast-forwarded, not firedPlatforms tested
Note
Happy to adjust the approach if you'd prefer to make the grace window user-configurable (e.g. a per-job
grace_secondsfield or a global config option), or if you'd rather use a different calculation for the grace period. Open to feedback on the formula.