[components-api] Intermittent internal API failures / retry internal requests
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Run a deploy

What happens?:

Sometimes the internal API calls fail e.g.

Deployment ID: 20250828-130049-x5ucaa3ign
Created: 20250828-130049
Status: failed
Long status: 
  Got exception: Failed run for component redis: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)

Builds:
  add-dangling-edits-to-group(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-edits-to-queue(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reported-edits(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reviews-from-huggle(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reviews-from-report(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  celery-flower(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  celery-worker(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  cleanup-user-records(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  cluebotng-reviewer(successful): id:cluebotng-review-buildpacks-pipelinerun-h2ppb You can see the logs with `toolforge build logs cluebotng-review-buildpacks-pipelinerun-h2ppb`
  export-statistics(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  grafana-alloy(skipped): id:cluebotng-review-buildpacks-pipelinerun-frx2d Reusing existing build
  grant-review-access-from-wikipedia-rights(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  import-training-data(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  mark-edits-as-deleted(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  mark-edits-as-having-data(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  redis(successful): id:cluebotng-review-buildpacks-pipelinerun-5n759 You can see the logs with `toolforge build logs cluebotng-review-buildpacks-pipelinerun-5n759`
  update-edit-classifications(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer

Runs:
  add-dangling-edits-to-group(successful): [info] (Job add-dangling-edits-to-group is already up to date)
  add-edits-to-queue(successful): [info] (Job add-edits-to-queue is already up to date)
  add-reported-edits(successful): [info] (Job add-reported-edits is already up to date)
  add-reviews-from-huggle(successful): [info] (Job add-reviews-from-huggle is already up to date)
  add-reviews-from-report(successful): [info] (Job add-reviews-from-report is already up to date)
  celery-flower(successful): [info] (Job celery-flower is already up to date)
  celery-worker(successful): [info] (Job celery-worker is already up to date)
  cleanup-user-records(successful): [info] (Job cleanup-user-records is already up to date)
  cluebotng-reviewer(successful): [info] (Job cluebotng-reviewer created)
  export-statistics(successful): [info] (Job export-statistics is already up to date)
  grafana-alloy(successful): [info] (Job grafana-alloy is already up to date)
  grant-review-access-from-wikipedia-rights(successful): [info] (Job grant-review-access-from-wikipedia-rights is already up to date)
  import-training-data(successful): [info] (Job import-training-data is already up to date)
  mark-edits-as-deleted(successful): [info] (Job mark-edits-as-deleted is already up to date)
  mark-edits-as-having-data(successful): [info] (Job mark-edits-as-having-data is already up to date)
  redis(failed): HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)
  update-edit-classifications(skipped): Skipped due to previous failure
Deployment ID: 20250828-105932-qebopov37q
Created: 20250828-105932
Status: failed
Long status: 
  Got exception: Failed run for component update-edit-classifications: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)

Builds:
  add-dangling-edits-to-group(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-edits-to-queue(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reported-edits(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reviews-from-huggle(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  add-reviews-from-report(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  celery-flower(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  celery-worker(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  cleanup-user-records(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  cluebotng-reviewer(successful): id:cluebotng-review-buildpacks-pipelinerun-z9gk5 You can see the logs with `toolforge build logs cluebotng-review-buildpacks-pipelinerun-z9gk5`
  export-statistics(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  grafana-alloy(skipped): id:cluebotng-review-buildpacks-pipelinerun-vphm9 Reusing existing build
  grant-review-access-from-wikipedia-rights(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  import-training-data(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  irc-relay(skipped): id:cluebotng-review-buildpacks-pipelinerun-28bqc Reusing existing build
  mark-edits-as-deleted(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  mark-edits-as-having-data(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer
  redis(successful): id:cluebotng-review-buildpacks-pipelinerun-hr5fw You can see the logs with `toolforge build logs cluebotng-review-buildpacks-pipelinerun-hr5fw`
  update-edit-classifications(skipped): id:no-build-needed Component re-uses build from cluebotng-reviewer

Runs:
  add-dangling-edits-to-group(successful): [info] (Job add-dangling-edits-to-group updated)
  add-edits-to-queue(successful): [info] (Job add-edits-to-queue updated)
  add-reported-edits(successful): [info] (Job add-reported-edits updated)
  add-reviews-from-huggle(successful): [info] (Job add-reviews-from-huggle updated)
  add-reviews-from-report(successful): [info] (Job add-reviews-from-report updated)
  celery-flower(successful): [info] (Job celery-flower updated)
  celery-worker(successful): [info] (Job celery-worker updated)
  cleanup-user-records(successful): [info] (Job cleanup-user-records updated)
  cluebotng-reviewer(successful): [info] (Job cluebotng-reviewer created)
  export-statistics(successful): [info] (Job export-statistics updated)
  grafana-alloy(successful): [info] (Job grafana-alloy updated)
  grant-review-access-from-wikipedia-rights(successful): [info] (Job grant-review-access-from-wikipedia-rights updated)
  import-training-data(successful): [info] (Job import-training-data updated)
  irc-relay(successful): [info] (Job irc-relay updated)
  mark-edits-as-deleted(successful): [info] (Job mark-edits-as-deleted updated)
  mark-edits-as-having-data(successful): [info] (Job mark-edits-as-having-data updated)
  redis(successful): [info] (Job redis created)
  update-edit-classifications(failed): HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)

There is very little that can be done as the end consumer other than to re-trigger the deployment, consuming time and resources.

What should have happened instead?:

Internal API calls should be re-tried with a backoff allowing for transient issues.

The timeout in ToolforgeClient was bumped under T376710 so it appears the jobs api is not finishing the response within 20 seconds.

Investigation needs to be done as to if this is a slow path/regression or the read timeout needs adjusting.

Details

TitleReferenceAuthorSource BranchDest Branch
_do_run - retry runtime errorsrepos/cloud/toolforge/components-api!139damianfeature/retry-jobs-http-errorsmain
delete_job_if_exists -> delete_jobrepos/cloud/toolforge/components-api!138damianfeature/retry-jobsmain
Customize query in GitLab

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DamianZaremba renamed this task from [components-api] retry internal api requests on failure to [components-api] Intermittent internal API failures / retry internal requests.Aug 28 2025, 1:44 PM
fgiunchedi triaged this task as Medium priority.Sep 1 2025, 2:36 PM

Another example in production

{
    "deploy_id": "20250916-145825-hmaalsrpe6",
    "creation_time": "20250916-145825",
    "builds": {
        "add-dangling-edits-to-group": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-edits-to-queue": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reported-edits": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reviews-from-huggle": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reviews-from-report": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "celery-flower": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "celery-worker": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "cleanup-user-records": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "cluebotng-reviewer": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-vvzs7",
            "build_status": "successful",
            "build_long_status": "You can see the logs with `toolforge build logs cluebotng-review-buildpacks-pipelinerun-vvzs7`"
        },
        "export-statistics": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "grafana-alloy": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-zfpzc",
            "build_status": "skipped",
            "build_long_status": "Reusing existing build"
        },
        "grant-review-access-from-wikipedia-rights": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "import-training-data": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "mark-edits-as-deleted": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "mark-edits-as-having-data": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "redis": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-ppv5j",
            "build_status": "skipped",
            "build_long_status": "Reusing existing build"
        },
        "update-edit-classifications": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        }
    },
    "runs": {
        "add-dangling-edits-to-group": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-dangling-edits-to-group is already up to date)"
        },
        "add-edits-to-queue": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-edits-to-queue is already up to date)"
        },
        "add-reported-edits": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reported-edits is already up to date)"
        },
        "add-reviews-from-huggle": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reviews-from-huggle is already up to date)"
        },
        "add-reviews-from-report": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reviews-from-report is already up to date)"
        },
        "celery-flower": {
            "run_status": "successful",
            "run_long_status": "[info] (Job celery-flower is already up to date)"
        },
        "celery-worker": {
            "run_status": "successful",
            "run_long_status": "[info] (Job celery-worker is already up to date)"
        },
        "cleanup-user-records": {
            "run_status": "successful",
            "run_long_status": "[info] (Job cleanup-user-records is already up to date)"
        },
        "cluebotng-reviewer": {
            "run_status": "failed",
            "run_long_status": "HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)"
        },
        "export-statistics": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "grafana-alloy": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "grant-review-access-from-wikipedia-rights": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "import-training-data": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "mark-edits-as-deleted": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "mark-edits-as-having-data": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "redis": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        },
        "update-edit-classifications": {
            "run_status": "skipped",
            "run_long_status": "Skipped due to previous failure"
        }
    },
    "tool_config": {
        "config_version": "v1beta1",
        "source_url": null,
        "components": {
            "add-dangling-edits-to-group": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_dangling_edits_to_group",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 21 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-edits-to-queue": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_edits_to_queue",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 6 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reported-edits": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reported_edits",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "55 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reviews-from-huggle": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reviews_from_huggle",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "23 */2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reviews-from-report": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reviews_from_report",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "15 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "celery-flower": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "run-flower",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 5555,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": "/healthcheck"
                },
                "component_type": "continuous"
            },
            "celery-worker": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "run-celery",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": null,
                    "replicas": 2,
                    "health_check_script": null,
                    "health_check_http": null
                },
                "component_type": "continuous"
            },
            "cleanup-user-records": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py cleanup_user_records",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 1 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "cluebotng-reviewer": {
                "build": {
                    "repository": "https://github.com/cluebotng/reviewer.git",
                    "ref": "refs/tags/v0.9.1",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "web",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 8000,
                    "replicas": 2,
                    "health_check_script": null,
                    "health_check_http": "/internal/health/"
                },
                "component_type": "continuous"
            },
            "export-statistics": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py export_statistics",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 9 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "grafana-alloy": {
                "build": {
                    "repository": "https://github.com/cluebotng/external-grafana-alloy.git",
                    "ref": "refs/tags/v0.2.8",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "run-alloy",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 8118,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": "/health"
                },
                "component_type": "continuous"
            },
            "grant-review-access-from-wikipedia-rights": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py grant_review_access_from_rights",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "27 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "import-training-data": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py import_training_data",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "15 2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "mark-edits-as-deleted": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py mark_edits_as_deleted",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 4 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "mark-edits-as-having-data": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py mark_edits_with_training_data",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 3 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "redis": {
                "build": {
                    "repository": "https://github.com/cluebotng/external-redis.git",
                    "ref": "main",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "redis-server",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": "all",
                    "port": 6379,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": null
                },
                "component_type": "continuous"
            },
            "update-edit-classifications": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py update_edit_classification",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "30 */2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            }
        }
    },
    "status": "failed",
    "long_status": "Got exception: Failed run for component cluebotng-reviewer: HTTPSConnectionPool(host='api.svc.tools.eqiad1.wikimedia.cloud', port=30003): Read timed out. (read timeout=20)",
    "force_build": false,
    "force_run": false
}

Another deployment creation fixed things....

{
    "deploy_id": "20250916-151215-ln6nqx2jiw",
    "creation_time": "20250916-151215",
    "builds": {
        "add-dangling-edits-to-group": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-edits-to-queue": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reported-edits": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reviews-from-huggle": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "add-reviews-from-report": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "celery-flower": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "celery-worker": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "cleanup-user-records": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "cluebotng-reviewer": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-vvzs7",
            "build_status": "skipped",
            "build_long_status": "Reusing existing build"
        },
        "export-statistics": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "grafana-alloy": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-zfpzc",
            "build_status": "skipped",
            "build_long_status": "Reusing existing build"
        },
        "grant-review-access-from-wikipedia-rights": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "import-training-data": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "mark-edits-as-deleted": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "mark-edits-as-having-data": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        },
        "redis": {
            "build_id": "cluebotng-review-buildpacks-pipelinerun-ppv5j",
            "build_status": "skipped",
            "build_long_status": "Reusing existing build"
        },
        "update-edit-classifications": {
            "build_id": "no-build-needed",
            "build_status": "skipped",
            "build_long_status": "Component re-uses build from cluebotng-reviewer"
        }
    },
    "runs": {
        "add-dangling-edits-to-group": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-dangling-edits-to-group is already up to date)"
        },
        "add-edits-to-queue": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-edits-to-queue is already up to date)"
        },
        "add-reported-edits": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reported-edits is already up to date)"
        },
        "add-reviews-from-huggle": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reviews-from-huggle is already up to date)"
        },
        "add-reviews-from-report": {
            "run_status": "successful",
            "run_long_status": "[info] (Job add-reviews-from-report is already up to date)"
        },
        "celery-flower": {
            "run_status": "successful",
            "run_long_status": "[info] (Job celery-flower is already up to date)"
        },
        "celery-worker": {
            "run_status": "successful",
            "run_long_status": "[info] (Job celery-worker is already up to date)"
        },
        "cleanup-user-records": {
            "run_status": "successful",
            "run_long_status": "[info] (Job cleanup-user-records is already up to date)"
        },
        "cluebotng-reviewer": {
            "run_status": "successful",
            "run_long_status": "[info] (Job cluebotng-reviewer created)"
        },
        "export-statistics": {
            "run_status": "successful",
            "run_long_status": "[info] (Job export-statistics is already up to date)"
        },
        "grafana-alloy": {
            "run_status": "successful",
            "run_long_status": "[info] (Job grafana-alloy is already up to date)"
        },
        "grant-review-access-from-wikipedia-rights": {
            "run_status": "successful",
            "run_long_status": "[info] (Job grant-review-access-from-wikipedia-rights is already up to date)"
        },
        "import-training-data": {
            "run_status": "successful",
            "run_long_status": "[info] (Job import-training-data is already up to date)"
        },
        "mark-edits-as-deleted": {
            "run_status": "successful",
            "run_long_status": "[info] (Job mark-edits-as-deleted is already up to date)"
        },
        "mark-edits-as-having-data": {
            "run_status": "successful",
            "run_long_status": "[info] (Job mark-edits-as-having-data is already up to date)"
        },
        "redis": {
            "run_status": "successful",
            "run_long_status": "[info] (Job redis is already up to date)"
        },
        "update-edit-classifications": {
            "run_status": "successful",
            "run_long_status": "[info] (Job update-edit-classifications is already up to date)"
        }
    },
    "tool_config": {
        "config_version": "v1beta1",
        "source_url": null,
        "components": {
            "add-dangling-edits-to-group": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_dangling_edits_to_group",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 21 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-edits-to-queue": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_edits_to_queue",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 6 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reported-edits": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reported_edits",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "55 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reviews-from-huggle": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reviews_from_huggle",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "23 */2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "add-reviews-from-report": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py add_reviews_from_report",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "15 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "celery-flower": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "run-flower",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 5555,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": "/healthcheck"
                },
                "component_type": "continuous"
            },
            "celery-worker": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "run-celery",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": null,
                    "replicas": 2,
                    "health_check_script": null,
                    "health_check_http": null
                },
                "component_type": "continuous"
            },
            "cleanup-user-records": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py cleanup_user_records",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 1 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "cluebotng-reviewer": {
                "build": {
                    "repository": "https://github.com/cluebotng/reviewer.git",
                    "ref": "refs/tags/v0.9.1",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "web",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 8000,
                    "replicas": 2,
                    "health_check_script": null,
                    "health_check_http": "/internal/health/"
                },
                "component_type": "continuous"
            },
            "export-statistics": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py export_statistics",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 9 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "grafana-alloy": {
                "build": {
                    "repository": "https://github.com/cluebotng/external-grafana-alloy.git",
                    "ref": "refs/tags/v0.2.8",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "run-alloy",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "port": 8118,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": "/health"
                },
                "component_type": "continuous"
            },
            "grant-review-access-from-wikipedia-rights": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py grant_review_access_from_rights",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "27 * * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "import-training-data": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py import_training_data",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "15 2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "mark-edits-as-deleted": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py mark_edits_as_deleted",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 4 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "mark-edits-as-having-data": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py mark_edits_with_training_data",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "13 3 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            },
            "redis": {
                "build": {
                    "repository": "https://github.com/cluebotng/external-redis.git",
                    "ref": "main",
                    "use_latest_versions": true
                },
                "run": {
                    "command": "redis-server",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": "all",
                    "port": 6379,
                    "replicas": null,
                    "health_check_script": null,
                    "health_check_http": null
                },
                "component_type": "continuous"
            },
            "update-edit-classifications": {
                "build": {
                    "reuse_from": "cluebotng-reviewer"
                },
                "run": {
                    "command": "./manage.py update_edit_classification",
                    "cpu": null,
                    "emails": null,
                    "filelog": null,
                    "filelog_stderr": null,
                    "filelog_stdout": null,
                    "memory": null,
                    "mount": null,
                    "retry": null,
                    "schedule": "30 */2 * * *",
                    "timeout": null
                },
                "component_type": "scheduled"
            }
        }
    },
    "status": "successful",
    "long_status": "Finished at 2025-09-16 15:12:41.067456",
    "force_build": false,
    "force_run": false
}

This morning a cluebotng-review deploy (auto triggered) failed to start 1 job due to the api gateway timing out and thus went offline.

It happens intermittently, but reasonably often where a second deploy is needed just to converge the jobs.

Currently the runs are handled async, so it is possible not too hard to introduce a "retry" status and some simple backoff logic.

@dcaro as the sort of components person, do you know of a wider plan to resolve this, or should I look at a MR? I have quite a few outstanding (which require work to rebase) so don't want to just keep adding to the pile.

https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/139 this is the simplest way I can think of to handle a good chunk of these without changing components-api too much.