fix: restore terminalbench2_env.py from patch-tool redaction corruption by teknium1 · Pull Request #3801 · NousResearch/hermes-agent

teknium1 · 2026-03-29T22:31:34Z

Summary

Commit ed27b826 introduced patch-tool redaction corruption that destroyed terminalbench2_env.py — the file went from 925 lines to 516, losing the entire evaluation pipeline.

Corruption fixed

max_token_length=*** → max_token_length=16000,
api_key=os.get...EY", "") → os.getenv("OPENROUTER_API_KEY", "")
tokenizer_name="NousRe...1-8B" → "NousResearch/Hermes-3-Llama-3.1-8B"

Code restored (409 lines)

_run_tests() — test upload, execution, and verifier download
_eval_with_timeout() — per-task wall-clock timeout wrapper
evaluate() — main evaluation entry point (tqdm progress, concurrency, results aggregation)
wandb_log() — metric logging
if __name__ == "__main__" entry point
Rest of rollout_and_score_eval() — result assembly, error handling, finally-block cleanup

Legitimate changes preserved

Re-applied from the two commits that landed after the corruption:

eval_concurrency config field (ed27b826)
docker_image registration alongside modal_image (ed27b826)
ManagedServer branching for vLLM/SGLang backends (13f54596)

Closes PRs #1737 and #1740 (partial fixes from aydnOktay that spotted the syntax errors — credit to them for surfacing the issue).

Commit ed27b82 introduced patch-tool redaction corruption that: - Replaced max_token_length=16000 with max_token_length=*** - Truncated api_key=os.getenv(...) to api_key=os.get...EY - Truncated tokenizer_name to NousRe...1-8B - Deleted 409 lines including _run_tests(), _eval_with_timeout(), evaluate(), wandb_log(), and the __main__ entry point Restores the file from pre-corruption state (ed27b82^) and re-applies the two legitimate changes from subsequent commits: - eval_concurrency config field (from ed27b82) - docker_image registration in register_task_env_overrides (from ed27b82) - ManagedServer branching for vLLM/SGLang backends (from 13f5459) Closes #1737, #1740.

github-actions · 2026-03-29T22:31:55Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

230:+                self.rollout_and_score_eval(item),
253:+        Runs all tasks through rollout_and_score_eval() via asyncio.gather()

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

teknium1 merged commit 475205e into main Mar 29, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore terminalbench2_env.py from patch-tool redaction corruption#3801

fix: restore terminalbench2_env.py from patch-tool redaction corruption#3801
teknium1 merged 1 commit intomainfrom
hermes/hermes-76df6a95

teknium1 commented Mar 29, 2026 •

edited

Loading

github-actions bot commented Mar 29, 2026

Uh oh!

Labels

1 participant

Conversation

teknium1 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Corruption fixed

Code restored (409 lines)

Legitimate changes preserved

github-actions bot commented Mar 29, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: exec() or eval() usage

Uh oh!

Labels

1 participant

teknium1 commented Mar 29, 2026 •

edited

Loading