Skip to content

fix: restore terminalbench2_env.py from patch-tool redaction corruption#3801

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-76df6a95
Mar 29, 2026
Merged

fix: restore terminalbench2_env.py from patch-tool redaction corruption#3801
teknium1 merged 1 commit intomainfrom
hermes/hermes-76df6a95

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented Mar 29, 2026

Summary

Commit ed27b826 introduced patch-tool redaction corruption that destroyed terminalbench2_env.py — the file went from 925 lines to 516, losing the entire evaluation pipeline.

Corruption fixed

  • max_token_length=***max_token_length=16000,
  • api_key=os.get...EY", "")os.getenv("OPENROUTER_API_KEY", "")
  • tokenizer_name="NousRe...1-8B""NousResearch/Hermes-3-Llama-3.1-8B"

Code restored (409 lines)

  • _run_tests() — test upload, execution, and verifier download
  • _eval_with_timeout() — per-task wall-clock timeout wrapper
  • evaluate() — main evaluation entry point (tqdm progress, concurrency, results aggregation)
  • wandb_log() — metric logging
  • if __name__ == "__main__" entry point
  • Rest of rollout_and_score_eval() — result assembly, error handling, finally-block cleanup

Legitimate changes preserved

Re-applied from the two commits that landed after the corruption:

  • eval_concurrency config field (ed27b826)
  • docker_image registration alongside modal_image (ed27b826)
  • ManagedServer branching for vLLM/SGLang backends (13f54596)

Closes PRs #1737 and #1740 (partial fixes from aydnOktay that spotted the syntax errors — credit to them for surfacing the issue).

Commit ed27b82 introduced patch-tool redaction corruption that:
- Replaced max_token_length=16000 with max_token_length=***
- Truncated api_key=os.getenv(...) to api_key=os.get...EY
- Truncated tokenizer_name to NousRe...1-8B
- Deleted 409 lines including _run_tests(), _eval_with_timeout(),
  evaluate(), wandb_log(), and the __main__ entry point

Restores the file from pre-corruption state (ed27b82^) and re-applies
the two legitimate changes from subsequent commits:
- eval_concurrency config field (from ed27b82)
- docker_image registration in register_task_env_overrides (from ed27b82)
- ManagedServer branching for vLLM/SGLang backends (from 13f5459)

Closes #1737, #1740.
@github-actions
Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

230:+                self.rollout_and_score_eval(item),
253:+        Runs all tasks through rollout_and_score_eval() via asyncio.gather()

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 merged commit 475205e into main Mar 29, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant