Skip to content

Detect skill install failures and surface them as agent failure context; fix needex/skills fallback crash#42642

Merged
pelikhan merged 6 commits into
mainfrom
copilot/detect-skills-installation-issues
Jul 1, 2026
Merged

Detect skill install failures and surface them as agent failure context; fix needex/skills fallback crash#42642
pelikhan merged 6 commits into
mainfrom
copilot/detect-skills-installation-issues

Conversation

Copilot AI commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

impeccable-skills-reviewer crashed in the activation step because vars.IMPECCABLE_SKILLS_REF || 'needex/skills' resolved to the non-existent needex/skills repo, and install_frontmatter_skills.cjs had no error handling — the unhandled rejection killed the step.

Root cause fix

Remove the || 'needex/skills' fallback. An unset var yields empty string → parseSkillSpecs("") returns [] → no install steps run (safe no-op).

Skill install failure detection (end-to-end)

install_frontmatter_skills.cjs — wrap each skill install in try/catch; on failure warn and append {skill, error} to /tmp/gh-aw/skill_install_failures.json (shared file within the activation job runner), then continue.

collect_skill_install_failures.cjs (new) — runs after all install steps (if: always()), reads the shared file, sets failure_count / errors step outputs.

Go compiler (compiler_activation_job_builder.go) — appends the collect step and exposes skill_install_failure_count / skill_install_errors as activation job outputs.

Conclusion job (notify_comment_conclusion_helpers.go) — passes GH_AW_SKILL_INSTALL_FAILURE_COUNT / GH_AW_SKILL_INSTALL_ERRORS to handle_agent_failure.cjs when the workflow has skills configured.

handle_agent_failure.cjs — reads the env vars, adds buildSkillInstallFailureContext() rendering skill_install_failure.md into the failure issue/comment body, adds skill_install_failures to buildFailureMatchCategories.

main_workflow_schema.json — adds skill_install_failures to the failure_categories pattern.

Copilot AI and others added 2 commits July 1, 2026 04:54
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Add skill install failure detection and fix needex/skills fallback Jul 1, 2026
Copilot AI requested a review from pelikhan July 1, 2026 05:03
@pelikhan pelikhan marked this pull request as ready for review July 1, 2026 05:05
Copilot AI review requested due to automatic review settings July 1, 2026 05:05
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #42642 does not have the 'implementation' label and has only 21 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. Test Quality Sentinel skipped.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request hardens frontmatter skill installation in gh-aw workflows by preventing activation-step crashes on gh skill install failures, capturing those failures, and surfacing them in the agent-failure issue/comment context. It also removes a broken fallback (needex/skills) that could resolve to a non-existent repo and cause unhandled rejections.

Changes:

  • Wrap each gh skill install invocation to continue on failure while recording {skill, error} into a shared runner file and step summary.
  • Add an always() collector step + activation job outputs so conclusion handling can pass skill-install failure context into handle_agent_failure.cjs.
  • Extend failure category matching/schema to include skill_install_failures, and remove the needex/skills fallback from the Impeccable workflow.
Show a summary per file
File Description
pkg/workflow/notify_comment_conclusion_helpers.go Conditionally passes skill-install failure outputs into the agent-failure handler env when skills are configured.
pkg/workflow/compiler_activation_job_builder.go Adds an always() collector step and exposes skill_install_failure_count / skill_install_errors as activation outputs.
pkg/parser/schemas/main_workflow_schema.json Adds skill_install_failures to the allowed failure_categories pattern.
actions/setup/md/skill_install_failure.md New failure-context template rendered into agent failure issues/comments.
actions/setup/js/install_frontmatter_skills.cjs Records per-skill install failures instead of hard-failing the activation step.
actions/setup/js/collect_skill_install_failures.cjs New script that reads the shared failures file and publishes step outputs.
actions/setup/js/handle_agent_failure.cjs Adds failure category + renders skill_install_failure.md context when installs fail.
.github/workflows/impeccable-skills-reviewer.md Removes `
.github/workflows/impeccable-skills-reviewer.lock.yml Regenerated workflow reflecting the removed fallback and new collector step/outputs.
.github/workflows/mattpocock-skills-reviewer.lock.yml Regenerated workflow reflecting the new collector step/outputs.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 10/10 changed files
  • Comments generated: 1
  • Review effort level: Low
Comment on lines +96 to +110
function appendSkillInstallFailure(skillSpec, errorMessage) {
let failures = [];
try {
if (fs.existsSync(SKILL_FAILURES_FILE)) {
const raw = fs.readFileSync(SKILL_FAILURES_FILE, "utf8");
failures = JSON.parse(raw);
}
} catch (parseErr) {
// If reading/parsing fails, start fresh and warn so the issue is visible in logs
core.warning(`Could not read skill install failures file, starting fresh: ${parseErr instanceof Error ? parseErr.message : String(parseErr)}`);
failures = [];
}
failures.push({ skill: skillSpec, error: errorMessage });
fs.writeFileSync(SKILL_FAILURES_FILE, JSON.stringify(failures, null, 2), "utf8");
}
@github-actions github-actions Bot mentioned this pull request Jul 1, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: #42642 — Skill install failure detection + needex/skills fallback fix

Overall this is a well-structured change. The root cause fix (removing || 'needex/skills') is correct and the end-to-end failure-surfacing pipeline (write → collect → output → env var → conclusion job → issue/comment) is sound. A few non-blocking suggestions below.

What's good
  • Early return at Go level (len(skillRefs) == 0return nil) ensures the collect step is never injected for workflows with no skills configured — avoids needless overhead.
  • if: always() on the collect step correctly captures failures even if a prior install step hard-fails.
  • Template syntax ({skills}) matches renderTemplate's /\{(\w+)\}/g regex — no mismatch.
  • Conditional env var injection in notify_comment_conclusion_helpers.go (guarded by len(SkillReferences) > 0 || len(Skills) > 0) prevents pollution of workflows without skills.
  • skill_install_failures added to failure_categories schema pattern — correctly allows users to opt-in to issue creation for this category.

Inline suggestions (3 flagged):

  1. collect_skill_install_failures.cjs L24 — missing element-level type validation; malformed entries produce undefined:undefined output.
  2. collect_skill_install_failures.cjs L39 and install_frontmatter_skills.cjs L153 — error messages with embedded newlines corrupt the \n-delimited serialization format used to pass errors between steps.
  3. handle_agent_failure.cjs L2136 — indexOf(":") parsing is fragile if skill names ever include a colon (e.g., URL-style refs); worth using a more robust delimiter long-term.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 93.3 AIC · ⌖ 7.02 AIC · ⊞ 4.9K

if (!Array.isArray(parsed)) {
return [];
}
return parsed;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return parsed here passes through without element-level validation. If the JSON file contains entries with missing or non-string skill/error fields (e.g., due to a partial write or crash mid-flush), the .map(f => ${f.skill}:${f.error}) on line 39 will silently produce undefined:undefined entries in the output.

Consider adding a guard before returning:

return parsed.filter(
  (e) => e && typeof e.skill === 'string' && typeof e.error === 'string'
);

@copilot please address this.

core.info(`Skill install failures detected: ${failureCount}`);

core.setOutput("failure_count", String(failureCount));
core.setOutput("errors", failures.map(f => `${f.skill}:${f.error}`).join("\n"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If f.error contains embedded newlines (which can happen if exec.exec captures multi-line stderr into its Error message), the join("\\n") here will produce a multi-entry string where lines belonging to one failure spill into adjacent entries. The parser in handle_agent_failure.cjs splits on "\\n" to identify entries, so a newline inside an error message would corrupt the format.

Sanitize before serializing:

core.setOutput(
  'errors',
  failures
    .map(f => `${f.skill}:${f.error.replace(/\n/g, ' ')}`)
    .join('\n')
);

@copilot please address this.

try {
await exec.exec("gh", command.args);
} catch (err) {
const errorMessage = err instanceof Error ? err.message : String(err);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same serialization concern as in collect_skill_install_failures.cjs: err.message from a failed exec.exec can include multi-line stderr output. Newlines in this value will corrupt the "\\n"-delimited format when the errors are later joined for the step output.

Consider normalizing at the source:

const errorMessage = (err instanceof Error ? err.message : String(err)).replace(/\n/g, ' ');

@copilot please address this.

for (const errorLine of errorLines) {
const colonIdx = errorLine.indexOf(":");
if (colonIdx >= 0) {
const skill = errorLine.slice(0, colonIdx);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexOf(":") parser is subtly fragile when paired with the current serialization format. The skill name is taken as everything before the first colon, but the error portion (everything after) inherits all remaining text — including any colon-containing prefixes like Error: ....

This currently works because skill names (owner/repo) don't contain colons. However, if the upstream serialization ever changes to include a URL-style skill ref (e.g., https://github.com/org/repo), skill extraction would silently break.

A more defensive alternative would be to use a structured delimiter (e.g., a tab or a null byte) or serialize as JSON in the step output. Minor suggestion — no change required if the skill-name contract is stable.

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — One blocking bug kills the core feature; two more can re-introduce the crash this PR is meant to prevent.

Blocking issues

Missing template placeholder (blocks the entire feature)

skill_install_failure_context is computed and passed to renderTemplateFromFile, but neither agent_failure_comment.md nor agent_failure_issue.md contains the {skill_install_failure_context} placeholder. renderTemplate silently drops unknown keys. Users will never see skill install failure details in the failure comment/issue.

Colon delimiter fragility

collect_skill_install_failures.cjs serialises failures as skill:error joined by newlines; the consumer in handle_agent_failure.cjs splits on the first colon. gh CLI error messages always contain colons (exit status 1: HTTP 404: Not Found), so the skill name and error text will be split at the wrong position. Use a tab or JSON serialisation instead.

Unguarded writeFileSync with missing directory prerequisite

appendSkillInstallFailure calls fs.writeFileSync with no mkdirSync guard and no try/catch. If /tmp/gh-aw/ does not exist (fresh runner), or a write error occurs, the exception propagates out of the catch block that called it and crashes the install step — exactly the regression this PR aims to avoid.

Non-blocking concern

!hasSkillInstallFailures in the isSuccessful guard will suppress success notifications for agents that completed their task correctly but happened to have a skill install failure. Skill install failures should probably be surfaced as warnings without overriding the agent's own success/failure signal.

🔎 Code quality review by PR Code Quality Reviewer · 90.7 AIC · ⌖ 7.54 AIC · ⊞ 1.6K
Comment /review to run again

credential_auth_error_context: credentialAuthErrorContext,
assignment_errors_context: assignmentErrorsContext,
assign_copilot_failure_context: assignCopilotFailureContext,
skill_install_failure_context: skillInstallFailureContext,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skill_install_failure_context is passed to the template renderer but neither agent_failure_comment.md nor agent_failure_issue.md contains the {skill_install_failure_context} placeholder — the failure context is silently discarded and users never see it.

💡 Fix

Both template files end with a tightly-packed chain of context placeholders on a single line. {skill_install_failure_context} must be inserted into that chain so the rendered output actually includes it.

In agent_failure_comment.md and agent_failure_issue.md, find the placeholder chain (the line that starts with {secret_verification_context}) and add {skill_install_failure_context} in the appropriate position — right after {assign_copilot_failure_context} to match the insertion order used throughout the rest of the file:

-...{assign_copilot_failure_context}{create_discussion_errors_context}...
+...{assign_copilot_failure_context}{skill_install_failure_context}{create_discussion_errors_context}...

Without this change, buildSkillInstallFailureContext() runs, populates the context object, and is then silently dropped by renderTemplate because there is no matching placeholder. The entire feature introduced by this PR — surfacing skill install failures in the failure comment/issue — does not work end-to-end.

core.info(`Skill install failures detected: ${failureCount}`);

core.setOutput("failure_count", String(failureCount));
core.setOutput("errors", failures.map(f => `${f.skill}:${f.error}`).join("\n"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colon-delimited skill:error output is ambiguous — gh error messages routinely contain colons, so the parser in handle_agent_failure.cjs will split at the wrong position and produce a truncated skill name and garbled error text.

💡 Fix

The serialization here:

failures.map(f => `${f.skill}:${f.error}`).join("\n")

collides with any colon inside an error message like "exit status 1: HTTP 404: Not Found". The consumer uses indexOf(":"), so it splits at the first colon regardless.

The simplest fix is to use a tab character as the delimiter (rare in skill specs and error messages):

core.setOutput("errors", failures.map(f => `${f.skill}\t${f.error}`).join("\n"));

And in handle_agent_failure.cjs, replace indexOf(":") with indexOf("\t"):

const tabIdx = errorLine.indexOf("\t");
if (tabIdx >= 0) {
  const skill = errorLine.slice(0, tabIdx);
  const error = errorLine.slice(tabIdx + 1);
  skillList += `- \`${skill}\`: ${error}\n`;
}

Alternatively, serialise as JSON and parse it back — the data is already structured.

failures = [];
}
failures.push({ skill: skillSpec, error: errorMessage });
fs.writeFileSync(SKILL_FAILURES_FILE, JSON.stringify(failures, null, 2), "utf8");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp/gh-aw/ is never created before writeFileSync, so the very first call to appendSkillInstallFailure will throw ENOENT if the directory does not exist — which then propagates out of the catch block and crashes the install loop.

💡 Fix

Add a mkdirSync call with { recursive: true } before the write, or at the top of appendSkillInstallFailure:

function appendSkillInstallFailure(skillSpec, errorMessage) {
  let failures = [];
  try {
    if (fs.existsSync(SKILL_FAILURES_FILE)) {
      const raw = fs.readFileSync(SKILL_FAILURES_FILE, "utf8");
      const parsed = JSON.parse(raw);
      failures = Array.isArray(parsed) ? parsed : [];
    }
  } catch (parseErr) {
    core.warning(`...`);
    failures = [];
  }
  failures.push({ skill: skillSpec, error: errorMessage });
  try {
    fs.mkdirSync(require("path").dirname(SKILL_FAILURES_FILE), { recursive: true });
    fs.writeFileSync(SKILL_FAILURES_FILE, JSON.stringify(failures, null, 2), "utf8");
  } catch (writeErr) {
    core.warning(`Could not write skill install failures file: ${writeErr instanceof Error ? writeErr.message : String(writeErr)}`);
  }
}

Note that the existing review comment already points out the missing Array.isArray guard on parsed and the unguarded write — this is the directory-existence gap that makes both failure paths worse: even if writeFileSync is later wrapped in try/catch, without the mkdirSync the first write in a fresh runner will always throw.

!isTimedOut &&
!hasAssignmentErrors &&
!hasAssignCopilotFailures &&
!hasSkillInstallFailures &&

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A skill install failure marks an otherwise-successful agent run as not successful, suppressing success notifications even when the agent completed its task correctly.

💡 Rationale

!hasSkillInstallFailures is ANDed into the isSuccessful condition. The issue: skill install failures are activation-time decorators — they affect what context the agent had available, not whether the agent succeeded at its assigned task. If a skill fails to install but the agent completes normally, the workflow should still be marked successful (with a warning surfaced via skill_install_failure_context).

Consider removing !hasSkillInstallFailures from the isSuccessful guard and surfacing skill install failures purely as warning context in the comment/issue body. If a skill failure should always be treated as a hard failure, that intent should be explicit in a comment here.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /codebase-design — requesting changes primarily on test coverage gaps and one latent encoding bug in the error serialization protocol.

📋 Key Themes & Highlights

Key Themes

  • Missing tests for new failure paths: The three files that form the failure-detection pipeline (install_frontmatter_skills.cjs, collect_skill_install_failures.cjs, handle_agent_failure.cjs) each introduce new behaviour with no corresponding test coverage. A regression in any of them would fail silently.
  • Fragile error serialization: The skill:error + newline-join protocol between collect_skill_install_failures.cjs and buildSkillInstallFailureContext will misformat errors that contain newlines (e.g. multi-line shell output or Node stack traces).
  • Minor condition asymmetry in Go compiler: notify_comment_conclusion_helpers.go guards the env-var injection with the raw (unfiltered) skills list while the compiler filters blanks, making the two sides of the pipeline diverge on edge inputs.

Positive Highlights

  • ✅ Root cause fix is clean and minimal — removing the || 'needex/skills' fallback is exactly the right call.
  • if: always() on the collect step is correct and necessary — good defensive thinking.
  • appendSkillInstallFailure correctly handles corrupt/missing file via try/catch and warns before resetting.
  • ✅ The failure template skill_install_failure.md covers all the common resolution paths clearly.
  • ✅ Adding skill_install_failures to both the schema pattern and buildFailureMatchCategories closes the loop end-to-end.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 109.9 AIC · ⌖ 8.93 AIC · ⊞ 6.6K
Comment /matt to run again

core.warning(`Failed to install skill '${skillSpec}': ${errorMessage}`);
failures.push({ skill: skillSpec, error: errorMessage });
appendSkillInstallFailure(skillSpec, errorMessage);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new try/catch error-recording path in main() has no test coverage — an exec failure is never exercised in the test suite.

💡 Suggested test addition

Add a case to install_frontmatter_skills.test.cjs where exec.exec rejects for one skill and assert:

  1. core.warning is called with the skill name
  2. The shared failures file is written with the correct {skill, error} entry
  3. writeSkillSummary receives the non-empty failures array
it('records a failure entry when a skill install throws', async () => {
  process.env.GH_AW_SKILL_DIR = '.claude/skills';
  process.env.GH_AW_FRONTMATTER_SKILLS = 'bad/repo@abc';
  global.exec.exec = vi.fn().mockRejectedValue(new Error('exit code 1'));
  global.core.warning = vi.fn();

  await script.main();

  expect(global.core.warning).toHaveBeenCalledWith(expect.stringContaining('bad/repo@abc'));
  const written = JSON.parse(fs.readFileSync('/tmp/gh-aw/skill_install_failures.json', 'utf8'));
  expect(written).toEqual([{ skill: 'bad/repo@abc', error: expect.stringContaining('exit code 1') }]);
});

Without this, a regression in the catch branch (e.g. silently swallowing the error) would go undetected.

@copilot please address this.

core.info(`Skill install failures detected: ${failureCount}`);

core.setOutput("failure_count", String(failureCount));
core.setOutput("errors", failures.map(f => `${f.skill}:${f.error}`).join("\n"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] New file with zero test coverage — and the errors output encoding is also a latent parsing bug.

The format \${f.skill}:${f.error}`joined by\nwill be mishandled bybuildSkillInstallFailureContextinhandle_agent_failure.cjs` if an error message itself contains a newline (e.g. a multi-line shell error or Node stack trace). The split-on-newline in the consumer will fragment the error into multiple lines, some of which won't have a colon, and will be rendered as bare bullet points.

💡 Suggested fixes

Option A – JSON-encode the errors output (most robust):

core.setOutput('errors', JSON.stringify(failures));

Then in handle_agent_failure.cjs:

const errorEntries = JSON.parse(skillInstallErrors || '[]');
for (const { skill, error } of errorEntries) {
  skillList += `- \`${skill}\`: ${error}\n`;
}

Option B – strip newlines from the error string before joining:

core.setOutput('errors', failures.map(f => `${f.skill}:${f.error.replace(/\n/g, ' ')}`).join('\n'));

Also add a test file collect_skill_install_failures.test.cjs that covers:

  • happy path: no failures file → failure_count=0, empty errors
  • single failure recorded → correct outputs
  • corrupt/unreadable failures file → graceful warning, no throw

@copilot please address this.

* @param {string} skillInstallErrors - Newline-separated list of "skill:error" entries
* @returns {string} Formatted context string, or empty string if no failures
*/
function buildSkillInstallFailureContext(hasSkillInstallFailures, skillInstallErrors) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] buildSkillInstallFailureContext and the new skill_install_failures category in buildFailureMatchCategories are not covered in handle_agent_failure.test.cjs.

💡 Suggested test additions

Add scenarios to handle_agent_failure.test.cjs (following the existing category test pattern):

it('includes skill install failure context when GH_AW_SKILL_INSTALL_FAILURE_COUNT is set', async () => {
  process.env.GH_AW_SKILL_INSTALL_FAILURE_COUNT = '1';
  process.env.GH_AW_SKILL_INSTALL_ERRORS = 'bad/repo:Error: exit code 1';
  // ... run handler and assert issue/comment body contains skill failure text
  expect(createdIssueBody).toContain('Skill Installation Failed');
  expect(createdIssueBody).toContain('bad/repo');
});

it('excludes skill install context when failure count is 0', async () => {
  process.env.GH_AW_SKILL_INSTALL_FAILURE_COUNT = '0';
  // ... run handler and assert issue body does NOT contain skill failure text
  expect(createdIssueBody).not.toContain('Skill Installation Failed');
});

Without these, the rendering path for the new category is untested and a template format change could silently break the failure report.

@copilot please address this.

ctx.steps = append(ctx.steps, " script: |\n")
ctx.steps = append(ctx.steps, generateGitHubScriptWithRequire("collect_skill_install_failures.cjs"))

ctx.outputs["skill_install_failure_count"] = "${{ steps.collect-skill-install-failures.outputs.failure_count || '0' }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/codebase-design] The collect-skill-install-failures step and its two outputs (skill_install_failure_count, skill_install_errors) are not asserted in activation_skills_step_test.go, leaving the wiring unverified by the existing test.

💡 Suggested assertion additions

In TestBuildActivationJob_AddsFrontmatterSkillsInstallSteps, add:

assert.Contains(t, steps, "collect-skill-install-failures", "expected collect failures step")
assert.Contains(t, steps, "collect_skill_install_failures.cjs", "expected collect failures script")

// Verify job outputs are wired
outputsStr := fmt.Sprintf("%v", job.Outputs)
assert.Contains(t, outputsStr, "skill_install_failure_count")
assert.Contains(t, outputsStr, "skill_install_errors")

This ensures the output names stay in sync between the Go compiler and the downstream references in notify_comment_conclusion_helpers.go.

@copilot please address this.

envVars = append(envVars, fmt.Sprintf(" GH_AW_STALE_LOCK_FILE_FAILED: ${{ needs.%s.outputs.stale_lock_file_failed }}\n", constants.ActivationJobName))
// SkillReferences holds structured skill refs (with github-token/github-app); Skills holds raw
// skill specs from simple frontmatter. Both result in activation-job skill install steps.
if len(data.SkillReferences) > 0 || len(data.Skills) > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/codebase-design] The condition here uses data.Skills (raw, unfiltered) while addActivationSkillInstallSteps in the compiler filters out empty skill strings before deciding whether to emit the collect step. If data.Skills = [""] (one blank entry), this block injects GH_AW_SKILL_INSTALL_FAILURE_COUNT/GH_AW_SKILL_INSTALL_ERRORS but the activation job never sets those outputs, so they evaluate to empty string in the conclusion job.

💡 Suggested fix

Apply the same blank-filter here, or extract a shared helper:

func hasNonEmptySkills(data *WorkflowData) bool {
    if len(data.SkillReferences) > 0 {
        return true
    }
    for _, s := range data.Skills {
        if strings.TrimSpace(s) != "" {
            return true
        }
    }
    return false
}

Then use hasNonEmptySkills(data) in both places. This is low-severity in practice since the || '0' / || '' defaults handle it, but the asymmetry makes the logic harder to reason about.

@copilot please address this.

@pelikhan

pelikhan commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Copilot AI and others added 2 commits July 1, 2026 05:52
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Ran pr-finisher pass and pushed follow-up fixes in bd9874d (plus b8c0d8b) to harden skill install failure handling, wire skill-install context into failure templates, and add targeted tests/validation updates.

@pelikhan pelikhan merged commit 1c56034 into main Jul 1, 2026
26 of 28 checks passed
@pelikhan pelikhan deleted the copilot/detect-skills-installation-issues branch July 1, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants