Skip to content

Add status conditions to MCPToolConfig CRD#4474

Open
ChrisJBurns wants to merge 4 commits intomainfrom
add-conditions-mcptoolconfig
Open

Add status conditions to MCPToolConfig CRD#4474
ChrisJBurns wants to merge 4 commits intomainfrom
add-conditions-mcptoolconfig

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns commented Mar 31, 2026

Summary

MCPToolConfig was the only CRD in the operator without standard Kubernetes conditions in its status struct. Every other CRD (MCPServer, EmbeddingServer, MCPExternalAuthConfig) uses Conditions []metav1.Condition to report validation state and readiness. This adds the same pattern to MCPToolConfig for consistency, enabling generic tooling and operators to check .status.conditions uniformly across all CRDs.

  • Add Conditions field to MCPToolConfigStatus with standard kubebuilder list markers
  • Define ConditionToolConfigValid type and reason constants
  • Set Valid=True on successful reconciliation, Valid=False on error paths (with static messages to avoid leaking errors into conditions)
  • Regenerate CRD manifests and deepcopy
  • Add envtest integration tests for the MCPToolConfig controller

Closes #4472

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Integration tests (task operator-test-integration) — new suite at cmd/thv-operator/test-integration/mcp-toolconfig/
  • Linting (task lint-fix)

Changes

File Change
cmd/thv-operator/api/v1alpha1/toolconfig_types.go Add Conditions field and condition constants
cmd/thv-operator/api/v1alpha1/zz_generated.deepcopy.go Regenerated deepcopy for new field
cmd/thv-operator/controllers/toolconfig_controller.go Set Valid condition on success and error paths
cmd/thv-operator/controllers/toolconfig_controller_test.go Add condition assertions and ObservedGeneration test
cmd/thv-operator/test-integration/mcp-toolconfig/suite_test.go New envtest suite setup (ToolConfig + MCPServer controllers)
cmd/thv-operator/test-integration/mcp-toolconfig/mcptoolconfig_controller_integration_test.go 10 integration tests covering conditions, hash tracking, referencing servers, and finalizer deletion protection
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptoolconfigs.yaml Regenerated CRD manifest
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcptoolconfigs.yaml Regenerated Helm-wrapped CRD

Does this introduce a user-facing change?

MCPToolConfig resources now report a Valid condition in .status.conditions, visible via kubectl get mcptoolconfig -o yaml. This is additive and does not affect existing manifests.

Special notes for reviewers

The implementation closely follows the MCPExternalAuthConfig controller pattern since both CRDs share the same status shape (ObservedGeneration, ConfigHash, ReferencingServers). Error messages in conditions are kept static to avoid leaking internal errors — the underlying errors are logged and returned for requeue.

Generated with Claude Code

MCPToolConfig was the only CRD in the operator without standard
Kubernetes conditions in its status. This adds a Conditions field
and sets a Valid condition during reconciliation, bringing it in
line with MCPServer, EmbeddingServer, and MCPExternalAuthConfig.

Closes #4472

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 31, 2026
Keep condition messages user-facing and static. The underlying error
is already logged via logger.Error and returned for requeue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 31, 2026
Verify condition-setting, hash tracking, referencing server updates,
and finalizer-based deletion protection using envtest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/S Small PR: 100-299 lines changed size/L Large PR: 600-999 lines changed labels Mar 31, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please review this PR for major and critical issues only — correctness bugs, security concerns, data loss risks, or architectural problems. Do not raise stylistic, minor, or low-level issues.

After your review, please recommend whether this PR should be approved or if changes are needed.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 31, 2026

Claude finished @ChrisJBurns's task in 2m 31s —— View job


PR Review: Add status conditions to MCPToolConfig CRD

  • Gather context and read changed files
  • Review types and constants
  • Review controller logic
  • Review tests
  • Final assessment

Assessment: Approve

The implementation is correct, consistent with the codebase pattern, and well-tested. I found one behavioral divergence from the reference implementation worth flagging, but it is not a blocking issue.


One behavioral divergence from MCPExternalAuthConfig

In handleConfigHashChange, the error path advances the hash before it's safe to do so.

toolconfig_controller.go lines 124–141:

// Hash is written to status BEFORE checking if findReferencingMCPServers succeeds
toolConfig.Status.ConfigHash = configHash
toolConfig.Status.ObservedGeneration = toolConfig.Generation

referencingServers, err := r.findReferencingMCPServers(ctx, toolConfig)
if err != nil {
    // ...sets Valid=False...
    r.Status().Update(ctx, toolConfig)   // <-- persists the new hash + Valid=False
    return ctrl.Result{}, err
}

The reference implementation (mcpexternalauthconfig_controller.go lines 148–156) sets the hash in memory the same way, but on a findReferencingMCPServers error it returns immediately without calling Status().Update. This means the MCPExternalAuthConfig controller retries the full handleConfigHashChange path on the next requeue.

The ToolConfig controller persists the new hash even on failure. On the next requeue the hash matches, so handleConfigHashChange is never re-entered — the controller falls through to updateReferencingServers instead. The outcome is functionally equivalent (MCPServer annotations are a best-effort trigger and servers do eventually get listed), but the MCPServer annotation updates in handleConfigHashChange lines 159–171 are permanently skipped for that config change event if the initial list call fails transiently. This is a minor correctness divergence from the reference, not a data-loss or security risk.


Everything else looks correct

  • Condition constants and field markers follow exactly the same kubebuilder patterns as the other CRDs.
  • Valid=True before Valid=False on the error paths: meta.SetStatusCondition is idempotent and the in-memory condition is overwritten before the status write, so no stale True is ever persisted on an error path.
  • Static condition messages (no internal errors in .status.conditions) is the right call and matches commit 7921097.
  • ObservedGeneration is correctly set on both success and error paths.
  • Deepcopy regeneration is correct and minimal.
  • CRD manifests match the type definition.
  • Tests cover the important scenarios: finalizer, hash, condition, ObservedGeneration tracking, referencing server list updates, and deletion protection. Integration suite correctly registers both controllers since ToolConfig watches MCPServer changes.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 40.90909% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.64%. Comparing base (e69251a) to head (277cfc9).

Files with missing lines Patch % Lines
.../thv-operator/controllers/toolconfig_controller.go 40.90% 12 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4474      +/-   ##
==========================================
- Coverage   69.64%   69.64%   -0.01%     
==========================================
  Files         491      491              
  Lines       50304    50324      +20     
==========================================
+ Hits        35036    35046      +10     
- Misses      12580    12590      +10     
  Partials     2688     2688              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Move ConfigHash and ObservedGeneration writes after the
findReferencingMCPServers call so a transient failure requeues
into handleConfigHashChange and MCPServer annotation updates
are not permanently skipped. Also regenerate CRD API docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 31, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude The behavioral divergence you flagged has been addressed in commit 277cfc9 — the hash is now only persisted after findReferencingMCPServers succeeds, matching the MCPExternalAuthConfig pattern. Please re-review and confirm whether this resolves your concern.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 31, 2026

Claude finished @ChrisJBurns's task in 41s —— View job


Re-review: Verifying fix for hash persistence divergence

  • Read trigger comment and understand request
  • Read current toolconfig_controller.go to verify commit 277cfc9 fix
  • Confirm whether the concern is resolved

Concern resolved — fix is correct

Commit 277cfc9 addresses the divergence exactly as needed. The ordering in handleConfigHashChange is now:

// 1. Lookup first (lines 124–131)
referencingServers, err := r.findReferencingMCPServers(ctx, toolConfig)
if err != nil {
    // Return WITHOUT calling Status().Update() and WITHOUT writing the hash.
    // Error will requeue; next attempt re-enters handleConfigHashChange.
    return ctrl.Result{}, fmt.Errorf("failed to find referencing MCPServers: %w", err)
}

// 2. Only set hash in memory after successful lookup (lines 133–135)
toolConfig.Status.ConfigHash = configHash
toolConfig.Status.ObservedGeneration = toolConfig.Generation

// 3. Persist to API server (lines 146–149)
if err := r.Status().Update(ctx, toolConfig); err != nil { ... }

The comment at line 128–130 also documents the intent clearly:

"Don't persist the new hash on error — returning the error will requeue, and on the next attempt handleConfigHashChange will be re-entered so that MCPServer annotation updates are not permanently skipped."

This now matches the MCPExternalAuthConfig pattern exactly. On a transient findReferencingMCPServers failure, the controller correctly retries the full hash-change path including the MCPServer annotation updates.

Recommendation: Approve

No remaining concerns. The implementation is correct, the divergence from the reference pattern is resolved, and the fix is well-commented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

1 participant