Skip to content

Add guest init binary, hooks, and wiring for go-microvm runtime#4358

Draft
JAORMX wants to merge 3 commits intomainfrom
feature/propolis-runtime
Draft

Add guest init binary, hooks, and wiring for go-microvm runtime#4358
JAORMX wants to merge 3 commits intomainfrom
feature/propolis-runtime

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 25, 2026

The go-microvm runtime skeleton was solid but couldn't boot MCP server
images because it lacked guest-side infrastructure. go-microvm uses
gvproxy networking (virtio-net), so the guest must configure its own
network via DHCP — libkrun's built-in init doesn't do this.

This adds:

  • thv-vm-init: PID 1 guest binary that boots (mounts, DHCP, SSH via
    go-microvm/guest/boot), reads /etc/thv-entrypoint.json for the original
    OCI cmd/env/workdir, execs the MCP server as a child, forwards
    signals, and halts the VM cleanly on exit
  • RootFS hooks: InjectInitBinary (embeds compiled binary),
    InjectEntrypoint (captures OCI config before init override),
    InjectEntrypointOverride (explicit command), InjectSSHKeys
  • libkrun backend with WithUserNamespaceUID(1000,1000) for virtiofs
    passthrough, WithCleanDataDir, WithLogLevel, WithImageCache
  • HTTP readiness probe (exponential backoff, accepts any response)
  • Recovered VM lifecycle fix: StopWorkload/RemoveWorkload now kill
    orphaned runner processes for VMs without handles
  • Build infrastructure: build-vm-init task with CGO_ENABLED=0 GOOS=linux,
    wired as dependency of the main build task
@JAORMX JAORMX marked this pull request as draft March 25, 2026 05:21
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Mar 25, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@jhrozek
Copy link
Copy Markdown
Contributor

jhrozek commented Mar 25, 2026

Code Review Notes

Nice work on the go-microvm runtime skeleton — the architecture is clean, the init binary design is elegant, and the test coverage for hooks/permissions/state/lifecycle is solid. A few things I noticed that are worth discussing:


1. VirtioFS mounts silently drop ReadOnly flag

buildVirtioFSMounts in permissions.go:55-71 only copies Tag and HostPath from runtime.Mount — the ReadOnly field is never propagated. This means --volume /path:/path:ro mounts become read-write inside the guest VM.

I checked upstream and VirtioFSMount in go-microvm v0.0.23 only has Tag and HostPath fields, so this can't be fixed on the ToolHive side alone. Would adding a ReadOnly field to VirtioFSMount (and plumbing it through hypervisor.FilesystemMount) be feasible upstream?

In the meantime, it might be worth either logging a warning when read-only mounts are requested, or rejecting them with a clear error so users aren't silently getting weaker isolation than they expect.

No catalog entries currently use read mounts, so this is low practical impact today — but it's a correctness gap relative to how Docker handles the same permission profile.

2. doc.go has wrong runtime name

doc.go:9 says TOOLHIVE_RUNTIME=microvm but register.go:13 defines RuntimeName = "go-microvm".

3. Egress policy: empty AllowHost + InsecureAllowAll=false = unrestricted

buildEgressPolicy in permissions.go:28-29 returns nil (no policy = unrestricted) when AllowHost is empty, even when InsecureAllowAll is false. The intent of that configuration is deny-all, but the VM gets full network access.

Removing the len(out.AllowHost) == 0 early return would fix this — but it depends on whether go-microvm treats an empty AllowedHosts slice as deny-all. Worth verifying upstream.

4. Toolhive runtime env vars don't reach the child MCP server process

This one is more significant than it looks on the surface. In main.go:58:

cmd.Env = append(os.Environ(), ep.Env...)

The Toolhive runtime vars (MCP_TRANSPORT, MCP_PORT, API keys, egress proxy settings) are written to /etc/environment by the InjectEnvFile hook. However, boot.Run loads that file but only passes the vars to the SSH server config — it does not call os.Setenv. So os.Environ() inside the init binary contains only PATH + OCI image ENV (via .krun_config.json), and ep.Env is also OCI image ENV (captured from the image config). The Toolhive runtime vars are absent from the child process entirely.

The init binary likely needs to load /etc/environment itself and layer the env correctly:

  • OCI image ENV as the base (defaults)
  • Toolhive runtime vars on top (overrides, matching Docker's convention where runtime > image)

5. SSH private key persists on disk after VM stop

GenerateKeyPair in deploy.go:117 writes a private key to vmDir. Only RemoveWorkload cleans up data dirs — StopWorkload does not. Since the comment says "we don't actively SSH into the guest," the private key could be deleted immediately after reading the public key content.

6. Minor: deploy.go:55 uses fmt.Sprintf for path construction

vmDir := fmt.Sprintf("%s/vms/%s", c.opts.dataDir, name)

Should be filepath.Join(c.opts.dataDir, "vms", name) for defense-in-depth (upstream sanitization handles it today, but filepath.Join is the idiomatic approach).

@JAORMX JAORMX force-pushed the feature/propolis-runtime branch from 4b58bff to 1f6d0c7 Compare March 30, 2026 05:54
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 30, 2026
@JAORMX JAORMX force-pushed the feature/propolis-runtime branch from 1f6d0c7 to 1f1278b Compare March 31, 2026 07:21
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 31, 2026
JAORMX and others added 3 commits March 31, 2026 12:52
The go-microvm runtime skeleton was solid but couldn't boot MCP server
images because it lacked guest-side infrastructure. go-microvm uses
gvproxy networking (virtio-net), so the guest must configure its own
network via DHCP — libkrun's built-in init doesn't do this.

This adds:
- thv-vm-init: PID 1 guest binary that boots (mounts, DHCP, SSH via
  go-microvm/guest/boot), reads /etc/thv-entrypoint.json for the original
  OCI cmd/env/workdir, execs the MCP server as a child, forwards
  signals, and halts the VM cleanly on exit
- RootFS hooks: InjectInitBinary (embeds compiled binary),
  InjectEntrypoint (captures OCI config before init override),
  InjectEntrypointOverride (explicit command), InjectSSHKeys
- libkrun backend with WithUserNamespaceUID(1000,1000) for virtiofs
  passthrough, WithCleanDataDir, WithLogLevel, WithImageCache
- HTTP readiness probe (exponential backoff, accepts any response)
- Recovered VM lifecycle fix: StopWorkload/RemoveWorkload now kill
  orphaned runner processes for VMs without handles
- Build infrastructure: build-vm-init task with CGO_ENABLED=0 GOOS=linux,
  wired as dependency of the main build task

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix runtime name in doc.go (microvm -> go-microvm)
- Warn when ReadOnly VirtioFS mounts are requested (upstream limitation)
- Warn when empty AllowHost with InsecureAllowAll=false gives unrestricted
  egress instead of deny-all (upstream limitation)
- Load /etc/environment in init binary so ToolHive runtime vars (MCP_TRANSPORT,
  API keys, etc.) reach the child MCP server process
- Keep SSH private key for debugging access, update comment
- Use filepath.Join instead of fmt.Sprintf for path construction
- Use errors.As instead of type assertion for exec.ExitError

Upstream issues filed: go-microvm#53 (ReadOnly), go-microvm#54 (deny-all)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The go-microvm library provides no direct stdin/stdout access to
running VMs. Bridge stdio over TCP using port forwarding: a relay
inside the guest VM accepts connections and pipes them to the child
MCP server's stdin/stdout, while the host dials through an additional
port forward.

Guest side (thv-vm-init):
- Detect stdio mode via MCP_TRANSPORT and THV_STDIO_RELAY_PORT env vars
- TCP relay with accept loop and first-byte probe detection to
  distinguish readiness probes from real data connections
- Signal forwarding (SIGTERM/SIGINT) to child process

Host side (gomicrovm):
- Remove stdio transport rejection from DeployWorkload
- Allocate relay port, add second port forward, inject env var
- AttachToWorkload dials TCP relay with half-close wrapper
- TCP readiness probe for post-boot verification
- Persist relay port in VM state for recovery

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the feature/propolis-runtime branch from b09a2fe to ab3231f Compare March 31, 2026 09:53
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

2 participants