Skip to content

Simplify VMCP Configuration #3125

@jerm-dro

Description

@jerm-dro

Summary

VirtualMCPServer configuration is defined at least thrice: once in the CRD spec (cmd/thv-operator/api/v1alpha1/), again in the server config (pkg/vmcp/config/), and once more in pkg/vmcp/config/yaml_loader.go. Adding or modifying any configurable feature requires touching multiple layers, with bugs at any step going undetected until production.

Motivation

Developer Velocity

Every new configurable feature must traverse the entire configuration pipeline: CRD type → Converter → Server Config → Execution. This multiplies the effort required for even simple additions and creates friction for contributors.

Quality / Bug Prevention

Missing or incorrect transformations at any step result in silent configuration bugs. Real examples:

Integration tests pass because they configure execution-layer data structures directly, bypassing transformation layers entirely.

Confusing Documentation

The current architecture produces two different configuration formats—YAML tags for CLI (snake_case) vs CRD spec (camelCase). Documenting the same configuration twice with subtle differences creates user confusion and maintenance burden. See PR #3070 which fixed documentation that incorrectly mixed these conventions.

Higher Value, Lower Cost Testing

Issue #2852 proposes moving e2e tests to integration tests. This is blocked by the current architecture—integration tests that configure data structures directly miss config plumbing bugs. Simplifying the types makes this transition safe.

API Extensibility

Fewer, more powerful data structures describing vmcp configuration make it easier to eventually support mutating and querying this data via API.

Current Architecture (Problem)

Configuration flows through multiple transformations:

mcpv1alpha1.VirtualMCPServer → vmcpconfig.Config -> rawConfig during unmarshall -> vmcpconfig.Config  again  → Data Structures Used in Execution

Key problematic areas:

  1. Duplicative type definitions: VirtualMCPServerSpec (CRD) and Config (server) define the same concepts with different types
  2. Manual conversion: cmd/thv-operator/pkg/vmcpconfig/converter.go (900+ lines) manually maps CRD types to config types
  3. YAML loader duplication: pkg/vmcp/config/yaml_loader.go has its own rawConfig types that mirror the CRD types
  4. Inconsistent validation: CRD webhook validation differs from server startup validation

Proposed Solution

Make the CRD Spec embed the server config type as a RawExtension.

The approach:

  1. Define a canonical pkg/vmcp/config.Config type as the single source of truth for the CRD-to-ConfigMap path
  2. The CRD VirtualMCPServerSpec.Config field is a runtime.RawExtension containing the JSON-serialized vmcpconfig.Config
  3. Webhook validation unmarshals the JSON and applies the same validation as server startup
  4. The k8s operator continues to write its config to the same exact location, we just change the type.

Example CRD structure after change:

apiVersion: mcp.toolhive.io/v1alpha1
kind: VirtualMCPServer
metadata:
  name: my-vmcp
spec:
  # k8s config is not included in the config object.
  podTemplateSpec: ... 
  serviceType: ClusterIP
  
  # Server configuration uses the canonical config type
  config:
    groupRef:
      name: my-group
    incoming_auth:
      type: oidc
      oidc:
        issuer: https://...
    composite_tools:
      - name: my-tool
        steps: [...]

Acceptance Criteria

  • The rawConfig types in pkg/vmcp/config/yaml_loader.go are deleted
  • The duplicative CRD spec fields are consolidated into a single config field using RawExtension.
  • CRD webhook validation uses the exact same validation as server startup (pkg/vmcp/config.Validator)
  • The 900+ line converter (cmd/thv-operator/pkg/vmcpconfig/converter.go) is significantly reduced or eliminated
  • Documentation references a single configuration schema, not platform-specific variants
  • E2e tests validating config transformations can be safely converted to integration tests with no loss of coverage

Out of Scope

  • Changes to K8s-specific fields (serviceType, podTemplateSpec)
  • Changes to the vmcp server runtime behavior
  • MCPServer CRD (focus is on VirtualMCPServer)
  • Reducing duplicative data types within the server itself (e.g., config.Config → execution-layer types). This is a separate problem that should be addressed independently. PR Composite Tools: Fix plumbing for on-error-continue config #3118's bug was actually in this layer, not the CRD conversion layer.

Tasks

  • Delete the rawConfig.
  • Modify the shape of the config.Config to be k8s friendly, including modifying the yaml and json tags to use camelcase instead of snake case.
  • Pull out the unique functionality from VirtualMCPServerSpec (i.e. discovery/resolution bits).
  • Replace the remainder of the config with config.Config. Make minor kubebuilder annotations and tweaks of config.Config to satisfy generation.
  • Update docs.

Related

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions