Skip to content

Acceptance testing — cold start request latency when optimizing over 200 tools #3759

@jerm-dro

Description

@jerm-dro

Parent Epic

stacklok/stacklok-epics#201

Depends on: #3733

Context

The epic acceptance criteria requires supporting searching over ~150 tools with reasonable latency (less than ~5s per request on a cold start). This issue adds dedicated acceptance tests that validate this requirement with a margin — testing over 200 tools to ensure headroom.

A "cold start" means no pre-cached embeddings exist in the SQLite store. The test measures request latency once the vMCP server is ready to accept requests — the optimizer must generate embeddings for tools on-demand and respond to a FindTool query within the latency budget.

Requirements

  1. Create an acceptance test that:
    • Deploys a vMCP server with 200+ tools (realistic tool names and descriptions) in a kind cluster
    • Waits for the vMCP server to be ready for requests
    • Measures latency of the first FindTool request (which triggers embedding generation and similarity search with no cached embeddings)
    • Asserts the request latency is under 5 seconds
  2. The test must run in a kind cluster with a real EmbeddingServer deployed, since it measures latency of both embedding generation and similarity search against a live service.
  3. The test does not need to run in CI, but should be checked into the repo for future manual benchmarking and regression detection.

High-Level Implementation

  • Create a test fixture that generates 200+ realistic MCP tool definitions (varied names, descriptions, and schemas)
  • Write an acceptance test (e.g., Chainsaw or Go integration test targeting the kind cluster) that waits for the vMCP server to be ready, then times a FindTool request from a cold state (no cached embeddings)
  • Assert the request latency is < 5s
  • Check the test into the repo so it can be run manually against a kind cluster for future benchmarking

Acceptance Tests

  • Request latency under 5s: With 200 tools and a running EmbeddingServer in a kind cluster, the first FindTool request after vMCP is ready completes in under 5 seconds
  • Search quality at scale: FindTool returns relevant results when searching over 200 tools (not just any 200 — results should be semantically appropriate)
  • Test checked in: The acceptance test is checked into the repo for future manual benchmarking against a kind cluster

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions