-
Notifications
You must be signed in to change notification settings - Fork 198
Acceptance testing — cold start request latency when optimizing over 200 tools #3759
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestkubernetesItems related to KubernetesItems related to Kubernetestelemetry
Description
Parent Epic
stacklok/stacklok-epics#201
Depends on: #3733
Context
The epic acceptance criteria requires supporting searching over ~150 tools with reasonable latency (less than ~5s per request on a cold start). This issue adds dedicated acceptance tests that validate this requirement with a margin — testing over 200 tools to ensure headroom.
A "cold start" means no pre-cached embeddings exist in the SQLite store. The test measures request latency once the vMCP server is ready to accept requests — the optimizer must generate embeddings for tools on-demand and respond to a FindTool query within the latency budget.
Requirements
- Create an acceptance test that:
- Deploys a vMCP server with 200+ tools (realistic tool names and descriptions) in a kind cluster
- Waits for the vMCP server to be ready for requests
- Measures latency of the first
FindToolrequest (which triggers embedding generation and similarity search with no cached embeddings) - Asserts the request latency is under 5 seconds
- The test must run in a kind cluster with a real EmbeddingServer deployed, since it measures latency of both embedding generation and similarity search against a live service.
- The test does not need to run in CI, but should be checked into the repo for future manual benchmarking and regression detection.
High-Level Implementation
- Create a test fixture that generates 200+ realistic MCP tool definitions (varied names, descriptions, and schemas)
- Write an acceptance test (e.g., Chainsaw or Go integration test targeting the kind cluster) that waits for the vMCP server to be ready, then times a
FindToolrequest from a cold state (no cached embeddings) - Assert the request latency is < 5s
- Check the test into the repo so it can be run manually against a kind cluster for future benchmarking
Acceptance Tests
- Request latency under 5s: With 200 tools and a running EmbeddingServer in a kind cluster, the first
FindToolrequest after vMCP is ready completes in under 5 seconds - Search quality at scale:
FindToolreturns relevant results when searching over 200 tools (not just any 200 — results should be semantically appropriate) - Test checked in: The acceptance test is checked into the repo for future manual benchmarking against a kind cluster
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestkubernetesItems related to KubernetesItems related to Kubernetestelemetry