refactor: Go SDK usage improvements from module review#2967
Conversation
Address findings from Go Fan report on modelcontextprotocol/go-sdk:
1. Extract generic paginateAll() helper in connection.go to deduplicate
identical cursor-loop pagination across listTools, listResources,
and listPrompts (~45 lines of boilerplate removed).
2. Eliminate resourceContents intermediate type in tool_result.go by
using sdk.ResourceContents directly for JSON unmarshaling, removing
field-by-field copy in the resource content conversion.
3. Pass explicit &sdk.ServerOptions{} instead of nil in mcptest/server.go
to guard against future SDK changes that might not accept nil options.
4. Add TTL-based eviction to filteredServerCache in routed.go to prevent
unbounded memory growth. Cache entries now expire after the session
timeout (30min), evicted lazily on each getOrCreate call.
5. Add transport ownership documentation to transportConnector type
clarifying that the SDK session owns the transport after Connect().
Closes #2911
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors several Go SDK integration points to reduce duplication and improve operational safety, based on findings from the modelcontextprotocol/go-sdk module review.
Changes:
- Introduces a generic pagination helper in the MCP connection layer to remove duplicated cursor-loop logic.
- Removes a redundant local
resourceContentstype in favor of unmarshaling directly intosdk.ResourceContents. - Adds TTL-based eviction to the routed-mode
(backend, session)filtered server cache and makes server options explicit in the MCP test server.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/mcp/connection.go | Extracts generic pagination helper and refactors listTools/listResources/listPrompts to use it. |
| internal/mcp/tool_result.go | Simplifies embedded resource unmarshaling by using sdk.ResourceContents directly. |
| internal/mcp/http_transport.go | Documents transport lifecycle/ownership expectations for SDK transports created by connectors. |
| internal/server/routed.go | Adds TTL tracking + lazy eviction for routed filtered server cache; aligns TTL with SessionTimeout. |
| internal/testutil/mcptest/server.go | Passes explicit &sdk.ServerOptions{} to sdk.NewServer() in tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| all := make([]T, len(first.Items), max(len(first.Items), 1)) | ||
| copy(all, first.Items) | ||
| logConn.Printf("list%s: received page of %d %s from serverID=%s", itemKind, len(first.Items), itemKind, serverID) | ||
|
|
||
| cursor := first.NextCursor | ||
| for cursor != "" { | ||
| result, err := c.getSDKSession().ListTools(c.ctx, &sdk.ListToolsParams{Cursor: cursor}) | ||
| page, err := fetch(cursor) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| allTools = append(allTools, result.Tools...) | ||
| logConn.Printf("listTools: received page of %d tools (total so far: %d) from serverID=%s", len(result.Tools), len(allTools), c.serverID) | ||
| cursor = result.NextCursor | ||
| all = append(all, page.Items...) | ||
| logConn.Printf("list%s: received page of %d %s (total so far: %d) from serverID=%s", itemKind, len(page.Items), itemKind, len(all), serverID) | ||
| cursor = page.NextCursor | ||
| } | ||
| logConn.Printf("listTools: received %d tools total from serverID=%s", len(allTools), c.serverID) | ||
| return marshalToResponse(&sdk.ListToolsResult{Tools: allTools}) | ||
| logConn.Printf("list%s: received %d %s total from serverID=%s", itemKind, len(all), itemKind, serverID) | ||
| return all, nil |
There was a problem hiding this comment.
paginateAll() changes the log prefix from the prior per-method prefixes (e.g., "listTools:"/"listResources:") to "list%s:" with a lowercase plural kind (e.g., "listtools:"). This contradicts the PR description of preserving identical logging behavior and can break log filtering/alerts keyed on the existing prefixes. Consider passing an explicit log prefix ("listTools"/"listResources"/"listPrompts") into paginateAll, or logging through a callback supplied by the caller so the emitted messages remain unchanged.
| for k, entry := range c.servers { | ||
| if now.Sub(entry.lastUsed) > c.ttl { | ||
| logRouted.Printf("[CACHE] Evicting expired server: key=%s (idle %s)", k, now.Sub(entry.lastUsed).Round(time.Second)) | ||
| delete(c.servers, k) | ||
| } | ||
| } | ||
|
|
||
| if entry, ok := c.servers[key]; ok { | ||
| entry.lastUsed = now | ||
| return entry.server | ||
| } | ||
|
|
||
| logRouted.Printf("[CACHE] Creating new filtered server: backend=%s, session=%s", backendID, sessionID) | ||
| server := creator() | ||
| c.servers[key] = &filteredServerEntry{server: server, lastUsed: now} |
There was a problem hiding this comment.
filteredServerCache logs raw session identifiers ("session=%s" and the full "key=%s" which includes sessionID). In this codebase session IDs are API keys and should be truncated before logging to avoid secret leakage (see internal/auth/header.go TruncateSessionID and existing usage in internal/server/session.go). Update these log lines to use auth.TruncateSessionID(sessionID) (and avoid embedding the full sessionID inside the logged key).
| c.mu.Lock() | ||
| defer c.mu.Unlock() | ||
|
|
||
| // Lazy eviction of expired entries | ||
| for k, entry := range c.servers { | ||
| if now.Sub(entry.lastUsed) > c.ttl { | ||
| logRouted.Printf("[CACHE] Evicting expired server: key=%s (idle %s)", k, now.Sub(entry.lastUsed).Round(time.Second)) | ||
| delete(c.servers, k) | ||
| } | ||
| } |
There was a problem hiding this comment.
getOrCreate() now scans and potentially logs/evicts every cache entry on every call (the for k, entry := range c.servers loop). Since this path runs per request, the O(n) eviction can become a CPU and lock-contention hotspot as the cache grows. Consider reducing eviction frequency (e.g., only evict when now passes a nextEvictAt, evict a bounded number of entries per call, or run periodic cleanup in a background goroutine).
| servers map[string]*sdk.Server | ||
| servers map[string]*filteredServerEntry | ||
| ttl time.Duration | ||
| mu sync.RWMutex |
There was a problem hiding this comment.
filteredServerCache declares mu sync.RWMutex but getOrCreate() always takes the write lock (Lock) and never uses RLock/RUnlock. If there’s no read-only fast path, consider switching this to sync.Mutex to better reflect usage and avoid confusion about intended concurrency behavior.
| mu sync.RWMutex | |
| mu sync.Mutex |
| // getOrCreate returns a cached server or creates a new one. | ||
| // Expired entries are lazily evicted on each call. | ||
| func (c *filteredServerCache) getOrCreate(backendID, sessionID string, creator func() *sdk.Server) *sdk.Server { | ||
| key := fmt.Sprintf("%s/%s", backendID, sessionID) | ||
| now := time.Now() | ||
|
|
||
| server, _ := syncutil.GetOrCreate(&c.mu, c.servers, key, func() (*sdk.Server, error) { | ||
| logRouted.Printf("[CACHE] Creating new filtered server: backend=%s, session=%s", backendID, sessionID) | ||
| return creator(), nil | ||
| }) | ||
| c.mu.Lock() | ||
| defer c.mu.Unlock() | ||
|
|
||
| // Lazy eviction of expired entries | ||
| for k, entry := range c.servers { | ||
| if now.Sub(entry.lastUsed) > c.ttl { | ||
| logRouted.Printf("[CACHE] Evicting expired server: key=%s (idle %s)", k, now.Sub(entry.lastUsed).Round(time.Second)) | ||
| delete(c.servers, k) | ||
| } | ||
| } | ||
|
|
||
| if entry, ok := c.servers[key]; ok { | ||
| entry.lastUsed = now | ||
| return entry.server | ||
| } | ||
|
|
||
| logRouted.Printf("[CACHE] Creating new filtered server: backend=%s, session=%s", backendID, sessionID) | ||
| server := creator() | ||
| c.servers[key] = &filteredServerEntry{server: server, lastUsed: now} | ||
| return server |
There was a problem hiding this comment.
The new TTL-based eviction behavior in filteredServerCache (lastUsed tracking + lazy eviction) isn’t covered by tests. Since internal/server has routed_test.go, it would be good to add a focused unit test for getOrCreate() verifying that entries older than ttl are evicted and recreated, while recently-used entries are retained (using a controllable clock or very short ttl).
Summary
Addresses the quick-win and best-practice findings from the Go Fan module review of
modelcontextprotocol/go-sdk(#2911).Changes
1. Extract pagination helper (
connection.go)Introduces a generic
paginateAll[T]()helper that replaces three identical cursor-loop pagination implementations inlistTools,listResources, andlistPrompts. Eliminates ~45 lines of duplicated boilerplate while preserving identical logging behavior.2. Eliminate
resourceContentsintermediate type (tool_result.go)The local
resourceContentsstruct mirroredsdk.ResourceContentsfield-for-field for JSON unmarshaling. Since the SDK type works directly for this purpose, the local type is removed andsdk.ResourceContentsis used inline. The field-by-field copy in the resource content conversion is also eliminated.3. Explicit server options (
mcptest/server.go)Passes
&sdk.ServerOptions{}instead ofniltosdk.NewServer()— more defensive and documents intent, guarding against future SDK changes.4. Cache eviction for
filteredServerCache(routed.go)The routed-mode server cache previously grew unboundedly (one entry per backend×session pair, never evicted). Now includes:
SessionTimeout(30 minutes)getOrCreate()callroutedSessionTimeoutvariable shared between cache TTL and SDK options5. Transport ownership documentation (
http_transport.go)Adds lifecycle/ownership documentation to the
transportConnectortype, clarifying that the SDK session owns the returned transport afterConnect().Testing
make agent-finishedpasses (format, build, lint, all tests)Closes #2911