-
Notifications
You must be signed in to change notification settings - Fork 198
Implement RoutingStorage (LRU + Redis fallback) #4210
Description
Description
Create pkg/transport/session/storage_routing.go implementing RoutingStorage, a two-tier session storage that checks an LRU-bounded in-memory cache first and falls back to a Redis-backed Storage on cache misses. RoutingStorage implements the session.Storage interface and is the core building block for session-sticky routing across multiple proxyrunner replicas.
Context
Part of epic #263 — Horizontal Scaling for Proxyrunner (THV-0047). When the proxyrunner runs as multiple StatefulSet replicas, each replica must be able to resolve any session to its originating backend pod regardless of which replica was used during the MCP initialize request. RoutingStorage provides this guarantee by persisting the session_id → backend_url binding to Redis (delivered by the vMCP epic, RC-6/RC-7), while keeping a bounded local LRU cache to minimize hot-path Redis round-trips.
The BackendReplicas and SessionCacheSize fields added to RunConfig in TASK-001 (#265) are the configuration source: SessionCacheSize sets the LRU capacity passed to NewRoutingStorage; the downstream wiring tasks (TASK-005 and TASK-006) inject a RoutingStorage instance into the proxy Manager constructors.
Dependencies: #265 (TASK-001 — Extend RunConfig with replica and cache-size fields)
Blocks: TASK-005 (Session-aware backend routing in proxy transports), TASK-006 (Wire LRU cache-size bound in proxy transports)
Acceptance Criteria
-
pkg/transport/session/storage_routing.gois created in thesessionpackage with package-level copyright header matching the project style (// SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc./// SPDX-License-Identifier: Apache-2.0) -
RoutingStoragestruct is unexported; onlyNewRoutingStorageand theStorageinterface methods are exported -
NewRoutingStorage(maxLocalEntries int, remote Storage) *RoutingStoragepanics (or returns an error) whenmaxLocalEntries <= 0; a caller passing zero should be handled by the caller applying the 1000 default before calling this constructor -
Storewrites the session toremotefirst, then promotes the entry into the local LRU cache; if the remote write fails the local cache is not updated and the error is returned -
Loadchecks the local LRU cache first; on a hit it returns the cached session without contactingremote; on a miss it fetches fromremote, promotes the result into the local cache, and returns it;ErrSessionNotFoundfromremotepropagates unchanged -
Deleteremoves the entry from both the local LRU cache andremote;remote.Deleteis called even if the local cache did not contain the entry; errors fromremote.Deleteare returned to the caller -
DeleteExpireddelegates entirely toremote.DeleteExpired; after the remote call completes without error, any entries in the local LRU whose session ID is also absent from the cache (i.e., cannot be found by a Load) are evicted — or alternatively the local cache is not modified (either approach is acceptable since the local cache will self-correct on the next Load) -
Closecallsremote.Close(), clears the local LRU cache, and returns any error fromremote.Close() - All five methods satisfy the
session.Storageinterface (Store,Load,Delete,DeleteExpired,Close) - The type assertion
var _ Storage = (*RoutingStorage)(nil)compiles without error (add as a compile-time check in the source file) - Unit tests in
pkg/transport/session/storage_routing_test.gocover the scenarios listed in the Testing Strategy section below - All existing tests in
pkg/transport/session/continue to pass without modification -
go vet ./pkg/transport/session/...reports no issues - The LRU implementation uses an existing dependency (e.g.
github.com/hashicorp/golang-lru/v2) already present ingo.sum; adding a new direct dependency requires team review before merging
Technical Approach
Recommended Implementation
Create a new file pkg/transport/session/storage_routing.go. The struct holds two fields: an LRU cache (keyed by string, valued as Session) and a remote Storage. The LRU cache is guarded by the package's existing concurrency conventions — use a thread-safe LRU from hashicorp/golang-lru/v2 (already present in go.sum as a transitive dependency) to avoid a manual mutex. For DeleteExpired, delegating entirely to remote is the correct behavior because Redis owns the TTL source of truth; the local cache will self-correct as sessions expire and are re-fetched on miss. The remote Storage implementation is provided by the vMCP epic (RC-6/RC-7) and injected at construction time — RoutingStorage has no direct Redis client dependency.
Patterns & Frameworks
- Follow
LocalStorageinpkg/transport/session/storage_local.goexactly for method signatures, doc comments, and error variable reuse (ErrSessionNotFoundfromerrors.go) - Use
github.com/hashicorp/golang-lru/v2for the thread-safe LRU cache; preferlru.New[string, Session](maxLocalEntries)from that package — the generic API avoids type assertions - The
remote Storagefield follows the same constructor-injection pattern used throughout ToolHive (compare withsession.NewManagerWithStorageinmanager.go:104) - Add a compile-time interface check
var _ Storage = (*RoutingStorage)(nil)at the top of the file, consistent with other type assertions in the codebase - File-level SPDX header and
package sessiondeclaration must be present; follow the existing file header instorage_local.goexactly
Code Pointers
pkg/transport/session/storage_local.go— Primary reference for implementingStorage;RoutingStoragefollows the same method signatures, doc comment style, and error handling patterns; theLoadlogic (check → miss → returnErrSessionNotFound) is the template for the two-tier lookuppkg/transport/session/storage.go— TheStorageinterface definition; all five methods must be satisfiedpkg/transport/session/errors.go—ErrSessionNotFound,ErrSessionAlreadyExists— reuse these sentinel errors rather than defining new onespkg/transport/session/manager.go:104—NewManagerWithStorageshows how a customStorageis injected;RoutingStorageis theStorageargument passed here by TASK-005pkg/transport/session/storage_test.go— Test patterns:t.Parallel(),testify/assert+testify/require, context creation, andErrSessionNotFoundassertion style to follow in the new test filepkg/runner/config.go(lines 46–213, updated by registry: Implement automatic tag bump #265) —RunConfig.SessionCacheSizeis the configuration source; TASK-006 propagates it toNewRoutingStorage
Component Interfaces
// RoutingStorage is an LRU-bounded, two-tier session storage.
// Local LRU cache is checked first; on miss, falls back to the Redis Storage.
// Evicted local entries are transparently recovered from Redis on the next Load.
type RoutingStorage struct {
local *lru.Cache[string, Session] // thread-safe LRU from hashicorp/golang-lru/v2
remote Storage
}
// NewRoutingStorage constructs a RoutingStorage.
// maxLocalEntries: LRU cap; must be > 0
// remote: Redis-backed Storage from the vMCP epic (RC-6/RC-7)
func NewRoutingStorage(maxLocalEntries int, remote Storage) *RoutingStorage
// Storage interface methods — all five must be implemented:
func (r *RoutingStorage) Store(ctx context.Context, session Session) error
func (r *RoutingStorage) Load(ctx context.Context, id string) (Session, error)
func (r *RoutingStorage) Delete(ctx context.Context, id string) error
func (r *RoutingStorage) DeleteExpired(ctx context.Context, before time.Time) error
func (r *RoutingStorage) Close() error
// Compile-time interface guard (add near the top of the file):
var _ Storage = (*RoutingStorage)(nil)Session-to-backend-URL metadata convention (shared with TASK-005):
"backend_pod" → pod name, e.g. "mcp-server-0"
"backend_url" → full pod URL, e.g. "http://mcp-server-0.mcp-server.default.svc:8080"
These keys are written into the session's metadata map via session.SetMetadata before the session is passed to Store. RoutingStorage does not parse or validate these keys — it stores and retrieves the Session object opaquely.
Testing Strategy
Create pkg/transport/session/storage_routing_test.go in package session. Use t.Parallel() on all top-level and sub-tests. Use testify/assert and testify/require following the existing style in storage_test.go. For the remote dependency, use NewLocalStorage() as a test double (it already satisfies Storage).
Unit Tests
-
StorethenLoad: store a session viaRoutingStorage; verify the session is returned byLoadwithout contacting remote (use a counter-wrappedStorageto assert zero remoteLoadcalls after a local hit) -
Loadcache miss: create aRoutingStoragewith an empty local cache but a populated remote; callLoad; verify the session is returned and subsequently served from the local cache (no second remote call) -
Loadnot found: callLoadfor an unknown session ID; assertErrSessionNotFoundis returned -
Deleteremoves from both tiers: store a session; callDelete; verifyLoadreturnsErrSessionNotFoundand the remote storage also no longer contains the session -
Deletenon-existent: callDeletefor an unknown ID; assert no error is returned (consistent withLocalStoragebehavior) -
Storeremote failure: use a stubStoragethat returns an error onStore; verify the local cache is not populated and the error is propagated - LRU eviction + recovery: construct a
RoutingStoragewithmaxLocalEntries=2; store three sessions; verify the first session is no longer in the local cache but can still be recovered viaLoad(fetched from remote) -
DeleteExpireddelegates to remote: verify that expired sessions written to the underlying remote storage are cleaned up afterDeleteExpiredis called onRoutingStorage -
Closecloses remote: assert thatremote.Close()is called; verify subsequentLoadcalls return errors consistent with closed storage
Integration Tests
- No integration tests required for this task; the Redis-backed remote
Storageis tested in the vMCP epic (RC-6/RC-7) and not available in this package
Edge Cases
-
maxLocalEntries <= 0passed toNewRoutingStorage— assert a panic or non-nil error is returned depending on the chosen constructor contract - Concurrent
StoreandLoadcalls from multiple goroutines — run under-raceto verify the thread-safe LRU prevents data races -
Loadafter remoteStore(bypass local cache): populate the remote directly and verifyRoutingStorage.Loadfetches and caches the session correctly
Out of Scope
- Redis
Storageimplementation (RC-6/RC-7) — delivered by the vMCP epic;RoutingStorageaccepts it as an injectedStorageinterface value - Wiring
RoutingStorageinto the proxy transports (TASK-005) - Propagating
SessionCacheSizetoNewRoutingStoragein the proxy constructors (TASK-006) - Load balancing / round-robin selection of backend pods on
initialize— handled in TASK-005 - Per-session backend URL assignment logic — handled in TASK-005
- Changes to
StatefulSetreplica count wiring (TASK-002) or graceful shutdown (TASK-003)
References
- Epic #263: Horizontal Scaling for Proxyrunner (THV-0047)
- TASK-001 #265: Extend RunConfig with replica and cache-size fields
- RFC THV-0047
pkg/transport/session/storage_local.go— ReferenceStorageimplementationpkg/transport/session/storage.go—Storageinterface definitionpkg/transport/session/manager.go:104—NewManagerWithStorageconstructorpkg/transport/session/errors.go— Reusable session error sentinelspkg/transport/session/storage_test.go— Test style and patterns