perf: parallelize version checks, tune SQLite/HTTP, fix flaky IPC test#16
perf: parallelize version checks, tune SQLite/HTTP, fix flaky IPC test#16kevinelliott merged 5 commits intomainfrom
Conversation
New agents: OpenClaw, Junie CLI, Mistral Vibe, Letta Code, Deep Agents CLI, Trae Agent, Kiro CLI, ForgeCode, Hermes Agent, Roo Code CLI, Tabnine CLI, Cortex Code, FetchCoder Bump catalog version to 1.0.25 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Voice/conversational AI agent management CLI by ElevenLabs. Bump catalog version to 1.0.26 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Forge CLI: Multi-agent TDD orchestrator by Red Green Labs - nono: Kernel-enforced agent sandbox by Always Further - mcp2cli: MCP/OpenAPI/GraphQL to CLI bridge Bump catalog version to 1.0.27 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Research + fix pass across the hot paths: Performance - Parallelize GetLatestVersion loops in CLI list/update, TUI, systray via new internal/versionfetch helper (errgroup + sem of 8); 10-20x on typical N-agent workloads - BinaryStrategy.Detect: deterministic match phase + parallel version extraction (concurrency 4) - exec.LookPath memoized via sync.Map keyed on PATH+name (all platforms) - brew info per-agent coalesced via sync.Map + sync.Once (5m TTL) - pip PyPI fallback uses shared http.Client instead of curl subprocess - catalog.Refresh coalesced via singleflight; honors stored ETag with If-None-Match / 304 - SQLite: _busy_timeout=5000, _synchronous=NORMAL, SetMaxOpenConns(1); migrations guarded by PRAGMA user_version - REST handlers share getAgentsWithCache helper honoring Detection.CacheDuration (+ ?refresh=true) - REST: ReadHeaderTimeout + MaxHeaderBytes; real /status with version and last-refresh timestamps - gRPC: keepalive, 16MiB msg caps, panic-recovery unary+stream interceptors Oddities - Fix IPC listenForNotifications not exiting on ctx cancel (SetReadDeadline poll loop); TestListenForNotificationsContextCanceled now 25/25 under -race - Spinner: go-isatty TTY check; honor NO_COLOR and TERM=dumb - Unify --no-color via output.NoColor() helper - Detector: channel buffer sized by applicable strategies - catalog: log cache-save failures via slog instead of silent drop - systray: replace shutdown time.Sleep(100ms) with ctx+WaitGroup handshake; checkUpdates actually fetches latest versions when AutoCheck=true - Config.Catalog.RefreshOnStart marked deprecated (dead flag) Tests - New: migration version guard, REST cache helper, gRPC recovery, platform lookpath cache, singleflight coalescing, 304 ETag, brew/pip caching, binary parallelism ordering, detector channel sizing, versionfetch helper Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1019 - errcheck: use comma-ok form on type assertions in platform lookpath cache, catalog singleflight result, brew latest-version cache - errcheck: annotate the intentional _ = g.Wait() in versionfetch - gofmt: reformat struct tag alignment in brew.go and test case comments in versionfetch_test.go - goimports: group golang.org/x/sync/singleflight as third-party (with -local github.com/kevinelliott/agentmanager) - misspell: cancelled -> canceled, behaviour -> behavior, Honour -> Honor - SA1019: suppress deprecated RefreshOnStart read in TUI display with //nolint:staticcheck (retained for backward-compat display) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR targets hot-path performance and reliability improvements across agent detection, version checks, catalog refresh, and API surfaces, while fixing a flaky IPC notification test and hardening output behavior in non-TTY environments.
Changes:
- Parallelized/optimized agent version and binary detection workflows; added memoization and TTL caching for repeated lookups.
- Improved catalog/REST/gRPC robustness (ETag + singleflight refresh, server hardening options, status/version reporting).
- SQLite startup tuning + migration guard using
PRAGMA user_version, plus new/updated tests and changelog entries.
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/storage/sqlite_test.go | Adds migration guard tests around PRAGMA user_version. |
| pkg/storage/sqlite.go | Tunes SQLite DSN/pool and adds user_version-guarded migration fast path. |
| pkg/platform/windows.go | Switches to cached LookPath helper. |
| pkg/platform/platform.go | Introduces global exec.LookPath memoization keyed by PATH. |
| pkg/platform/linux.go | Switches to cached LookPath helper. |
| pkg/platform/darwin.go | Switches to cached LookPath helper. |
| pkg/platform/cache_test.go | Adds tests for cachedLookPath correctness, invalidation, concurrency safety. |
| pkg/ipc/ipc.go | Adds read-deadline polling to make notification listener responsive to ctx cancellation. |
| pkg/installer/providers/pip.go | Replaces curl subprocess with shared http.Client + JSON decode for PyPI. |
| pkg/installer/providers/brew.go | Adds process-wide TTL cache and per-key coalescing for brew info latest-version lookups. |
| pkg/detector/strategies/strategies_test.go | Adds test asserting BinaryStrategy parallelism preserves deterministic order. |
| pkg/detector/strategies/binary.go | Two-phase binary detect: sequential match + bounded-parallel version extraction. |
| pkg/detector/detector_test.go | Adds regression test around applicable-strategy channel sizing. |
| pkg/detector/detector.go | Prefilters applicable strategies and sizes channels accordingly. |
| pkg/config/config.go | Marks RefreshOnStart deprecated with clarifying comments. |
| pkg/catalog/manager_test.go | Adds tests for singleflight coalescing, ETag/304 behavior, cache-save error logging. |
| pkg/catalog/manager.go | Adds singleflight refresh, ETag If-None-Match handling, warning log on cache save failures. |
| pkg/api/rest/server_test.go | Updates /status behavior expectations and adds detection-cache helper tests. |
| pkg/api/rest/server.go | Adds server options (version/start time), cache-aware agent detection, HTTP server hardening, richer /status. |
| pkg/api/grpc/server_test.go | Adds tests for panic recovery interceptors and version option wiring. |
| pkg/api/grpc/server.go | Adds keepalive, message size limits, panic recovery interceptors, and version/start-time options. |
| internal/versionfetch/versionfetch_test.go | Adds unit tests for concurrent version fetching helper (ordering, errors, concurrency cap, cancellation). |
| internal/versionfetch/versionfetch.go | New helper to concurrently fill LatestVersion with bounded parallelism. |
| internal/tui/app.go | Switches update checks to parallel version fetch helper. |
| internal/systray/systray.go | Uses parallel version fetch helper; replaces sleep-based shutdown with ctx/WG coordination; wires REST version/start time. |
| internal/cli/upgrade.go | Uses unified output.NoColor when constructing printer. |
| internal/cli/output/spinner.go | Disables spinner output on non-TTY; honors NO_COLOR and TERM=dumb; avoids ANSI in pipes. |
| internal/cli/output/colors.go | Adds shared NoColor helper and Printer.NoColor() accessor. |
| internal/cli/doctor.go | Uses unified output.NoColor when constructing printer. |
| internal/cli/catalog.go | Uses unified no-color derivation for printer/spinner. |
| internal/cli/agent.go | Uses unified no-color; replaces sequential version checks with parallel helper; prints aggregated per-agent errors. |
| go.sum | Adds checksums for new module dependencies (e.g., golang.org/x/sync). |
| go.mod | Promotes go-isatty to direct dep; adds golang.org/x/sync. |
| catalog.json | Bumps catalog version/date and adds new agents/metadata. |
| CHANGELOG.md | Documents fixes (IPC flake, spinner non-TTY, unified no-color), deprecation note, known issue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| err := store.db.QueryRowContext(ctx, | ||
| "SELECT name FROM sqlite_master WHERE type='table' AND name='installations'", | ||
| ).Scan(&name) | ||
| if err == nil { | ||
| t.Fatal("installations table should not exist after fast-path migrate") | ||
| } |
There was a problem hiding this comment.
This assertion treats any non-nil Scan error as success. That can let the test pass if the query fails for reasons other than “no such table” (e.g., a transient DB error). It would be more robust to explicitly assert sql.ErrNoRows (and fail on any other error).
| "brew": { | ||
| "method": "brew", | ||
| "package": "openclaw", | ||
| "command": "brew install openclaw", | ||
| "update_cmd": "brew upgrade openclaw", | ||
| "uninstall_cmd": "brew uninstall openclaw", | ||
| "platforms": ["darwin"], | ||
| "metadata": { | ||
| "cask": "true" | ||
| } |
There was a problem hiding this comment.
The Homebrew install method is marked as a cask via metadata ("cask": "true"), but the command/update_cmd/uninstall_cmd don’t include --cask, and BrewProvider currently only infers cask-ness from the command (presence of "--cask"/"cask") or metadata["type"]=="cask". As written, this is likely to be treated as a formula and brew info/install may fail. Consider changing these commands to use brew ... --cask openclaw (and/or align metadata to what the provider expects).
| return path, err | ||
| } | ||
|
|
||
| // resetLookPathCache clears the memoization cache. Intended for tests. |
There was a problem hiding this comment.
resetLookPathCache() reassigns the package-level sync.Map (lookPathCache = sync.Map{}). That struct assignment is not safe if any other goroutine is concurrently calling cachedLookPath (it can race even though sync.Map operations are safe). Since this is only for tests, consider clearing the existing map via Range+Delete, or guarding reset with a mutex/atomic swap and documenting that it must only be called when no other goroutines can access the cache.
| onceI, _ := brewLatestVersionOnce.LoadOrStore(key, &sync.Once{}) | ||
| once, ok := onceI.(*sync.Once) | ||
| if !ok { | ||
| // Fallback — recompute directly rather than risk a nil-pointer panic. | ||
| return p.fetchLatestVersionUncached(ctx, packageName, isCask) | ||
| } | ||
| once.Do(func() { | ||
| version, err := p.fetchLatestVersionUncached(ctx, packageName, isCask) | ||
| brewLatestVersionCache.Store(key, brewLatestEntry{ | ||
| version: version, | ||
| err: err, | ||
| cachedAt: time.Now(), | ||
| }) |
There was a problem hiding this comment.
The per-key sync.Once coalescing caches whatever result the first caller computes, including context cancellation/deadline errors from that caller’s ctx. This means one short-deadline (or canceled) request can poison the cache for other concurrent callers (and for up to brewLatestVersionTTL). Consider treating context.Canceled / context.DeadlineExceeded as non-cacheable (don’t Store), or running the underlying fetch with a context that isn’t tied to the first caller (e.g., a bounded background context) and only using the caller ctx to wait/abort.
| v, err, _ := m.refreshGroup.Do("catalog-refresh", func() (interface{}, error) { | ||
| return m.doRefresh(ctx) | ||
| }) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if v == nil { | ||
| return nil, nil | ||
| } | ||
| result, ok := v.(*RefreshResult) | ||
| if !ok { | ||
| return nil, fmt.Errorf("unexpected singleflight result type %T", v) | ||
| } | ||
| return result, nil |
There was a problem hiding this comment.
singleflight.Group.Do does not accept a context, so the first caller’s ctx is the one used by doRefresh(). If that ctx is canceled/short-deadline, it will cancel the shared HTTP fetch and all concurrent Refresh callers will fail even if their ctx is still valid. Consider running doRefresh with a context that isn’t derived from the first caller (e.g., background + internal timeout), and let individual callers respect their own ctx while waiting for the shared result (DoChan/select).
| v, err, _ := m.refreshGroup.Do("catalog-refresh", func() (interface{}, error) { | |
| return m.doRefresh(ctx) | |
| }) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if v == nil { | |
| return nil, nil | |
| } | |
| result, ok := v.(*RefreshResult) | |
| if !ok { | |
| return nil, fmt.Errorf("unexpected singleflight result type %T", v) | |
| } | |
| return result, nil | |
| ch := m.refreshGroup.DoChan("catalog-refresh", func() (interface{}, error) { | |
| refreshCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | |
| defer cancel() | |
| return m.doRefresh(refreshCtx) | |
| }) | |
| select { | |
| case <-ctx.Done(): | |
| return nil, ctx.Err() | |
| case res := <-ch: | |
| if res.Err != nil { | |
| return nil, res.Err | |
| } | |
| if res.Val == nil { | |
| return nil, nil | |
| } | |
| result, ok := res.Val.(*RefreshResult) | |
| if !ok { | |
| return nil, fmt.Errorf("unexpected singleflight result type %T", res.Val) | |
| } | |
| return result, nil | |
| } |
| // match the number of goroutines that will actually send (previously | ||
| // sized to len(strategies), which over-allocated and, on the edge, risked | ||
| // a mismatch if senders outnumbered the buffer). |
There was a problem hiding this comment.
The comment about channel sizing is misleading: when the channels were sized to len(strategies), the buffer could not be smaller than the number of sending goroutines (applicable strategies are a subset), so there wasn’t a risk of “senders outnumbering the buffer”. If the intent is just to avoid over-allocation, please adjust the wording to reflect that (and avoid implying a prior correctness bug).
| // match the number of goroutines that will actually send (previously | |
| // sized to len(strategies), which over-allocated and, on the edge, risked | |
| // a mismatch if senders outnumbered the buffer). | |
| // reflect only the strategies that may actually run and send. Sizing | |
| // these channels to len(strategies) would also be correct, but could | |
| // allocate more buffer space than necessary when some strategies are not applicable. |
Consolidates the cache-aware detection and parallel latest-version fetch sequence into a new internal/orchestrator package. The four call sites that previously open-coded this flow (agent list, agent update, the TUI loader, and the systray refresh) now delegate to orchestrator.Pipeline.DetectAndCheckVersions, which removes the drift that previously required commit bf82ca8 to patch up agent update. Line-count changes across call sites: internal/cli/agent.go: +122 / -122 (list & update simplified) internal/tui/app.go: +18 / -78 loadDataWithRefresh internal/systray/systray.go: +15 / -68 refreshAgentsWithCache Net across the three edits: 57 insertions, 228 deletions (duplicated cache-logic collapsed into the new package). The new package itself is 255 lines of code plus 601 lines of tests (97.4% coverage). Behavior preserved: - agent list: drops version-check errors silently - agent update: aggregates errors and prints them (now passes ForceRefresh so the flow matches the original, which always ran a fresh detection before choosing updates) - TUI: surfaces fatal load errors via dataLoadedMsg{err}, swallows version-check errors - Systray: keeps the agentsMu write-lock swap and sets TolerateCatalogError=true to match the original tolerant startup Guardrails honored: - No changes to catalog.Manager, detector.Detector, installer.Manager, or storage.Store public types - internal/versionfetch is still the engine; Pipeline wraps it - pkg/api/rest and pkg/api/grpc keep their own cache helper (PR #16) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote the Unreleased section to 1.1.0 and capture everything landed since v1.0.24: the parallel version-check perf work (#16), the google.golang.org/grpc CVE-2026-33186 security bump (#17), the golangci-lint pinning (#18), and the shared detect+version-check pipeline refactor (#19). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Team audit + fix pass across the hot paths. 5 research agents identified issues; 4 implementation agents fixed them across 34 files (+1877 / -208). All existing tests continue to pass; ~15 new tests added.
Performance wins
GetLatestVersionloopserrgroup+ semaphore(8) in newinternal/versionfetchhelper; used by CLI list/update, TUI, systrayBinaryStrategy.Detectbrew infoper-agentsync.Map+sync.Oncecoalescing, 5-min TTLcurlsubprocess with sharedhttp.Clientexec.LookPathsync.Mapcache keyed on PATH+name (darwin/linux/windows)catalog.Refreshsingleflightcoalescing; ETagIf-None-Match/304; warn-log on cache write_busy_timeout=5000,_synchronous=NORMAL, pool=1;PRAGMA user_versionmigration guardgetAgentsWithCachehonorscfg.Detection.CacheDuration+?refresh=true;ReadHeaderTimeout/MaxHeaderBytes; real/statuswith version + last-refresh timestampsOddities fixed
TestListenForNotificationsContextCanceled—conn.Receive()blocked past ctx cancel. Fix:SetReadDeadline(500ms)poll loop. 25/25 pass under-race.go-isattycheck;NO_COLOR/TERM=dumbhonored.--no-colorunified viaoutput.NoColor()+Printer.NoColor().log/slog.time.Sleep(100ms)→ ctx + WaitGroup handshake.checkUpdatesplaceholder → real parallel version fetch whenAutoCheck=true.RefreshOnStartflag marked deprecated.cover.out/coverage.outremoved from working tree (already gitignored).Tests added
Migration version guard, REST cache helper, gRPC recovery, platform lookpath cache, singleflight coalescing, 304 ETag, brew/pip caching, binary parallelism ordering, detector channel sizing,
versionfetchhelper.Test plan
make check(fmt, vet, lint, test) greengo test ./... -race -short— all packages PASS locallygo test ./pkg/ipc/... -race -count=25 -run TestListenForNotificationsContextCanceled— 25/25 PASSagentmgr agent list— visually faster on a machine with multiple installed agentsagentmgr agent update --all— faster version-check phase; error aggregation unchangedagentmgr tui— first paint unchanged in feel; refresh visibly fastercheckUpdatesnotifications correctGET /api/v1/agentswith cache hit ≪ with cache miss;?refresh=truebypassesGET /api/v1/statusreturns realversion+ non-zerolast_catalog_refreshafter a refresh--no-colorandNO_COLOR=1and non-TTY stdout all suppress ANSI correctlyNotes
ld: warning: ignoring duplicate libraries: '-lobjc'remains (cosmetic; auto-linked by bothgetlantern/systrayandprogrium/darwinkit).🤖 Generated with Claude Code