Skip to content

Performance Review #23

Description

@spboyer

Migrated from spboyer/waza#445
Last validated: March 3, 2026 — 25 still present, 2 partial, 1 fixed (#12). See comments for details.

Go Performance Audit — Dual-Model Expert Review

Audited by: Turk (Go Performance Specialist)
Models used: GPT-5.3-Codex (28 findings) + Claude Opus 4.6 (23 findings)
Codebase: 239 Go files, ~53K LOC


Summary

30 unique findings across 8 categories. 19 findings overlap between both models (highest confidence), 7 unique to Codex, 4 unique to Opus. 3 severity disagreements resolved below.


Agreement — Both Models Found These (highest confidence)

# Severity File Issue
1 🔴 P0 orchestration/runner.go:658-671 O(N²) stop-on-error scan — re-scans all outcomes each iteration
2 🔴 P0 orchestration/runner.go:1167+ Graders recreated per run × test × grader (+ config reload)
3 🔴 P0 orchestration/runner.go:1081-1140 Fixture files re-read per run (redundant I/O)
4 🟡 P1 cmd/waza/cmd_run.go:607 context.Background() — no signal cancellation, Ctrl+C broken
5 🟡 P1 graders/inline_script_grader.go:142-160 Temp script file created/deleted per Grade() call
6 🟡 P1 graders/program_grader.go:47-55 .waza.yaml loaded per grader construction
7 🟡 P1 webapi/store.go:258-299 Summaries recomputed per request (no caching)
8 �� P1 webapi/store.go:47-128 Full reload under write lock / race on first load
9 🟡 P1 cache/cache.go:110-153 Global mutex held during disk I/O
10 🟡 P1 tokens/bpe/tokenizer.go:201-332 Regex + slice churn in hot encode paths
11 🟡 P1 tokens/bpe/tokenizer.go:605 Decode buffer starts at zero capacity
12 ✅ Fixed cmd/waza/dev/links.go:336-351 HTTP response body not drained — breaks keep-alive (Fixed: body now properly closed)
13 🟡 P1 cmd/waza/dev/links.go:191 goldmark.New() recreated per file
14 🟡 P1 jsonrpc/handlers.go:260-281 h.runs map never pruned — memory leak
15 🟢 P2 execution/session_events_collector.go:54-118 Shared state mutation without synchronization
16 🟢 P2 trigger/runner.go:56-64 Goroutines dont check ctx.Done() before semaphore
17 🟢 P2 spinner/spinner.go:19-30 time.After in loop leaks timers
18 🟢 P2 checks/token_limits.go:73-79 Double string allocation on file content
19 🟢 P2 jsonrpc/transport.go:44-67 Marshal + append newline per message

Unique to GPT-5.3-Codex

# Severity File Issue
20 🟡 P1 webserver/server.go:46-50 Missing ReadTimeout, WriteTimeout, IdleTimeout
21 🟡 P1 execution/copilot.go:266-283 Temp workspaces retained until engine shutdown (disk/inode buildup)
22 🟡 P1 orchestration/runner.go:174-178 Shutdown uses possibly-cancelled context
23 🟡 P1 tokens/bpe/tokenizer.go:475-589 EncodeTrimPrefix growing progress map
24 🟢 P2 cmd/waza/cmd_run.go:994-1000 Full-buffer marshal for large results (use streaming)
25 🟢 P2 cmd/waza/dev/links.go:55-63 Default transport not tuned for batch URL checks
26 🟢 P2 jsonrpc/handlers.go:266-267 Detached context from request lifecycle

Unique to Claude Opus 4.6

# Severity File Issue
27 🟢 P2 orchestration/runner.go:533-547 Template context copies spec.Inputs map per CSV row
28 🟢 P2 cache/cache.go:64-88 Three separate json.Marshal calls for cache key
29 🟢 P2 webapi/handlers.go:119 json.Encoder.Encode() error silently ignored
30 🟢 P2 execution/copilot.go:186 Shutdown accepts ctx but ignores it — RemoveAll could hang

Severity Disagreements (Resolved)

File Issue Codex Says Opus Says Resolution
orchestration/runner.go:658 O(N²) scan 🟢 P2 🔴 P0 P0 — compounds with run count
jsonrpc/handlers.go:260 Runs map never pruned 🔴 P0 🟢 P2 P1 — real leak, but MCP server is short-lived
orchestration/runner.go:1081 Fixture re-read per run 🟢 P2 🔴 P0 P0 — multiplicative with runs × tests

🏆 Prioritized Action List

P0 — Fix Now (biggest impact)

  1. Cache graders per spec — create once, reuse across runs. Eliminates Adding Microsoft SECURITY.MD #2 and test #6 together.
  2. Cache fixture file content — read once at test start, reuse across runs. Fixes This repo is missing important files #3.
  3. Replace O(N²) scan with boolean flag — track anyFailed once. Fixes This repo is missing a LICENSE file #1.
  4. Wire signal.NotifyContext — one-line fix in cmd_run.go. Fixes chore(deps): Bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.40.0 #4.

P1 — Fix Soon

  1. Cache RunSummary in store — compute on load, invalidate on reload. Fixes Generate a task.yaml from a copilot session log #7.
  2. Reuse inline script grader temp files — materialize once per grader instance. Fixes chore(deps): Bump rollup from 4.58.0 to 4.59.0 in /site #5.
  3. RWMutex + narrow lock scope in cache — I/O outside critical section. Fixes Design discussion: do we need to run snippets in a sandbox? #9.
  4. Pre-allocate tokenizer slicesmake([]int, 0, len/4) heuristic. Fixes [E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support #10, Bring Your Own Model — Ollama, OpenAI, Anthropic, OpenCode engines #11.
  5. Drain HTTP response bodies — ✅ Fixed. Design discussion: logging Copilot sessions #12 resolved.
  6. Add server timeouts — ReadTimeout, WriteTimeout, IdleTimeout. Fixes Epic: Consolidate configuration into .waza.yaml #20.
  7. Prune JSON-RPC runs map — delete from h.runs alongside h.cancelFuncs. Fixes feat: Map OpenAI Evals YAML format → waza graders #14.

P2 — When Convenient

  1. Reuse goldmark parser (feat: Eval & Grader Registry — design doc #13)
  2. Compact JSON for cache/subprocess (Indicate cost comparison of models #19)
  3. Remaining items (feat: Go-module-style grader/eval references #15-18, feat: Three-state token budget in waza check (ok/warning/error) #22-30)

This audit was generated by running two independent model passes and synthesizing the overlap. Full decision records are in .ai-team/decisions.md.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions