Improve `waza run` concurrency: shared Copilot client + auto-sized workers (#135) by Copilot · Pull Request #221 · microsoft/waza

Copilot · 2026-04-22T21:01:17Z

waza run --workers 6 on a suite of 3 tasks + 3 trigger tests only ever runs 3 concurrent sessions because tasks and triggers are executed in serial phases, each with its own worker semaphore. The issue asked for research and recommendations; this PR adds the design doc and ships Phase 1 of the plan (R2 + R3).

Adds docs/design/135-improve-concurrency.md (following the existing docs/design/ convention) capturing the diagnosis and a phased proposal, including a §8 "Critique resolutions" section that addresses the B1/B2/B3 review feedback.

Diagnosis

Phase barrier in cmd/waza/cmd_run.go: runner.RunBenchmark fully drains before trigger.Runner.RunDetailed starts. Each phase has its own semaphore sized at Workers, so Workers > items_per_phase is unused capacity.
Per-call Copilot clients in graders: internal/graders/prompt_grader.go (gradeIndependent, runPairwiseOnce) calls copilot.NewClient(...) + client.Stop() every invocation, spawning/tearing down the embedded Copilot CLI subprocess each time. The engine itself already shares one client; graders don't.
Engine instance is already shared between the task runner and trigger runner and is mutex-protected, so no concurrency-safety blocker for unification.

Recommendations (prioritized)

R1 — Unified worker pool spanning task + trigger jobs. Introduce a Job interface, build the pool once in cmd_run.go, have TestRunner and trigger.Runner expose Plan() []Job, then submit both sets and demux results by Job.Kind(). Fixes the reported 3/6 utilization. Follow-up PR.
R2 — Process-wide singleton Copilot SDK client (per @richardpark-msft). All per-call state already lives on the session; expose execution.SharedClient(...) behind a sync.Once, switch prompt_grader to it, and drop the per-call Stop(). Removes the dominant local-run latency. Implemented in this PR.
R3 — Auto-size --workers default to min(NumCPU, totalJobs, 8) (currently hard-coded 4) and log a one-line "capped to N (phase)" notice. Implemented in this PR. The conservative DefaultWorkerCap = 8 ceiling addresses B1 from the design review until R4 lands.
R4 (stretch) — Split --workers (job concurrency) from --sessions (Copilot session concurrency) as a second semaphore inside the pool. Defer until the single knob proves insufficient.

Implementation in this PR

R2 — shared Copilot SDK client

New internal/execution/sdkclient.go exposing SharedClient(opts) CopilotClient and ShutdownSharedClient(ctx) error, both guarded by sync.Once.
CopilotEngine now defaults to the shared client and tracks an ownsClient flag; Shutdown only calls client.Stop() for engines that own their client (test path). The shared SDK process is stopped exactly once via a top-level defer execution.ShutdownSharedClient(...) in cmd_run.go::runCommandE, so it survives the per---model engine loop.
CopilotEngine exposes CopilotClient() so the runner can plumb the shared client to graders.
graders.Context gains a CopilotClient execution.CopilotClient field. TestRunner.buildGraderContext populates it via an interface assertion against the engine.
prompt_grader.gradeIndependent and runPairwiseOnce use a new shared openGradingSession helper that prefers Context.CopilotClient when set (no Start/Stop overhead, session is Disconnect'd on cleanup) and falls back to the legacy per-call copilot.NewClient(...)/Stop() path otherwise so direct grader tests (prompt_grader_test.go) keep working.

R3 — auto-sized workers

New internal/orchestration/workers.go with ResolveWorkers(requested, jobs, phase, out) int and DefaultWorkerCap = 8.
When --workers is unset: defaults to min(runtime.NumCPU(), jobs, 8).
When --workers exceeds the job count: clamps and emits workers=N capped to M (phase).
Wired into TestRunner.runConcurrent and trigger.Runner.RunDetailed. The CLI's "Parallel:" line now prints workers=auto when the auto-default applies.

Design doc updates

§8 "Critique resolutions" addressing B1 (cap), B2 (DI path = Context.CopilotClient), B3 (process-wide singleton lifecycle decoupled from per-engine Shutdown), N4 (use symbol names), and S3 (revised order to R3 → R2 → R1 → R4).

Tests

internal/execution/sdkclient_test.go — SharedClient returns the same instance, ShutdownSharedClient is idempotent and safe when no client was constructed.
internal/orchestration/workers_test.go — auto-sizing respects the cap, explicit values are clamped to job count and emit a notice, no notice when within budget.
Race-enabled tests pass for internal/execution, internal/orchestration, internal/trigger, internal/graders, and cmd/waza.

Open questions for reviewers

Prefer Job interface in internal/orchestration, or keep packages self-contained and have Plan() return work descriptors that the CLI hands to a thin pool? (R1 follow-up)
@richardpark-msft — confirm the SDK is safe sharing one Client across N concurrent CreateSession calls without per-call Start/Stop. The implementation now relies on this.
R4 now or later? Default cap of 8 is in place as a holdover.

Agent-Logs-Url: https://github.com/microsoft/waza/sessions/55c73a11-8dec-401a-925f-3f2a0536fc78 Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com>

spboyer · 2026-04-28T19:40:35Z

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code

Claim	Verdict
Phase barrier in `cmd_run.go` (L762 → L804–808)	✅ confirmed
Per-call `copilot.NewClient` in `prompt_grader.go` (L62, L313)	✅ confirmed; pairwise calls `runPairwiseOnce` twice per invocation
Engine is shared, mutex-protected, concurrency-safe	✅ confirmed (`sessionsMu`, `usageCollectorsMu`, `workspacesMu`, `startOnce`, `shutdownOnce`)

Blocking issues

B1 — R3 without R4 is dangerous. min(NumCPU, totalJobs) on a 64-core CI runner with 100 jobs gives 64 concurrent Copilot sessions. No evidence the SDK or backend handles this gracefully. Either ship R4 alongside R3, or cap the auto-default conservatively (e.g., min(NumCPU, totalJobs, 8)) until R4 lands.

B2 — R2 design is incomplete: dependency injection for graders is unspecified. prompt_grader.go has zero imports from internal/execution. The doc says "switch to shared client" but doesn't say HOW the grader gets access — through graders.Context? A package-level singleton? A factory threaded from the runner? Pick one and document the wiring.

B3 — "Process-wide singleton" is wrong for multi-model runs. cmd_run.go creates a new CopilotEngine per model iteration (line 504 loop, engine at L641). If the singleton is torn down when one engine shuts down, the next model can't use it. Clarify: is it per-engine (existing sync.Once already does this) or truly process-wide (must outlive multiple engine.Shutdown() calls)?

Non-blocking

N1 — prompt_grader_test.go:167 uses real copilot.NewClient directly (integration test, gated by skipIfCopilotNotEnabled). Acknowledge in R2 plan.
N2 — R1's Plan() []Job API underestimates preamble complexity in RunBenchmark: hooks (before_run), skill validation, agent grader augmentation, dataset/template/range expansion, baseline-mode branching. Doc should explicitly state whether R1 applies to baseline mode.
N3 — hooks.Runner is shared across goroutines today. Verify it's already goroutine-safe before R1 amplifies concurrency.
N4 — Line numbers already stale after refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166) #222 rename (e.g., runConcurrent is L784 not L776). Use function/symbol names instead of line numbers — rename-proof.

Suggestions

S1 — "Halves wall-clock" claim has no data. Trigger tests use CancelOnSkillInvocation and terminate early, so the gain may be 10–20%, not 50%. Add a before/after benchmark to the validation plan, or qualify the claim.
S2 — Grader network concurrency under high --workers could hit Copilot LLM-as-judge rate limits. Currently out of scope but could undermine gains from R1+R2.
S3 — Proposed order should be R2 → R1 → R3+R4. R3 needs totalJobs counting BOTH tasks and triggers, which only makes sense after R1 unifies them.

Final verdict

Question	Verdict
Diagnosis accurate?	✅ yes
R1 sound?	⚠️ mostly — clarify baseline mode + preamble factoring
R2 sound?	❌ partially — fix B2 (DI path) and B3 (lifecycle)
R3 sound?	⚠️ risky without R4 — fix B1
R4 sound?	✅ but mis-prioritized; should ship with R3
Order correct?	❌ should be R2 → R1 → R3+R4

Must address before implementation: B1, B2, B3. Can defer: N2, N4, S1.

spboyer · 2026-05-11T14:47:40Z

@copilot implement the plan

Copilot · 2026-05-11T15:03:39Z

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code...

Resolutions captured in a new §8 of the design doc and reflected in the implementation that just landed:

B1 — Auto-default is now min(NumCPU, jobs, DefaultWorkerCap=8) via orchestration.ResolveWorkers. The 8-worker ceiling protects 64-core CI runners until R4 lands; explicit --workers N still wins.
B2 — Wiring is via graders.Context.CopilotClient (execution.CopilotClient interface). TestRunner.buildGraderContext populates it from the engine through an inline interface assertion (no new public engine method exposed beyond CopilotClient()); prompt_grader.openGradingSession uses it when set and falls back to the legacy per-call copilot.NewClient/Stop() path so prompt_grader_test.go:167 keeps working unchanged (N1).
B3 — Singleton is process-wide, not per-engine. execution.SharedClient is built once via sync.Once, and CopilotEngine.Shutdown no longer calls client.Stop() when the engine was built on top of the shared client. cmd_run.go::runCommandE defers execution.ShutdownSharedClient to stop the underlying SDK process exactly once after every --model iteration completes.
N4 — Doc references switched to symbol names; line numbers retained only as historical context.
S3 — Order revised to R3 → R2 → R1 → R4. R3 + R2 ship in this PR; R1 (unified pool) and R4 (split workers/sessions) are explicit follow-ups.

S1 (wall-clock claim) and S2 (judge rate-limits) are noted but not measured in this PR — they fit naturally into the R1 validation plan.

@copilot implement the plan

Phase 1 of the plan is implemented in this PR (R3 + R2). R1 (unified worker pool) and R4 (split --workers/--sessions) will follow as separate PRs per the revised §8 / S3 ordering.

Highlights:

execution.SharedClient + ShutdownSharedClient (process-wide sync.Once); CopilotEngine no longer stops the client during per-engine Shutdown.
prompt_grader now uses gradingContext.CopilotClient when present, eliminating the per-call copilot.NewClient(...)/Stop() overhead. Legacy path retained for direct-instantiation tests.
orchestration.ResolveWorkers auto-sizes to min(NumCPU, jobs, 8) and logs workers=N capped to M (phase) when the requested count exceeds available work.

Race-enabled tests pass for internal/execution, internal/orchestration, internal/trigger, internal/graders, and cmd/waza. Validation found no CodeQL alerts.

spboyer

This improves waza run concurrency by sharing the Copilot SDK client and auto-sizing workers; I found one session cleanup issue and one doc mismatch to address.

Issues to address:

internal/graders/prompt_grader.go:70 - shared-client prompt grader sessions are disconnected but not deleted
docs/design/135-improve-concurrency.md:230 - lifecycle docs reference a stale SharedClient signature and shutdown hook

Resolve PR #221 conflicts with current main and keep prompt graders on the shared Copilot execution path while preserving ephemeral session cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR broadens Waza’s evaluation/runtime surface: it introduces shared Copilot SDK client lifecycle and new execution request controls, adjusts concurrency defaults via worker auto-sizing, and extends project configuration + documentation (including .agent.md support and instruction_files) alongside assorted web/dashboard tooling updates.

Changes:

Add process-wide shared Copilot SDK client plumbing and extend execution/session event handling to support grader-style ephemeral sessions and richer request options.
Auto-size/clamp worker concurrency for task/trigger phases and introduce worker resolution helpers.
Expand project config/schema/docs/examples (custom eval/task naming, instruction_files, .agent.md discovery) and update web tooling (ESLint, Playwright port config).

Show a summary per file

File	Description
web/src/hooks/useSSE.ts	Fix SSE run state reset logic using a ref-backed current run.
web/src/components/TrajectoryViewer.tsx	Refactor event description derivation to a single expression.
web/src/components/TrajectoryDiff.tsx	Make tool alignment helper internal (non-exported).
web/src/components/RunsTable.tsx	Add ESLint disable directive for TanStack Table usage.
web/playwright.config.ts	Allow configuring Playwright preview port via env var.
web/package.json	Add ESLint + related tooling dependencies.
web/eslint.config.js	Introduce flat ESLint config for TS/React hooks/refresh rules.
web/e2e/run-detail.spec.ts	Remove unused fixture import.
web/e2e/fixtures/mock-data.ts	Remove unused Playwright `Page` type import.
web/e2e/dashboard.spec.ts	Remove unused fixture imports.
web/dist/index.html	Update built dashboard entrypoint asset hash reference.
version.txt	Bump repo version.
site/src/content/docs/reference/waza-yaml.mdx	Document `files` config section in YAML reference.
site/src/content/docs/reference/statistical-fields.mdx	Clarify per-run flat `usage` block vs aggregate usage.
site/src/content/docs/reference/schema.mdx	Document `instruction_files` and `files` sections in schema reference.
site/src/content/docs/quick-start.mdx	Update onboarding for skills vs custom agents and install flows.
site/src/content/docs/index.mdx	Clarify install command context (Bash environments).
site/src/content/docs/guides/graders.mdx	Add note about prompt grader mapping + tool_constraint auto-injection.
site/src/content/docs/getting-started.mdx	Update installation guidance and remove SKILL `type` field mention.
site/package.json	Bump Astro version.
site/astro.config.mjs	Add “Evaluating Custom Agents” to Guides nav.
schemas/task.schema.json	Add `instruction_files` support at task level.
schemas/eval.schema.json	Add `instruction_files` support at eval config level.
schemas/config.schema.json	Add `files` naming conventions to project config schema.
internal/workspace/workspace_test.go	Add tests for nested skill discovery + custom eval filename resolution.
internal/wizard/wizard.go	Remove skill “type” selection; validate name/description in wizard.
internal/webapi/store.go	Map transcript events via typed Copilot SDK event data helpers.
internal/webapi/additional_test.go	Update web API tests to new Copilot SDK typed event data.
internal/validation/schema.go	Fix comment to reflect `LoadEvalSpec` naming.
internal/validation/schema_test.go	Add schema tests for `instruction_files` and custom-agent example.
internal/utils/testdata/sample_events_slog.jsonl	Update sample tool completion event shape to include `success`.
internal/utils/logging_test.go	Update slog mapping test to typed `assistant.message` event.
internal/utils/copilot_log_iterator_test.go	Update iterator tests to typed session event constants/data.
internal/trigger/runner.go	Auto-size workers for trigger tests; centralize MCP server conversion.
internal/transcript/transcript_test.go	Update transcript tests for typed event data + renamed stimulus type.
internal/suggest/suggest.go	Switch eval YAML validation struct to `EvalSpec`.
internal/suggest/suggest_test.go	Update tests for `EvalSpec` rename.
internal/suggest/prompt.go	Update prompt text to reference `EvalSpec` not `BenchmarkSpec`.
internal/skill/agent.go	Add parsing/loading helpers for `.agent.md` frontmatter.
internal/skill/agent_test.go	Add tests for `.agent.md` parsing and filename detection.
internal/scoring/scoring.go	Include skill body in heuristic scorer trigger/clarity detection.
internal/scoring/scoring_test.go	Add coverage for trigger detection in SKILL body content.
internal/scaffold/scaffold.go	Read project file naming config; parameterize eval template/task filenames.
internal/scaffold/scaffold_test.go	Add tests for custom task glob/suffix and project file naming config.
internal/projectconfig/schema_parity_test.go	Assert schema defaults match new `files` Go defaults.
internal/projectconfig/config.go	Add `files` config, defaults, and validation.
internal/projectconfig/config_test.go	Add coverage for `files` defaults/merge and invalid config errors.
internal/orchestration/workers.go	Add worker auto-sizing/clamping utility.
internal/orchestration/workers_test.go	Test worker resolution behavior and cap logging.
internal/orchestration/skill_discovery.go	Extend discovery to support `.agent.md` in skill directories.
internal/orchestration/skill_discovery_test.go	Add discovery tests for `.agent.md` and SKILL priority.
internal/orchestration/csv_integration_test.go	Update orchestration CSV tests to `EvalSpec`/`EvalRunner`.
internal/orchestration/agent_graders.go	Inject implicit tool_constraint grader for agents declaring tools.
internal/models/testcase.go	Add `instruction_files`; rename stimulus/expectation types with aliases.
internal/models/testcase_test.go	Add test for loading `instruction_files`.
internal/models/outcome.go	Add per-run `Usage` pointer on `RunResult`.
internal/models/grader_validation_test.go	Update to `LoadEvalSpec`.
internal/models/grader_params_test.go	Update to `LoadEvalSpec`.
internal/models/events_test.go	Update transcript event JSON round-trip to typed data structures.
internal/models/baseline_test.go	Update baseline YAML serialization test to `EvalSpec`.
internal/jsonrpc/handlers.go	Update JSON-RPC handlers to load `EvalSpec`.
internal/graders/trigger_grader_test.go	Update tests to renamed stimulus type.
internal/graders/run_test.go	Update tests to renamed expectation type.
internal/graders/inline_script_grader_test.go	Update to typed assistant message event data.
internal/graders/grader.go	Add `Executor` interface to grader context.
internal/execution/usage.go	Populate per-run `Usage` and re-aggregate from `RunResult.Usage`.
internal/execution/skill_injection_test.go	Add tests for instruction system message formatting.
internal/execution/session_usage_collector.go	Update usage collection to typed event data helpers.
internal/execution/session_events_collector.go	Update event parsing to typed Copilot SDK event data helpers.
internal/execution/session_events_collector_test.go	Update tests to typed SDK data + add helper for error messages.
internal/execution/sdkclient.go	Add shared singleton Copilot SDK client API.
internal/execution/sdkclient_test.go	Add tests for shared client singleton + shutdown idempotency.
internal/execution/mock.go	Expand mock output to echo task/context/resources; support skipping workspace capture.
internal/execution/mock_engine_test.go	Add tests for mock output including resource content/truncation.
internal/execution/engine.go	Expand execution request options (instructions/tools/streaming/ephemeral/etc).
internal/execution/engine_response_test.go	Update message extraction tests for typed assistant message events.
internal/execution/copilot_engine_test.go	Add tests for grader-style ephemeral session behaviors and options.
internal/embedded/zcopilot_windows_arm64.go	Bump embedded Copilot CLI bundle version/hash (Windows ARM64).
internal/embedded/zcopilot_windows_amd64.go	Bump embedded Copilot CLI bundle version/hash (Windows AMD64).
internal/embedded/zcopilot_linux_arm64.go	Bump embedded Copilot CLI bundle version/hash (Linux ARM64).
internal/embedded/zcopilot_linux_amd64.go	Bump embedded Copilot CLI bundle version/hash (Linux AMD64).
internal/embedded/zcopilot_darwin_arm64.go	Bump embedded Copilot CLI bundle version/hash (macOS ARM64).
internal/embedded/zcopilot_darwin_amd64.go	Bump embedded Copilot CLI bundle version/hash (macOS AMD64).
internal/embedded/zcopilot_1.0.49_windows_arm64.exe.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_windows_arm64.exe.license	Add license for embedded Copilot CLI (Windows ARM64).
internal/embedded/zcopilot_1.0.49_windows_amd64.exe.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_windows_amd64.exe.license	Add license for embedded Copilot CLI (Windows AMD64).
internal/embedded/zcopilot_1.0.49_linux_arm64.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_linux_arm64.license	Add license for embedded Copilot CLI (Linux ARM64).
internal/embedded/zcopilot_1.0.49_linux_amd64.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_linux_amd64.license	Add license for embedded Copilot CLI (Linux AMD64).
internal/embedded/zcopilot_1.0.49_darwin_arm64.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_darwin_arm64.license	Add license for embedded Copilot CLI (macOS ARM64).
internal/embedded/zcopilot_1.0.49_darwin_amd64.zst	Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_darwin_amd64.license	Add license for embedded Copilot CLI (macOS AMD64).
internal/embedded/zcopilot_1.0.2_windows_arm64.exe.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_windows_amd64.exe.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_linux_arm64.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_linux_amd64.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_darwin_arm64.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_darwin_amd64.zst	Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/generate/generate.go	Allow pinning Copilot CLI bundler version via env var.
internal/discovery/discovery.go	Respect configured eval filename during skill eval discovery.
internal/discovery/discovery_test.go	Add tests for discovery using custom eval filename.
internal/copilotconfig/mcp.go	Centralize robust MCP server config conversion/validation.
internal/config/config.go	Rename BenchmarkConfig → EvalConfig with backward-compatible alias.
internal/config/config_test.go	Update config tests to EvalConfig naming.
internal/cache/cache.go	Update cache key + grader checks to accept `EvalSpec`.
install.sh	Detect WSL and print guidance for native Windows installs.
go.sum	Bump copilot-sdk module version checksums.
go.mod	Bump copilot-sdk module version and adjust semver dependency placement.
extension.yaml	Bump azd extension version.
examples/README.md	Add custom-agent/required-skills-demo/rubrics sections to examples index.
examples/custom-agent/trigger_tests.yaml	Add trigger tests for the custom-agent example.
examples/custom-agent/tasks/review-clean-code.yaml	Add example task fixture for “clean code” review.
examples/custom-agent/tasks/find-xss.yaml	Add example task fixture for XSS detection.
examples/custom-agent/tasks/find-sql-injection.yaml	Add example task fixture for SQL injection detection.
examples/custom-agent/security-reviewer.agent.md	Add example `.agent.md` custom agent definition.
examples/custom-agent/README.md	Add documentation for the custom-agent example.
examples/custom-agent/fixtures/xss.html	Add XSS fixture HTML file.
examples/custom-agent/fixtures/vulnerable.py	Add SQL injection fixture Python file.
examples/custom-agent/fixtures/clean.go	Add “clean” fixture Go file (excluded from builds).
examples/custom-agent/eval.yaml	Add example eval spec targeting a custom agent.
examples/ci/README.md	Update CI docs to reference binary install method.
docs/TUTORIAL.md	Update tutorial to mention custom agents and refine install steps.
docs/SKILLS_CI_INTEGRATION.md	Update CI integration to prefer binary install and revise troubleshooting.
docs/RELEASE.md	Update release instructions to tag-push-first workflow narrative.
docs/GUIDE.md	Update guide for `.agent.md` support and new install/source build guidance.
docs/GETTING-STARTED.md	Update getting started to mention custom agents and remove SKILL `type`.
docs/DEMO-GUIDE.md	Update demo guide install instructions wording and minor cleanup.
DEMO-SCRIPT.md	Update demo script install comment.
cmd/waza/workspace_resolve.go	Pass configured eval filename into workspace detection; improve error text.
cmd/waza/workspace_resolve_test.go	Add test for configured eval filename resolution.
cmd/waza/tokens/suggest.go	Ensure shared Copilot client is shutdown on suggest completion.
cmd/waza/tokens/helpers.go	Pass configured eval filename into token command detection.
cmd/waza/newtask/converters.go	Update copilot log parsing to typed event data helpers.
cmd/waza/newtask/converters_test.go	Update test case stimulus naming.
cmd/waza/dev/loop.go	Pass configured eval filename into dev workspace detection.
cmd/waza/dev/copilot.go	Shutdown shared Copilot client; use project config for workspace detection.
cmd/waza/cmd_suggest.go	Shutdown shared Copilot client after suggest command.
cmd/waza/cmd_serve.go	Only start MCP stdio server when stdin is a terminal to avoid EOF crash.
cmd/waza/cmd_run_test.go	Update runner injection to `EvalRunner` and config types.
cmd/waza/cmd_run_signal_test.go	Add SIGTERM cancellation integration test for `waza run`.
cmd/waza/cmd_quality.go	Shutdown shared Copilot client after quality command.
cmd/waza/cmd_new.go	Respect project file naming config in scaffolding; remove SKILL `type` metadata.
cmd/waza/cmd_new_test.go	Add tests for custom file naming; assert SKILL frontmatter no longer includes `type`.
cmd/waza/cmd_new_task.go	Ensure shared Copilot client shutdown in new-task command teardown.
cmd/waza/cmd_new_task_test.go	Update test case stimulus naming.
cmd/waza/cmd_models.go	Ensure shared Copilot client is shutdown in models command teardown.
cmd/waza/cmd_grade.go	Update to `LoadEvalSpec` and config types.
cmd/waza/cmd_eval_test.go	Add test verifying eval scaffolding respects custom file naming config.
cmd/waza/cmd_coverage.go	Extend coverage discovery to `.agent.md` while preserving SKILL priority.
cmd/waza/cmd_coverage_test.go	Add coverage tests for agent discovery and SKILL priority.
cmd/waza/cmd_check.go	Use configured detect options when attempting workspace auto-detection.
cmd/waza/cmd_check_workspace_test.go	Add tests for nested skill path detection with separated evals.
.squad/decisions/inbox/saul-doc-audit-2025-07-15.md	Add doc audit decision record for onboarding entry points.
.squad/decisions/inbox/livingston-custom-agent-docs.md	Add decision record for custom agent docs structure.
.squad/decisions/inbox/linus-mock-includes-resources.md	Add decision record for mock engine echo behavior.
.squad/decisions/inbox/linus-custom-agent-support.md	Add decision record for representing agents as SkillInfo.
.squad/decisions/inbox/linus-agent-tool-injection.md	Add decision record for implicit tool_constraint injection.
.squad/agents/saul/history.md	Update documentation audit history notes.
.squad/agents/livingston/history.md	Update Livingston’s history with custom agent docs completion.
.github/workflows/waza-eval.yml	Expand workflow path filters to include execution/orchestration/grader changes.
.github/workflows/pages.yml	Trigger Pages build after successful Release; bump Node version.

Copilot's findings

Files not reviewed (2)

site/package-lock.json: Language not supported
web/package-lock.json: Language not supported

Files reviewed: 192/202 changed files
Comments generated: 8

Resolve latest main conflict by preserving BYOK provider wiring alongside the shared Copilot client lifecycle. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Check the capping notice write result to satisfy errcheck. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 16/17 changed files
Comments generated: 1

Use utils.Ptr for shared client bool options and make the worker auto-size test safe on single-core runners. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rrency-design

Copilot

Copilot's findings

Files reviewed: 17/18 changed files
Comments generated: 3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 17/18 changed files
Comments generated: 1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rrency-design

spboyer

The requested changes have been addressed and the review threads are resolved.

Copilot

Copilot's findings

Files reviewed: 22/23 changed files
Comments generated: 0 new

Initial plan

1c48c0e

Copilot AI assigned Copilot and spboyer Apr 22, 2026

Copilot started work on behalf of spboyer April 22, 2026 21:01 View session

Copilot AI linked an issue Apr 22, 2026 that may be closed by this pull request

Improve concurrency design #135

Closed

docs(design): add #135 concurrency research and recommendations

edaf59d

Agent-Logs-Url: https://github.com/microsoft/waza/sessions/55c73a11-8dec-401a-925f-3f2a0536fc78 Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Research concurrency design improvements for waza~~ Research and recommendations for #135: improve waza run concurrency Apr 22, 2026

Copilot finished work on behalf of spboyer April 22, 2026 21:06

Copilot AI requested a review from spboyer April 22, 2026 21:06

Copilot started work on behalf of spboyer May 11, 2026 14:47 View session

Copilot AI added 2 commits May 11, 2026 15:00

feat(#135): R2 shared Copilot client + R3 auto-size workers

578da5f

refactor(#135): drop dead workers fallback in trigger runner

ed80e3a

Copilot AI changed the title ~~Research and recommendations for #135: improve waza run concurrency~~ Improve waza run concurrency: shared Copilot client + auto-sized workers (#135) May 11, 2026

Copilot finished work on behalf of spboyer May 11, 2026 15:04

spboyer mentioned this pull request May 20, 2026

Improve concurrency design #135

Closed

spboyer requested changes May 22, 2026

View reviewed changes

Comment thread internal/graders/prompt_grader.go Outdated

Comment thread docs/design/135-improve-concurrency.md Outdated

Copilot started work on behalf of spboyer May 22, 2026 20:26 View session

fix(#135): delete fresh prompt-grader sessions; fix design doc lifecycle

333c816

Copilot AI requested review from Copilot and spboyer and removed request for Copilot May 22, 2026 20:32

Copilot finished work on behalf of spboyer May 22, 2026 20:34

fix(#135): update concurrency PR for review

a2a9685

Resolve PR #221 conflicts with current main and keep prompt graders on the shared Copilot execution path while preserving ephemeral session cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 23, 2026 14:16

spboyer marked this pull request as ready for review May 23, 2026 14:16

Copilot started reviewing on behalf of spboyer May 23, 2026 14:16 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Copilot AI added 2 commits May 23, 2026 10:24

merge main into concurrency PR

eb5d5bf

Resolve latest main conflict by preserving BYOK provider wiring alongside the shared Copilot client lifecycle. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(#135): satisfy worker cap lint

f4fb923

Check the capping notice write result to satisfy errcheck. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 23, 2026 14:28

Copilot started reviewing on behalf of spboyer May 23, 2026 14:28 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread internal/orchestration/workers_test.go Outdated

spboyer mentioned this pull request May 23, 2026

Fix PR 221 concurrency feedback #289

Closed

Copilot AI added 2 commits May 23, 2026 11:14

fix(#135): address PR 221 review feedback

cb7f75b

Use utils.Ptr for shared client bool options and make the worker auto-size test safe on single-core runners. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: synchronize signal cancellation helper #135

877e601

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 23, 2026 19:47

Copilot started reviewing on behalf of spboyer May 23, 2026 19:48 View session

Merge remote-tracking branch 'origin/main' into copilot/improve-concu…

44a4b08

…rrency-design

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread internal/execution/copilot.go Outdated

Comment thread internal/trigger/runner.go Outdated

Comment thread cmd/waza/cmd_run.go Outdated

fix(#135): address Cloud Code Review feedback

e23bf0c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 23, 2026 19:53

Copilot started reviewing on behalf of spboyer May 23, 2026 19:53 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go

Copilot AI added 2 commits May 23, 2026 16:01

fix(#135): allow workers auto default

efa9528

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into copilot/improve-concu…

c0b14ea

…rrency-design

Copilot AI review requested due to automatic review settings May 23, 2026 20:04

Copilot started reviewing on behalf of spboyer May 23, 2026 20:04 View session

spboyer approved these changes May 23, 2026

View reviewed changes

Copilot AI reviewed May 23, 2026

View reviewed changes

spboyer merged commit c842fdf into main May 23, 2026
8 checks passed

spboyer deleted the copilot/improve-concurrency-design branch May 23, 2026 20:09

spboyer mentioned this pull request May 23, 2026

Release v0.34.0 #294

Merged

15 tasks

Uh oh!

Conversation

Copilot AI commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnosis

Recommendations (prioritized)

Implementation in this PR

Tests

Open questions for reviewers

Uh oh!

spboyer commented Apr 28, 2026

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code

Blocking issues

Non-blocking

Suggestions

Final verdict

Uh oh!

spboyer commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code...

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 22, 2026 •

edited

Loading

Copilot AI commented May 11, 2026 •

edited

Loading