Skip to content

Improve waza run concurrency: shared Copilot client + auto-sized workers (#135)#221

Merged
spboyer merged 14 commits into
mainfrom
copilot/improve-concurrency-design
May 23, 2026
Merged

Improve waza run concurrency: shared Copilot client + auto-sized workers (#135)#221
spboyer merged 14 commits into
mainfrom
copilot/improve-concurrency-design

Conversation

Copilot AI commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

waza run --workers 6 on a suite of 3 tasks + 3 trigger tests only ever runs 3 concurrent sessions because tasks and triggers are executed in serial phases, each with its own worker semaphore. The issue asked for research and recommendations; this PR adds the design doc and ships Phase 1 of the plan (R2 + R3).

Adds docs/design/135-improve-concurrency.md (following the existing docs/design/ convention) capturing the diagnosis and a phased proposal, including a §8 "Critique resolutions" section that addresses the B1/B2/B3 review feedback.

Diagnosis

  • Phase barrier in cmd/waza/cmd_run.go: runner.RunBenchmark fully drains before trigger.Runner.RunDetailed starts. Each phase has its own semaphore sized at Workers, so Workers > items_per_phase is unused capacity.
  • Per-call Copilot clients in graders: internal/graders/prompt_grader.go (gradeIndependent, runPairwiseOnce) calls copilot.NewClient(...) + client.Stop() every invocation, spawning/tearing down the embedded Copilot CLI subprocess each time. The engine itself already shares one client; graders don't.
  • Engine instance is already shared between the task runner and trigger runner and is mutex-protected, so no concurrency-safety blocker for unification.

Recommendations (prioritized)

  • R1 — Unified worker pool spanning task + trigger jobs. Introduce a Job interface, build the pool once in cmd_run.go, have TestRunner and trigger.Runner expose Plan() []Job, then submit both sets and demux results by Job.Kind(). Fixes the reported 3/6 utilization. Follow-up PR.
  • R2 — Process-wide singleton Copilot SDK client (per @richardpark-msft). All per-call state already lives on the session; expose execution.SharedClient(...) behind a sync.Once, switch prompt_grader to it, and drop the per-call Stop(). Removes the dominant local-run latency. Implemented in this PR.
  • R3 — Auto-size --workers default to min(NumCPU, totalJobs, 8) (currently hard-coded 4) and log a one-line "capped to N (phase)" notice. Implemented in this PR. The conservative DefaultWorkerCap = 8 ceiling addresses B1 from the design review until R4 lands.
  • R4 (stretch) — Split --workers (job concurrency) from --sessions (Copilot session concurrency) as a second semaphore inside the pool. Defer until the single knob proves insufficient.

Implementation in this PR

R2 — shared Copilot SDK client

  • New internal/execution/sdkclient.go exposing SharedClient(opts) CopilotClient and ShutdownSharedClient(ctx) error, both guarded by sync.Once.
  • CopilotEngine now defaults to the shared client and tracks an ownsClient flag; Shutdown only calls client.Stop() for engines that own their client (test path). The shared SDK process is stopped exactly once via a top-level defer execution.ShutdownSharedClient(...) in cmd_run.go::runCommandE, so it survives the per---model engine loop.
  • CopilotEngine exposes CopilotClient() so the runner can plumb the shared client to graders.
  • graders.Context gains a CopilotClient execution.CopilotClient field. TestRunner.buildGraderContext populates it via an interface assertion against the engine.
  • prompt_grader.gradeIndependent and runPairwiseOnce use a new shared openGradingSession helper that prefers Context.CopilotClient when set (no Start/Stop overhead, session is Disconnect'd on cleanup) and falls back to the legacy per-call copilot.NewClient(...)/Stop() path otherwise so direct grader tests (prompt_grader_test.go) keep working.

R3 — auto-sized workers

  • New internal/orchestration/workers.go with ResolveWorkers(requested, jobs, phase, out) int and DefaultWorkerCap = 8.
  • When --workers is unset: defaults to min(runtime.NumCPU(), jobs, 8).
  • When --workers exceeds the job count: clamps and emits workers=N capped to M (phase).
  • Wired into TestRunner.runConcurrent and trigger.Runner.RunDetailed. The CLI's "Parallel:" line now prints workers=auto when the auto-default applies.

Design doc updates

  • §8 "Critique resolutions" addressing B1 (cap), B2 (DI path = Context.CopilotClient), B3 (process-wide singleton lifecycle decoupled from per-engine Shutdown), N4 (use symbol names), and S3 (revised order to R3 → R2 → R1 → R4).

Tests

  • internal/execution/sdkclient_test.goSharedClient returns the same instance, ShutdownSharedClient is idempotent and safe when no client was constructed.
  • internal/orchestration/workers_test.go — auto-sizing respects the cap, explicit values are clamped to job count and emit a notice, no notice when within budget.
  • Race-enabled tests pass for internal/execution, internal/orchestration, internal/trigger, internal/graders, and cmd/waza.

Open questions for reviewers

  1. Prefer Job interface in internal/orchestration, or keep packages self-contained and have Plan() return work descriptors that the CLI hands to a thin pool? (R1 follow-up)
  2. @richardpark-msft — confirm the SDK is safe sharing one Client across N concurrent CreateSession calls without per-call Start/Stop. The implementation now relies on this.
  3. R4 now or later? Default cap of 8 is in place as a holdover.

Copilot AI linked an issue Apr 22, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Research concurrency design improvements for waza Research and recommendations for #135: improve waza run concurrency Apr 22, 2026
Copilot AI requested a review from spboyer April 22, 2026 21:06
@spboyer

spboyer commented Apr 28, 2026

Copy link
Copy Markdown
Member

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code

Claim Verdict
Phase barrier in cmd_run.go (L762 → L804–808) ✅ confirmed
Per-call copilot.NewClient in prompt_grader.go (L62, L313) ✅ confirmed; pairwise calls runPairwiseOnce twice per invocation
Engine is shared, mutex-protected, concurrency-safe ✅ confirmed (sessionsMu, usageCollectorsMu, workspacesMu, startOnce, shutdownOnce)

Blocking issues

B1 — R3 without R4 is dangerous. min(NumCPU, totalJobs) on a 64-core CI runner with 100 jobs gives 64 concurrent Copilot sessions. No evidence the SDK or backend handles this gracefully. Either ship R4 alongside R3, or cap the auto-default conservatively (e.g., min(NumCPU, totalJobs, 8)) until R4 lands.

B2 — R2 design is incomplete: dependency injection for graders is unspecified. prompt_grader.go has zero imports from internal/execution. The doc says "switch to shared client" but doesn't say HOW the grader gets access — through graders.Context? A package-level singleton? A factory threaded from the runner? Pick one and document the wiring.

B3 — "Process-wide singleton" is wrong for multi-model runs. cmd_run.go creates a new CopilotEngine per model iteration (line 504 loop, engine at L641). If the singleton is torn down when one engine shuts down, the next model can't use it. Clarify: is it per-engine (existing sync.Once already does this) or truly process-wide (must outlive multiple engine.Shutdown() calls)?

Non-blocking

  • N1prompt_grader_test.go:167 uses real copilot.NewClient directly (integration test, gated by skipIfCopilotNotEnabled). Acknowledge in R2 plan.
  • N2 — R1's Plan() []Job API underestimates preamble complexity in RunBenchmark: hooks (before_run), skill validation, agent grader augmentation, dataset/template/range expansion, baseline-mode branching. Doc should explicitly state whether R1 applies to baseline mode.
  • N3hooks.Runner is shared across goroutines today. Verify it's already goroutine-safe before R1 amplifies concurrency.
  • N4 — Line numbers already stale after refactor: complete vocabulary renames — BenchmarkSpec→EvalSpec, TestRunner→EvalRunner (#166) #222 rename (e.g., runConcurrent is L784 not L776). Use function/symbol names instead of line numbers — rename-proof.

Suggestions

  • S1 — "Halves wall-clock" claim has no data. Trigger tests use CancelOnSkillInvocation and terminate early, so the gain may be 10–20%, not 50%. Add a before/after benchmark to the validation plan, or qualify the claim.
  • S2 — Grader network concurrency under high --workers could hit Copilot LLM-as-judge rate limits. Currently out of scope but could undermine gains from R1+R2.
  • S3 — Proposed order should be R2 → R1 → R3+R4. R3 needs totalJobs counting BOTH tasks and triggers, which only makes sense after R1 unifies them.

Final verdict

Question Verdict
Diagnosis accurate? ✅ yes
R1 sound? ⚠️ mostly — clarify baseline mode + preamble factoring
R2 sound? ❌ partially — fix B2 (DI path) and B3 (lifecycle)
R3 sound? ⚠️ risky without R4 — fix B1
R4 sound? ✅ but mis-prioritized; should ship with R3
Order correct? ❌ should be R2 → R1 → R3+R4

Must address before implementation: B1, B2, B3. Can defer: N2, N4, S1.

@spboyer

spboyer commented May 11, 2026

Copy link
Copy Markdown
Member

@copilot implement the plan

Copilot AI commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Rubber-duck critique (Opus 4.6)

Diagnosis verification — ✅ all three claims verified against code...

Resolutions captured in a new §8 of the design doc and reflected in the implementation that just landed:

  • B1 — Auto-default is now min(NumCPU, jobs, DefaultWorkerCap=8) via orchestration.ResolveWorkers. The 8-worker ceiling protects 64-core CI runners until R4 lands; explicit --workers N still wins.
  • B2 — Wiring is via graders.Context.CopilotClient (execution.CopilotClient interface). TestRunner.buildGraderContext populates it from the engine through an inline interface assertion (no new public engine method exposed beyond CopilotClient()); prompt_grader.openGradingSession uses it when set and falls back to the legacy per-call copilot.NewClient/Stop() path so prompt_grader_test.go:167 keeps working unchanged (N1).
  • B3 — Singleton is process-wide, not per-engine. execution.SharedClient is built once via sync.Once, and CopilotEngine.Shutdown no longer calls client.Stop() when the engine was built on top of the shared client. cmd_run.go::runCommandE defers execution.ShutdownSharedClient to stop the underlying SDK process exactly once after every --model iteration completes.
  • N4 — Doc references switched to symbol names; line numbers retained only as historical context.
  • S3 — Order revised to R3 → R2 → R1 → R4. R3 + R2 ship in this PR; R1 (unified pool) and R4 (split workers/sessions) are explicit follow-ups.

S1 (wall-clock claim) and S2 (judge rate-limits) are noted but not measured in this PR — they fit naturally into the R1 validation plan.

@copilot implement the plan

Phase 1 of the plan is implemented in this PR (R3 + R2). R1 (unified worker pool) and R4 (split --workers/--sessions) will follow as separate PRs per the revised §8 / S3 ordering.

Highlights:

  • execution.SharedClient + ShutdownSharedClient (process-wide sync.Once); CopilotEngine no longer stops the client during per-engine Shutdown.
  • prompt_grader now uses gradingContext.CopilotClient when present, eliminating the per-call copilot.NewClient(...)/Stop() overhead. Legacy path retained for direct-instantiation tests.
  • orchestration.ResolveWorkers auto-sizes to min(NumCPU, jobs, 8) and logs workers=N capped to M (phase) when the requested count exceeds available work.

Race-enabled tests pass for internal/execution, internal/orchestration, internal/trigger, internal/graders, and cmd/waza. Validation found no CodeQL alerts.

Copilot AI changed the title Research and recommendations for #135: improve waza run concurrency Improve waza run concurrency: shared Copilot client + auto-sized workers (#135) May 11, 2026

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves waza run concurrency by sharing the Copilot SDK client and auto-sizing workers; I found one session cleanup issue and one doc mismatch to address.

Issues to address:

  • internal/graders/prompt_grader.go:70 - shared-client prompt grader sessions are disconnected but not deleted
  • docs/design/135-improve-concurrency.md:230 - lifecycle docs reference a stale SharedClient signature and shutdown hook

Comment thread internal/graders/prompt_grader.go Outdated
Comment thread docs/design/135-improve-concurrency.md Outdated
Copilot AI requested review from Copilot and spboyer and removed request for Copilot May 22, 2026 20:32
Resolve PR #221 conflicts with current main and keep prompt graders on the shared Copilot execution path while preserving ephemeral session cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 14:16
@spboyer spboyer marked this pull request as ready for review May 23, 2026 14:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR broadens Waza’s evaluation/runtime surface: it introduces shared Copilot SDK client lifecycle and new execution request controls, adjusts concurrency defaults via worker auto-sizing, and extends project configuration + documentation (including .agent.md support and instruction_files) alongside assorted web/dashboard tooling updates.

Changes:

  • Add process-wide shared Copilot SDK client plumbing and extend execution/session event handling to support grader-style ephemeral sessions and richer request options.
  • Auto-size/clamp worker concurrency for task/trigger phases and introduce worker resolution helpers.
  • Expand project config/schema/docs/examples (custom eval/task naming, instruction_files, .agent.md discovery) and update web tooling (ESLint, Playwright port config).
Show a summary per file
File Description
web/src/hooks/useSSE.ts Fix SSE run state reset logic using a ref-backed current run.
web/src/components/TrajectoryViewer.tsx Refactor event description derivation to a single expression.
web/src/components/TrajectoryDiff.tsx Make tool alignment helper internal (non-exported).
web/src/components/RunsTable.tsx Add ESLint disable directive for TanStack Table usage.
web/playwright.config.ts Allow configuring Playwright preview port via env var.
web/package.json Add ESLint + related tooling dependencies.
web/eslint.config.js Introduce flat ESLint config for TS/React hooks/refresh rules.
web/e2e/run-detail.spec.ts Remove unused fixture import.
web/e2e/fixtures/mock-data.ts Remove unused Playwright Page type import.
web/e2e/dashboard.spec.ts Remove unused fixture imports.
web/dist/index.html Update built dashboard entrypoint asset hash reference.
version.txt Bump repo version.
site/src/content/docs/reference/waza-yaml.mdx Document files config section in YAML reference.
site/src/content/docs/reference/statistical-fields.mdx Clarify per-run flat usage block vs aggregate usage.
site/src/content/docs/reference/schema.mdx Document instruction_files and files sections in schema reference.
site/src/content/docs/quick-start.mdx Update onboarding for skills vs custom agents and install flows.
site/src/content/docs/index.mdx Clarify install command context (Bash environments).
site/src/content/docs/guides/graders.mdx Add note about prompt grader mapping + tool_constraint auto-injection.
site/src/content/docs/getting-started.mdx Update installation guidance and remove SKILL type field mention.
site/package.json Bump Astro version.
site/astro.config.mjs Add “Evaluating Custom Agents” to Guides nav.
schemas/task.schema.json Add instruction_files support at task level.
schemas/eval.schema.json Add instruction_files support at eval config level.
schemas/config.schema.json Add files naming conventions to project config schema.
internal/workspace/workspace_test.go Add tests for nested skill discovery + custom eval filename resolution.
internal/wizard/wizard.go Remove skill “type” selection; validate name/description in wizard.
internal/webapi/store.go Map transcript events via typed Copilot SDK event data helpers.
internal/webapi/additional_test.go Update web API tests to new Copilot SDK typed event data.
internal/validation/schema.go Fix comment to reflect LoadEvalSpec naming.
internal/validation/schema_test.go Add schema tests for instruction_files and custom-agent example.
internal/utils/testdata/sample_events_slog.jsonl Update sample tool completion event shape to include success.
internal/utils/logging_test.go Update slog mapping test to typed assistant.message event.
internal/utils/copilot_log_iterator_test.go Update iterator tests to typed session event constants/data.
internal/trigger/runner.go Auto-size workers for trigger tests; centralize MCP server conversion.
internal/transcript/transcript_test.go Update transcript tests for typed event data + renamed stimulus type.
internal/suggest/suggest.go Switch eval YAML validation struct to EvalSpec.
internal/suggest/suggest_test.go Update tests for EvalSpec rename.
internal/suggest/prompt.go Update prompt text to reference EvalSpec not BenchmarkSpec.
internal/skill/agent.go Add parsing/loading helpers for .agent.md frontmatter.
internal/skill/agent_test.go Add tests for .agent.md parsing and filename detection.
internal/scoring/scoring.go Include skill body in heuristic scorer trigger/clarity detection.
internal/scoring/scoring_test.go Add coverage for trigger detection in SKILL body content.
internal/scaffold/scaffold.go Read project file naming config; parameterize eval template/task filenames.
internal/scaffold/scaffold_test.go Add tests for custom task glob/suffix and project file naming config.
internal/projectconfig/schema_parity_test.go Assert schema defaults match new files Go defaults.
internal/projectconfig/config.go Add files config, defaults, and validation.
internal/projectconfig/config_test.go Add coverage for files defaults/merge and invalid config errors.
internal/orchestration/workers.go Add worker auto-sizing/clamping utility.
internal/orchestration/workers_test.go Test worker resolution behavior and cap logging.
internal/orchestration/skill_discovery.go Extend discovery to support .agent.md in skill directories.
internal/orchestration/skill_discovery_test.go Add discovery tests for .agent.md and SKILL priority.
internal/orchestration/csv_integration_test.go Update orchestration CSV tests to EvalSpec/EvalRunner.
internal/orchestration/agent_graders.go Inject implicit tool_constraint grader for agents declaring tools.
internal/models/testcase.go Add instruction_files; rename stimulus/expectation types with aliases.
internal/models/testcase_test.go Add test for loading instruction_files.
internal/models/outcome.go Add per-run Usage pointer on RunResult.
internal/models/grader_validation_test.go Update to LoadEvalSpec.
internal/models/grader_params_test.go Update to LoadEvalSpec.
internal/models/events_test.go Update transcript event JSON round-trip to typed data structures.
internal/models/baseline_test.go Update baseline YAML serialization test to EvalSpec.
internal/jsonrpc/handlers.go Update JSON-RPC handlers to load EvalSpec.
internal/graders/trigger_grader_test.go Update tests to renamed stimulus type.
internal/graders/run_test.go Update tests to renamed expectation type.
internal/graders/inline_script_grader_test.go Update to typed assistant message event data.
internal/graders/grader.go Add Executor interface to grader context.
internal/execution/usage.go Populate per-run Usage and re-aggregate from RunResult.Usage.
internal/execution/skill_injection_test.go Add tests for instruction system message formatting.
internal/execution/session_usage_collector.go Update usage collection to typed event data helpers.
internal/execution/session_events_collector.go Update event parsing to typed Copilot SDK event data helpers.
internal/execution/session_events_collector_test.go Update tests to typed SDK data + add helper for error messages.
internal/execution/sdkclient.go Add shared singleton Copilot SDK client API.
internal/execution/sdkclient_test.go Add tests for shared client singleton + shutdown idempotency.
internal/execution/mock.go Expand mock output to echo task/context/resources; support skipping workspace capture.
internal/execution/mock_engine_test.go Add tests for mock output including resource content/truncation.
internal/execution/engine.go Expand execution request options (instructions/tools/streaming/ephemeral/etc).
internal/execution/engine_response_test.go Update message extraction tests for typed assistant message events.
internal/execution/copilot_engine_test.go Add tests for grader-style ephemeral session behaviors and options.
internal/embedded/zcopilot_windows_arm64.go Bump embedded Copilot CLI bundle version/hash (Windows ARM64).
internal/embedded/zcopilot_windows_amd64.go Bump embedded Copilot CLI bundle version/hash (Windows AMD64).
internal/embedded/zcopilot_linux_arm64.go Bump embedded Copilot CLI bundle version/hash (Linux ARM64).
internal/embedded/zcopilot_linux_amd64.go Bump embedded Copilot CLI bundle version/hash (Linux AMD64).
internal/embedded/zcopilot_darwin_arm64.go Bump embedded Copilot CLI bundle version/hash (macOS ARM64).
internal/embedded/zcopilot_darwin_amd64.go Bump embedded Copilot CLI bundle version/hash (macOS AMD64).
internal/embedded/zcopilot_1.0.49_windows_arm64.exe.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_windows_arm64.exe.license Add license for embedded Copilot CLI (Windows ARM64).
internal/embedded/zcopilot_1.0.49_windows_amd64.exe.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_windows_amd64.exe.license Add license for embedded Copilot CLI (Windows AMD64).
internal/embedded/zcopilot_1.0.49_linux_arm64.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_linux_arm64.license Add license for embedded Copilot CLI (Linux ARM64).
internal/embedded/zcopilot_1.0.49_linux_amd64.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_linux_amd64.license Add license for embedded Copilot CLI (Linux AMD64).
internal/embedded/zcopilot_1.0.49_darwin_arm64.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_darwin_arm64.license Add license for embedded Copilot CLI (macOS ARM64).
internal/embedded/zcopilot_1.0.49_darwin_amd64.zst Add new embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.49_darwin_amd64.license Add license for embedded Copilot CLI (macOS AMD64).
internal/embedded/zcopilot_1.0.2_windows_arm64.exe.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_windows_amd64.exe.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_linux_arm64.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_linux_amd64.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_darwin_arm64.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/zcopilot_1.0.2_darwin_amd64.zst Remove old embedded Copilot CLI binary blob (LFS pointer).
internal/embedded/generate/generate.go Allow pinning Copilot CLI bundler version via env var.
internal/discovery/discovery.go Respect configured eval filename during skill eval discovery.
internal/discovery/discovery_test.go Add tests for discovery using custom eval filename.
internal/copilotconfig/mcp.go Centralize robust MCP server config conversion/validation.
internal/config/config.go Rename BenchmarkConfig → EvalConfig with backward-compatible alias.
internal/config/config_test.go Update config tests to EvalConfig naming.
internal/cache/cache.go Update cache key + grader checks to accept EvalSpec.
install.sh Detect WSL and print guidance for native Windows installs.
go.sum Bump copilot-sdk module version checksums.
go.mod Bump copilot-sdk module version and adjust semver dependency placement.
extension.yaml Bump azd extension version.
examples/README.md Add custom-agent/required-skills-demo/rubrics sections to examples index.
examples/custom-agent/trigger_tests.yaml Add trigger tests for the custom-agent example.
examples/custom-agent/tasks/review-clean-code.yaml Add example task fixture for “clean code” review.
examples/custom-agent/tasks/find-xss.yaml Add example task fixture for XSS detection.
examples/custom-agent/tasks/find-sql-injection.yaml Add example task fixture for SQL injection detection.
examples/custom-agent/security-reviewer.agent.md Add example .agent.md custom agent definition.
examples/custom-agent/README.md Add documentation for the custom-agent example.
examples/custom-agent/fixtures/xss.html Add XSS fixture HTML file.
examples/custom-agent/fixtures/vulnerable.py Add SQL injection fixture Python file.
examples/custom-agent/fixtures/clean.go Add “clean” fixture Go file (excluded from builds).
examples/custom-agent/eval.yaml Add example eval spec targeting a custom agent.
examples/ci/README.md Update CI docs to reference binary install method.
docs/TUTORIAL.md Update tutorial to mention custom agents and refine install steps.
docs/SKILLS_CI_INTEGRATION.md Update CI integration to prefer binary install and revise troubleshooting.
docs/RELEASE.md Update release instructions to tag-push-first workflow narrative.
docs/GUIDE.md Update guide for .agent.md support and new install/source build guidance.
docs/GETTING-STARTED.md Update getting started to mention custom agents and remove SKILL type.
docs/DEMO-GUIDE.md Update demo guide install instructions wording and minor cleanup.
DEMO-SCRIPT.md Update demo script install comment.
cmd/waza/workspace_resolve.go Pass configured eval filename into workspace detection; improve error text.
cmd/waza/workspace_resolve_test.go Add test for configured eval filename resolution.
cmd/waza/tokens/suggest.go Ensure shared Copilot client is shutdown on suggest completion.
cmd/waza/tokens/helpers.go Pass configured eval filename into token command detection.
cmd/waza/newtask/converters.go Update copilot log parsing to typed event data helpers.
cmd/waza/newtask/converters_test.go Update test case stimulus naming.
cmd/waza/dev/loop.go Pass configured eval filename into dev workspace detection.
cmd/waza/dev/copilot.go Shutdown shared Copilot client; use project config for workspace detection.
cmd/waza/cmd_suggest.go Shutdown shared Copilot client after suggest command.
cmd/waza/cmd_serve.go Only start MCP stdio server when stdin is a terminal to avoid EOF crash.
cmd/waza/cmd_run_test.go Update runner injection to EvalRunner and config types.
cmd/waza/cmd_run_signal_test.go Add SIGTERM cancellation integration test for waza run.
cmd/waza/cmd_quality.go Shutdown shared Copilot client after quality command.
cmd/waza/cmd_new.go Respect project file naming config in scaffolding; remove SKILL type metadata.
cmd/waza/cmd_new_test.go Add tests for custom file naming; assert SKILL frontmatter no longer includes type.
cmd/waza/cmd_new_task.go Ensure shared Copilot client shutdown in new-task command teardown.
cmd/waza/cmd_new_task_test.go Update test case stimulus naming.
cmd/waza/cmd_models.go Ensure shared Copilot client is shutdown in models command teardown.
cmd/waza/cmd_grade.go Update to LoadEvalSpec and config types.
cmd/waza/cmd_eval_test.go Add test verifying eval scaffolding respects custom file naming config.
cmd/waza/cmd_coverage.go Extend coverage discovery to .agent.md while preserving SKILL priority.
cmd/waza/cmd_coverage_test.go Add coverage tests for agent discovery and SKILL priority.
cmd/waza/cmd_check.go Use configured detect options when attempting workspace auto-detection.
cmd/waza/cmd_check_workspace_test.go Add tests for nested skill path detection with separated evals.
.squad/decisions/inbox/saul-doc-audit-2025-07-15.md Add doc audit decision record for onboarding entry points.
.squad/decisions/inbox/livingston-custom-agent-docs.md Add decision record for custom agent docs structure.
.squad/decisions/inbox/linus-mock-includes-resources.md Add decision record for mock engine echo behavior.
.squad/decisions/inbox/linus-custom-agent-support.md Add decision record for representing agents as SkillInfo.
.squad/decisions/inbox/linus-agent-tool-injection.md Add decision record for implicit tool_constraint injection.
.squad/agents/saul/history.md Update documentation audit history notes.
.squad/agents/livingston/history.md Update Livingston’s history with custom agent docs completion.
.github/workflows/waza-eval.yml Expand workflow path filters to include execution/orchestration/grader changes.
.github/workflows/pages.yml Trigger Pages build after successful Release; bump Node version.

Copilot's findings

Files not reviewed (2)
  • site/package-lock.json: Language not supported
  • web/package-lock.json: Language not supported
  • Files reviewed: 192/202 changed files
  • Comments generated: 8

Comment thread internal/execution/sdkclient.go
Comment thread internal/execution/session_events_collector.go
Comment thread web/playwright.config.ts
Comment thread web/src/components/RunsTable.tsx
Comment thread internal/skill/agent.go
Comment thread version.txt
Comment thread extension.yaml
Comment thread site/src/content/docs/guides/graders.mdx
Copilot AI added 2 commits May 23, 2026 10:24
Resolve latest main conflict by preserving BYOK provider wiring alongside the shared Copilot client lifecycle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Check the capping notice write result to satisfy errcheck.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 14:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 16/17 changed files
  • Comments generated: 1

Comment thread internal/orchestration/workers_test.go Outdated
Copilot AI added 2 commits May 23, 2026 11:14
Use utils.Ptr for shared client bool options and make the worker auto-size test safe on single-core runners.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 19:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 17/18 changed files
  • Comments generated: 3

Comment thread internal/execution/copilot.go Outdated
Comment thread internal/trigger/runner.go Outdated
Comment thread cmd/waza/cmd_run.go Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 19:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 17/18 changed files
  • Comments generated: 1

Comment thread cmd/waza/cmd_run.go
Copilot AI added 2 commits May 23, 2026 16:01
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 20:04

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requested changes have been addressed and the review threads are resolved.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 22/23 changed files
  • Comments generated: 0 new

@spboyer spboyer merged commit c842fdf into main May 23, 2026
8 checks passed
@spboyer spboyer deleted the copilot/improve-concurrency-design branch May 23, 2026 20:09
@spboyer spboyer mentioned this pull request May 23, 2026
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve concurrency design

4 participants