docs: Add CI/CD integration guide (GitHub Actions, Azure DevOps)#97
docs: Add CI/CD integration guide (GitHub Actions, Azure DevOps)#97spboyer wants to merge 10 commits into
Conversation
- Add decision entry (2026-03-05): Token Diff Distribution Strategy (Issue microsoft#81) - Approved: CLI-first architecture with optional thin wrapper action - Implementation: `waza tokens diff` command + `.github/actions/token-diff` - Rationale: Serves all users (GitHub, non-GitHub CI, azd) without vendor lock-in - Add session log: 2026-03-05T00-26-rusty-token-diff-design.md - Outcome: CLI + wrapper recommendation - Key design choices documented - Related Issue microsoft#81 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Merge 3 inbox decisions into decisions.md; delete inbox files - Write orchestration log for 4 waves (triage, CI guide, PR gen, verification) - Create session log documenting full pipeline activation across 7 agents - Captured: PR reviews, issue triage, multi-platform CI/CD guide, 5 new feature PRs - Status: 3 PRs approved, 1 rebased, 2 follow-ups in flight; no blockers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Expands the docs site’s CI/CD guidance into a multi-platform “CI/CD Integration” guide, aiming to show how to install and run waza in common pipeline systems (GitHub Actions, Azure DevOps, GitLab CI) and capture best practices/troubleshooting.
Changes:
- Rewrites
ci-cd.mdxinto a comprehensive CI/CD integration guide with examples for GitHub Actions, Azure DevOps, and GitLab CI. - Updates the site lockfile (minor metadata changes).
- Adds/updates multiple
.squad/session and orchestration logs documenting the parallel-work pipeline.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| site/src/content/docs/guides/ci-cd.mdx | Major expansion of CI/CD documentation (install methods, platform examples, best practices, troubleshooting). |
| site/package-lock.json | Lockfile metadata updates (adds peer: true flags to some entries). |
| .squad/orchestration-log/2026-03-05T00-37-pipeline-wave-4.md | New orchestration log entry for wave 4. |
| .squad/orchestration-log/2026-03-05T00-37-pipeline-wave-3.md | New orchestration log entry for wave 3. |
| .squad/orchestration-log/2026-03-05T00-37-pipeline-wave-2.md | New orchestration log entry for wave 2. |
| .squad/orchestration-log/2026-03-05T00-37-pipeline-wave-1.md | New orchestration log entry for wave 1. |
| .squad/log/2026-03-05T00-36-issue-assignment-pipeline.md | New session log capturing the assignment pipeline directive. |
| .squad/log/2026-03-05T00-36-full-pipeline-activation.md | New session log summarizing the full pipeline activation. |
| .squad/log/2026-03-05T00-26-rusty-token-diff-design.md | New session note documenting token-diff distribution strategy. |
| .squad/decisions.md | Appends multiple decisions and summaries related to the pipeline and docs work. |
| .squad/agents/rusty/history.md | Adds a new “Batch PR Review & Issue Triage” history entry (currently dated in the future). |
| .squad/agents/livingston/history.md | Adds a completion summary for Issue #89 and notes about doc structure/patterns. |
| .squad/agents/linus/history.md | Adds notes about Windows CRLF stability and a test-variant reminder. |
Files not reviewed (1)
- site/package-lock.json: Language not supported
Comments suppressed due to low confidence (2)
site/src/content/docs/guides/ci-cd.mdx:265
- This Azure DevOps snippet assumes
waza tokens diff --ref1/--ref2and parses.exceededfrom the JSON output. The implemented command iswaza tokens compare [ref1] [ref2], and its JSON report nests budget info undersummary.exceededCount(no top-levelexceededboolean). Update the command and jq query (or rely on--strictexit code) so the pipeline gates correctly.
- script: |
waza tokens diff \
--ref1 origin/main \
--ref2 HEAD \
--format json > token-diff.json
EXCEEDED=$(jq '.exceeded' token-diff.json)
if [ "$EXCEEDED" == "true" ]; then
echo "##vso[task.logissue type=error;]Token budget exceeded"
exit 1
fi
site/src/content/docs/guides/ci-cd.mdx:6
Asideis imported here but never used in the document. Remove the unused import or add the intended<Aside ...>blocks so the MDX stays tidy and avoids unused-import warnings in tooling.
import { Tabs, TabItem, Aside } from '@astrojs/starlight/components';
| Use `waza tokens diff` to track token usage across PRs and fail if budgets are exceeded: | ||
|
|
||
| ```yaml | ||
| strategy: | ||
| matrix: | ||
| model: | ||
| - gpt-4o | ||
| - claude-sonnet-4.6 | ||
| - claude-opus-4 | ||
| max-parallel: 3 | ||
|
|
||
| steps: | ||
| - name: Run evals for ${{ matrix.model }} | ||
| run: waza run --model "${{ matrix.model }}" -o "results-${{ matrix.model }}.json" | ||
| - name: Check token budget | ||
| run: | | ||
| waza tokens diff origin/main HEAD \ | ||
| --format table \ | ||
| --strict |
There was a problem hiding this comment.
The CLI in this repo currently provides waza tokens compare [ref1] [ref2] (with --format and --strict), not waza tokens diff. Also, tokens compare takes refs as positional args (no --ref1/--ref2). Update these examples to match the actual command so they’re copy/pasteable.
This issue also appears on line 255 of the same file.
See below for a potential fix:
Use `waza tokens compare` to track token usage across PRs and fail if budgets are exceeded:
```yaml
- name: Check token budget
run: |
waza tokens compare origin/main HEAD \
|
|
||
| # Windows (PowerShell) | ||
| $ProgressPreference = 'SilentlyContinue' | ||
| Invoke-WebRequest -Uri "https://github.com/microsoft/waza/releases/latest" -OutFile waza.exe |
There was a problem hiding this comment.
The Windows install example downloads https://github.com/microsoft/waza/releases/latest to waza.exe, which will save the HTML redirect page rather than the actual binary. Prefer the existing install.ps1 flow (as documented in site/src/content/docs/reference/releases.mdx) or link to a specific release asset.
| Invoke-WebRequest -Uri "https://github.com/microsoft/waza/releases/latest" -OutFile waza.exe | |
| Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/waza/main/install.ps1" -OutFile install.ps1 | |
| ./install.ps1 |
|
|
||
| permissions: | ||
| contents: read | ||
| pull-requests: write |
There was a problem hiding this comment.
This workflow posts PR comments via github.rest.issues.createComment, which requires issues: write permission. Currently only pull-requests: write is granted; update the permissions block (or switch to an API that matches the granted scopes) so the comment step works on PRs.
| pull-requests: write | |
| pull-requests: write | |
| issues: write |
| issue_number: context.issue.number, | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| body: `### Eval Results: ${{ matrix.model }}\n\n✅ **Passed:** ${results.passed}\n❌ **Failed:** ${results.failed}` |
There was a problem hiding this comment.
The PR comment script reads the JSON output and expects results.passed/results.failed, but EvaluationOutcome uses summary.succeeded and summary.failed (and other fields under summary). Adjust the field names (or use waza run --format github-comment to generate a ready-to-post Markdown comment) so the posted counts aren’t undefined.
| body: `### Eval Results: ${{ matrix.model }}\n\n✅ **Passed:** ${results.passed}\n❌ **Failed:** ${results.failed}` | |
| body: `### Eval Results: ${{ matrix.model }}\n\n✅ **Passed:** ${results.summary.succeeded}\n❌ **Failed:** ${results.summary.failed}` |
|
|
||
| ## Troubleshooting | ||
| ```bash | ||
| waza tokens check ./evals/my-skill/eval.yaml |
There was a problem hiding this comment.
waza tokens check validates token limits for markdown files (e.g., SKILL.md or skill directories). Passing an eval spec path like ./evals/my-skill/eval.yaml won’t do what this section suggests. Update the example to point at the skill content you want to budget (for example, the skill directory or SKILL.md).
| waza tokens check ./evals/my-skill/eval.yaml | |
| waza tokens check ./evals/my-skill/SKILL.md |
| ```bash | ||
| waza compare baseline-eval.yaml --model gpt-4o | ||
| ``` |
There was a problem hiding this comment.
The waza compare command compares result JSON files (waza compare <result1.json> <result2.json> ...) and doesn’t take --model or an eval YAML path. This section’s example (waza compare baseline-eval.yaml --model ...) won’t work as written; update it to show running evaluations first (producing JSON outputs) and then comparing those outputs.
See below for a potential fix:
Then in CI, run both evaluations and compare their JSON result files:
```bash
# Run baseline model
waza run baseline-eval.yaml --output baseline-results.json
# Run same eval with a different model
waza run baseline-eval.yaml --model gpt-4o --output gpt-4o-results.json
# Compare the two runs
waza compare baseline-results.json gpt-4o-results.json
| --- | ||
| title: CI/CD Integration | ||
| description: Automate evaluation runs in GitHub Actions. | ||
| description: Integrate waza into your CI/CD pipeline — GitHub Actions, Azure DevOps, GitLab CI, and more. | ||
| --- |
There was a problem hiding this comment.
This guide updates guides/ci-cd.mdx, but Issue #89 acceptance criteria references a new page ci-integration.mdx. If the canonical URL is now guides/ci-cd, please update the issue/PR text or add a redirect/alias so readers don’t end up with mismatched links and sidebar entries.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #97 +/- ##
=======================================
Coverage ? 72.21%
=======================================
Files ? 128
Lines ? 14252
Branches ? 0
=======================================
Hits ? 10292
Misses ? 3196
Partials ? 764
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
) * fix: add regression test and CHANGELOG for waza suggest deadlock fix Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com> * fix: correct misspelling cancelled→canceled for lint Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com> Co-authored-by: Shayne Boyer <spboyer@live.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…soft#56) * fix: make site base path configurable + remove unused workflow - Make astro.config.mjs read site/base from ASTRO_SITE and ASTRO_BASE env vars with GitHub Pages defaults. Azure SWA can now build with ASTRO_BASE=/ so images resolve correctly at root. - Remove unused squad-heartbeat.yml workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: normalize base path and prevent // in nav links Address review feedback: normalize ASTRO_BASE to prevent double-slash URLs when base is '/'. Also fix Header.astro to strip trailing slash from BASE_URL before concatenating nav link paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: repair broken TestCopilotExecute_StartRespectsTimeout mock setup The test was mocking Start() expecting Execute() to call it, but Execute() calls CreateSession() directly — Start() is only called by Initialize(). Fixed to test Initialize() error propagation directly, which is the actual regression guard for the deadlock fix. This test failure was blocking CI on all open PRs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review feedback — rename test, use errors.New - Rename TestCopilotExecute_StartRespectsTimeout to TestCopilotExecute_InitializePropagatesStartError to match what the test actually asserts - Use errors.New instead of fmt.Errorf for constant string (staticcheck S1028) - Remove unused fmt import Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#48 (microsoft#63) * feat: add FileWriter service and refactor waza init inventory microsoft#48 - Create internal/scaffold/writer.go with FileWriter type that encapsulates the create-if-missing + skip-if-exists pattern - FileWriter returns structured Inventory with per-entry outcomes (created/skipped) - Inventory.Fprint() renders aligned table with emoji indicators: ➕ for created, ✅ (already exists) for skipped - Refactor cmd/waza/cmd_init.go to use FileWriter instead of inline write loop - Inventory is always visible (not gated behind --verbose) - Add 8 tests in writer_test.go covering: create-if-missing, skip-if-exists, mixed outcomes, parent directory creation, inventory output, relative paths, empty content handling, and CreatedCount Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: update Linus history with FileWriter work (microsoft#48) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: refactor waza new to use shared FileWriter microsoft#58 (microsoft#66) * feat: refactor waza new to use shared FileWriter microsoft#58 Replace the inline write loop in cmd_new.go with the shared FileWriter from internal/scaffold/writer.go. Malformed SKILL.md detection still runs before FileWriter — the file is removed so FileWriter creates it fresh. Inventory now uses consistent ➕/✅ emoji indicators (always visible, not gated behind --verbose), matching the waza init behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: update squad state for microsoft#58 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove .squad/ files from PR branch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove .squad/ files from PR branch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: improve error handling in FileWriter stat checks microsoft#48 - Directory branch: explicitly handle IsNotExist vs other stat errors - File branch: detect directory-at-file-path type mismatch - Both branches: return errors on permission failures instead of masking - Add regression tests for type-mismatch error paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: improve error handling in FileWriter stat checks microsoft#48 - Directory branch: explicitly handle IsNotExist vs other stat errors - File branch: detect directory-at-file-path type mismatch - Both branches: return errors on permission failures instead of masking - Add regression tests for type-mismatch error paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: improve error handling in FileWriter stat checks microsoft#48 - Directory branch: explicitly handle IsNotExist vs other stat errors - Directory branch: error when path exists but is not a directory - File branch: detect directory-at-file-path type mismatch - Both branches: return errors on permission failures instead of masking - Add regression tests for type-mismatch error paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt writer_test.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps [svgo](https://github.com/svg/svgo) from 4.0.0 to 4.0.1. - [Release notes](https://github.com/svg/svgo/releases) - [Commits](svg/svgo@v4.0.0...v4.0.1) --- updated-dependencies: - dependency-name: svgo dependency-version: 4.0.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1bc0b8a to
c92a810
Compare
|
Closing — rebuilding with clean branch (had leaked content from other PRs). |
Pull request was closed
…t#159) Closes microsoft#97 MockEngine now creates a temp workspace directory and sets `ExecutionResponse.WorkspaceDir`, enabling FileGrader and other workspace-dependent graders to work in test scenarios. ### Changes - **`workspace.go`** — Extract `setupWorkspaceResources()` as a shared package-level helper with path-traversal protection - **`mock.go`** — `Execute()` creates a temp dir, populates resources, sets `WorkspaceDir`; `Shutdown()` cleans it up - **`copilot.go`** — `setupResources()` now delegates to the shared helper (no behaviour change) ### How it works Same pattern as CopilotEngine: each `Execute()` call creates a fresh `waza-mock-*` temp directory, writes request resources into it, and sets `WorkspaceDir` on the response. `Shutdown()` removes it. All existing tests pass. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes #89
Adds comprehensive CI/CD integration guide to the GitHub Pages docs site at
site/src/content/docs/guides/ci-cd.mdxcovering: