Add comprehensive test suite for Q CLI provider by gutosantos82 · Pull Request #16 · awslabs/cli-agent-orchestrator

gutosantos82 · 2025-10-31T00:59:09Z

Summary

Adds complete test coverage for the Q CLI provider with 51 tests achieving 100% code coverage.

What's Included

42 unit tests: Fast, mocked tests covering initialization, status detection, message extraction, handoff scenarios, and edge cases
9 integration tests: End-to-end tests with real Q CLI operations
Test fixtures: Realistic terminal output with ANSI codes for all Q CLI states
Documentation: Comprehensive README with usage examples and CI/CD integration

Running Tests

# Unit tests (fast, <1s)
pytest test/providers/test_q_cli_unit.py -v

# Integration tests (requires Q CLI)
pytest test/providers/test_q_cli_integration.py -v

tuanknguyen

Great addition overall! The main change I request is to follow the uv convention throughout to stay consistent with the repo. Another ask is to add the example GitHub Actions you had as an actual yml file in .github/workflows (with a few small modifications)

Addresses all review feedback from PR awslabs#16

tuanknguyen

LGTM!

…t stabilization Three bugs fixed in Gemini CLI provider (lessons #16, #17, #18): 1. Ink TUI keeps idle prompt visible during processing — added PROCESSING_SPINNER_PATTERN (Braille dots + "esc to cancel") check in get_status() to avoid premature COMPLETED when MCP tools are still executing. 2. Premature COMPLETED between text output and MCP tool call — replaced sequential wait_for_status + terminal count check with combined _wait_for_supervisor_done() polling in supervisor E2E tests. 3. Ink TUI shows idle prompt before -i prompt is processed — added _uses_prompt_interactive flag so initialize() waits for COMPLETED (not IDLE) when -i is used, preventing lost messages. E2E test improvements across all providers: - Accept both "idle" and "completed" as valid post-initialization states - Add stabilization delay after COMPLETED detection in handoff/assign tests - Add missing get_terminal_status import in test_handoff.py E2E results: 13/14 pass (7/7 Gemini, 6/7 Codex — 1 pre-existing Codex supervisor_assign output extraction failure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(opencode): Phase 1 — foundation primitives for OpenCode CLI provider - Add ProviderType.OPENCODE_CLI = "opencode_cli" to the provider enum - Add OPENCODE_CONFIG_DIR / OPENCODE_AGENTS_DIR / OPENCODE_CONFIG_FILE path constants pointing at ~/.aws/opencode_cli/ - New OpenCodeAgentConfig Pydantic model (description, mode, permission) that serializes to OpenCode-compatible YAML frontmatter via frontmatter.dumps() - New cao_tools_to_opencode_permission() translator: two-step algorithm from §9 of the design doc (shorthand expansion + CAO-category → OpenCode tool mapping + hardcoded non-vocabulary deny/allow policies) - New opencode_config.py read-modify-write helper for the shared opencode.json (upsert_mcp_server, upsert_agent_tools, remove_agent_tools, read_config, write_config) - Port 5 TUI probe captures into test/providers/fixtures/ (plain + ANSI variants for idle-splash, idle-post-completion, processing, completed, permission states) - 54 new unit tests covering all Phase 1 modules; all 1368 tests pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(opencode): Phase 1 review items — ANSI fixture, model cleanup, guard-rail Item 1: Replace opencode_cli_processing.ansi.txt with a genuine PROCESSING frame re-captured via tmux probe (md5 9cbe2723, distinct from completed frame). Add test/providers/fixtures/OPENCODE_FIXTURES.md documenting all fixture sources and the remaining idle_post_completion.ansi.txt reuse. Item 2: Remove dead Pydantic v1 `class Config: exclude_none = True` block from OpenCodeAgentConfig — it is a no-op under Pydantic v2. Item 3: Add inline comment to OpenCodeAgentConfig.permission documenting the deliberate Phase 1 type simplification and when to widen it. Item 4: Replace unreachable `else: result[tool] = "deny"` in opencode_permissions.py with `raise AssertionError(...)` so any future tool added to ALL_OPENCODE_TOOLS without a policy update fails loudly. Item 5: Add test_noop_on_completely_missing_file to TestRemoveAgentTools — exercises the read_config() skeleton-return path when opencode.json does not exist yet. All 1369 tests pass; mypy/black/isort clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(install): add opencode_cli provider branch with --auto-approve flag Adds the `opencode_cli` branch to `cao install` (Phase 2): - writes agent `<name>.md` with YAML frontmatter (description, mode, permission) using compose_agent_prompt for the body and cao_tools_to_opencode_permission for the per-tool allow/ask/deny map - `--auto-approve` flag emits `allow` instead of `ask` for permitted tools; has no effect on other providers - if the agent profile declares mcpServers, upserts top-level mcp/tools entries (default-deny) and per-agent tool re-enables into opencode.json - full unit-test coverage in test/cli/commands/test_install_opencode.py (fresh install, idempotency, auto-approve, MCP wiring, config preservation, safe filename) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(install): apply Phase 2 review polish - Rename agent_config_oc → agent_config in opencode_cli branch for consistency with Kiro/Q/Copilot sibling branches (Item 1) - Strengthen test_agent_md_has_body: assert sentinel prompt text via profile.prompt frontmatter field instead of weak non-empty check (Item 2) - Bump live smoke-test subprocess timeout 30s → 60s to survive cold-cache npm plugin installs on CI (Item 4) Items 3 (MCP collision coverage already in Phase 1) and 5 (context-file parent mkdir — out of Phase 2 scope) intentionally not addressed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(providers): add OpenCodeCliProvider (Phase 3) Implements the opencode_cli runtime provider per §8 of the design doc: - OpenCodeCliProvider with full BaseProvider interface (initialize, get_status, extract_last_message_from_script, exit_cli, cleanup) - 5-state detection (IDLE/PROCESSING/COMPLETED/WAITING_USER_ANSWER/ERROR) with line-level position guard against stale alt-screen esc-interrupt remnants (lesson #16) - COMPLETED vs IDLE-post-completion distinguished by checking for a subsequent ▣ token after the last full completion marker - 120s initialize() timeout for first-run npm install cold-start (§8.2) - Inline-env launch command with all stability env vars (§5) - --model flag included only when profile.model is set (§3.1 exception) - Registered in ProviderManager; "opencode_cli" added to PROVIDERS_REQUIRING_WORKSPACE_ACCESS in launch.py - 43 unit tests at 96% line coverage against Phase 1 fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add Phase 3 OpenCode provider runtime development walkthrough report Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(opencode): Phase 3 review polish — correct report line count, add dual-pattern comment Item 1: development report corrected 125 → 332 lines for opencode_cli.py. Item 4: inline comment at extract_last_message_from_script explains why the unanchored r"┃\s{2}" is used instead of the module-level USER_MESSAGE_PATTERN. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(opencode): Phase 4 — e2e test, provider docs, README/CHANGELOG - test/e2e/conftest.py: add require_opencode fixture (skips if opencode not on PATH) - test/e2e/test_assign.py: add TestOpenCodeCliAssign with data_analyst, report_generator, and assign_with_callback tests covering all four orchestration modes - docs/opencode-cli.md: new provider doc covering prerequisites, quick start, config isolation, permission/tool mapping, MCP wiring, known limitations, troubleshooting - README.md: add opencode_cli row to provider table + cao launch example - CHANGELOG.md: add Unreleased entry announcing OpenCode CLI provider Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add Phase 4 OpenCode e2e and docs development walkthrough report Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(opencode): translate CAO mcpServer format to OpenCode's opencode.json format (Phase 3 regression) CAO profiles store MCP servers with {type: "stdio", command: str, args: list}. OpenCode's opencode.json requires {type: "local", command: list, enabled: true}. The install branch was passing raw CAO config directly, causing OpenCode to reject the config with "Configuration is invalid: Invalid input mcp.cao-mcp-server". Fix: add translate_mcp_server_config() to opencode_config.py and call it in the opencode_cli install branch before upsert_mcp_server(). Also translates env→environment. 6 unit tests added for the translator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(opencode): add --yolo DANGEROUS caveat to permission troubleshooting (Phase 4 review polish) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update Phase 4 report with e2e results and Phase 3 regression fix notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(opencode): native skill discovery via OPENCODE_CONFIG_DIR/skills symlink At install time, create a skills → SKILLS_DIR symlink under OPENCODE_CONFIG_DIR so OpenCode auto-discovers CAO skills through its native skill tool (§5.1). Uses profile.system_prompt or profile.prompt as the lean agent body — the skill catalog is no longer baked into the OpenCode system prompt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(opencode): handle Nm Ns duration format and extend extraction buffer Two status/extraction bugs revealed by e2e runs with full system prompts: 1. COMPLETION_MARKER_PATTERN now matches the Nm Ns duration format that OpenCode emits for responses that take more than 60 seconds (e.g. "1m 8s"). The old pattern only matched the pure-seconds form, causing get_status() to stall at PROCESSING indefinitely for longer turns. 2. Add extraction_tail_lines property to BaseProvider (default None) and override to 2000 in OpenCodeCliProvider. terminal_service.get_output uses this value for the LAST-mode tmux capture so long responses don't push the user-message marker (┃ ) beyond the 200-line default window. Status-check captures are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(opencode): single capture-pane per get_output call; add wrong-target symlink test Item 2: Eliminate double capture-pane in get_output(mode=LAST). Previously the function always captured at 200 lines then recaptured if the provider declared extraction_tail_lines. Now FULL mode returns after a single capture at the default depth; LAST mode resolves extract_lines from the provider once and makes exactly one capture before the retry loop. Item 1: Add test_warns_and_skips_when_symlink_points_elsewhere to TestEnsureSkillsSymlink, covering the branch at opencode_config.py:37-42 where the target is a symlink that resolves to a different directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(opencode): rename on-disk config directory from opencode_cli to opencode Change OPENCODE_CONFIG_DIR from ~/.aws/opencode_cli to ~/.aws/opencode in constants.py; OPENCODE_AGENTS_DIR and OPENCODE_CONFIG_FILE update transitively. Update all path string references in docs, CHANGELOG, and the constants unit test. Provider identifier (ProviderType.OPENCODE_CLI.value == "opencode_cli") is unchanged. Add CHANGELOG migration note for users who need to re-run cao install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(opencode): fall back to first agent-indented line when user message scrolled off viewport OpenCode renders in alt-screen mode so the tmux scrollback only holds the current visible frame (~41 lines, history_size≈2). For long responses the user-message bar (┃ ) scrolls off the top before extraction runs, causing "No user message found". When no ┃ is found before the completion marker, scan for the first 5-space-indented agent line as the left boundary instead of raising. The visible frame already contains only the current turn's content, so multi-turn disambiguation is not needed here. Adds unit test test_fallback_extracts_when_user_message_scrolled_off. e2e: 3/3 PASSED in 161s on port 9888. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(terminal_service): guard build_skill_catalog() call and update skill-delivery comments Cleanup 1: build_skill_catalog() now runs only when the provider is in RUNTIME_SKILL_PROMPT_PROVIDERS, skipping the file reads/YAML parsing/Pydantic validation for providers that deliver skills natively (OpenCode symlink, Kiro skill:// resources) or via install-time baking (Q, Copilot). The skill_prompt kwarg at the create_provider call site simplifies to skill_prompt=skill_prompt since the guard now lives one line above. Cleanup 2: update comments in the RUNTIME_SKILL_PROMPT_PROVIDERS block and create_terminal Steps 3b/4 to reflect Phase 5's native OpenCode skill discovery. Adds two new tests asserting the lazy-call invariant: - test_build_skill_catalog_called_for_runtime_prompt_provider (call_count == 1) - test_build_skill_catalog_not_called_for_native_or_baked_provider (parametrized over opencode_cli, kiro_cli, q_cli, copilot_cli; assert_not_called) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update Phase 6 dev report with alt-screen extraction fix and cleanup commits Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(opencode): correct extraction_tail_lines docstring + black format + cleanup polish Item 1: black reformats terminal_service.py line 155 from a three-line expression to the single 100-char form black prefers. Item 2: rewrite extraction_tail_lines docstring — the old text claimed responses push ┃ beyond a 200-line window, which is wrong; OpenCode's alt-screen mode caps history_size near 2 making the override a no-op. Docstring now accurately describes the belt-and-braces rationale and cross-references the within-viewport fallback. Item 3: add single-turn alt-screen assumption comment to the normal extraction path. Item 4: CHANGELOG migration note gains a rm -rf cleanup hint for pre-release users. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(opencode): correct OPENCODE_DISABLE_MOUSE rationale and add UX callout - Update design doc stability-env-vars entry: footer patterns (ctrl+p, esc interrupt) are pinned and scroll-safe; the completion marker (▣ agent · model · Ns) is conversation content and scrolls off, preventing COMPLETED detection if mouse reporting is enabled - Add 'Scrolling enters tmux copy mode' Known Limitations entry in opencode-cli.md explaining the trade-off and how to work around it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * untrack out of scope docs * fix(opencode): align auto-approve, agent ID, and MCP cleanup semantics Addresses PR 193 review feedback. - Drop `cao install --auto-approve` and the `auto_approve` arg on the permission translator. CAO owns the permission decision, so `cao_tools_to_opencode_permission` now emits only `allow` or `deny`. `cao launch --auto-approve` retains its repo-wide meaning (skip CAO's confirmation prompt) and no longer has a provider-specific reinterpretation. - Remove stale `agent.<id>.tools` entries when a profile is reinstalled without `mcpServers` so revoked MCP grants do not persist. - Introduce `to_opencode_agent_id()` and use it consistently for the installed `.md` filename and the `agent.<id>.tools` key in `opencode.json`, keeping both aligned with the runtime `opencode --agent` argument for slash-containing profile names. - Strip phase numbers and design-doc section references from shipped source files and the provider doc per reviewer request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(opencode): drop reference to removed design doc from changelog Addresses review comment: the design doc link in the Unreleased entry referred to a file that is not included in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(opencode): scope extraction_tail_lines to OpenCodeCliProvider The property was opencode-specific but lived on BaseProvider, which meant every provider carried an attribute it had no use for. Remove it from the base class, keep it as a provider-local property on OpenCodeCliProvider, and have terminal_service.get_output read it via a getattr capability check so the base class stays agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(opencode): close codecov gaps in cli provider and permission translator Adds four tests to cover lines flagged as missing on the codecov report: - get_status ERROR fallback for non-empty output with no recognized marker - extract_last_message_from_script residual ``┃`` line + blank-line branches when raw_response contains leftover bar-prefixed lines - extract_last_message_from_script empty-response ValueError - cao_tools_to_opencode_permission AssertionError when a tool appears in ALL_OPENCODE_TOOLS without a matching policy Brings opencode_cli.py and opencode_permissions.py to 100% patch coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(opencode): mark provider experimental — single-agent flows only Add a warning badge to docs/opencode-cli.md and tag the README provider table row, both linking the post-settle inbox-delivery deadlock tracked in #203. Multi-agent flows are not yet reliable on opencode_cli; this signals the constraint to evaluators before they hit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: haofeif <56006724+haofeif@users.noreply.github.com>

Add unit tests and integration tests

70bf63d

tuanknguyen suggested changes Oct 31, 2025

View reviewed changes

chore: standardize test suite to uv convention

0593c3a

Addresses all review feedback from PR awslabs#16

tuanknguyen approved these changes Oct 31, 2025

View reviewed changes

tuanknguyen merged commit 9f95a9f into awslabs:main Oct 31, 2025

gutosantos82 deleted the add-q-cli-tests-clean branch November 2, 2025 22:18

haofeif mentioned this pull request Apr 18, 2026

feat(memory): add persistent agent memory system (Phase 1) #179

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive test suite for Q CLI provider#16

Add comprehensive test suite for Q CLI provider#16
tuanknguyen merged 2 commits into
awslabs:mainfrom
gutosantos82:add-q-cli-tests-clean

gutosantos82 commented Oct 31, 2025

Uh oh!

tuanknguyen left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tuanknguyen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gutosantos82 commented Oct 31, 2025

Summary

What's Included

Running Tests

Uh oh!

tuanknguyen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tuanknguyen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants