-
Notifications
You must be signed in to change notification settings - Fork 0
End-to-end integration test: single agent receives and completes a task #24
Copy link
Copy link
Closed
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:providersDESIGN_SPEC Section 9 - Model Provider LayerDESIGN_SPEC Section 9 - Model Provider Layerspec:securityDESIGN_SPEC Section 12 - Security & Approval SystemDESIGN_SPEC Section 12 - Security & Approval Systemspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow EngineDESIGN_SPEC Section 6 - Task & Workflow Enginespec:toolsDESIGN_SPEC Section 11 - Tool & Capability SystemDESIGN_SPEC Section 11 - Tool & Capability Systemtype:testTest coverage, test infrastructureTest coverage, test infrastructure
Milestone
Description
Context
Comprehensive end-to-end test validating the full single-agent pipeline works correctly. This is the capstone test for M3, ensuring all components integrate properly.
Acceptance Criteria
- Scenario 1: Agent with file tools creates a file from a task description
- Scenario 2: Agent without tools answers a question (text-only response)
- Scenario 3: Tool permission denied is handled gracefully (clear error, no crash)
- Scenario 4: Max iterations reached results in clean failure with informative message
- Mocked LLM provider used (no real API calls in CI)
- Happy path and error paths both covered
- Cost tracking validated: costs recorded correctly for each scenario
- Status transitions validated: correct lifecycle states observed
- Optional real LLM flag for manual integration testing runs
- Tests are deterministic and reproducible
Dependencies
- Depends on Design and implement basic tool system (registry, invocation, results) #15 (single-task execution lifecycle)
Design Spec Reference
Section 3.1, 6.1, 11.1 — Agent System, Task Execution, and Tool System
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:providersDESIGN_SPEC Section 9 - Model Provider LayerDESIGN_SPEC Section 9 - Model Provider Layerspec:securityDESIGN_SPEC Section 12 - Security & Approval SystemDESIGN_SPEC Section 12 - Security & Approval Systemspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow EngineDESIGN_SPEC Section 6 - Task & Workflow Enginespec:toolsDESIGN_SPEC Section 11 - Tool & Capability SystemDESIGN_SPEC Section 11 - Tool & Capability Systemtype:testTest coverage, test infrastructureTest coverage, test infrastructure