Skip to content

End-to-end integration test: single agent receives and completes a task #24

@Aureliolo

Description

@Aureliolo

Context

Comprehensive end-to-end test validating the full single-agent pipeline works correctly. This is the capstone test for M3, ensuring all components integrate properly.

Acceptance Criteria

  • Scenario 1: Agent with file tools creates a file from a task description
  • Scenario 2: Agent without tools answers a question (text-only response)
  • Scenario 3: Tool permission denied is handled gracefully (clear error, no crash)
  • Scenario 4: Max iterations reached results in clean failure with informative message
  • Mocked LLM provider used (no real API calls in CI)
  • Happy path and error paths both covered
  • Cost tracking validated: costs recorded correctly for each scenario
  • Status transitions validated: correct lifecycle states observed
  • Optional real LLM flag for manual integration testing runs
  • Tests are deterministic and reproducible

Dependencies

Design Spec Reference

Section 3.1, 6.1, 11.1 — Agent System, Task Execution, and Tool System

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:providersDESIGN_SPEC Section 9 - Model Provider Layerspec:securityDESIGN_SPEC Section 12 - Security & Approval Systemspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginespec:toolsDESIGN_SPEC Section 11 - Tool & Capability Systemtype:testTest coverage, test infrastructure

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions