Skip to content

Implement graceful shutdown with cooperative timeout strategy (DESIGN_SPEC §6.7) #130

@Aureliolo

Description

@Aureliolo

Context

When the process receives SIGTERM/SIGINT (user Ctrl+C, Docker stop, systemd shutdown), the framework needs to stop cleanly without losing work or leaking costs. The MVP implements the cooperative with timeout strategy behind a ShutdownStrategy protocol.

Acceptance Criteria

ShutdownStrategy Protocol

  • ShutdownStrategy protocol defined
  • Protocol is pluggable — new strategies can be registered via config

Cooperative with Timeout Strategy (Default / MVP)

  • shutdown_event (asyncio.Event) set on signal receipt — agents check at turn boundaries
  • Stop accepting new tasks (drain gate closes)
  • Wait up to grace_seconds (default: 30) for agents to exit cooperatively
  • Force-cancel remaining agents (task.cancel()) after grace period
  • Cleanup phase (cleanup_seconds, default: 5): persist cost records, close provider connections, flush logs
  • Configurable via graceful_shutdown: YAML config block

New TaskStatus: INTERRUPTED

  • INTERRUPTED added to TaskStatus enum as a non-terminal state
  • Valid transitions updated: IN_PROGRESS → INTERRUPTED, INTERRUPTED → ASSIGNED (reassignment on restart)
  • INTERRUPTED indicates process shutdown, distinct from FAILED (agent error) and CANCELLED (user action)

Signal Handling

  • SIGINT (Ctrl+C) handled cross-platform
  • SIGTERM handled on Unix; signal.signal() fallback on Windows (no loop.add_signal_handler())
  • In-flight LLM calls: log request start (input token count) before each provider call for cost audit

Testing

  • Unit tests for ShutdownStrategy protocol
  • Unit tests for cooperative timeout with simulated agents
  • Integration test: send SIGINT → agents stop → tasks marked INTERRUPTED → cleanup runs

Dependencies

Design Spec Reference

  • §6.7 — Graceful Shutdown Protocol (Strategy 1: Cooperative with Timeout)

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementation

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions