You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a GitHub Actions workflow sets timeout-minutes on a step that runs awf, the agent container is not reliably terminated when the timeout fires. The agent process inside the Docker container continues running past the step timeout, consuming runner time until the job-level (6-hour) or workflow-level (72-hour) timeout is hit.
GH Actions enforces timeout-minutes by sending SIGTERM to the step process (awf), followed by SIGKILL after a short grace period (~10 s). The awf Node.js process has a SIGTERM handler (src/cli.ts:1895–1898) that calls performCleanup() → stopContainers() → docker compose down -v. However:
docker compose down -v is slow — it gracefully stops services and tears down volumes, which can take 10–30 seconds.
If GH Actions sends SIGKILL to awf before docker compose down completes, awf is killed immediately while the Docker container (awf-agent) keeps running as an orphan.
Even in the non-SIGKILL path, there is a window where the container is still running after the step timeout fires.
The root cause is that the SIGTERM handler does not immediately kill the container before embarking on the slower graceful cleanup path.
AWF already has --agent-timeout <minutes> (src/cli.ts:1402, src/docker-manager.ts:1996–2022) which uses docker stop -t 10 awf-agent when the internal timer fires. But this is a separate mechanism from GH Actions step-level timeout-minutes, which signals the awf host process directly.
Root Cause
src/cli.ts:1895–1898 — the SIGTERM handler calls await performCleanup('SIGTERM') which calls stopContainers() → docker compose down -v. This is too slow to reliably complete before GH Actions sends SIGKILL.
src/docker-manager.ts:2089 (stopContainers) — uses docker compose down -v with default stop timeouts. No fast-path kill of awf-agent when called under signal pressure.
Proposed Solution
1. Fast-kill the container at the top of the SIGTERM/SIGINT handlers
In src/cli.ts, before calling the slow performCleanup(), immediately stop the container so the agent can't outlive the awf process:
process.on('SIGTERM',async()=>{// Fast-kill the container immediately so it cannot outlive this process.// docker compose down (called in performCleanup) is too slow and may be// interrupted by a follow-up SIGKILL from the GH Actions runner.try{awaitexeca('docker',['stop','-t','3','awf-agent'],{reject: false});}catch{/* best-effort */}awaitperformCleanup('SIGTERM');process.exit(143);});
A 3-second graceful window for the container gives the agent a chance to flush logs, while still completing well within the GH Actions grace period before SIGKILL.
2. Document --agent-timeout as the preferred workaround
Until a fix ships, users can set --agent-timeout <minutes> in their awf invocation to cap agent execution at the AWF level, which already does docker stop -t 10 awf-agent correctly. The compiled GH Actions workflow could accept a timeout-minutes input and pass it as --agent-timeout. This is a gh-aw CLI concern but the AWF documentation should surface the option.
3. (Optional) Add --stop-timeout to stopContainers
Expose a stopTimeoutSeconds parameter in stopContainers() so callers from signal handlers can request a faster teardown (e.g., docker compose down --timeout 5) instead of the default 10-second container stop grace period.
Problem
When a GitHub Actions workflow sets
timeout-minuteson a step that runsawf, the agent container is not reliably terminated when the timeout fires. The agent process inside the Docker container continues running past the step timeout, consuming runner time until the job-level (6-hour) or workflow-level (72-hour) timeout is hit.GH Actions enforces
timeout-minutesby sending SIGTERM to the step process (awf), followed by SIGKILL after a short grace period (~10 s). TheawfNode.js process has a SIGTERM handler (src/cli.ts:1895–1898) that callsperformCleanup()→stopContainers()→docker compose down -v. However:docker compose down -vis slow — it gracefully stops services and tears down volumes, which can take 10–30 seconds.awfbeforedocker compose downcompletes,awfis killed immediately while the Docker container (awf-agent) keeps running as an orphan.The root cause is that the SIGTERM handler does not immediately kill the container before embarking on the slower graceful cleanup path.
Context
--agent-timeout <minutes>(src/cli.ts:1402,src/docker-manager.ts:1996–2022) which usesdocker stop -t 10 awf-agentwhen the internal timer fires. But this is a separate mechanism from GH Actions step-leveltimeout-minutes, which signals theawfhost process directly.Root Cause
src/cli.ts:1895–1898— the SIGTERM handler callsawait performCleanup('SIGTERM')which callsstopContainers()→docker compose down -v. This is too slow to reliably complete before GH Actions sends SIGKILL.src/docker-manager.ts:2089(stopContainers) — usesdocker compose down -vwith default stop timeouts. No fast-path kill ofawf-agentwhen called under signal pressure.Proposed Solution
1. Fast-kill the container at the top of the SIGTERM/SIGINT handlers
In
src/cli.ts, before calling the slowperformCleanup(), immediately stop the container so the agent can't outlive theawfprocess:A 3-second graceful window for the container gives the agent a chance to flush logs, while still completing well within the GH Actions grace period before SIGKILL.
2. Document
--agent-timeoutas the preferred workaroundUntil a fix ships, users can set
--agent-timeout <minutes>in theirawfinvocation to cap agent execution at the AWF level, which already doesdocker stop -t 10 awf-agentcorrectly. The compiled GH Actions workflow could accept atimeout-minutesinput and pass it as--agent-timeout. This is agh-awCLI concern but the AWF documentation should surface the option.3. (Optional) Add
--stop-timeouttostopContainersExpose a
stopTimeoutSecondsparameter instopContainers()so callers from signal handlers can request a faster teardown (e.g.,docker compose down --timeout 5) instead of the default 10-second container stop grace period.