Problem
Daytona-backed workflow runs can become brittle when the sandbox, OpenClaw runtime, or selected model provider fails after the claw has already been created. A run can appear active while the agent is blocked on missing workspace files, unavailable tool capabilities, or repeated model-provider idle timeouts.
This issue is intentionally sanitized. It does not include any task/story-specific details from the workflow that exposed the problem.
Observed failure classes
- The agent attempted to read files under an expected cloned repository path, but the path was missing in the Daytona workspace.
- The agent attempted an elevated shell command even though OpenClaw reported elevated execution was unavailable for the current runtime/provider policy.
- The selected model provider timed out after the OpenClaw idle timeout and no fallback profile was configured, leaving the session stalled until the remote connection dropped.
Code context
- Daytona provisioning is launched asynchronously from
pkg/hub/workflow_creator.go after the claw DB row is inserted and marked provisioning.
pkg/hub/server.go sets Daytona claws to starting, then runs bootstrapDaytona in the background.
bootstrapDaytona currently pins OpenClaw to 2026.5.20, installs optional Nix/Docker, configures the gateway, stages plugin dependencies, writes workspace files, configures GitHub credentials, optionally clones repos, then sets bootstrap_ok=1 and starts claw-bridge.
- Repository clone verification exists in the GitHub App path, but workspace repository availability should be treated as a first-class readiness gate whenever a workflow declares required repositories.
detectToolLoop currently catches repeated edit/write/read errors, but not repeated exec/elevated/tool-policy failures.
Desired behavior
A workflow run should fail fast or recover cleanly instead of leaving the operator with a stuck agent session.
-
Workspace readiness should be explicit:
- Before setting
bootstrap_ok=1 or starting the bridge, verify every configured workspace repository is present at the expected path and has a .git directory.
- If repositories cannot be cloned because credentials are missing/unavailable, mark the claw as failed with a sanitized reason instead of starting the agent against an incomplete workspace.
- Persist a concise bootstrap diagnostic that the UI can show without requiring SSH/log spelunking.
-
Tool policy should match the runtime:
- If Daytona/OpenClaw direct runtime cannot use elevated execution, the agent should be told that up front in injected context or OpenClaw config.
- Detect repeated
exec failed: elevated is not available errors and inject a corrective hub message, similar to the existing tool-loop detection.
- Consider exposing tool capability status in the claw details panel so operators know whether elevated, browser, file, and process tools are available.
-
Model-provider timeouts should be recoverable:
- Detect OpenClaw/provider idle timeout failures and mark the current turn as failed/retryable instead of leaving the session stuck.
- Add optional fallback model/profile configuration for workflow runs, especially for providers/models known to have long first-token latency.
- Surface a clear UI/status message such as
model timeout: no fallback configured with a retry action.
- Consider increasing or configuring idle timeout for slow models, but only with a visible status so true hangs are still detectable.
-
Connection loss should not hide root cause:
- If SSH/Daytona disconnects after runtime failure, preserve the last known classified failure in the claw record.
- Status should distinguish
offline, model_timeout, bootstrap_failed, and workspace_incomplete where possible.
Proposed implementation plan
- Add a Daytona bootstrap readiness checklist and store per-step results in
bootstrap_status or a structured column/table.
- Promote repository clone/verify into a reusable required-workspace step that runs independently of the credential mechanism.
- Extend tool-loop detection to include process/elevated/tool-policy failures.
- Add model timeout classification based on bridge/OpenClaw messages, then expose it in claw status and UI.
- Add tests for:
- missing required repository prevents
bootstrap_ok=1 and bridge start;
- repeated elevated exec failures cause a corrective hub message;
- model timeout/fallback-unavailable messages are persisted as a classified failure.
Acceptance criteria
- A Daytona workflow with missing required repos does not start the agent; it fails with a sanitized, actionable bootstrap error.
- Repeated unavailable elevated execution does not loop silently; the hub injects guidance or blocks the tool path.
- Model idle timeout with no fallback produces a visible failed/retryable status and preserved diagnostic.
- No workflow issue/story content, secrets, tokens, or raw long logs are included in persisted failure comments or GitHub issue comments.
Problem
Daytona-backed workflow runs can become brittle when the sandbox, OpenClaw runtime, or selected model provider fails after the claw has already been created. A run can appear active while the agent is blocked on missing workspace files, unavailable tool capabilities, or repeated model-provider idle timeouts.
This issue is intentionally sanitized. It does not include any task/story-specific details from the workflow that exposed the problem.
Observed failure classes
Code context
pkg/hub/workflow_creator.goafter the claw DB row is inserted and markedprovisioning.pkg/hub/server.gosets Daytona claws tostarting, then runsbootstrapDaytonain the background.bootstrapDaytonacurrently pins OpenClaw to2026.5.20, installs optional Nix/Docker, configures the gateway, stages plugin dependencies, writes workspace files, configures GitHub credentials, optionally clones repos, then setsbootstrap_ok=1and startsclaw-bridge.detectToolLoopcurrently catches repeated edit/write/read errors, but not repeated exec/elevated/tool-policy failures.Desired behavior
A workflow run should fail fast or recover cleanly instead of leaving the operator with a stuck agent session.
Workspace readiness should be explicit:
bootstrap_ok=1or starting the bridge, verify every configured workspace repository is present at the expected path and has a.gitdirectory.Tool policy should match the runtime:
exec failed: elevated is not availableerrors and inject a corrective hub message, similar to the existing tool-loop detection.Model-provider timeouts should be recoverable:
model timeout: no fallback configuredwith a retry action.Connection loss should not hide root cause:
offline,model_timeout,bootstrap_failed, andworkspace_incompletewhere possible.Proposed implementation plan
bootstrap_statusor a structured column/table.bootstrap_ok=1and bridge start;Acceptance criteria