feat: make Daytona workflow runs resilient to workspace, tool-policy, and model timeout failures (#331) by elasticclaw-factory[bot] · Pull Request #339 · elasticclaw/elasticclaw

elasticclaw-factory · 2026-06-03T11:06:20Z

This PR addresses issue #331 by making Daytona-backed workflow runs fail fast or recover cleanly instead of leaving stuck agent sessions.

Changes

1. Workspace readiness gate (bootstrapDaytona)

Before setting bootstrap_ok=1 or starting the bridge, verify every configured repository is present at the expected path with a .git directory
If repos are missing, mark the claw as failed with a sanitized, actionable bootstrap error stored in bootstrap_diagnostic
Uses new setBootstrapStatusWithDiagnostic helper to persist both status and diagnostic for UI display

2. Tool-loop detection (detectToolLoop)

Extended to catch repeated exec/elevated/tool-policy failures:
- exec failed:
- elevated is not available
- tool-policy
Injects a corrective hub message after 3+ occurrences, similar to existing edit/write/read loop detection

3. Model timeout classification (heartbeat handler)

When gateway is unhealthy for 8+ consecutive checks (~4 minutes) while streaming, classify as 'model timeout'
Persists diagnostic in bootstrap_diagnostic column and broadcasts it to the UI as a retryable failure

4. Bootstrap diagnostic persistence

Added bootstrap_diagnostic column to claws table (schema + migration)
Added BootstrapDiagnostic field to types.Claw
stopAgentWithReason now persists sanitized diagnostic to DB and includes it in the broadcast payload
setBootstrapStatusWithDiagnostic helper added for gating failures

5. Tests

Added daytona_resilience_test.go with tests for:
- elevated failure loop detection
- tool-policy failure loop detection
- single failure not triggering loop
- mixed failure patterns
- bootstrap output sanitization for workspace diagnostics

Verification

go build ./... passes
go test ./pkg/hub/... passes (all 100+ tests)

Closes #331

… and model timeout failures This commit addresses issue #331 by making Daytona-backed workflow runs fail fast or recover cleanly instead of leaving stuck agent sessions. Changes: 1. Workspace readiness gate (bootstrapDaytona): - Before setting bootstrap_ok=1 or starting the bridge, verify every configured repository is present at the expected path with a .git dir. - If repos are missing, mark the claw as failed with a sanitized, actionable bootstrap error (bootstrap_diagnostic). - Uses setBootstrapStatusWithDiagnostic to persist both status and diagnostic for UI display. 2. Tool-loop detection (detectToolLoop): - Extended to catch repeated exec/elevated/tool-policy failures: - 'exec failed:' - 'elevated is not available' - 'tool-policy' - Injects a corrective hub message after 3+ occurrences, similar to existing edit/write/read loop detection. 3. Model timeout classification (heartbeat handler): - When gateway is unhealthy for 8+ consecutive checks (~4 minutes) while streaming, classify as 'model timeout'. - Persists diagnostic in bootstrap_diagnostic column and broadcasts it to the UI as a retryable failure. 4. Bootstrap diagnostic persistence: - Added bootstrap_diagnostic column to claws table (schema + migration). - Added BootstrapDiagnostic field to types.Claw. - stopAgentWithReason now persists sanitized diagnostic to DB and includes it in the broadcast payload. - setBootstrapStatusWithDiagnostic helper added for gating failures. 5. Tests: - Added daytona_resilience_test.go with tests for: - elevated failure loop detection - tool-policy failure loop detection - single failure not triggering loop - mixed failure patterns - bootstrap output sanitization for workspace diagnostics

greptile-apps · 2026-06-03T11:09:27Z

Comments Outside Diff (1)

pkg/hub/server.go, line 2103-2114 (link)

The bootstrapDiagnostic field is fetched from the DB (the query and scan were both updated correctly), but it is not included in the initial WebSocket payload emitted to reconnecting clients. Any user who opens the dashboard after an agent has already failed will receive bootstrap_status but silently lose the diagnostic that explains why it failed — defeating the purpose of this PR.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "feat: make Daytona workflow runs resilie..." | Re-trigger Greptile}

greptile-apps · 2026-06-03T11:41:15Z

_{Reviews (2): Last reviewed commit: "Address Daytona resilience review feedba..." | Re-trigger Greptile}

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread pkg/hub/server.go Outdated

Address Daytona resilience review feedback

0a3ce79

marccampbell merged commit 7f0b35e into main Jun 3, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make Daytona workflow runs resilient to workspace, tool-policy, and model timeout failures (#331)#339

feat: make Daytona workflow runs resilient to workspace, tool-policy, and model timeout failures (#331)#339
marccampbell merged 2 commits into
mainfrom
feat/daytona-resilience-331

elasticclaw-factory Bot commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elasticclaw-factory Bot commented Jun 3, 2026

Changes

1. Workspace readiness gate (bootstrapDaytona)

2. Tool-loop detection (detectToolLoop)

3. Model timeout classification (heartbeat handler)

4. Bootstrap diagnostic persistence

5. Tests

Verification

Uh oh!

greptile-apps Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 3, 2026 •

edited

Loading