Brokered Azure lease allocation can time out before a late lease becomes ready

## Summary

Brokered Azure Linux lease allocation can time out after the CLI/coordinator wait window with `no lease was returned`, even though a lease from the same Azure path can later appear as active/ready and be usable via `crabbox run --id`.

This makes OpenClaw's Azure-backed Crabbox default unreliable for proof runs: the user-facing command fails as unavailable, while the underlying Azure VM may still be provisioning or may become usable too late for the caller.

## Environment

- Date observed: 2026-06-05
- Client OS/shell: Windows, PowerShell
- Crabbox CLI: `0.26.0`
- Broker: `https://crabbox.openclaw.ai`
- Auth: GitHub broker auth for org `openclaw`
- Repository/worktree: `C:\oc-work\oc-87735`
- Repo config: OpenClaw `.crabbox.yaml`
- Default OpenClaw provider in that repo: `azure`
- Azure config in repo: `location: eastus2`
- Tested provider/type: `azure`, `Standard_D4ads_v6`, `market=on-demand`, `target=linux`
- Local `rsync`: installed and runnable (`C:\Users\marti\.local\bin\rsync.cmd`, rsync `3.4.2`)

`crabbox doctor --provider azure` reached the broker/provider and reported `provider=azure coordinator_secrets=ready`. The only local doctor failure was the existing Windows config permission warning:

```text
failed  config   C:\Users\marti\AppData\Roaming\crabbox\config.yaml: permissions 0666 want 0600
ok      broker   auth=github owner=martin_cleary@yahoo.co.uk org=openclaw default_type=Standard_D32ads_v6
ok      provider provider=azure coordinator_secrets=ready
```

## Reproduction

From `C:\oc-work\oc-87735`:

```powershell
pnpm crabbox:run -- --type Standard_D4ads_v6 --market on-demand --idle-timeout 10m --ttl 20m --timing-json --no-sync --no-hydrate --stop-after always --shell -- "echo CRABBOX_AZURE_SMOKE_OK; uname -srm; whoami; pwd"
```

This is intentionally a tiny no-sync/no-hydrate command so the result isolates lease allocation/SSH readiness rather than repo sync or test setup.

## Observed Behavior

The command waited for a coordinator lease for the full 10-minute acquire window, then failed:

```text
[crabbox] bin=..\..\Users\marti\.local\bin\crabbox.exe version=0.26.0 provider=azure providers=...
recording run run_87a63bdd35c2
coordinator lease class=standard preferred_type=Standard_D4ads_v6 keep=false slug=amber-barnacle idle_timeout=10m0s ttl=20m0s
waiting for coordinator lease provider=azure slug=amber-barnacle elapsed=30s timeout=10m0s
...
waiting for coordinator lease provider=azure slug=amber-barnacle elapsed=9m30s timeout=10m0s
timed out waiting for coordinator lease after 10m0s provider=azure target=linux type=Standard_D4ads_v6 slug=amber-barnacle lease=cbx_ddea6cab6b52; no lease was returned; next_action=check coordinator/cloud logs and retry, then run `crabbox stop --provider azure --target linux --id cbx_ddea6cab6b52` if a late lease appears
```

Immediately after the timeout, the hinted late lease id was not visible to the user:

```text
crabbox status --provider azure --id cbx_ddea6cab6b52
coordinator GET /v1/leases/cbx_ddea6cab6b52: http 404: {"error":"not_found"}
```

In the same troubleshooting session, a separate Azure attempt from another chat showed the more worrying late-lease behavior directly: the command timed out after 10 minutes, but the reported late lease later appeared in the user-visible lease list as active/ready:

```text
crabbox-harbor-crab-78988ccd active Standard_D4ads_v6 20.101.44.161 lease=cbx_2513f241d618 slug=harbor-crab keep=false target=linux
```

Status/inspect showed it was ready:

```text
cbx_2513f241d618 slug=harbor-crab provider=azure target=linux state=active type=Standard_D4ads_v6 host=20.101.44.161 ready=true has_host=true idle_timeout=1h30m0s
```

A no-sync attach command against that late/ready Azure lease succeeded:

```powershell
pnpm crabbox:run -- --provider azure --id cbx_2513f241d618 --no-sync --no-hydrate --timing-json --stop-after never --shell -- "echo CRABBOX_AZURE_REUSE_OK; uname -srm; whoami; pwd"
```

Output:

```text
CRABBOX_AZURE_REUSE_OK
Linux 7.0.0-1004-azure x86_64
crabbox
/work/crabbox/cbx_2513f241d618/oc-87735
```

Timing summary:

```json
{"provider":"azure","leaseId":"cbx_2513f241d618","slug":"harbor-crab","syncMs":0,"syncSkipped":true,"commandMs":1845,"totalMs":2353,"exitCode":0,"runId":"run_c85ec4db1c50","machineType":"Standard_D4ads_v6"}
```

## Additional Cleanup Evidence

After filing this issue, the portal showed both late Azure leases as active:

- `cbx_2513f241d618` / `harbor-crab`
- `cbx_ddea6cab6b52` / `amber-barnacle`

`harbor-crab` released successfully:

```text
crabbox stop --provider azure --target linux --id cbx_2513f241d618
released lease=cbx_2513f241d618 server=crabbox-harbor-crab-78988ccd
```

`amber-barnacle` is more concerning. It is visible as active/ready even though `keep=false`, `idle_timeout=10m0s`, and `expiresAt` is already in the past:

```text
cbx_ddea6cab6b52 slug=amber-barnacle provider=azure target=linux state=active type=Standard_D4ads_v6 host=52.157.75.123 ready=true has_host=true idle_for=28m2s idle_timeout=10m0s expires=2026-06-05T18:37:13.214Z
```

A manual release attempt with a long local timeout failed at the broker release endpoint:

```text
crabbox stop --provider azure --target linux --id cbx_ddea6cab6b52
Post "https://crabbox.openclaw.ai/v1/leases/cbx_ddea6cab6b52/release": context deadline exceeded
```

A follow-up `list/status/inspect` still showed it active/ready. So this issue covers both late lease visibility and a cleanup/release timeout for at least one late Azure lease.

## Expected Behavior

One of these should happen:

1. Azure allocation returns the lease once the VM becomes SSH-ready, within the configured wait window for normal OpenClaw proof runs.
2. If Azure provisioning is legitimately slow, the CLI/coordinator reports a precise capacity/provisioning-delay state instead of a generic acquire timeout.
3. If a lease is still provisioning after the caller times out, late lease cleanup/status is reliable: the hinted lease id should be visible, inspectable, and stoppable once it exists.
4. The CLI should not leave the operator in a state where the proof run fails but a paid Azure lease later becomes active outside the failed run's control path.

## Why This Looks Reportable

This does not appear to be a local user-auth problem:

- Broker auth is configured and works for the `openclaw` org.
- `crabbox doctor --provider azure` reaches the broker/provider and reports Azure coordinator secrets ready.
- An already-ready Azure lease can be attached to and used successfully.
- AWS brokered leases are usable from the same machine/session.

This also does not appear to be repo sync/test setup, because the failing repro uses `--no-sync --no-hydrate` and only tries to run `echo`, `uname`, `whoami`, and `pwd`.

Related PR history suggests small Azure Linux brokered warmups have previously completed well inside 10 minutes:

- #39 reported brokered Azure Linux `Standard_D2ads_v6` warmup around 2m25s.
- #111 reported brokered Azure `Standard_D2ads_v6` warmup around 1m55s.

The current symptom is therefore either a real Azure capacity/provisioning latency issue that needs better surfacing, or a coordinator/CLI late-lease lifecycle bug.

<img width="1669" height="663" alt="Image" src="https://github.com/user-attachments/assets/aef6fbe2-e686-406d-ae9f-f865eebcce1b" />

above screenshot from the UI, which i could see the boxes available afterwards

## Acceptance Criteria

- A fresh brokered Azure no-sync smoke either succeeds or returns a clear, actionable capacity/provisioning status.
- Late Azure leases created by timed-out attempts are consistently visible to `status`/`inspect`/`list` once they exist.
- Timed-out attempts do not leave untracked active Azure leases, or the CLI provides a reliable cleanup command that works after late provisioning completes.
- If the right fix is a longer/default Azure acquire timeout for `Standard_D4ads_v6`/managed OS disk paths, document that expectation in the provider docs and OpenClaw `.crabbox.yaml` guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brokered Azure lease allocation can time out before a late lease becomes ready #215

Summary

Environment

Reproduction

Observed Behavior

Additional Cleanup Evidence

Expected Behavior

Why This Looks Reportable

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Brokered Azure lease allocation can time out before a late lease becomes ready #215

Description

Summary

Environment

Reproduction

Observed Behavior

Additional Cleanup Evidence

Expected Behavior

Why This Looks Reportable

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions