apply_wal_with_fallback: DELETE fallback uncaught — crashes on APFS external SSDs

## Bug Description

`apply_wal_with_fallback()` in `hermes_state.py` fails completely when `~/.hermes` is on an APFS external SSD. Both WAL and DELETE journal modes throw "disk I/O error". The DELETE fallback is uncaught, so the exception propagates up and crashes every caller that depends on a SQLite connection — kanban dispatcher, SessionDB init, API server, holographic memory store, etc.

## Environment

- macOS 26.5
- APFS external SSD (Thunderbolt / USB-C)
- `~/.hermes` lives on the external volume
- SQLite 3.x (system default)

## Root Cause

The fix from #22032 added `apply_wal_with_fallback()` with `_WAL_INCOMPAT_MARKERS` including `"disk i/o error"`. WAL failures matching these correctly trigger a DELETE fallback on line 160. However, when DELETE *also* fails with a disk I/O error (as seen on APFS external SSDs), that exception is NOT caught — it propagates out unhandled:

```python
except sqlite3.OperationalError as exc:
    msg = str(exc).lower()
    if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
        raise
    _log_wal_fallback_once(db_label, exc)
    conn.execute("PRAGMA journal_mode=DELETE")   # <-- UNCAUGHT
    return "delete"
```

Impact on callers:

- **SessionDB.__init__** (`hermes_state.py:354`): caught by its own `except`, sets `_last_init_error`, re-raises → session DB stays `None`, features like `/resume`, `/title`, `/history` silently break
- **kanban_db.connect()** (`kanban_db.py:1050`): caught by its own `except`, closes connection, re-raises → kanban dispatcher crashes every 60s when the dashboard is open
- **api_server.py** (line 349): same pattern → response store unavailable
- **plugins/memory/holographic/store.py** (line 134): same pattern → holographic memory store fails

## Workaround

Manually set `journal_mode=DELETE` + run VACUUM on the databases:

```bash
sqlite3 ~/.hermes/state.db "PRAGMA journal_mode=DELETE; VACUUM;"
sqlite3 ~/.hermes/kanban/default.db "PRAGMA journal_mode=DELETE; VACUUM;"
```

This persists DELETE mode in the DB header, so subsequent connections start with DELETE and never trigger the WAL fallback path.

## Proposed Fix

Wrap the DELETE fallback in a try/except. If both WAL and DELETE fail, log a warning and continue with the connection's default journal mode.

```python
def apply_wal_with_fallback(
    conn: sqlite3.Connection,
    *,
    db_label: str = "state.db",
) -> str:
    try:
        conn.execute("PRAGMA journal_mode=WAL")
        return "wal"
    except sqlite3.OperationalError as exc:
        msg = str(exc).lower()
        if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
            raise
        _log_wal_fallback_once(db_label, exc)
        try:
            conn.execute("PRAGMA journal_mode=DELETE")
            return "delete"
        except sqlite3.OperationalError as delete_exc:
            logger.warning(
                "%s: both WAL and DELETE journal_mode failed "
                "(WAL: %s, DELETE: %s). "
                "Continuing with default journal mode.",
                db_label, exc, delete_exc,
            )
            return "delete"
```

### Tests to update

`test_captures_cause_on_failed_init` in `tests/test_hermes_state_wal_fallback.py` currently expects `SessionDB()` to *raise* when both pragmas fail. With the fix, SessionDB would succeed (both errors caught internally). Update the test to verify:
1. `SessionDB()` succeeds despite both journal_mode pragmas failing
2. The connection is usable for reads/writes
3. A warning is logged (new test or extend the existing one)

### All callers (would benefit from the fix without any changes)

- `hermes_state.py:354` — SessionDB.__init__
- `hermes_cli/kanban_db.py:1050` — kanban_db.connect()
- `gateway/platforms/api_server.py:349` — ResponseStore init
- `plugins/memory/holographic/store.py:134` — MemoryStore init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply_wal_with_fallback: DELETE fallback uncaught — crashes on APFS external SSDs #30816

Bug Description

Environment

Root Cause

Workaround

Proposed Fix

Tests to update

All callers (would benefit from the fix without any changes)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

apply_wal_with_fallback: DELETE fallback uncaught — crashes on APFS external SSDs #30816

Description

Bug Description

Environment

Root Cause

Workaround

Proposed Fix

Tests to update

All callers (would benefit from the fix without any changes)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions