Skip to content

[DGX Spark][Sandbox] Hermes gateway kanban dispatcher logs repeated Traceback under shields-up — sqlite3 OperationalError on readonly database #4299

@mercl-lau

Description

@mercl-lau

Description

When a NemoHermes sandbox has shields UP (read-only lockdown), the Hermes gateway's kanban dispatcher repeatedly fails with sqlite3.OperationalError trying to write kanban.db. The error fires on every dispatcher tick (every few seconds), filling gateway.log with ERROR + Traceback entries. The gateway health endpoint still returns 200 so the issue is silent to the user, but kanban task-scheduling functionality is completely broken under shields-up and the log is polluted with recurring stack traces.

Environment

Device:        DGX Spark (NVIDIA GB10, spark-8158)
OS:            Ubuntu 24.04.4 LTS (aarch64)
Architecture:  aarch64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        Docker 29.2.1
OpenShell CLI: openshell 0.0.44
NemoClaw:      v0.0.52
NemoHermes:    v0.0.52
Hermes Agent:  v2026.5.16

Steps to Reproduce

  1. nemohermes onboard --name hermes-ollama (provider: Ollama, model: qwen3.6:35b)
  2. Wait for sandbox to reach Phase=Ready
  3. nemohermes hermes-ollama shields up
  4. Wait 30 seconds
  5. openshell sandbox exec -n hermes-ollama -- sh -c 'head -50 /tmp/gateway.log'
  6. openshell sandbox exec -n hermes-ollama -- sh -c 'grep -c Traceback /tmp/gateway.log'

Expected Result

gateway.log under shields-up should contain at most documented non-fatal warnings (e.g. channel_directory Permission denied). It MUST NOT contain "Traceback", "FATAL", or "Aborting" (per DevTest T6047907 expected result).

The kanban dispatcher should either:

  • Skip write operations when the filesystem is read-only
  • Put kanban.db under a writable path (e.g. /tmp)
  • Gracefully degrade without logging ERROR + Traceback on every tick

Actual Result

gateway.log contains 7 Traceback entries within 30 seconds of shields-up:

ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
  File "/opt/hermes/gateway/run.py", line 4693, in _tick_once_for_board
    conn = _kb.connect(board=slug)
  File "/opt/hermes/hermes_cli/kanban_db.py", line 928, in connect
    apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
  File "/opt/hermes/hermes_state.py", line 152, in apply_wal_with_fallback
    conn.execute("PRAGMA journal_mode=WAL")
sqlite3.OperationalError: attempt to write a readonly database

The error repeats indefinitely on every tick cycle.

Logs

$ openshell sandbox exec -n hermes-ollama -- sh -c 'grep -c Traceback /tmp/gateway.log'
7

$ openshell sandbox exec -n hermes-ollama -- sh -c 'head -50 /tmp/gateway.log'
WARNING gateway.run: No user allowlists configured.
WARNING gateway.platforms.api_server: No API key configured
ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
  File "/opt/hermes/gateway/run.py", line 4693, in _tick_once_for_board
    conn = _kb.connect(board=slug)
  File "/opt/hermes/hermes_cli/kanban_db.py", line 928, in connect
    apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
  File "/opt/hermes/hermes_state.py", line 152, in apply_wal_with_fallback
    conn.execute("PRAGMA journal_mode=WAL")
sqlite3.OperationalError: attempt to write a readonly database
[repeats 7 times in 30 seconds]

WARNING gateway.channel_directory: Channel directory: failed to write:
  [Errno 13] Permission denied: '/sandbox/.hermes/.channel_directory_*.tmp'

Note: The channel_directory warning is documented non-fatal noise.
The kanban sqlite3 Traceback is NOT documented and is a new error path.

NVB#6228504

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: observabilityLogging, metrics, tracing, diagnostics, or debug outputintegration: hermesHermes integration behaviorplatform: dgx-sparkAffects DGX Spark hardware or workflows

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions