Skip to content

fix: guard json.loads() and use atomic writes for persistent state#29019

Open
annguyenNous wants to merge 1 commit into
NousResearch:mainfrom
annguyenNous:fix/guard-json-loads-and-atomic-writes
Open

fix: guard json.loads() and use atomic writes for persistent state#29019
annguyenNous wants to merge 1 commit into
NousResearch:mainfrom
annguyenNous:fix/guard-json-loads-and-atomic-writes

Conversation

@annguyenNous

Copy link
Copy Markdown
Contributor

Problem

Multiple files have unguarded json.loads() calls that crash on malformed/non-JSON responses, and several persistent state files use non-atomic write_text() that can corrupt on crash.

Fix

JSON decode guards (5 files):

  • weixin.py: _api_post() and _api_get() now catch json.JSONDecodeError and raise descriptive RuntimeError
  • browser_cdp_tool.py: Malformed CDP WebSocket frames are skipped instead of crashing the attach-to-target loop
  • osv_check.py: Non-JSON OSV API responses (502/503 HTML) return [] instead of crashing
  • api_server.py: Corrupted SQLite cache entries return None instead of propagating JSONDecodeError

Atomic writes (6 files):

  • skills_hub.py: HubLockFile and HubTapFile now use atomic_json_write() (temp+fsync+replace)
  • model_metadata.py: Context length cache uses atomic_yaml_write()
  • delivery.py: Cron delivery output uses tempfile+fsync+os.replace
  • run.py: Voice mode preferences use atomic_json_write()
  • environments/base.py: JSON store uses atomic_json_write()
  • checkpoint_manager.py: Project metadata uses atomic_json_write()

Resource management (1 file):

  • vision_tools.py: PIL Image.open() handle now closed in finally block to prevent fd leak during resize loops

Before vs After

File Before After
weixin.py API calls json.JSONDecodeError crash on non-200 HTML RuntimeError with context
CDP WebSocket Crash on malformed frame Skip and continue
osv_check.py Crash on 502 HTML body Return empty list
api_server.py cache Crash on corrupted SQLite Return None (cache miss)
skills_hub.py writes Partial write on crash Atomic replace
model_metadata.py cache Corrupted YAML on crash Atomic replace
vision_tools.py PIL fd leak until GC Explicit close in finally

Tests

  • All 11 modified files pass py_compile syntax check
  • Zero behavioral changes for happy paths — only error handling improved

JSON decode guards (5 files):
- weixin.py: wrap _api_post/_api_get json.loads in try/except
- browser_cdp_tool.py: skip malformed CDP WebSocket frames
- osv_check.py: return [] on non-JSON OSV API responses
- api_server.py: return None on corrupted SQLite cache entries

Atomic writes (6 files):
- skills_hub.py: use atomic_json_write for lock/taps files
- model_metadata.py: use atomic_yaml_write for context cache
- delivery.py: use tempfile+fsync+os.replace for delivery output
- run.py: use atomic_json_write for voice mode preferences
- environments/base.py: use atomic_json_write for JSON store
- checkpoint_manager.py: use atomic_json_write for project metadata

Resource management (1 file):
- vision_tools.py: close PIL Image in finally block to prevent fd leak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists tool/browser Browser automation (CDP, Playwright) tool/skills Skills system (list, view, manage) tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants