Fix/misc bug fixes#1974
Conversation
- agent: remove misplaced _set_interrupt(False) from _hydrate_todo_store() that silently cleared pending interrupts on every run_conversation call - agent: fix read_file in both _PATH_SCOPED_TOOLS and _PARALLEL_SAFE_TOOLS causing unnecessary batch serialization when path arg is missing - agent: fix temp file leak in _materialize_data_url_for_vision() when base64.b64decode() raises an exception - state: add missing thread lock to session_count() and message_count() preventing race conditions with concurrent gateway requests - state: add intermediate commits between schema migration steps to prevent version/schema mismatch on mid-migration crashes - tools: make firecrawl import lazy in web_tools.py so non-Firecrawl backends (Tavily, Parallel) don't crash on module import - tools: guard fcntl import in memory_tool.py for Windows compatibility per CONTRIBUTING.md cross-platform rules - gateway: wrap JSONL transcript write in try/except in session.py to prevent gateway crash on disk full or permission errors - gateway: add OSError handling to _read_pid_record() in status.py to handle concurrent PID file writes gracefully - cron: guard lock_fd.close() with None check in scheduler.py finally block to prevent AttributeError if open() never completed - cron: narrow RuntimeError catch in _deliver_result() to only catch event-loop-related errors, preventing spurious retries on unrelated RuntimeErrors from _send_to_platform
|
Orb Code Review (powered by GLM 5.1 on Orb Cloud) SummaryFixes 11 independently discovered bugs across the agent core, gateway, cron scheduler, and tools. Each fix is small and targeted. Key highlights:
ArchitectureAll fixes follow the existing patterns in the codebase. The bug fixes are orthogonal — no cross-dependencies between them. IssuesWarning — except RuntimeError as e:
if "event loop" not in str(e).lower():
raiseString-matching on exception messages is fragile — if Python changes the Suggestion — if current_version < 2:
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN ...")
cursor.execute("UPDATE schema_version SET version = 2")
self._conn.commit()
except sqlite3.OperationalError:
self._conn.rollback()
passThis is a minor improvement — the current code is already better than the original (single commit for all migrations). Note — Cross-file impact
Assessmentapprove ✅ — Well-structured batch of targeted bug fixes. Each addresses a real issue. Good test additions for the path normalization fix. The PR description's bug table is excellent for review traceability. |
1 similar comment
|
Orb Code Review (powered by GLM 5.1 on Orb Cloud) SummaryFixes 11 independently discovered bugs across the agent core, gateway, cron scheduler, and tools. Each fix is small and targeted. Key highlights:
ArchitectureAll fixes follow the existing patterns in the codebase. The bug fixes are orthogonal — no cross-dependencies between them. IssuesWarning — except RuntimeError as e:
if "event loop" not in str(e).lower():
raiseString-matching on exception messages is fragile — if Python changes the Suggestion — if current_version < 2:
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN ...")
cursor.execute("UPDATE schema_version SET version = 2")
self._conn.commit()
except sqlite3.OperationalError:
self._conn.rollback()
passThis is a minor improvement — the current code is already better than the original (single commit for all migrations). Note — Cross-file impact
Assessmentapprove ✅ — Well-structured batch of targeted bug fixes. Each addresses a real issue. Good test additions for the path normalization fix. The PR description's bug table is excellent for review traceability. |
Salvages the three substantive low-severity fixes from Gutslabs' #1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
|
Salvaged via #23591 — merge commit 3af3c4e on main with your authorship preserved (rebase merge). After review, only 3 of the 11 claims survived:
The other 8 claims were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard with proper Windows locking, path normalization, schema migrations rewritten declaratively) or didn't survive analysis (the interrupt clear is immediately re-asserted by run_conversation; cron lock_fd is guaranteed non-None by the outer try block structure; etc). Thanks for the contribution — credit preserved in git log for the salvaged fixes. |
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvages the three substantive low-severity fixes from Gutslabs' NousResearch#1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Summary
Fix 11 independently discovered bugs across the agent core, gateway, cron scheduler, and tools — none tracked in existing issues.
Bugs Fixed
run_agent.py_hydrate_todo_store()unconditionally called_set_interrupt(False), silently clearing pending interrupts on everyrun_conversationcallrun_agent.pyread_filelisted in both_PATH_SCOPED_TOOLSand_PARALLEL_SAFE_TOOLS— when called without a path arg, the path-scoped check returnedFalsefirst, unnecessarily serializing the entire batchrun_agent.py_materialize_data_url_for_vision()created aNamedTemporaryFile(delete=False)but never cleaned it up ifbase64.b64decode()raisedhermes_state.pysession_count()andmessage_count()were the onlySessionDBmethods that didn't acquireself._lock— race condition under concurrent gateway requestshermes_state.pyweb_tools.pyfrom firecrawl import Firecrawlcrashed the entire web tools module for users on Tavily or Parallel backendsmemory_tool.pyimport fcntl— immediateImportErroron Windows, violating CONTRIBUTING.md cross-platform rulesgateway/session.pyappend_to_transcript()JSONL write had no error handling — disk full or permission errors crashed the gateway message handlergateway/status.py_read_pid_record()calledread_text()withouttry/except— concurrent PID file writes caused unhandledOSErrorcron/scheduler.pyfinallyblock calledlock_fd.close()unconditionally, butlock_fdcould still beNoneifopen()never completed →AttributeErrorcron/scheduler.pyexcept RuntimeErrorin_deliver_result()was too broad — anyRuntimeErrorfrom_send_to_platform(not just event-loop errors) triggered a spurious retry in a new thread, risking double deliveryTest Plan
pytest tests/ -vpassesweb_toolswithout firecrawl package installed (Tavily/Parallel backend)memory_toolon Windows (fcntl guard)tick()handlesKeyboardInterruptduring lock acquisition withoutAttributeError