Skip to content

fix: rollback MemoryManager state when provider registration fails#10051

Open
nightq wants to merge 6 commits into
NousResearch:mainfrom
nightq:fix/issue-9948-memory-manager-rollback
Open

fix: rollback MemoryManager state when provider registration fails#10051
nightq wants to merge 6 commits into
NousResearch:mainfrom
nightq:fix/issue-9948-memory-manager-rollback

Conversation

@nightq

@nightq nightq commented Apr 15, 2026

Copy link
Copy Markdown

Summary

Fixes failed external provider registration leaving MemoryManager in a poisoned state that blocks all future external providers.

Root Cause

add_provider() mutated _has_external = True and appended to _providers before calling provider.get_tool_schemas(). If schema loading raised an exception, the manager retained the failed provider in its list and kept _has_external = True, causing all subsequent external providers to be rejected with "external provider X is already registered".

Fix

Load tool schemas before mutating any manager state. Only set _has_external = True and append to _providers after get_tool_schemas() succeeds.

Test Plan

  • New regression test verifies failed provider doesn't register and doesn't block subsequent providers
  • Existing MemoryManager tests still pass

Closes #9948

nightq added 6 commits April 15, 2026 11:41
Fixes NousResearch#9999

Root cause: _sanitize_api_messages compared raw tool_call_id strings
without stripping whitespace, causing valid tool results to be treated
as orphaned when IDs had leading/trailing spaces.
Fix: strip whitespace in _get_tool_call_id_static and when collecting
result_call_ids from tool messages.
Fixes NousResearch#9980

Root cause: _send_raw_message hardcoded 'chat_id' as receive_id_type,
causing [230001] invalid receive_id errors when sending to user open_ids
(prefix 'ou_') or union_ids (prefix 'on_').
Fix: Add _detect_receive_id_type() that checks ID prefix (oc_→chat_id,
ou_→open_id, on_→union_id) and use it in _send_raw_message.
Fixes NousResearch#9950

Root cause: Prompts discarded for having no reasoning were not added to
completed_in_batch, causing --resume to retry them indefinitely.
Fix: Append prompt_index to completed_in_batch before continuing past
the discard branch.
… env var

Fixes NousResearch#9869

Root cause: Hindsight cloud client had hardcoded 30s timeout, causing
hindsight_reflect to fail for longer operations.
Fix: Read timeout from HINDSIGHT_TIMEOUT env var with default of 300s.
…t cwd

Fixes NousResearch#9949

Root cause: Relative paths in skills.external_dirs were resolved against
the process cwd via Path.resolve(), causing inconsistent behavior across
CLI/gateway/cron contexts.
Fix: Check if the path is relative and resolve it against the config
file's parent directory instead of cwd.
Fixes NousResearch#9948

Root cause: add_provider() set _has_external=True and appended to
_providers BEFORE calling get_tool_schemas(). If schemas raised, the
manager was left in a half-registered state that blocked all future
external providers.
Fix: Call get_tool_schemas() first and only mutate state after it
succeeds.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers comp/plugins Plugin system and bundled plugins labels Apr 26, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #9997 — both fix the same root cause: add_provider() mutates state before schema validation in MemoryManager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MemoryManager add_provider() leaves failed external provider half-registered

2 participants