Skip to content

fix(memory): prevent stuck typing on embedding model switch#27175

Open
dakshaymehta wants to merge 2 commits intoopenclaw:mainfrom
dakshaymehta:fix/embedding-model-switch-typing-stuck
Open

fix(memory): prevent stuck typing on embedding model switch#27175
dakshaymehta wants to merge 2 commits intoopenclaw:mainfrom
dakshaymehta:fix/embedding-model-switch-typing-stuck

Conversation

@dakshaymehta
Copy link
Contributor

Summary

Fixes #27143 — changing the embedding model (e.g. text-embedding-3-smalltext-embedding-3-large) causes the gateway to get stuck with an infinite "typing..." indicator on Telegram after replying, becoming unresponsive to new messages.

Root cause: A race condition in MemoryIndexManager during safe reindex triggered by the model change. The reindex mutates shared state (this.db, this.vector.*) on the singleton manager instance while concurrent async operations (agent memory search, session warm) use the same instance. Additionally, a falsy dimension check in ensureVectorTable failed to drop stale vec0 tables when dimensions were reset to undefined during the reindex, and an unguarded vec0 INSERT could throw unhandled errors that corrupt the indexing pipeline.

Changes

  1. Fix falsy dimension check in ensureVectorTable (manager-sync-ops.ts:223)

    • Changed this.vector.dims && this.vector.dims !== dimensions to this.vector.dims !== undefined && this.vector.dims !== dimensions
    • The truthy check skipped dropVectorTable() when dims was reset to undefined during safe reindex, leaving a stale vec0 table with wrong dimensions
  2. Add reindexing guard flag (manager-sync-ops.ts)

    • New protected reindexing = false flag set during runSafeReindex
    • ensureVectorReady returns false when reindexing, causing search to gracefully fall back to the non-vec0 cosine similarity path instead of operating on a temp database that may be closed mid-query
  3. Wrap vec0 INSERT in try-catch (manager-embedding-ops.ts:771-783)

    • The vec0 INSERT was the only vec0 write NOT wrapped in error handling — an unhandled throw from a dimension mismatch could corrupt the entire indexing pipeline
    • Now logs the error and continues indexing remaining chunks
  4. Improve dropVectorTable error logging (manager-sync-ops.ts:244)

    • Changed from log.debug to log.warn — a failed table drop is a significant issue that was invisible at debug level

Test plan

  • All 222 memory module tests pass (pnpm test src/memory/)
  • All 67 typing/dispatch tests pass (pnpm test src/channels/typing.test.ts src/auto-reply/reply/reply-flow.test.ts src/auto-reply/reply/followup-runner.test.ts)
  • Lint + format pass (pnpm check)
  • Manual verification: change embedding model in config, restart gateway, send Telegram message — typing should stop after reply and gateway should remain responsive

🤖 Generated with Claude Code

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Fixes a race condition that caused Telegram to show infinite "typing..." and become unresponsive after changing embedding models. The core issue was concurrent async operations accessing shared MemoryIndexManager state during safe reindex.

Key changes:

  • Added reindexing guard flag that prevents concurrent operations from querying the temp database during reindex by making ensureVectorReady() return false, forcing searches to use the safe non-vec0 fallback path
  • Fixed dimension check in ensureVectorTable() from truthy to explicit !== undefined check for correct handling of falsy dimension values
  • Wrapped previously unhandled vec0 INSERT in try-catch to prevent dimension mismatch errors from corrupting the indexing pipeline (all other vec0 operations were already wrapped)
  • Improved error visibility by changing failed table drop logging from debug to warn level

The implementation correctly sets/resets the reindexing flag in all code paths (success at manager-sync-ops.ts:1109 and error at manager-sync-ops.ts:1116), and the fallback behavior in manager-search.ts:71 safely handles queries during reindex using cosine similarity on the chunks table.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation correctly addresses the race condition with a well-designed guard flag, fixes the dimension check logic, and adds proper error handling. All code paths properly set/reset the reindexing flag (both success and error paths), the fallback behavior is already well-tested, and the changes follow existing error handling patterns in the codebase. The PR includes passing tests (222 memory module tests, 67 typing/dispatch tests), and the atomic reindex test suite validates error handling during reindex operations.
  • No files require special attention

Last reviewed commit: 66bd50b

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime size: S and removed size: XS labels Feb 26, 2026
@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: change embedding model causing infinite typing status in Telegram

1 participant