fix(memory): prevent stuck typing on embedding model switch by dakshaymehta · Pull Request #27175 · openclaw/openclaw

dakshaymehta · 2026-02-26T04:28:35Z

Summary

Fixes #27143 — changing the embedding model (e.g. text-embedding-3-small → text-embedding-3-large) causes the gateway to get stuck with an infinite "typing..." indicator on Telegram after replying, becoming unresponsive to new messages.

Root cause: A race condition in MemoryIndexManager during safe reindex triggered by the model change. The reindex mutates shared state (this.db, this.vector.*) on the singleton manager instance while concurrent async operations (agent memory search, session warm) use the same instance. Additionally, a falsy dimension check in ensureVectorTable failed to drop stale vec0 tables when dimensions were reset to undefined during the reindex, and an unguarded vec0 INSERT could throw unhandled errors that corrupt the indexing pipeline.

Changes

Fix falsy dimension check in ensureVectorTable (manager-sync-ops.ts:223)
- Changed this.vector.dims && this.vector.dims !== dimensions to this.vector.dims !== undefined && this.vector.dims !== dimensions
- The truthy check skipped dropVectorTable() when dims was reset to undefined during safe reindex, leaving a stale vec0 table with wrong dimensions
Add reindexing guard flag (manager-sync-ops.ts)
- New protected reindexing = false flag set during runSafeReindex
- ensureVectorReady returns false when reindexing, causing search to gracefully fall back to the non-vec0 cosine similarity path instead of operating on a temp database that may be closed mid-query
Wrap vec0 INSERT in try-catch (manager-embedding-ops.ts:771-783)
- The vec0 INSERT was the only vec0 write NOT wrapped in error handling — an unhandled throw from a dimension mismatch could corrupt the entire indexing pipeline
- Now logs the error and continues indexing remaining chunks
Improve dropVectorTable error logging (manager-sync-ops.ts:244)
- Changed from log.debug to log.warn — a failed table drop is a significant issue that was invisible at debug level

Test plan

All 222 memory module tests pass (pnpm test src/memory/)
All 67 typing/dispatch tests pass (pnpm test src/channels/typing.test.ts src/auto-reply/reply/reply-flow.test.ts src/auto-reply/reply/followup-runner.test.ts)
Lint + format pass (pnpm check)
Manual verification: change embedding model in config, restart gateway, send Telegram message — typing should stop after reply and gateway should remain responsive

🤖 Generated with Claude Code

…#27143)

greptile-apps · 2026-02-26T04:33:18Z

Greptile Summary

Fixes a race condition that caused Telegram to show infinite "typing..." and become unresponsive after changing embedding models. The core issue was concurrent async operations accessing shared MemoryIndexManager state during safe reindex.

Key changes:

Added reindexing guard flag that prevents concurrent operations from querying the temp database during reindex by making ensureVectorReady() return false, forcing searches to use the safe non-vec0 fallback path
Fixed dimension check in ensureVectorTable() from truthy to explicit !== undefined check for correct handling of falsy dimension values
Wrapped previously unhandled vec0 INSERT in try-catch to prevent dimension mismatch errors from corrupting the indexing pipeline (all other vec0 operations were already wrapped)
Improved error visibility by changing failed table drop logging from debug to warn level

The implementation correctly sets/resets the reindexing flag in all code paths (success at manager-sync-ops.ts:1109 and error at manager-sync-ops.ts:1116), and the fallback behavior in manager-search.ts:71 safely handles queries during reindex using cosine similarity on the chunks table.

Confidence Score: 5/5

This PR is safe to merge with no identified issues
The implementation correctly addresses the race condition with a well-designed guard flag, fixes the dimension check logic, and adds proper error handling. All code paths properly set/reset the reindexing flag (both success and error paths), the fallback behavior is already well-tested, and the changes follow existing error handling patterns in the codebase. The PR includes passing tests (222 memory module tests, 67 typing/dispatch tests), and the atomic reindex test suite validates error handling during reindex operations.
No files require special attention

_{Last reviewed commit: 66bd50b}

openclaw-barnacle · 2026-03-03T05:22:38Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

fix(memory): prevent stuck typing on embedding model switch (openclaw…

66bd50b

…#27143)

openclaw-barnacle bot added the size: XS label Feb 26, 2026

test: normalize paths in symlink safety tests for Windows CI

8a1d7bf

openclaw-barnacle bot added gateway Gateway runtime size: S and removed size: XS labels Feb 26, 2026

openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory): prevent stuck typing on embedding model switch#27175

fix(memory): prevent stuck typing on embedding model switch#27175
dakshaymehta wants to merge 2 commits intoopenclaw:mainfrom
dakshaymehta:fix/embedding-model-switch-typing-stuck

dakshaymehta commented Feb 26, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

openclaw-barnacle bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dakshaymehta commented Feb 26, 2026

Summary

Changes

Test plan

Uh oh!

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

openclaw-barnacle bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant