-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Summary
After memory extraction via session.commit(), the semantic processor generates .overview.md for the parent memory directory. When this overview text exceeds the embedding model's context length, OpenAIDenseEmbedder.embed() raises an unhandled RuntimeError. This exception appears to block the uvicorn event loop, causing the entire HTTP server to become unresponsive (process alive, port open, but all endpoints hang).
Environment
- OpenViking: installed via pipx (latest as of 2026-03-17)
- OS: macOS arm64 (Darwin 25.3.0)
- Embedding: Ollama
nomic-embed-text(8192 token context, 768 dim) via OpenAI-compatible API - VLM: Bailian
qwen3-max - Mode: local, bound to
127.0.0.1:1933 - Integration: OpenClaw
memory-openvikingplugin
Steps to Reproduce
- Accumulate enough memories in a directory (e.g.
viking://user/default/memories/preferences) - Trigger a new memory extraction (e.g. via session commit from OpenClaw)
- Memory extractor writes the new memory file successfully
- Semantic processor runs on the parent directory (
recursive=False) - Generated
.overview.mdaggregates all file summaries → text exceeds embedding model's token limit - Embedding queue calls
OpenAIDenseEmbedder.embed()with the oversized text - Ollama returns HTTP 400:
the input length exceeds the context length embed()raisesRuntimeError→collection_schemas.py:on_dequeuepropagates the exception- Server hangs: all HTTP endpoints stop responding,
curltimes out
Relevant Logs
INFO - Processing semantic generation for: viking://user/default/memories/preferences (recursive=False)
WARNING - Candidate data is None for label index 4 (label: ...), skipping.
INFO - Created memory file: viking://user/default/memories/preferences/mem_04c2ef28-...md
INFO - Enqueued memory for vectorization
stderr:
openai.BadRequestError: Error code: 400 - {'error': {'message': 'the input length exceeds the context length', ...}}
RuntimeError: OpenAI API error: Error code: 400 - ...
After this error, no further log output appears and all HTTP requests time out.
Root Cause Analysis
Two issues combine:
-
No input truncation guard in
OpenAIDenseEmbedder.embed()(openai_embedders.py): text is passed directly to the API without any length check. When the embedding model has a limited context window, oversized input causes a hard API error. -
Unhandled exception in embedding queue blocks uvicorn: the
RuntimeErrorfrom the embedder propagates throughcollection_schemas.py:on_dequeueand appears to block or crash the async event loop, making the entire server unresponsive.
Expected Behavior
- Embedding input should be truncated (or chunked) before being sent to the provider
- Embedding failures should be caught gracefully without blocking the HTTP server
- The server should remain responsive even if individual vectorization tasks fail
Workaround
Monkey-patched OpenAIDenseEmbedder.embed() to truncate input to 24000 chars (~6000-8000 tokens) before calling the API. Server remains stable after the patch.
Related Issues
- [Bug]: add-resource sends oversized input to OpenAI embeddings API during repo import #616 — same oversized embedding problem but for
add-resource(closed, only addressed that path) - Long memory indexing should use first-class chunked vectorization instead of relying on single-record embedding #530 — long memory should use chunked vectorization (design discussion)
- Embedding truncation and chunking should have clearer responsibilities across memory, file, and directory vectorization #531 — truncation vs chunking responsibilities (design discussion)
- Bug: process alive but HTTP endpoints hang, causing AbortError in memory_store/auto-capture #527 — process alive but HTTP endpoints hang (same symptom, likely same root cause)
- Memory extraction triggers O(n²) semantic reprocessing — token cost grows quadratically with memory count #505 — O(n²) semantic reprocessing amplifies the problem
Metadata
Metadata
Assignees
Labels
Type
Projects
Status