feat(session): add chunked vectorization for long memories by deepakdevp · Pull Request #734 · volcengine/OpenViking

deepakdevp · 2026-03-18T07:54:08Z

Summary

Long memories are now split into multiple independent vector records for better retrieval precision
Each chunk is independently searchable, replacing the current approach of averaging chunks into a single lossy vector
Configurable via SemanticConfig.memory_chunk_chars and memory_chunk_overlap
Short memories behave identically to before

Fixes #530

Problem

When a memory exceeds the embedder's max token limit, the current code chunks it internally but merges all chunks into a single averaged vector (via _chunk_and_embed in base.py). This averaging is lossy — a long memory covering "Python debugging" AND "Docker networking" becomes one blended vector that matches neither topic precisely. Retrieval quality degrades as memories grow.

Solution

Chunk long memories before enqueuing to the EmbeddingQueue. Each chunk becomes a separate vector record:

Chunk URIs: memory_uri#chunk_0001, memory_uri#chunk_0002, etc.
Each chunk's parent_uri points to the original memory for deduplication
The base memory record is still enqueued with its abstract as the vector text
Chunking prefers paragraph boundaries (\n\n) with configurable overlap

Changes Made

openviking_cli/utils/config/parser_config.py: Added memory_chunk_chars (2000) and memory_chunk_overlap (200) to SemanticConfig
openviking/session/compressor.py: Modified _index_memory() to chunk long memories before enqueuing. Added _chunk_text() static method with paragraph-aware splitting.
tests/misc/test_semantic_config.py: 5 new tests for chunking logic (short text, long text, overlap, paragraph boundaries, custom config)

Configuration

{
  "semantic": {
    "memory_chunk_chars": 2000,
    "memory_chunk_overlap": 200
  }
}

Type of Change

New feature (non-breaking change which adds functionality)

Testing

Tests added (5 new chunking tests, 12 total pass)
Tested on macOS
Short memories unchanged (backward compatible)

Checklist

My code follows the code style of this project
I have performed a self-review of my own code
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes

Long memories are now split into multiple vector records for better retrieval precision. Each chunk gets its own URI (memory_uri#chunk_NNNN) and is independently searchable, while the base record retains the abstract as its vector. Chunking prefers paragraph boundaries and uses configurable overlap for context continuity. Short memories (under memory_chunk_chars) are unchanged — single vector as before. New SemanticConfig fields: - memory_chunk_chars (default 2000): chunk threshold - memory_chunk_overlap (default 200): overlap between chunks Fixes volcengine#530. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chenjw · 2026-03-18T10:38:54Z

Thank you so much for your contribution! The issue with the average vector you mentioned is indeed lossy — this problem was introduced by the commit at #642, and we will roll it back.
Additionally, you proposed a solution of using multiple vectors to represent a single Viking URI. However, I see a drawback to this approach: it still allows a single Viking URI to serve as the unit of retrieval. If this retrieval unit is too large, it will still consume an excessive number of context tokens.
Our current design philosophy is to split both documents and memories into relatively small URI units based on semantics during the preprocessing phase. This way, the embedding and URI remain in a 1:1 correspondence, and there is no need for chunking.
The current implementation of memory is still quite basic and has not yet achieved this. We will soon initiate a refactoring of the memory module, aiming to partially achieve this goal. Therefore, from a long-term planning perspective, chunking of embeddings may not be necessary.

deepakdevp · 2026-03-18T11:08:26Z

Understood — the 1:1 URI-to-embedding design makes sense as the long-term direction. Splitting memories into smaller semantic units during preprocessing is cleaner than chunking at embedding time.

Happy to close this PR since the memory refactoring will address the root issue properly. The lossy averaging rollback from #642 should resolve the immediate problem.

If there's anything I can help with on the memory module refactoring when it starts, let me know.

chenjw · 2026-03-18T11:25:22Z

Understood — the 1:1 URI-to-embedding design makes sense as the long-term direction. Splitting memories into smaller semantic units during preprocessing is cleaner than chunking at embedding time.

Happy to close this PR since the memory refactoring will address the root issue properly. The lossy averaging rollback from #642 should resolve the immediate problem.

If there's anything I can help with on the memory module refactoring when it starts, let me know.

We have an ongoing discussion, and your participation and suggestions are warmly welcome. #712 The current solution does not yet include a mechanism for splitting memory files.

chenjw · 2026-03-18T11:35:48Z

We have discussed this issue internally. In fact, we already enforce size limits per URI for PDF, Markdown, and HTML files. However, this constraint has not yet been implemented for code and memory. Until this constraint is in place, the 1:N vectorization approach is better than nothing — so we have decided to accept this change. Thank you again for your contribution!

chenjw · 2026-03-18T11:38:14Z

@qin-ptr

qin-ptr

Review Summary

This PR introduces chunked vectorization for long memories to improve retrieval precision. The approach is sound, but there's a critical implementation bug: the base record still embeds the full content instead of just the abstract, defeating the purpose of chunking.

Main Issue: The code comment claims "uses abstract as vector text" for the base record, but EmbeddingMsgConverter.from_context(memory) actually uses memory.vectorize.text, which contains the full content (set in memory_extractor.py:456). This needs to be fixed before merge.

See inline comments for details.

🤖 I am a bot owned by @qin-ctx.

qin-ptr · 2026-03-18T12:22:19Z

openviking/session/compressor.py

            change_type: One of "added" or "modified"
        """
        from openviking.storage.queuefs.embedding_msg_converter import EmbeddingMsgConverter
+        from openviking_cli.utils.config import get_openviking_config


[Bug] (blocking)

The base record still uses full content instead of abstract, contradicting the PR description and comment.

Root cause: When a memory is created, memory_extractor.py:456 sets memory.set_vectorize(Vectorize(text=payload.content)), so memory.vectorize.text contains the full content. When you call EmbeddingMsgConverter.from_context(memory) without resetting it, the base record will still try to embed the full long content.

Impact:

Base record may exceed embedding token limits and fail

Even if successful, produces a lossy vector (same problem this PR aims to solve)

Code behavior contradicts the comment "uses abstract as vector text"

Fix: Before enqueueing the base record, reset vectorization text to abstract:

if vectorize_text and len(vectorize_text) > semantic.memory_chunk_chars: # ... existing chunk logic ... # Reset base record to use abstract memory.set_vectorize(Vectorize(text=memory.abstract)) # Now base record uses abstract as intended embedding_msg = EmbeddingMsgConverter.from_context(memory) await self.vikingdb.enqueue_embedding_msg(embedding_msg)

qin-ptr · 2026-03-18T12:22:19Z

tests/misc/test_semantic_config.py

@@ -80,3 +83,54 @@ def test_batch_splitting():
    assert len(batches[0]) == 50


[Suggestion] (non-blocking)

Current tests only cover _chunk_text() unit behavior. Consider adding integration tests to verify:

Long memories are split into correct number of chunk records

Base record's vectorization text is abstract (not full content)

Chunk URIs follow the memory_uri#chunk_NNNN format

parent_uri correctly points to original memory URI

This would catch issues like the base record vectorization bug.

qin-ptr · 2026-03-18T12:22:19Z

openviking/session/compressor.py

+                if boundary > start + chunk_size // 2:
+                    end = boundary + 2  # Include the double newline
+
+            chunks.append(text[start:end].strip())


[Suggestion] (non-blocking)

The overlap calculation may be slightly inaccurate because:

Line 212: chunks.append(text[start:end].strip()) strips whitespace

Line 213: start = end - overlap uses the original end position

Actual overlap between chunks may differ from configured overlap value. Consider either:

Document that overlap is approximate

Calculate boundaries first, then strip uniformly

github-project-automation bot added this to OpenViking project Mar 18, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 18, 2026

deepakdevp closed this Mar 18, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 18, 2026

chenjw reopened this Mar 18, 2026

chenjw mentioned this pull request Mar 18, 2026

revert-embedding-chunking #741

Merged

qin-ptr suggested changes Mar 18, 2026

View reviewed changes

chenjw approved these changes Mar 18, 2026

View reviewed changes

chenjw merged commit 1efb9fb into volcengine:main Mar 18, 2026
11 checks passed

deepakdevp mentioned this pull request Mar 26, 2026

[Feature]: Add migration tool to import existing OpenClaw conversation history into OpenViking #1011

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): add chunked vectorization for long memories#734

feat(session): add chunked vectorization for long memories#734
chenjw merged 1 commit intovolcengine:mainfrom
deepakdevp:feat/chunked-memory-vectorization

deepakdevp commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

deepakdevp commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

qin-ptr left a comment

Uh oh!

qin-ptr Mar 18, 2026

Uh oh!

qin-ptr Mar 18, 2026

Uh oh!

qin-ptr Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -80,3 +83,54 @@ def test_batch_splitting():
		assert len(batches[0]) == 50

Conversation

deepakdevp commented Mar 18, 2026

Summary

Problem

Solution

Changes Made

Configuration

Type of Change

Testing

Checklist

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

deepakdevp commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

chenjw commented Mar 18, 2026

Uh oh!

qin-ptr left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

qin-ptr Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ptr Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ptr Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants