Skip to content

feat(session): add chunked vectorization for long memories#734

Merged
chenjw merged 1 commit intovolcengine:mainfrom
deepakdevp:feat/chunked-memory-vectorization
Mar 18, 2026
Merged

feat(session): add chunked vectorization for long memories#734
chenjw merged 1 commit intovolcengine:mainfrom
deepakdevp:feat/chunked-memory-vectorization

Conversation

@deepakdevp
Copy link
Copy Markdown
Contributor

Summary

  • Long memories are now split into multiple independent vector records for better retrieval precision
  • Each chunk is independently searchable, replacing the current approach of averaging chunks into a single lossy vector
  • Configurable via SemanticConfig.memory_chunk_chars and memory_chunk_overlap
  • Short memories behave identically to before

Fixes #530

Problem

When a memory exceeds the embedder's max token limit, the current code chunks it internally but merges all chunks into a single averaged vector (via _chunk_and_embed in base.py). This averaging is lossy — a long memory covering "Python debugging" AND "Docker networking" becomes one blended vector that matches neither topic precisely. Retrieval quality degrades as memories grow.

Solution

Chunk long memories before enqueuing to the EmbeddingQueue. Each chunk becomes a separate vector record:

  • Chunk URIs: memory_uri#chunk_0001, memory_uri#chunk_0002, etc.
  • Each chunk's parent_uri points to the original memory for deduplication
  • The base memory record is still enqueued with its abstract as the vector text
  • Chunking prefers paragraph boundaries (\n\n) with configurable overlap

Changes Made

  • openviking_cli/utils/config/parser_config.py: Added memory_chunk_chars (2000) and memory_chunk_overlap (200) to SemanticConfig
  • openviking/session/compressor.py: Modified _index_memory() to chunk long memories before enqueuing. Added _chunk_text() static method with paragraph-aware splitting.
  • tests/misc/test_semantic_config.py: 5 new tests for chunking logic (short text, long text, overlap, paragraph boundaries, custom config)

Configuration

{
  "semantic": {
    "memory_chunk_chars": 2000,
    "memory_chunk_overlap": 200
  }
}

Type of Change

  • New feature (non-breaking change which adds functionality)

Testing

  • Tests added (5 new chunking tests, 12 total pass)
  • Tested on macOS
  • Short memories unchanged (backward compatible)

Checklist

  • My code follows the code style of this project
  • I have performed a self-review of my own code
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes

Long memories are now split into multiple vector records for better
retrieval precision. Each chunk gets its own URI (memory_uri#chunk_NNNN)
and is independently searchable, while the base record retains the
abstract as its vector.

Chunking prefers paragraph boundaries and uses configurable overlap
for context continuity. Short memories (under memory_chunk_chars)
are unchanged — single vector as before.

New SemanticConfig fields:
- memory_chunk_chars (default 2000): chunk threshold
- memory_chunk_overlap (default 200): overlap between chunks

Fixes volcengine#530.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chenjw
Copy link
Copy Markdown
Collaborator

chenjw commented Mar 18, 2026

Thank you so much for your contribution! The issue with the average vector you mentioned is indeed lossy — this problem was introduced by the commit at #642, and we will roll it back.
Additionally, you proposed a solution of using multiple vectors to represent a single Viking URI. However, I see a drawback to this approach: it still allows a single Viking URI to serve as the unit of retrieval. If this retrieval unit is too large, it will still consume an excessive number of context tokens.
Our current design philosophy is to split both documents and memories into relatively small URI units based on semantics during the preprocessing phase. This way, the embedding and URI remain in a 1:1 correspondence, and there is no need for chunking.
The current implementation of memory is still quite basic and has not yet achieved this. We will soon initiate a refactoring of the memory module, aiming to partially achieve this goal. Therefore, from a long-term planning perspective, chunking of embeddings may not be necessary.

@deepakdevp
Copy link
Copy Markdown
Contributor Author

Understood — the 1:1 URI-to-embedding design makes sense as the long-term direction. Splitting memories into smaller semantic units during preprocessing is cleaner than chunking at embedding time.

Happy to close this PR since the memory refactoring will address the root issue properly. The lossy averaging rollback from #642 should resolve the immediate problem.

If there's anything I can help with on the memory module refactoring when it starts, let me know.

@deepakdevp deepakdevp closed this Mar 18, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 18, 2026
@chenjw
Copy link
Copy Markdown
Collaborator

chenjw commented Mar 18, 2026

Understood — the 1:1 URI-to-embedding design makes sense as the long-term direction. Splitting memories into smaller semantic units during preprocessing is cleaner than chunking at embedding time.

Happy to close this PR since the memory refactoring will address the root issue properly. The lossy averaging rollback from #642 should resolve the immediate problem.

If there's anything I can help with on the memory module refactoring when it starts, let me know.

We have an ongoing discussion, and your participation and suggestions are warmly welcome. #712 The current solution does not yet include a mechanism for splitting memory files.

@chenjw chenjw reopened this Mar 18, 2026
@chenjw
Copy link
Copy Markdown
Collaborator

chenjw commented Mar 18, 2026

We have discussed this issue internally. In fact, we already enforce size limits per URI for PDF, Markdown, and HTML files. However, this constraint has not yet been implemented for code and memory. Until this constraint is in place, the 1:N vectorization approach is better than nothing — so we have decided to accept this change. Thank you again for your contribution!

@chenjw
Copy link
Copy Markdown
Collaborator

chenjw commented Mar 18, 2026

@qin-ptr

@chenjw chenjw mentioned this pull request Mar 18, 2026
Copy link
Copy Markdown
Contributor

@qin-ptr qin-ptr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR introduces chunked vectorization for long memories to improve retrieval precision. The approach is sound, but there's a critical implementation bug: the base record still embeds the full content instead of just the abstract, defeating the purpose of chunking.

Main Issue: The code comment claims "uses abstract as vector text" for the base record, but EmbeddingMsgConverter.from_context(memory) actually uses memory.vectorize.text, which contains the full content (set in memory_extractor.py:456). This needs to be fixed before merge.

See inline comments for details.

🤖 I am a bot owned by @qin-ctx.

change_type: One of "added" or "modified"
"""
from openviking.storage.queuefs.embedding_msg_converter import EmbeddingMsgConverter
from openviking_cli.utils.config import get_openviking_config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking)

The base record still uses full content instead of abstract, contradicting the PR description and comment.

Root cause: When a memory is created, memory_extractor.py:456 sets memory.set_vectorize(Vectorize(text=payload.content)), so memory.vectorize.text contains the full content. When you call EmbeddingMsgConverter.from_context(memory) without resetting it, the base record will still try to embed the full long content.

Impact:

  • Base record may exceed embedding token limits and fail
  • Even if successful, produces a lossy vector (same problem this PR aims to solve)
  • Code behavior contradicts the comment "uses abstract as vector text"

Fix: Before enqueueing the base record, reset vectorization text to abstract:

if vectorize_text and len(vectorize_text) > semantic.memory_chunk_chars:
    # ... existing chunk logic ...
    # Reset base record to use abstract
    memory.set_vectorize(Vectorize(text=memory.abstract))

# Now base record uses abstract as intended
embedding_msg = EmbeddingMsgConverter.from_context(memory)
await self.vikingdb.enqueue_embedding_msg(embedding_msg)

@@ -80,3 +83,54 @@ def test_batch_splitting():
assert len(batches[0]) == 50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking)

Current tests only cover _chunk_text() unit behavior. Consider adding integration tests to verify:

  1. Long memories are split into correct number of chunk records
  2. Base record's vectorization text is abstract (not full content)
  3. Chunk URIs follow the memory_uri#chunk_NNNN format
  4. parent_uri correctly points to original memory URI

This would catch issues like the base record vectorization bug.

if boundary > start + chunk_size // 2:
end = boundary + 2 # Include the double newline

chunks.append(text[start:end].strip())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking)

The overlap calculation may be slightly inaccurate because:

  • Line 212: chunks.append(text[start:end].strip()) strips whitespace
  • Line 213: start = end - overlap uses the original end position

Actual overlap between chunks may differ from configured overlap value. Consider either:

  1. Document that overlap is approximate
  2. Calculate boundaries first, then strip uniformly

@chenjw chenjw merged commit 1efb9fb into volcengine:main Mar 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Long memory indexing should use first-class chunked vectorization instead of relying on single-record embedding

3 participants