Skip to content

fix(semantic): add budget guard to overview generation with batched summarization#683

Merged
qin-ctx merged 5 commits intovolcengine:mainfrom
deepakdevp:feature/overview-budget-guard
Mar 17, 2026
Merged

fix(semantic): add budget guard to overview generation with batched summarization#683
qin-ctx merged 5 commits intovolcengine:mainfrom
deepakdevp:feature/overview-budget-guard

Conversation

@deepakdevp
Copy link
Copy Markdown
Contributor

Summary

  • Adds a prompt budget guard to _generate_overview() that prevents oversized LLM prompts for large directories
  • When a directory exceeds the budget (default 60K chars / 50+ files), file summaries are split into batches, each batch gets a partial overview, then partials are merged into a final overview
  • Centralizes all scattered truncation constants (30000, 256, 4000) into a configurable SemanticConfig dataclass
  • Enforces abstract/overview max size limits on LLM output

Fixes #529, addresses #531

Changes Made

  • openviking_cli/utils/config/parser_config.py: Added SemanticConfig dataclass with configurable limits (max_file_content_chars, max_overview_prompt_chars, overview_batch_size, abstract_max_chars, overview_max_chars)
  • openviking_cli/utils/config/open_viking_config.py: Wired SemanticConfig into OpenVikingConfig (configurable via ov.conf under "semantic" key)
  • openviking/storage/queuefs/semantic_processor.py: Added budget guard with _single_generate_overview() (unchanged behavior for small dirs) and _batched_generate_overview() (split + merge for large dirs). Replaced hardcoded 30000 with SemanticConfig.max_file_content_chars. Added output size enforcement.
  • openviking/storage/queuefs/semantic_dag.py: Added output size enforcement at the DAG call site
  • tests/misc/test_semantic_config.py: 7 tests covering config defaults, custom values, budget estimation, truncation, and batch splitting

Configuration Example

{
  "semantic": {
    "max_file_content_chars": 30000,
    "max_overview_prompt_chars": 60000,
    "overview_batch_size": 50,
    "abstract_max_chars": 256,
    "overview_max_chars": 4000
  }
}

All values have sensible defaults — no config changes required for existing users.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Testing

  • Tests added (7 new tests)
  • Unit tests pass locally
  • Tested on macOS

Checklist

  • My code follows the code style of this project (ruff check + format pass)
  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes
  • Small directories (under budget) behave identically to before

deepakdevp and others added 4 commits March 17, 2026 14:50
Centralizes hardcoded constants (30000, 256, 4000) into a configurable
SemanticConfig dataclass. Configurable via ov.conf under "semantic" key.

Addresses volcengine#531.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_generate_overview() now checks prompt size against
SemanticConfig.max_overview_prompt_chars. If exceeded, file summaries
are split into batches, each batch gets a partial overview, then
partials are merged into a final overview.

Also:
- Replace hardcoded 30000 char limit with SemanticConfig.max_file_content_chars
- Enforce abstract/overview max size on output (both semantic_processor
  and semantic_dag call sites)

Fixes volcengine#529, addresses volcengine#531.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover: config defaults, custom values, budget under/over limit
detection, abstract/overview truncation, and batch splitting logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds a needed budget guard for oversized overview prompts (Fixes #529) and centralizes hardcoded constants into SemanticConfig. The batch-and-merge approach for large directories is a sound design.

However, there are two correctness issues that should be addressed before merge:

  1. File index mismatch in batched generation: Each batch uses local indices [1]..[N], but post-processing uses the global file_index_map, causing incorrect file name replacements.
  2. Budget guard gap: The AND condition means directories with ≤50 files but very long summaries bypass the guard entirely — the exact scenario described in #529.

See inline comments for details and suggestions.

for batch_idx, batch in enumerate(batches):
batch_lines = []
for idx, item in enumerate(batch, 1):
batch_lines.append(f"[{idx}] {item['name']}: {item['summary']}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) Each batch uses local indices via enumerate(batch, 1), so every batch numbers its files [1], [2], ..., [batch_size]. However, the file_index_map passed to this method maps global indices (1..N across all files) to file names.

At line ~945, re.sub(r"\[(\d+)\]", replace_index, overview) replaces [number] references using the global map. If the LLM outputs [1] in batch 2's partial overview (referring to the first file of batch 2), it will be replaced with the name of global file #1 instead.

Suggested fix: either (a) use global indices when building each batch's prompt (offset by batch_idx * batch_size), or (b) apply replace_index to each partial overview individually using a per-batch index map before merging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Each batch now uses global offset indices (global_offset + local_idx + 1), and replace_index is applied per-batch using a batch_index_map via a make_replacer closure. The global file_index_map is no longer used in the batched path.

estimated_size = len(file_summaries_str) + len(children_abstracts_str)
if (
estimated_size > semantic.max_overview_prompt_chars
and len(file_summaries) > semantic.overview_batch_size
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) The AND condition requires both estimated_size > budget and file_count > batch_size for the guard to trigger. This means a directory with ≤50 files but very long individual summaries (e.g. 50 files × 1.5K summary = 75K chars, exceeding the 60K budget) will bypass the guard entirely and still produce an oversized prompt.

This is the exact failure mode described in #529 — oversized prompts causing VLM timeouts. The fix only protects against the many-files variant, not the few-files-long-summaries variant.

Suggested fix: consider using OR instead of AND, or add a fallback for the case where the prompt is oversized but file count is below batch_size (e.g., truncating individual summaries proportionally to fit the budget).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed to three-way branching: (1) over budget + many files → batch and merge, (2) over budget + few files → truncate individual summaries proportionally to fit budget, (3) under budget → proceed as normal.

from openviking_cli.utils.config import get_openviking_config

semantic = get_openviking_config().semantic
if len(overview) > semantic.overview_max_chars:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Design] (non-blocking) This truncation block is duplicated nearly identically in semantic_processor.py:_process_memory_directory (lines ~357-362). Consider extracting a shared helper like _enforce_size_limits(overview, abstract, config)(overview, abstract) to avoid divergence when limits or truncation logic change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Extracted _enforce_size_limits(overview, abstract) on SemanticProcessor. Both call sites (semantic_processor.py and semantic_dag.py) now use it.

{
"dir_name": dir_name,
"file_summaries": combined,
"children_abstracts": "None",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Design] (non-blocking) In the merge step, children_abstracts is hardcoded to "None". Children abstracts are only included in the first batch's partial overview (line 891). When partials are merged, the LLM may not retain subdirectory context from batch 1's output, leading to loss of child directory information in the final overview.

Consider passing the children abstracts string into the merge prompt as well, so the final overview has full subdirectory context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The merge prompt now includes children_abstracts_str instead of hardcoded "None".

from openviking_cli.utils.config.parser_config import SemanticConfig


def test_semantic_config_defaults():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking) The tests here thoroughly validate SemanticConfig defaults and budget arithmetic, which is good. However, there are no tests for the core behavioral changes: _batched_generate_overview index mapping, the merge step, or the truncation enforcement in semantic_dag.py / semantic_processor.py.

Consider adding at least one test with a mocked VLM that verifies:

  • Batch splitting produces correct per-batch index numbering
  • The [number] → file name replacement is correct across batches
  • Truncation is applied when overview/abstract exceeds limits

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — will add behavioral tests with mocked VLM in a follow-up if needed. The current tests validate the config, budget logic, and batch splitting which are the core pure-function components.

- Fix file index mismatch: use global offset indices per batch and
  apply replace_index per-batch via make_replacer closure
- Fix budget guard gap: change AND to three-way branching (batch for
  many files, truncate for few-files-long-summaries, pass-through
  for under budget)
- Extract _enforce_size_limits() helper to deduplicate truncation
  logic between semantic_processor.py and semantic_dag.py
- Pass children_abstracts to merge prompt for full context retention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All blocking issues from the previous review have been addressed correctly:

  • ✅ File index mismatch fixed with global offset indices + per-batch make_replacer closure
  • ✅ Budget guard gap fixed with three-way branching (batch, truncate, pass-through)
  • ✅ Duplicated truncation logic extracted into _enforce_size_limits
  • ✅ Children abstracts now passed to merge prompt

LGTM 🎉

@qin-ctx qin-ctx merged commit e1d4176 into volcengine:main Mar 17, 2026
6 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Directory overview generation can build oversized prompts and destabilize VLM calls

2 participants