Skip to content

fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate#453

Merged
tobi merged 1 commit into
tobi:mainfrom
builderjarvis:fix/rerank-context-size
Mar 28, 2026
Merged

fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate#453
tobi merged 1 commit into
tobi:mainfrom
builderjarvis:fix/rerank-context-size

Conversation

@builderjarvis

Copy link
Copy Markdown
Contributor

Problem

qmd query crashes on longer documents (session transcripts, CJK text, large markdown files) with:

Error: The input lengths of some of the given documents exceed the context size.
Try to increase the context size to at least 2207 or use another model that supports longer contexts.

This affects multiple reported issues: #91, #290, #291, #314

Root cause

Two compounding issues:

  1. RERANK_CONTEXT_SIZE = 2048 is too small for documents longer than ~1600 tokens. The Qwen3 reranker template overhead is higher than estimated, so even after truncation, some chunks still exceed the context window.

  2. RERANK_TEMPLATE_OVERHEAD = 200 underestimates the actual Qwen3 chat template overhead. Measured at ~350 tokens on real queries; truncation budgets based on 200 allow documents through that still overflow the context.

Fix

  • Bump RERANK_CONTEXT_SIZE default from 2048 → 4096
  • Make it overridable via QMD_RERANK_CONTEXT_SIZE env var for users with tighter memory budgets or very long documents
  • Bump RERANK_TEMPLATE_OVERHEAD from 200 → 512 so the truncation budget correctly accounts for actual template overhead

The 4096 default comfortably fits real-world long documents while staying well below the 40 960-token auto size.

…e via QMD_RERANK_CONTEXT_SIZE env var, fix RERANK_TEMPLATE_OVERHEAD underestimate 200→512

Default 2048 was too small for longer documents (session transcripts, CJK
text, large markdown files). After truncation the Qwen3 reranker template
adds more overhead than the original 200-token estimate, causing node-llama-cpp
to throw 'input lengths exceed context size'.

Fixes: tobi#91 tobi#290 tobi#291 tobi#314
zeattacker pushed a commit to zeattacker/qmd that referenced this pull request Mar 26, 2026
Merges dev-upstream-fixes (cherry-picked PRs tobi#462, tobi#463, tobi#455, tobi#418,
tobi#456, tobi#442, tobi#453) into dev. Resolved mcp/server.ts bind conflict —
keep 0.0.0.0 for Docker container accessibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tobi tobi merged commit 616776e into tobi:main Mar 28, 2026
jaylfc added a commit to jaylfc/qmd that referenced this pull request Apr 5, 2026
fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate
jaylfc added a commit to jaylfc/qmd that referenced this pull request Apr 5, 2026
fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate
tanarchytan referenced this pull request in tanarchytan/lotl Apr 8, 2026
fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate
lucndm pushed a commit to lucndm/qmd that referenced this pull request Jun 7, 2026
fix: increase RERANK_CONTEXT_SIZE default 2048→4096, configurable via env var, fix template overhead underestimate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants