Skip to content

[GTC] fix(tokenizer): bound pathological token counting#1036

Merged
esengine merged 1 commit into
esengine:mainfrom
GTC2080:GTC/bound-pathological-token-counting
May 17, 2026
Merged

[GTC] fix(tokenizer): bound pathological token counting#1036
esengine merged 1 commit into
esengine:mainfrom
GTC2080:GTC/bound-pathological-token-counting

Conversation

@GTC2080

@GTC2080 GTC2080 commented May 16, 2026

Copy link
Copy Markdown
Contributor

What

Adds bounded token counting for oversized text and routes the hot token-estimation paths through it: request estimation, context shrink ordering, /context breakdowns, desktop context telemetry, /cost, streaming token-rate estimates, and subagent distillation metrics. This keeps exact counting for normal-sized inputs while sampling pathological large strings.

Why

Fixes #558. A reproduction is a very large repetitive payload such as "A".repeat(100_000) flowing into token-counting paths; exact BPE counting can spend multiple seconds there before shrink/healing gets a chance to reduce the content. The bounded counter avoids that stall while preserving a proportional estimate for context budgeting and UI telemetry.

How to verify

npm run verify

Regression coverage added for countTokensBounded("A".repeat(100_000)) and for /context breakdown over a 100KB tool result, both asserting the pathological path stays under 1 second.

Checklist

  • npm run verify passes locally (lint + typecheck + tests + comment-policy gate)
  • No Co-Authored-By: Claude trailer in commits
  • Comments follow CONTRIBUTING.md (no module-essay headers, no incident history)
  • No edits to CHANGELOG.md — release notes are maintainer-written at release time

@GTC2080 GTC2080 marked this pull request as ready for review May 16, 2026 16:11
@esengine esengine merged commit 8fd3989 into esengine:main May 17, 2026
5 checks passed
@esengine

Copy link
Copy Markdown
Owner

Thanks @GTC2080 — clean fix and great test coverage on the perf bound. Merged.

ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
@GTC2080 GTC2080 deleted the GTC/bound-pathological-token-counting branch May 31, 2026 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit countTokens callers — add upper-char bound to defend non-MCP paths from pathological BPE input

2 participants