Skip to content

Record message boundaries during tokenization for checkpoint creation#16

Closed
aldehir wants to merge 10 commits into
masterfrom
claude/checkpoint-message-span-tokens-XAStE
Closed

Record message boundaries during tokenization for checkpoint creation#16
aldehir wants to merge 10 commits into
masterfrom
claude/checkpoint-message-span-tokens-XAStE

Conversation

@aldehir

@aldehir aldehir commented May 28, 2026

Copy link
Copy Markdown
Owner

Overview

This PR refactors how the server handles message boundaries in chat prompts to enable more efficient context reuse through checkpoints. Instead of computing the token index of the last user message after tokenization, message boundaries are now recorded during tokenization itself via a new map_idx_to_role mapping in server_tokens.

Key Changes

  1. New message boundary tracking in server_tokens:

    • Added map_idx_to_role map to record the role of messages at their starting token indices
    • Added set_role_marker(), get_role_map(), last_role_boundary(), and next_role_boundary() methods to query message boundaries
    • Updated clone() to preserve role markers
  2. Refactored tokenization pipeline:

    • Extracted process_mtmd_prompt_impl() to support both special-token-adding and non-adding paths
    • Added new tokenize_spans() function that tokenizes a rendered chat prompt while recording message boundaries from byte offsets
    • Updated tokenize_input_prompts() to accept optional files and spans parameters for the OAI-compatible chat path
    • String prompts now route through tokenize_spans() to capture message boundaries
  3. Improved server_tokens::push_back() for media handling:

    • Refactored to properly handle media chunks when merging token sequences
    • Now correctly advances through media chunks and their associated token placeholders
  4. Simplified checkpoint logic in server_context_impl:

    • Replaced n_before_user parameter with direct queries to role boundaries via last_role_boundary() and next_role_boundary()
    • Removed prompt_get_n_before_user() helper function (now redundant)
    • Checkpoints are created before the next user message boundary instead of before the last user message
    • Removed n_before_user field from task_params
  5. Message span handling in completions:

    • Message spans (byte offsets into the rendered prompt) are now passed directly to tokenize_input_prompts()
    • Role markers are recorded during tokenization rather than computed afterward

Benefits

  • More efficient: Message boundaries are computed once during tokenization rather than in a separate pass
  • Cleaner separation of concerns: Tokenization handles both text and metadata (role markers)
  • Better support for multimodal prompts: Media files and message boundaries are processed together
  • Enables more granular checkpoint creation: Can create checkpoints at any message boundary, not just before the last user message

Additional information

This change maintains backward compatibility for all prompt formats (arrays, mixed tokens/strings, multimodal objects) while improving the efficiency of the OAI-compatible chat path.

Requirements

https://claude.ai/code/session_01JDuY9p5h2bgpBpm5LpS7QN

claude added 5 commits May 28, 2026 17:48
Record a sparse token-index -> role mapping during prompt tokenization
(mirroring map_idx_to_media) instead of re-tokenizing a prefix to locate
the last user message. Checkpoints are now created before every user
message rather than only the last one.

https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE
Spans are already ordered by position, so iterate them directly instead
of copying and sorting. Move the has_mtmd assert in push_back into the
media branch so it only fires when actually copying a media chunk.

https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE
Extract the two role_map loops that compute the last/next user-message
token boundaries into server_tokens::last_role_boundary and
server_tokens::next_role_boundary, keeping the checkpoint logic in the
slot loop focused on policy rather than map traversal.
…_prompts

Collapse the three-way branch in handle_completions_impl into a single
tokenize_input_prompts call that threads optional media files and message
spans through tokenize_input_subprompt. A plain string prompt now routes
through tokenize_input_prompt_with_spans (empty spans/files reduce it to
plain tokenization), folding in the former process_mtmd_prompt and
spans-only branches per the existing TODOs.

Preserve today's behavior by flagging a prompt as multimodal only when
media files are actually attached, so text-only prompts on a multimodal
model keep has_mtmd=false (and remain eligible for prefix caching and
speculative decoding).
Move the has_mtmd guard in push_back(server_tokens&) to a single
top-level assertion (has_mtmd || source has no media) instead of
re-checking inside the loop, and rename tokenize_input_prompt_with_spans
to tokenize_spans to match the tokenize_mixed naming.
claude added 5 commits May 28, 2026 21:24
…ompt

Move add_special/parse_special ahead of files/spans so the parameter
order matches tokenize_input_subprompt.
In push_back(server_tokens&), copy media chunks via mtmd_input_chunk_copy
and append the NULL placeholders directly instead of delegating to the
chunk push_back overload. Remove two explanatory comments.
Revert the cosmetic assert-hoist and explicit chunk-copy changes; the
b9192bf version already handled mtmd copies correctly by delegating to
the chunk push_back overload.
Drop the _impl indirection; give process_mtmd_prompt an add_special
parameter defaulting to true. tokenize_spans passes first && add_special.
Keeps the original by-value signature (accepts the per-segment copies)
for a smaller diff.
Replace the last_role_boundary/next_role_boundary helpers with one loop
over the (ascending) role map at the call site, computing both the last
user boundary and the next one after the batch start together. Removes
the now-unused server_tokens methods.
@aldehir aldehir closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants