Record message boundaries during tokenization for checkpoint creation#16
Closed
aldehir wants to merge 10 commits into
Closed
Record message boundaries during tokenization for checkpoint creation#16aldehir wants to merge 10 commits into
aldehir wants to merge 10 commits into
Conversation
Record a sparse token-index -> role mapping during prompt tokenization (mirroring map_idx_to_media) instead of re-tokenizing a prefix to locate the last user message. Checkpoints are now created before every user message rather than only the last one. https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE
Spans are already ordered by position, so iterate them directly instead of copying and sorting. Move the has_mtmd assert in push_back into the media branch so it only fires when actually copying a media chunk. https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE
Extract the two role_map loops that compute the last/next user-message token boundaries into server_tokens::last_role_boundary and server_tokens::next_role_boundary, keeping the checkpoint logic in the slot loop focused on policy rather than map traversal.
…_prompts Collapse the three-way branch in handle_completions_impl into a single tokenize_input_prompts call that threads optional media files and message spans through tokenize_input_subprompt. A plain string prompt now routes through tokenize_input_prompt_with_spans (empty spans/files reduce it to plain tokenization), folding in the former process_mtmd_prompt and spans-only branches per the existing TODOs. Preserve today's behavior by flagging a prompt as multimodal only when media files are actually attached, so text-only prompts on a multimodal model keep has_mtmd=false (and remain eligible for prefix caching and speculative decoding).
Move the has_mtmd guard in push_back(server_tokens&) to a single top-level assertion (has_mtmd || source has no media) instead of re-checking inside the loop, and rename tokenize_input_prompt_with_spans to tokenize_spans to match the tokenize_mixed naming.
…ompt Move add_special/parse_special ahead of files/spans so the parameter order matches tokenize_input_subprompt.
In push_back(server_tokens&), copy media chunks via mtmd_input_chunk_copy and append the NULL placeholders directly instead of delegating to the chunk push_back overload. Remove two explanatory comments.
Revert the cosmetic assert-hoist and explicit chunk-copy changes; the b9192bf version already handled mtmd copies correctly by delegating to the chunk push_back overload.
Drop the _impl indirection; give process_mtmd_prompt an add_special parameter defaulting to true. tokenize_spans passes first && add_special. Keeps the original by-value signature (accepts the per-segment copies) for a smaller diff.
Replace the last_role_boundary/next_role_boundary helpers with one loop over the (ascending) role map at the call site, computing both the last user boundary and the next one after the batch start together. Removes the now-unused server_tokens methods.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR refactors how the server handles message boundaries in chat prompts to enable more efficient context reuse through checkpoints. Instead of computing the token index of the last user message after tokenization, message boundaries are now recorded during tokenization itself via a new
map_idx_to_rolemapping inserver_tokens.Key Changes
New message boundary tracking in
server_tokens:map_idx_to_rolemap to record the role of messages at their starting token indicesset_role_marker(),get_role_map(),last_role_boundary(), andnext_role_boundary()methods to query message boundariesclone()to preserve role markersRefactored tokenization pipeline:
process_mtmd_prompt_impl()to support both special-token-adding and non-adding pathstokenize_spans()function that tokenizes a rendered chat prompt while recording message boundaries from byte offsetstokenize_input_prompts()to accept optionalfilesandspansparameters for the OAI-compatible chat pathtokenize_spans()to capture message boundariesImproved
server_tokens::push_back()for media handling:Simplified checkpoint logic in
server_context_impl:n_before_userparameter with direct queries to role boundaries vialast_role_boundary()andnext_role_boundary()prompt_get_n_before_user()helper function (now redundant)n_before_userfield fromtask_paramsMessage span handling in completions:
tokenize_input_prompts()Benefits
Additional information
This change maintains backward compatibility for all prompt formats (arrays, mixed tokens/strings, multimodal objects) while improving the efficiency of the OAI-compatible chat path.
Requirements
https://claude.ai/code/session_01JDuY9p5h2bgpBpm5LpS7QN