Record message boundaries during tokenization for checkpoint creation by aldehir · Pull Request #16 · aldehir/llama.cpp

aldehir · 2026-05-28T21:02:45Z

Overview

This PR refactors how the server handles message boundaries in chat prompts to enable more efficient context reuse through checkpoints. Instead of computing the token index of the last user message after tokenization, message boundaries are now recorded during tokenization itself via a new map_idx_to_role mapping in server_tokens.

Key Changes

New message boundary tracking in server_tokens:
- Added map_idx_to_role map to record the role of messages at their starting token indices
- Added set_role_marker(), get_role_map(), last_role_boundary(), and next_role_boundary() methods to query message boundaries
- Updated clone() to preserve role markers
Refactored tokenization pipeline:
- Extracted process_mtmd_prompt_impl() to support both special-token-adding and non-adding paths
- Added new tokenize_spans() function that tokenizes a rendered chat prompt while recording message boundaries from byte offsets
- Updated tokenize_input_prompts() to accept optional files and spans parameters for the OAI-compatible chat path
- String prompts now route through tokenize_spans() to capture message boundaries
Improved server_tokens::push_back() for media handling:
- Refactored to properly handle media chunks when merging token sequences
- Now correctly advances through media chunks and their associated token placeholders
Simplified checkpoint logic in server_context_impl:
- Replaced n_before_user parameter with direct queries to role boundaries via last_role_boundary() and next_role_boundary()
- Removed prompt_get_n_before_user() helper function (now redundant)
- Checkpoints are created before the next user message boundary instead of before the last user message
- Removed n_before_user field from task_params
Message span handling in completions:
- Message spans (byte offsets into the rendered prompt) are now passed directly to tokenize_input_prompts()
- Role markers are recorded during tokenization rather than computed afterward

Benefits

More efficient: Message boundaries are computed once during tokenization rather than in a separate pass
Cleaner separation of concerns: Tokenization handles both text and metadata (role markers)
Better support for multimodal prompts: Media files and message boundaries are processed together
Enables more granular checkpoint creation: Can create checkpoints at any message boundary, not just before the last user message

Additional information

This change maintains backward compatibility for all prompt formats (arrays, mixed tokens/strings, multimodal objects) while improving the efficiency of the OAI-compatible chat path.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

https://claude.ai/code/session_01JDuY9p5h2bgpBpm5LpS7QN

Record a sparse token-index -> role mapping during prompt tokenization (mirroring map_idx_to_media) instead of re-tokenizing a prefix to locate the last user message. Checkpoints are now created before every user message rather than only the last one. https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE

Spans are already ordered by position, so iterate them directly instead of copying and sorting. Move the has_mtmd assert in push_back into the media branch so it only fires when actually copying a media chunk. https://claude.ai/code/session_01ER9K2qxFRD8YvyFgqgWWEE

Extract the two role_map loops that compute the last/next user-message token boundaries into server_tokens::last_role_boundary and server_tokens::next_role_boundary, keeping the checkpoint logic in the slot loop focused on policy rather than map traversal.

…_prompts Collapse the three-way branch in handle_completions_impl into a single tokenize_input_prompts call that threads optional media files and message spans through tokenize_input_subprompt. A plain string prompt now routes through tokenize_input_prompt_with_spans (empty spans/files reduce it to plain tokenization), folding in the former process_mtmd_prompt and spans-only branches per the existing TODOs. Preserve today's behavior by flagging a prompt as multimodal only when media files are actually attached, so text-only prompts on a multimodal model keep has_mtmd=false (and remain eligible for prefix caching and speculative decoding).

Move the has_mtmd guard in push_back(server_tokens&) to a single top-level assertion (has_mtmd || source has no media) instead of re-checking inside the loop, and rename tokenize_input_prompt_with_spans to tokenize_spans to match the tokenize_mixed naming.

…ompt Move add_special/parse_special ahead of files/spans so the parameter order matches tokenize_input_subprompt.

In push_back(server_tokens&), copy media chunks via mtmd_input_chunk_copy and append the NULL placeholders directly instead of delegating to the chunk push_back overload. Remove two explanatory comments.

Revert the cosmetic assert-hoist and explicit chunk-copy changes; the b9192bf version already handled mtmd copies correctly by delegating to the chunk push_back overload.

Drop the _impl indirection; give process_mtmd_prompt an add_special parameter defaulting to true. tokenize_spans passes first && add_special. Keeps the original by-value signature (accepts the per-segment copies) for a smaller diff.

Replace the last_role_boundary/next_role_boundary helpers with one loop over the (ascending) role map at the call site, computing both the last user boundary and the next one after the batch start together. Removes the now-unused server_tokens methods.

claude added 5 commits May 28, 2026 17:48

github-actions Bot added examples server labels May 28, 2026

claude added 5 commits May 28, 2026 21:24

server: align tokenize_spans argument order with tokenize_input_subpr…

af22c55

…ompt Move add_special/parse_special ahead of files/spans so the parameter order matches tokenize_input_subprompt.

server: copy mtmd chunks explicitly in push_back, drop stray comments

612de86

In push_back(server_tokens&), copy media chunks via mtmd_input_chunk_copy and append the NULL placeholders directly instead of delegating to the chunk push_back overload. Remove two explanatory comments.

server: restore push_back(server_tokens&) to its prior form

5af8688

Revert the cosmetic assert-hoist and explicit chunk-copy changes; the b9192bf version already handled mtmd copies correctly by delegating to the chunk push_back overload.

aldehir closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record message boundaries during tokenization for checkpoint creation#16

Record message boundaries during tokenization for checkpoint creation#16
aldehir wants to merge 10 commits into
masterfrom
claude/checkpoint-message-span-tokens-XAStE

aldehir commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aldehir commented May 28, 2026

Overview

Key Changes

Benefits

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants