Stop the constrained decoder at sentence boundaries#517
Conversation
The deterministic constrained decoder ran to its token budget even after finishing a sentence, so it could spill into the next one. After committing a token carrying an ASCII '.'/'!'/'?', it now decodes the accumulated bytes and stops when they end a sentence, reusing SentenceBoundaryClassifier so decimals, list numbers, single-letter initials, and abbreviations don't stop early. Constrained path only (the default-off developer flag); the shipping sampler path is unchanged. Pre-commit stop checks were folded into one helper to keep the decode loop under the cyclomatic-complexity budget.
| if Self.completesSentence(generatedBytes, lastTokenBytes: tokenBytes) { | ||
| stopReason = "sentence_boundary" | ||
| break | ||
| } |
There was a problem hiding this comment.
Token-split may produce unclosed quotation marks
completesSentence evaluates generatedBytes immediately after the token that carries ., !, or ? is appended. If the tokenizer emits a closing quote as a separate subsequent token (e.g. token N = "stop.", token N+1 = '"'), the check fires after token N when generatedBytes ends with stop. — endsSentence returns true, the loop breaks, and the closing " is never appended. The walk-back logic in endsSentence only helps when the closing punctuation is already in the buffer. For "He said \"Good morning.\"" split across two tokens this would yield He said "Good morning. in the output. Consider deferring the sentence-end break by one additional token when the current check fires, to allow a potential closing-quote token to land first.
| private static func completesSentence(_ generatedBytes: [UInt8], lastTokenBytes: [UInt8]) -> Bool { | ||
| guard lastTokenBytes.contains(where: isSentenceTerminatorByte) else { | ||
| return false | ||
| } | ||
| // swiftlint:disable:next optional_data_string_conversion | ||
| let decoded = String(decoding: generatedBytes, as: UTF8.self) | ||
| return SentenceBoundaryClassifier.endsSentence(decoded) | ||
| } |
There was a problem hiding this comment.
Full-buffer decode on every terminator token
String(decoding: generatedBytes, as: UTF8.self) re-decodes the entire accumulated byte array every time a token carries a terminator byte. For the intended short completions under a small maxPredictionTokens budget this is negligible, but if the budget grows or a sentence contains multiple terminator bytes (e.g. U.S. followed by a real period), this is called multiple times and scales with total completion length. A suffix-only decode — e.g. keeping a rolling decodedSoFar: String updated incrementally, or only decoding the last ~50 bytes — would keep per-token cost constant.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
Adds a sentence-boundary stop to the deterministic constrained decoder so a completion ends cleanly
at the end of one sentence instead of running into the next. After committing a token that carried an
ASCII sentence terminator, the decoder decodes the accumulated bytes and stops when they end a
sentence (a terminal
.,!, or?, allowing a trailing run of closing quotes/brackets). Theperiod case reuses the existing
SentenceBoundaryClassifierdisambiguation, so decimals ("1.2"),list numbers ("1."), single-letter initials ("U.S."), and abbreviations ("e.g.", "Mr.") do not stop
generation early.
This affects only the constrained decode path (the
cotabbyConstrainedDecoderEnableddeveloperflag, default off); the shipping sampler path is unchanged.
Validation
Linked issues
None. Brings the constrained decoder closer to the intended "predict a short continuation, stop at a
sentence boundary" behavior.
Risk / rollout notes
behavior. It is a building block toward making that path good enough to enable by default (which
still needs on-device evaluation).
./!/?, so the steady decode pathstays on raw byte accumulation with a single decode at the end. Non-ASCII terminators (CJK 。!?)
are not yet handled and fall through to the existing token-budget stop.
Greptile Summary
Adds sentence-boundary stopping to the constrained decoder (
cotabbyConstrainedDecoderEnableddev flag, default off). After each committed token, if its bytes contain an ASCII terminator (./!/?), the accumulated completion is decoded andSentenceBoundaryClassifier.endsSentenceis consulted; the loop breaks withstopReason = \"sentence_boundary\"when a true sentence end is detected. The PR also refactors the pre-commit EOS/newline checks into a singlepreCommitStopReasonhelper to stay under the cyclomatic-complexity budget.SentenceBoundaryClassifier.endsSentencewalks back past trailing whitespace and closing punctuation (quotes, brackets) before checking the terminal character, correctly delegating period disambiguation to the existingisTerminalPeriodlogic so decimals, initials, and abbreviations do not trigger early stops.Confidence Score: 4/5
Safe to merge — the new logic is gated behind a default-off developer flag and does not touch the shipped sampler path.
The SentenceBoundaryClassifier changes are thoroughly unit-tested and the walk-back logic is correct. The main behavioural gap is in LlamaRuntimeCore: when a tokenizer emits a sentence terminator and a closing quotation mark in consecutive separate tokens, the sentence-boundary check fires after the terminator token and the closing quote is never appended, producing an unclosed quote. This is specific to tokenizer splitting behaviour and affects only the dev-flag constrained path, but it is a real output-quality issue worth addressing before the path is enabled by default.
The token-split edge case is isolated to LlamaRuntimeCore.swift around the completesSentence call site at line 381.
Important Files Changed
Sequence Diagram
sequenceDiagram participant DecodeLoop as "Decode Loop" participant PreCommit as "preCommitStopReason" participant Accept as "engine.acceptToken" participant CS as "completesSentence" participant SBC as "SentenceBoundaryClassifier" DecodeLoop->>PreCommit: tokenID, options, profile PreCommit-->>DecodeLoop: nil or eos/single_line (break) DecodeLoop->>Accept: acceptToken(sequenceID, tokenID) Accept-->>DecodeLoop: ok or error (break) DecodeLoop->>CS: completesSentence(generatedBytes, lastTokenBytes) CS->>CS: guard lastTokenBytes contains terminator byte CS->>SBC: endsSentence(String decoding generatedBytes) SBC->>SBC: skip trailing whitespace SBC->>SBC: skip closing punctuation SBC->>SBC: inspect last visible character SBC-->>CS: true or false CS-->>DecodeLoop: true means stopReason sentence_boundary and breakReviews (1): Last reviewed commit: "Stop the constrained decoder at sentence..." | Re-trigger Greptile