Skip to content

Stop the constrained decoder at sentence boundaries#517

Merged
FuJacob merged 1 commit into
mainfrom
feat/constrained-sentence-stop
Jun 2, 2026
Merged

Stop the constrained decoder at sentence boundaries#517
FuJacob merged 1 commit into
mainfrom
feat/constrained-sentence-stop

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a sentence-boundary stop to the deterministic constrained decoder so a completion ends cleanly
at the end of one sentence instead of running into the next. After committing a token that carried an
ASCII sentence terminator, the decoder decodes the accumulated bytes and stops when they end a
sentence (a terminal ., !, or ?, allowing a trailing run of closing quotes/brackets). The
period case reuses the existing SentenceBoundaryClassifier disambiguation, so decimals ("1.2"),
list numbers ("1."), single-letter initials ("U.S."), and abbreviations ("e.g.", "Mr.") do not stop
generation early.

This affects only the constrained decode path (the cotabbyConstrainedDecoderEnabled developer
flag, default off); the shipping sampler path is unchanged.

Validation

xcodebuild ... test ... CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO \
  -only-testing:CotabbyTests/SentenceBoundaryClassifierTests \
  -only-testing:CotabbyTests/SuggestionSessionReconcilerTests
# ** TEST SUCCEEDED ** — incl. 8 new endsSentence cases (terminal vs abbreviation/decimal/initial,
#   closing-punctuation walk-back, trailing whitespace, empty string)

swiftlint --strict --quiet   # exit 0 (kept runConstrainedDecode under the cyclomatic-complexity
                             # budget by folding the pre-commit stop checks into one helper)

Linked issues

None. Brings the constrained decoder closer to the intended "predict a short continuation, stop at a
sentence boundary" behavior.

Risk / rollout notes

  • Constrained path only, which is default-off, so there is no change to shipped suggestion
    behavior. It is a building block toward making that path good enough to enable by default (which
    still needs on-device evaluation).
  • The boundary decode runs only when a token carried an ASCII ./!/?, so the steady decode path
    stays on raw byte accumulation with a single decode at the end. Non-ASCII terminators (CJK 。!?)
    are not yet handled and fall through to the existing token-budget stop.

Greptile Summary

Adds sentence-boundary stopping to the constrained decoder (cotabbyConstrainedDecoderEnabled dev flag, default off). After each committed token, if its bytes contain an ASCII terminator (./!/?), the accumulated completion is decoded and SentenceBoundaryClassifier.endsSentence is consulted; the loop breaks with stopReason = \"sentence_boundary\" when a true sentence end is detected. The PR also refactors the pre-commit EOS/newline checks into a single preCommitStopReason helper to stay under the cyclomatic-complexity budget.

  • SentenceBoundaryClassifier.endsSentence walks back past trailing whitespace and closing punctuation (quotes, brackets) before checking the terminal character, correctly delegating period disambiguation to the existing isTerminalPeriod logic so decimals, initials, and abbreviations do not trigger early stops.
  • The constrained path is guarded by a developer flag and is not active in shipped builds, so there is no impact on production suggestion behaviour.

Confidence Score: 4/5

Safe to merge — the new logic is gated behind a default-off developer flag and does not touch the shipped sampler path.

The SentenceBoundaryClassifier changes are thoroughly unit-tested and the walk-back logic is correct. The main behavioural gap is in LlamaRuntimeCore: when a tokenizer emits a sentence terminator and a closing quotation mark in consecutive separate tokens, the sentence-boundary check fires after the terminator token and the closing quote is never appended, producing an unclosed quote. This is specific to tokenizer splitting behaviour and affects only the dev-flag constrained path, but it is a real output-quality issue worth addressing before the path is enabled by default.

The token-split edge case is isolated to LlamaRuntimeCore.swift around the completesSentence call site at line 381.

Important Files Changed

Filename Overview
Cotabby/Services/Runtime/LlamaRuntimeCore.swift Adds completesSentence, preCommitStopReason, and isSentenceTerminatorByte helpers to stop the constrained decode loop at sentence boundaries; refactors pre-commit EOS/newline checks into a single helper. Logic is correct but has a token-boundary edge case when . and a trailing closing quote arrive in separate tokens.
Cotabby/Support/SentenceBoundaryClassifier.swift Adds endsSentence and the isSentenceClosingPunctuation Character extension. Walk-back logic (whitespace → closing punctuation → terminator) is sound, and isTerminalPeriod delegation correctly handles decimals, initials, and abbreviations.
CotabbyTests/SentenceBoundaryClassifierTests.swift Adds 8 well-chosen endsSentence unit tests covering terminal vs. non-terminal periods, !/?, trailing whitespace, closing-punctuation walk-back, and empty string. All expected behaviours are exercised.

Sequence Diagram

sequenceDiagram
    participant DecodeLoop as "Decode Loop"
    participant PreCommit as "preCommitStopReason"
    participant Accept as "engine.acceptToken"
    participant CS as "completesSentence"
    participant SBC as "SentenceBoundaryClassifier"

    DecodeLoop->>PreCommit: tokenID, options, profile
    PreCommit-->>DecodeLoop: nil or eos/single_line (break)
    DecodeLoop->>Accept: acceptToken(sequenceID, tokenID)
    Accept-->>DecodeLoop: ok or error (break)
    DecodeLoop->>CS: completesSentence(generatedBytes, lastTokenBytes)
    CS->>CS: guard lastTokenBytes contains terminator byte
    CS->>SBC: endsSentence(String decoding generatedBytes)
    SBC->>SBC: skip trailing whitespace
    SBC->>SBC: skip closing punctuation
    SBC->>SBC: inspect last visible character
    SBC-->>CS: true or false
    CS-->>DecodeLoop: true means stopReason sentence_boundary and break
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Stop the constrained decoder at sentence..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

The deterministic constrained decoder ran to its token budget even after finishing a sentence, so it could spill into the next one. After committing a token carrying an ASCII '.'/'!'/'?', it now decodes the accumulated bytes and stops when they end a sentence, reusing SentenceBoundaryClassifier so decimals, list numbers, single-letter initials, and abbreviations don't stop early.

Constrained path only (the default-off developer flag); the shipping sampler path is unchanged. Pre-commit stop checks were folded into one helper to keep the decode loop under the cyclomatic-complexity budget.
@FuJacob FuJacob merged commit 562bd98 into main Jun 2, 2026
4 checks passed
@FuJacob FuJacob deleted the feat/constrained-sentence-stop branch June 2, 2026 01:07
Comment on lines +381 to +384
if Self.completesSentence(generatedBytes, lastTokenBytes: tokenBytes) {
stopReason = "sentence_boundary"
break
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Token-split may produce unclosed quotation marks

completesSentence evaluates generatedBytes immediately after the token that carries ., !, or ? is appended. If the tokenizer emits a closing quote as a separate subsequent token (e.g. token N = "stop.", token N+1 = '"'), the check fires after token N when generatedBytes ends with stop.endsSentence returns true, the loop breaks, and the closing " is never appended. The walk-back logic in endsSentence only helps when the closing punctuation is already in the buffer. For "He said \"Good morning.\"" split across two tokens this would yield He said "Good morning. in the output. Consider deferring the sentence-end break by one additional token when the current check fires, to allow a potential closing-quote token to land first.

Fix in Codex Fix in Claude Code

Comment on lines +240 to +247
private static func completesSentence(_ generatedBytes: [UInt8], lastTokenBytes: [UInt8]) -> Bool {
guard lastTokenBytes.contains(where: isSentenceTerminatorByte) else {
return false
}
// swiftlint:disable:next optional_data_string_conversion
let decoded = String(decoding: generatedBytes, as: UTF8.self)
return SentenceBoundaryClassifier.endsSentence(decoded)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full-buffer decode on every terminator token

String(decoding: generatedBytes, as: UTF8.self) re-decodes the entire accumulated byte array every time a token carries a terminator byte. For the intended short completions under a small maxPredictionTokens budget this is negligible, but if the budget grows or a sentence contains multiple terminator bytes (e.g. U.S. followed by a real period), this is called multiple times and scales with total completion length. A suffix-only decode — e.g. keeping a rolling decodedSoFar: String updated incrementally, or only decoding the last ~50 bytes — would keep per-token cost constant.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant