Skip to content

v0.6.3.0 refactor(llm-client): multi-turn ChatMessage[] API (#149)#152

Merged
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/llm-multiturn
May 5, 2026
Merged

v0.6.3.0 refactor(llm-client): multi-turn ChatMessage[] API (#149)#152
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/llm-multiturn

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Closes #149. `LlmClient.generate` and `generateStream` now accept `string | ChatMessage[]`. Each provider translates the array to its native chat-completion shape. The `User:` / `Assistant:` prompt-flattening workaround in `@skytwin/assistant` is gone — `reply()` and `replyStream()` pass the conversation history directly.

Pure refactor — no new user-visible feature — but unblocks #148 (action-intent routing) and removes a comment-laden workaround that's been load-bearing since assistant phase 1.

Backward-compatible: existing string callers (decision-engine's LLM strategies, every provider integration test, anything outside this monorepo importing `@skytwin/llm-client`) work unchanged.

What landed

`@skytwin/llm-client` public API

```ts
type ChatMessage = { role: 'system' | 'user' | 'assistant'; content: string };

LlmClient.generate(prompt: string | ChatMessage[], options?): Promise
LlmClient.generateStream(prompt: string | ChatMessage[], options?): AsyncIterable

// Helpers (also re-exported)
toMessages(input: string | ChatMessage[]): ChatMessage[]
splitSystemAndConversation(messages, fallbackSystem?): { system, conversation }
```

Provider translations

Provider Before After
Anthropic `messages: [{role: 'user', content: }]` + `system` top-level Pass-through `messages` array; system-role messages hoisted to the `system` field
OpenAI Hardcoded system + user pair Pass-through `messages`; falls back to `options.systemPrompt` only when no inline system message
Google/Gemini Fake `user: ` + `model: "Understood."` pair to emulate system Native `system_instruction` field; assistant role correctly translated to `'model'`. Saves tokens AND removes a drift hazard
Ollama `/api/generate` with `systemPrompt + "\n\n" + prompt` flattened `/api/chat` with native `messages` array. Both endpoints exist on every modern Ollama server. Response parsing moved from `{response}` to `{message: {content}}`

`AssistantService` cleanup

  • `reply()` and `replyStream()` now pass the trimmed `ChatTurn[]` directly to `LlmClient.generate` / `generateStream`. `ChatTurn` and `ChatMessage` are structurally identical, so the change is just dropping the `formatHistoryAsPrompt` call.
  • New private `composeSystemPrompt(enrichment?)` helper shared between `reply()` and `replyStream()` so the two paths cannot drift on the prepend-context-block step.
  • `formatHistoryAsPrompt` stays exported for back-compat (no known external callers; plan to remove on the next major bump).

Inline-system precedence

`splitSystemAndConversation` makes inline system messages WIN over `options.systemPrompt`. This matters: the assistant package injects its enrichment context (twin profile + memories) as a system turn at the head of the array; the route also passes the default system prompt via `options.systemPrompt`. Without this precedence, generic instructions would silently override the personalized context — exactly the regression Phase 2b was supposed to prevent.

Safety / invariants

  • API surface stable — string callers untouched; the new array path is opt-in.
  • Inline system precedence asserted in tests for both Anthropic + OpenAI.
  • Provider response parsing verified per-provider in the new tests.
  • Decision-engine LLM strategies (LlmCandidateGenerator, LlmSituationStrategy) NOT migrated — their PromptBuilder-built strings work as-is, and changing them would conflate this refactor with their own re-shaping. Tracked separately if it ever becomes valuable.

Test plan

  • `pnpm build` — 20 packages green
  • `pnpm test` — 40 packages green; 28 new tests + 1 updated
  • `pnpm lint` — clean
  • Manual: configure each provider in turn, send a multi-turn assistant conversation, verify the model responds with conversational coherence (vs. treating each message as standalone)
  • Manual: pull the worker log, confirm the LLM request bodies have the right shape per-provider (no fake `Understood.` from Gemini, no `User:` / `Assistant:` flattening from Anthropic)

Phase 2 — last remaining work

🤖 Generated with Claude Code

Closes #149. LlmClient.generate and generateStream now accept
string | ChatMessage[]. Each provider translates the array to its
native chat-completion shape:

- Anthropic: pass-through messages, system messages hoisted to top-level
- OpenAI: pass-through messages, no-duplicate-system guard
- Google/Gemini: native system_instruction (no fake user/model pair),
  assistant role correctly translated to 'model'
- Ollama: switched from /api/generate (concat prompt) to /api/chat
  (native messages array); response shape moved from {response} to
  {message: {content}}

AssistantService.reply and replyStream now pass ChatTurn[] directly
to LlmClient — formatHistoryAsPrompt is no longer in the main path.
Kept exported as legacy for back-compat. New private composeSystemPrompt
helper shared between sync + streaming paths so they cannot drift.

New @skytwin/llm-client public API:
- ChatMessage type
- toMessages helper (string -> [{role:user, content}])
- splitSystemAndConversation (peels system messages, fallback to
  options.systemPrompt when none inline; inline wins so the assistant
  context block is never overridden)

Backward-compatible: existing string callers (decision-engine LLM
strategies, every provider integration test) work unchanged.

Tests: 28 new (7 helpers + 15 per-provider request-body shape +
1 updated for the assistant history-cap test). Full suite green
across 40 packages; lint clean.

Unblocks #148 (action-intent routing) — the intent classifier can now
look at structured turns instead of regexing a flattened prompt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 19:46

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors @skytwin/llm-client to support native multi-turn chat by allowing LlmClient.generate() / generateStream() to accept string | ChatMessage[], updating each provider to translate message arrays into its native wire format, and updating @skytwin/assistant to pass trimmed conversation history directly (dropping prompt-flattening in the main path).

Changes:

  • Extend LlmClient + provider function signatures to accept string | ChatMessage[], with shared helpers (toMessages, splitSystemAndConversation) and new translation tests.
  • Update providers (Anthropic/OpenAI/Gemini/Ollama) to use message-array-native request bodies, including system-message precedence handling.
  • Update AssistantService.reply() / replyStream() to pass ChatTurn[] directly and centralize system prompt composition.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
VERSION Bumps version to 0.6.3.0.
CHANGELOG.md Documents the multi-turn API refactor and provider behavior changes.
packages/llm-client/src/types.ts Adds ChatMessage + updates provider function types to accept multi-turn input.
packages/llm-client/src/messages.ts Adds helpers to normalize prompts and split system vs conversation messages.
packages/llm-client/src/llm-client.ts Updates generate/generateStream signatures and preserves provider-chain behavior.
packages/llm-client/src/index.ts Re-exports ChatMessage and message helpers from package root.
packages/llm-client/src/providers/anthropic.ts Hoists system messages to top-level system and supports message-array prompts for sync/stream.
packages/llm-client/src/providers/openai.ts Uses native messages array with inline-system precedence over options.systemPrompt.
packages/llm-client/src/providers/google.ts Translates to Gemini contents + system_instruction and maps assistant → model role.
packages/llm-client/src/providers/ollama.ts Switches from /api/generate to /api/chat and updates response parsing.
packages/llm-client/src/tests/messages.test.ts Adds unit tests for toMessages and splitSystemAndConversation.
packages/llm-client/src/tests/provider-multiturn.test.ts Adds provider request-shape assertions for multi-turn translation.
packages/assistant/src/assistant-service.ts Passes ChatTurn[] directly to LlmClient and dedupes system prompt composition.
packages/assistant/src/tests/assistant-service.test.ts Updates history-cap test to assert on message arrays rather than flattened prompts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +47
export interface ChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
Comment on lines +11 to +16
* Anthropic takes `system` as a top-level field separate from the
* `messages` array, so we split system messages out of the conversation
* — see `splitSystemAndConversation`. Adjacent same-role messages are
* NOT merged here (Anthropic accepts them) but the API rejects empty
* conversations, so we always have at least one message after the split.
*/
const { system, conversation } = splitSystemAndConversation(
toMessages(prompt),
options.systemPrompt,
);
Comment on lines +40 to +42
* native top-level `system` field). When both are present, the array
* system messages take precedence — the assistant injects context as a
* system turn at the head of the array.
@jayzalowitz jayzalowitz merged commit 334f497 into main May 5, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assistant phase 3: multi-turn LlmClient API + drop prompt-flattening workaround

2 participants