feat(#187 AC#6 + AC#8): Smart/Smarter mode toggle + zero-cost helper#253
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Implements two user-facing pieces of the embedded-LLM rollout: (1) a Smart/Smarter mode toggle in the web Settings “AI brain” section that reorders the provider chain and auto-saves, and (2) a shared @skytwin/llm-client helper to estimate per-call LLM cost in integer cents (including a zero-cost invariant for local providers).
Changes:
- Add
estimateLlmCostCents()/isZeroCostProvider()with vitest coverage in@skytwin/llm-client. - Add Smart/Smarter mode UI + pure chain-manipulation helpers to
apps/webSettings, including a blocked Smarter path that focuses the “Add a provider” control. - Document both features in
CHANGELOG.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
packages/llm-client/src/index.ts |
Re-export cost helpers from the package entrypoint. |
packages/llm-client/src/cost.ts |
New cost estimation + zero-cost provider helper (rate table + rounding). |
packages/llm-client/src/__tests__/cost.test.ts |
New vitest suite covering cost estimation and zero-cost invariants. |
apps/web/public/js/pages/settings.js |
Adds Smart/Smarter toggle UI, helper functions, and auto-save mode switching. |
CHANGELOG.md |
Adds an unreleased entry describing the new toggle and cost helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+26
to
+37
| const RATE_DECICENTS_PER_M_TOKENS: Record<AIProviderName, { input: number; output: number }> = { | ||
| // Anthropic — based on Claude 3.5 Haiku list price ($0.80/$4.00 per | ||
| // 1M). Deci-cents/M: input 8, output 40. | ||
| anthropic: { input: 8, output: 40 }, | ||
| // OpenAI — based on GPT-4o-mini list price ($0.15/$0.60 per 1M). | ||
| // Deci-cents/M: input 1.5 → rounded up to 2 (we use integers and | ||
| // round UP everywhere so the cap stays conservative), output 6. | ||
| openai: { input: 2, output: 6 }, | ||
| // Google — based on Gemini 1.5 Flash list price ($0.075/$0.30 per | ||
| // 1M for prompts <128k tokens). Deci-cents/M: input 1 (rounded up | ||
| // from 0.75), output 3. | ||
| google: { input: 1, output: 3 }, |
Comment on lines
+27
to
+46
| // 1M input + 1M output at 8/40 deci-cents per million → 4.8¢ → 5¢. | ||
| expect(estimateLlmCostCents('anthropic', 1_000_000, 1_000_000)).toBe(5); | ||
| }); | ||
|
|
||
| it('estimates openai cost in integer cents', () => { | ||
| // 1M output at 6 deci-cents per million → 0.6¢ → 1¢ (rounded up). | ||
| expect(estimateLlmCostCents('openai', 0, 1_000_000)).toBe(1); | ||
| }); | ||
|
|
||
| it('estimates google cost in integer cents', () => { | ||
| // 1M output at 3 deci-cents per million → 0.3¢ → 1¢ (rounded up). | ||
| expect(estimateLlmCostCents('google', 0, 1_000_000)).toBe(1); | ||
| }); | ||
|
|
||
| it('rounds up so the cap-enforcement direction is always safe', () => { | ||
| // Anthropic at small token counts (10k input, 10k output) → | ||
| // (8*10_000 + 40*10_000) / 1_000_000 = 0.48 deci-cents → ceil = 1 → | ||
| // ceil(1/10) = 1¢. The exact float would be 0.048¢; rounding up | ||
| // means cap checks see 1¢ instead of zeroing out tiny usage. | ||
| expect(estimateLlmCostCents('anthropic', 10_000, 10_000)).toBe(1); |
Comment on lines
+1516
to
+1523
| await saveAIProviders(userId, _aiChain.map((p, i) => ({ | ||
| provider: p.provider, | ||
| apiKey: p.apiKey || '', | ||
| model: p.model, | ||
| baseUrl: p.baseUrl, | ||
| priority: i, | ||
| enabled: p.enabled !== false, | ||
| }))); |
Comment on lines
+1510
to
+1535
| // Re-render the pill + provider chain optimistically so the click | ||
| // produces an immediate visual change while the save round-trips. | ||
| document.getElementById('ai-mode-toggle').innerHTML = renderModeToggle(_aiChain); | ||
| document.getElementById('ai-provider-chain').innerHTML = renderProviderChain(_aiChain); | ||
|
|
||
| try { | ||
| await saveAIProviders(userId, _aiChain.map((p, i) => ({ | ||
| provider: p.provider, | ||
| apiKey: p.apiKey || '', | ||
| model: p.model, | ||
| baseUrl: p.baseUrl, | ||
| priority: i, | ||
| enabled: p.enabled !== false, | ||
| }))); | ||
| // Re-fetch from the server so the pill reflects the persisted state | ||
| // (handles edge cases like an existing-but-disabled embedded entry | ||
| // that was re-enabled by applySmartMode, where the server response | ||
| // may carry extra fields not in our optimistic copy). | ||
| const { renderSettings } = await import('./settings.js'); | ||
| await renderSettings(document.getElementById('page-content'), userId); | ||
| } catch (err) { | ||
| document.getElementById('page-content').insertAdjacentHTML( | ||
| 'afterbegin', | ||
| `<div class="error-banner">Failed to switch mode: ${escapeHtml(err.message)}</div>`, | ||
| ); | ||
| } |
Comment on lines
+1224
to
+1357
| // #187 AC#6: providers that count as "Smarter" — i.e. external paid APIs | ||
| // the user is choosing to delegate the harder thinking to. `ollama` lives | ||
| // on a third rail: it's local like `embedded` but the user installed it | ||
| // themselves, so we treat it as Smarter too (the operator chose it | ||
| // deliberately and may have a beefier model than the embedded default). | ||
| const SMARTER_PROVIDERS = new Set(['anthropic', 'openai', 'google', 'ollama']); | ||
|
|
||
| /** | ||
| * Determine the user's current AI mode from their provider chain. | ||
| * | ||
| * 'smart' — top enabled provider is `embedded` (Smart mode default | ||
| * per #187 AC#6). | ||
| * 'smarter' — top enabled provider is hosted / Ollama (BYO API path). | ||
| * 'none' — no enabled providers; the LlmClient will return null and | ||
| * callers fall back to local AI + built-in rules. | ||
| * | ||
| * Pure helper so the mode pill, the action handler, and any future audit | ||
| * route all agree on one definition. | ||
| */ | ||
| export function detectAIMode(chain) { | ||
| const enabled = chain.filter((p) => p.enabled !== false); | ||
| if (enabled.length === 0) return 'none'; | ||
| const top = enabled.slice().sort((a, b) => (a.priority ?? 0) - (b.priority ?? 0))[0]; | ||
| if (!top) return 'none'; | ||
| if (top.provider === 'embedded') return 'smart'; | ||
| if (SMARTER_PROVIDERS.has(top.provider)) return 'smarter'; | ||
| return 'none'; | ||
| } | ||
|
|
||
| /** | ||
| * Reorder the chain so `embedded` is the top-priority enabled provider. | ||
| * Adds an `embedded` entry if one doesn't exist yet so first-time-Smart | ||
| * users get a working configuration in one click. Returns the new chain | ||
| * (does not mutate the input). | ||
| * | ||
| * The embedded entry uses `'auto'` as the model so the runtime resolves | ||
| * the first GGUF in the detected modelDir — matching the convention | ||
| * `apps/web/public/js/components/embedded-llm-card.js` uses for fresh | ||
| * installs. | ||
| */ | ||
| export function applySmartMode(chain) { | ||
| const next = chain.map((p) => ({ ...p })); | ||
| let embedded = next.find((p) => p.provider === 'embedded'); | ||
| if (!embedded) { | ||
| embedded = { | ||
| provider: 'embedded', | ||
| model: 'auto', | ||
| apiKey: '', | ||
| baseUrl: undefined, | ||
| priority: 0, | ||
| enabled: true, | ||
| hasApiKey: false, | ||
| apiKeyPreview: '', | ||
| }; | ||
| next.push(embedded); | ||
| } else { | ||
| embedded.enabled = true; | ||
| } | ||
| // Rebuild priorities so embedded is at 0 and the rest preserve their | ||
| // relative order. This is the contract the API expects (priorities are | ||
| // unique sequential integers). | ||
| const others = next.filter((p) => p !== embedded); | ||
| others.sort((a, b) => (a.priority ?? 0) - (b.priority ?? 0)); | ||
| return [embedded, ...others].map((p, i) => ({ ...p, priority: i })); | ||
| } | ||
|
|
||
| /** | ||
| * Reorder the chain so the first non-embedded enabled provider becomes | ||
| * top-priority. Returns null if there's nothing to switch to (caller | ||
| * should surface a "configure a paid provider first" message). | ||
| */ | ||
| export function applySmarterMode(chain) { | ||
| const next = chain.map((p) => ({ ...p })); | ||
| next.sort((a, b) => (a.priority ?? 0) - (b.priority ?? 0)); | ||
| const smarterIdx = next.findIndex((p) => SMARTER_PROVIDERS.has(p.provider)); | ||
| if (smarterIdx === -1) return null; | ||
| const target = next[smarterIdx]; | ||
| target.enabled = true; | ||
| const others = next.filter((p) => p !== target); | ||
| return [target, ...others].map((p, i) => ({ ...p, priority: i })); | ||
| } | ||
|
|
||
| // In-memory state for the current chain being edited | ||
| let _aiChain = []; | ||
|
|
||
| /** | ||
| * #187 AC#6: render the Smart / Smarter mode pill above the provider | ||
| * chain. Active mode is highlighted; clicking the inactive pill reorders | ||
| * priorities and auto-saves. | ||
| * | ||
| * Disabled states (rendered as helper text under the inactive pill): | ||
| * - Switch-to-Smarter is disabled when no hosted/Ollama provider exists | ||
| * in the chain (we don't auto-add one because the user has to supply | ||
| * an API key). | ||
| * - Switch-to-Smart is always available — if no embedded entry exists | ||
| * yet, `applySmartMode` adds one with `model: 'auto'` so the runtime | ||
| * picks up the first GGUF in the detected model directory. | ||
| */ | ||
| function renderModeToggle(providers) { | ||
| const mode = detectAIMode(providers); | ||
| const hasSmarterCandidate = providers.some((p) => SMARTER_PROVIDERS.has(p.provider)); | ||
|
|
||
| const pill = (label, isActive, action, helperText) => ` | ||
| <div style="flex: 1; min-width: 0;"> | ||
| <button class="btn ${isActive ? 'btn-primary' : 'btn-outline'} btn-sm" | ||
| style="width: 100%; padding: 0.5rem 0.75rem; font-size: 0.85rem;" | ||
| data-action="${action}" | ||
| ${isActive ? 'disabled' : ''}> | ||
| ${isActive ? '✓ ' : ''}${label}${isActive ? '' : ' →'} | ||
| </button> | ||
| ${helperText ? `<div style="font-size: 0.7rem; color: var(--text-dim); margin-top: 0.25rem;">${helperText}</div>` : ''} | ||
| </div> | ||
| `; | ||
|
|
||
| return ` | ||
| <div style="display: flex; gap: 0.5rem; margin-bottom: 0.75rem;"> | ||
| ${pill( | ||
| 'Smart (free, on-device)', | ||
| mode === 'smart', | ||
| 'switch-to-smart', | ||
| mode === 'smart' | ||
| ? 'Embedded model is your top choice.' | ||
| : 'No API costs, runs offline.', | ||
| )} | ||
| ${pill( | ||
| 'Smarter (paid API)', | ||
| mode === 'smarter', | ||
| hasSmarterCandidate ? 'switch-to-smarter' : 'switch-to-smarter-blocked', | ||
| mode === 'smarter' | ||
| ? 'Your paid provider is the top choice.' | ||
| : hasSmarterCandidate | ||
| ? 'Sharper reasoning on tricky calls.' | ||
| : 'Add a paid provider below first.', | ||
| )} |
Comment on lines
+3
to
+6
| ## [unreleased] — Smart / Smarter mode toggle + zero-cost helper (#187 AC#6 + AC#8) | ||
|
|
||
| Two pieces close out the user-visible side of the embedded LLM story: | ||
|
|
Comment on lines
+28
to
+37
| // 1M). Deci-cents/M: input 8, output 40. | ||
| anthropic: { input: 8, output: 40 }, | ||
| // OpenAI — based on GPT-4o-mini list price ($0.15/$0.60 per 1M). | ||
| // Deci-cents/M: input 1.5 → rounded up to 2 (we use integers and | ||
| // round UP everywhere so the cap stays conservative), output 6. | ||
| openai: { input: 2, output: 6 }, | ||
| // Google — based on Gemini 1.5 Flash list price ($0.075/$0.30 per | ||
| // 1M for prompts <128k tokens). Deci-cents/M: input 1 (rounded up | ||
| // from 0.75), output 3. | ||
| google: { input: 1, output: 3 }, |
Comment on lines
+62
to
+64
| const rate = RATE_DECICENTS_PER_M_TOKENS[provider]; | ||
| if (!rate) return 0; | ||
| // Deci-cents per million tokens × tokens / 1M = deci-cents. |
Comment on lines
+64
to
+69
| // Deci-cents per million tokens × tokens / 1M = deci-cents. | ||
| // Cents = ceil(deci-cents / 10). Integer math throughout. | ||
| const deciCents = (rate.input * tokensIn + rate.output * tokensOut); | ||
| const tokenScale = 1_000_000; | ||
| const scaledDeciCents = Math.ceil(deciCents / tokenScale); | ||
| return Math.ceil(scaledDeciCents / 10); |
Comment on lines
+1290
to
+1302
| /** | ||
| * Reorder the chain so the first non-embedded enabled provider becomes | ||
| * top-priority. Returns null if there's nothing to switch to (caller | ||
| * should surface a "configure a paid provider first" message). | ||
| */ | ||
| export function applySmarterMode(chain) { | ||
| const next = chain.map((p) => ({ ...p })); | ||
| next.sort((a, b) => (a.priority ?? 0) - (b.priority ?? 0)); | ||
| const smarterIdx = next.findIndex((p) => SMARTER_PROVIDERS.has(p.provider)); | ||
| if (smarterIdx === -1) return null; | ||
| const target = next[smarterIdx]; | ||
| target.enabled = true; | ||
| const others = next.filter((p) => p !== target); |
Comment on lines
+1349
to
+1356
| 'Smarter (paid API)', | ||
| mode === 'smarter', | ||
| hasSmarterCandidate ? 'switch-to-smarter' : 'switch-to-smarter-blocked', | ||
| mode === 'smarter' | ||
| ? 'Your paid provider is the top choice.' | ||
| : hasSmarterCandidate | ||
| ? 'Sharper reasoning on tricky calls.' | ||
| : 'Add a paid provider below first.', |
Comment on lines
+1530
to
+1534
| } catch (err) { | ||
| document.getElementById('page-content').insertAdjacentHTML( | ||
| 'afterbegin', | ||
| `<div class="error-banner">Failed to switch mode: ${escapeHtml(err.message)}</div>`, | ||
| ); |
Comment on lines
+27
to
+46
| // 1M input + 1M output at 8/40 deci-cents per million → 4.8¢ → 5¢. | ||
| expect(estimateLlmCostCents('anthropic', 1_000_000, 1_000_000)).toBe(5); | ||
| }); | ||
|
|
||
| it('estimates openai cost in integer cents', () => { | ||
| // 1M output at 6 deci-cents per million → 0.6¢ → 1¢ (rounded up). | ||
| expect(estimateLlmCostCents('openai', 0, 1_000_000)).toBe(1); | ||
| }); | ||
|
|
||
| it('estimates google cost in integer cents', () => { | ||
| // 1M output at 3 deci-cents per million → 0.3¢ → 1¢ (rounded up). | ||
| expect(estimateLlmCostCents('google', 0, 1_000_000)).toBe(1); | ||
| }); | ||
|
|
||
| it('rounds up so the cap-enforcement direction is always safe', () => { | ||
| // Anthropic at small token counts (10k input, 10k output) → | ||
| // (8*10_000 + 40*10_000) / 1_000_000 = 0.48 deci-cents → ceil = 1 → | ||
| // ceil(1/10) = 1¢. The exact float would be 0.048¢; rounding up | ||
| // means cap checks see 1¢ instead of zeroing out tiny usage. | ||
| expect(estimateLlmCostCents('anthropic', 10_000, 10_000)).toBe(1); |
jayzalowitz
added a commit
that referenced
this pull request
May 11, 2026
…UT /ai, rollback optimistic state on save failure, copy fix for Ollama Copilot's review of PR #253 caught four substantive issues: 1. Cost rate table off by 100×. The original draft stored { input: 8, output: 40 } for Anthropic and called the unit "deci-cents per 1M" — but $0.80 = 80¢ = 800 deci-cents, so the table needed 800/4000. Same conversion error on OpenAI (now 150/600) and Google (now 75/300). Tests updated to pin the expected dollar-equivalent cents outputs ($4.80 for 1M+1M Anthropic, $0.60 OpenAI, $0.30 Google) so the regression can't recur. Embedded + Ollama still return 0¢ at any volume. 2. PUT /api/settings/:userId/ai rejected `embedded` as an invalid provider. Smart mode toggle inserts an `embedded` entry via applySmartMode; the pre-existing validation set didn't include it, so the round-trip 400'd at the API. Added `embedded` to validProviders. 3. switchAIBrainMode now snapshots the previous chain before the optimistic re-render and rolls back on save failure. Previously the UI would show the reordered state with only an error banner on top — visually implying success when the server rejected. 4. Smarter pill copy now says "paid API or Ollama" — SMARTER_PROVIDERS includes Ollama (local, free), and the earlier copy would have confused Ollama users into thinking the option didn't apply. Test plan: llm-client 137/137 (was 136, +1 new rounding-up regression test); api 544/544. Build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 12, 2026
…low, fix smarter-mode doc + err-message defensiveness Four substantive Copilot round-2 findings on PR #253 addressed: 1. estimateLlmCostCents now THROWS when given a provider name not in the rate table, instead of silently returning 0. The AIProviderName type is the source of truth at compile time; this adds a runtime guard for paths that cast a DB string. Silent-zero would have hidden a real data/config bug as fake-free usage. 2. Token-count safe-integer guard: throws on non-finite, negative, or > 2e12 token counts. Prevents IEEE-754 rounding from silently producing a wrong cents value if an untrusted aggregator passes a bogus number. 2e12 is far beyond any real prompt; well below the 2^53 safe-integer ceiling. 3. applySmarterMode docstring corrected to match behavior: it scans the full chain by priority regardless of enabled state, then force-enables the chosen entry. The prior doc claimed "first non-embedded *enabled* provider" — that implied a filter we never applied. 4. switchAIBrainMode catch block now uses the `err instanceof Error ? err.message : String(err)` defensive pattern. A non-Error rejection (string, object, undefined) would otherwise produce "Failed to switch mode: undefined" on the banner. 3 new test cases cover the throw paths (unknown provider, bad token counts, overflow). llm-client 140/140 green; api builds clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
Round-2 review reply: Already fixed in round 1 (efc0822):
Round-2 findings fixed in 0d20331:
|
Closes the user-visible side of the embedded LLM story. AC#6: Settings → AI brain now leads with a two-pill Smart/Smarter mode toggle. Active mode is computed from the user's provider chain (top-priority enabled = embedded → Smart; hosted/Ollama → Smarter). Clicking the inactive pill reorders priorities and auto-saves through `PUT /api/settings/:userId/ai`. - Switching to Smart adds an `embedded` entry with `model: 'auto'` if the chain doesn't have one — first-time-Smart users get a working configuration in one click. - Switching to Smarter when no paid provider exists routes to a `switch-to-smarter-blocked` action that focuses the "+ Add a provider…" dropdown rather than failing silently. Pure helpers (`detectAIMode`, `applySmartMode`, `applySmarterMode`) factored out at the top of `settings.js` with module exports so the mode pill, the action handler, and any future audit route all agree on one definition. AC#8: New `estimateLlmCostCents(provider, tokensIn, tokensOut)` helper in @skytwin/llm-client. Per-provider rate table for hosted APIs; absolute zero for `embedded` and `ollama`. Rounds up to the nearest cent so spend-cap enforcement stays conservative — failure direction is "approval required," never "silently past the cap." Also exports `isZeroCostProvider(provider)` for callers that want to render a free badge. The future spend-recording call site can compute `estimateLlmCostCents(response.provider, tokensIn, tokensOut)` and trust local-runtime calls record zero — no embedded-special-case branch needed at the recording site. Test plan: 10 new vitest cases for `cost.ts` (load-bearing one: embedded/ollama return 0 regardless of token volume). 16 cases smoke-tested for the JS helpers via Node ESM import. Toggle visually verified in Chrome across three provider-chain scenarios (Smart-active, Smarter-active, no-paid-provider). Full llm-client suite: 136/136 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…UT /ai, rollback optimistic state on save failure, copy fix for Ollama Copilot's review of PR #253 caught four substantive issues: 1. Cost rate table off by 100×. The original draft stored { input: 8, output: 40 } for Anthropic and called the unit "deci-cents per 1M" — but $0.80 = 80¢ = 800 deci-cents, so the table needed 800/4000. Same conversion error on OpenAI (now 150/600) and Google (now 75/300). Tests updated to pin the expected dollar-equivalent cents outputs ($4.80 for 1M+1M Anthropic, $0.60 OpenAI, $0.30 Google) so the regression can't recur. Embedded + Ollama still return 0¢ at any volume. 2. PUT /api/settings/:userId/ai rejected `embedded` as an invalid provider. Smart mode toggle inserts an `embedded` entry via applySmartMode; the pre-existing validation set didn't include it, so the round-trip 400'd at the API. Added `embedded` to validProviders. 3. switchAIBrainMode now snapshots the previous chain before the optimistic re-render and rolls back on save failure. Previously the UI would show the reordered state with only an error banner on top — visually implying success when the server rejected. 4. Smarter pill copy now says "paid API or Ollama" — SMARTER_PROVIDERS includes Ollama (local, free), and the earlier copy would have confused Ollama users into thinking the option didn't apply. Test plan: llm-client 137/137 (was 136, +1 new rounding-up regression test); api 544/544. Build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…low, fix smarter-mode doc + err-message defensiveness Four substantive Copilot round-2 findings on PR #253 addressed: 1. estimateLlmCostCents now THROWS when given a provider name not in the rate table, instead of silently returning 0. The AIProviderName type is the source of truth at compile time; this adds a runtime guard for paths that cast a DB string. Silent-zero would have hidden a real data/config bug as fake-free usage. 2. Token-count safe-integer guard: throws on non-finite, negative, or > 2e12 token counts. Prevents IEEE-754 rounding from silently producing a wrong cents value if an untrusted aggregator passes a bogus number. 2e12 is far beyond any real prompt; well below the 2^53 safe-integer ceiling. 3. applySmarterMode docstring corrected to match behavior: it scans the full chain by priority regardless of enabled state, then force-enables the chosen entry. The prior doc claimed "first non-embedded *enabled* provider" — that implied a filter we never applied. 4. switchAIBrainMode catch block now uses the `err instanceof Error ? err.message : String(err)` defensive pattern. A non-Error rejection (string, object, undefined) would otherwise produce "Failed to switch mode: undefined" on the banner. 3 new test cases cover the throw paths (unknown provider, bad token counts, overflow). llm-client 140/140 green; api builds clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0d20331 to
3505fe5
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the user-visible side of the embedded LLM story. Two pieces:
estimateLlmCostCents()helper in@skytwin/llm-clientthat returns 0 forembedded/ollamaand a per-provider rate for hosted APIs. The future spend-recording call site can drop this in without an embedded-special-case branch.Together these take #187 from 5/8 → 7/8 closed. Remaining: AC#1 bundling (distribution work paired with #188) and AC#4 (Piper TTS, blocked on
piperon PATH).AC#6 — Smart/Smarter mode toggle
detectAIMode(chain)→'smart' | 'smarter' | 'none'based on the top-priority enabled provider:embeddedat top → Smartanthropic/openai/google/ollamaat top → SmarterClick the inactive pill → priorities reorder + auto-save:
applySmartMode(chain)promotesembeddedto priority 0 (adds a freshembeddedentry withmodel: 'auto'if missing, re-enables a disabled one).applySmarterMode(chain)promotes the first hosted/Ollama entry to priority 0. Returnsnullif no candidate exists; in that case the action routes toswitch-to-smarter-blockedwhich focuses the "+ Add a provider…" dropdown so the user's eye is drawn to the next step instead of failing silently.After the save round-trip,
renderSettingsre-runs so the pill and the provider chain agree on the persisted state (handles cases like server-side normalization adding fields the optimistic copy didn't have).Pure helpers exported from
apps/web/public/js/pages/settings.jsso the pill, the action handler, and any future audit route all agree on one definition. Same module-levelexport functionstyle other helpers in this file use.Screenshot (Smart active, Anthropic configured as fallback)
✓ Smart (free, on-device)highlighted;Smarter (paid API) →is the call-to-action; helper text under each pill explains what they do.AC#8 —
estimateLlmCostCents()helperpackages/llm-client/src/cost.tsexports:Rate table is keyed by provider with deci-cents per million tokens — list-price of the cheapest model we expose in
PROVIDER_MODELSfor each family. Local-runtime providers (embedded,ollama) carry zero per-token cost; that's the load-bearing piece of AC#8.Rounding is
ceileverywhere so spend-cap enforcement stays conservative — the failure direction is "approval required," never "silently past the cap."No call site wired yet. The current spend-recording path (
spendRepository.create) isn't called from any LLM code path today, so AC#8 is trivially satisfied at runtime. This PR adds the helper as a single source of truth so when LLM-cost recording does land (separate issue), it can computeestimateLlmCostCents(response.provider, tokensIn, tokensOut)and trust that local-runtime calls record zero.Test plan
cost.ts:embeddedreturns 0 for any token volume (incl.Number.MAX_SAFE_INTEGER).ollamareturns 0.isZeroCostProviderdistinguishes local-runtime from hosted.detectAIMode,applySmartMode,applySmarterModeacross empty/embedded-top/anthropic-top/ollama-top/all-disabled/lower-priority-embedded/skip-disabled scenarios + null-return when no candidate).http://localhost:3201/#/settingsacross three scenarios:switch-to-smarter-blocked.pnpm --filter @skytwin/llm-client buildclean.pnpm --filter @skytwin/llm-client test— 136/136 green.Notes for reviewers
applySmartModeadding an embedded entry withmodel: 'auto'mirrors whatembedded-llm-card.jsuses for fresh installs — the runtime resolves the first GGUF in the detected modelDir.embedded === 0 AND ollama === 0as an invariant.apps/web, so the helpers were smoke-tested via Node ESM import. If/when Playwright lands, the toggle's three-scenario behavior deserves an E2E.🤖 Generated with Claude Code