feat(runtime-api): daemon API quartet for whalescale (#561 #562 #563 #564)#567
Conversation
…564) Bridge work to unblock whalescale-desktop's Settings/Composer/Archived-chats flows without requiring a daemon recompile per dev-port or client-side aggregation. #561 / whalescale#255 — CORS allow-list configurable * Add `[runtime_api] cors_origins` config field, `--cors-origin URL` (repeatable) flag on `deepseek serve --http`, and `DEEPSEEK_CORS_ORIGINS` env var. User entries stack on top of the built-in defaults (localhost:3000, localhost:1420, tauri://localhost). Resolution preserves first-seen order and drops empty/duplicate values; invalid HeaderValues log a warning and are skipped. * Refactor `cors_layer()` to read merged origins from `RuntimeApiState`. #562 / whalescale#256 — `PATCH /v1/threads/{id}` accepts the full editable field set * Extend `UpdateThreadRequest` with `allow_shell`, `trust_mode`, `auto_approve`, `model`, `mode`, `title`, `system_prompt`. Each is optional; missing means no change. Empty-string clears `title`/ `system_prompt`. Empty `model`/`mode` rejected with 400. * Add `title: Option<String>` to `ThreadRecord` (additive, no schema bump per documented criteria — old readers ignore the field without misinterpretation). `list_threads_summary` now returns the user-set title when present, falling back to the derived input-summary title. * `thread.updated` event payload now carries a `changes` map with only the fields that actually changed. #563 / whalescale#260 — list-archived-only filter * New `archived_only=true` query param on `GET /v1/threads` and `GET /v1/threads/summary`. Backed by a new `ThreadListFilter` enum (`ActiveOnly` | `IncludeArchived` | `ArchivedOnly`). `archived_only` takes precedence over `include_archived`. Default behavior unchanged. #564 / whalescale#261 — `GET /v1/usage` aggregation * New `RuntimeThreadManager::aggregate_usage` walks all threads/turns, filters by inclusive `since`/`until` RFC 3339 bounds, accumulates token totals + cost (via `pricing::calculate_turn_cost_from_usage`), and groups by `day` (default), `model`, `provider`, or `thread`. * New `GET /v1/usage` route. `since`/`until`/`group_by` query params, `since > until` and unknown `group_by` rejected with 400. Empty time ranges yield empty `buckets` (never 404). 5 new tests cover preflight Allow-Origin echoing for both default and extra origins, the extended PATCH field set + clear-by-empty + 400 paths, the archived_only filter on list + summary endpoints, and the /v1/usage envelope + validation errors. Existing 13 runtime_api tests continue to pass; the parity gates and full workspace test suite are clean. `docs/RUNTIME_API.md` and `config.example.toml` updated to document the new params, body shape, endpoint, and CORS knob. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds four runtime-daemon HTTP API enhancements needed by whalescale-desktop (CORS configuration, richer thread PATCH, archived-only listing, and usage aggregation), updating both the runtime data model and the route table/documentation.
Changes:
- Make runtime API CORS allow-list extensible via CLI/env/config while preserving built-in dev defaults.
- Extend threads API: archived-only filtering for list endpoints; broaden
PATCH /v1/threads/{id}to multiple editable fields includingtitle. - Add
GET /v1/usageto aggregate token/cost totals and buckets across threads/turns.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/RUNTIME_API.md | Documents new query params, PATCH body shape, usage endpoint, and CORS configuration sources/defaults. |
| crates/tui/src/runtime_threads.rs | Adds title, thread list filter enum, update request fields, and implements usage aggregation over stored turns. |
| crates/tui/src/runtime_api.rs | Wires new query params, CORS layer configuration, and /v1/usage route/handler into the router. |
| crates/tui/src/main.rs | Adds --cors-origin flag and merges CLI/env/config into runtime API options. |
| crates/tui/src/config.rs | Introduces [runtime_api] config table for additional CORS origins and merges it during config overlay. |
| config.example.toml | Adds documented example section for [runtime_api] cors_origins. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut buckets: BTreeMap<String, UsageBucket> = BTreeMap::new(); | ||
| let mut totals = UsageTotals::default(); | ||
|
|
||
| for thread in self.store.list_threads()? { | ||
| let turns = self.store.list_turns_for_thread(&thread.id)?; | ||
| for turn in turns { | ||
| if let Some(s) = since | ||
| && turn.created_at < s | ||
| { | ||
| continue; | ||
| } | ||
| if let Some(u) = until | ||
| && turn.created_at > u | ||
| { | ||
| continue; | ||
| } | ||
| let Some(usage) = turn.usage.as_ref() else { | ||
| continue; | ||
| }; | ||
| let cached = usage.prompt_cache_hit_tokens.unwrap_or(0) as u64; | ||
| let reasoning = usage.reasoning_tokens.unwrap_or(0) as u64; | ||
| let input = usage.input_tokens as u64; | ||
| let output = usage.output_tokens as u64; | ||
| let cost = crate::pricing::calculate_turn_cost_from_usage(&thread.model, usage) | ||
| .unwrap_or(0.0); | ||
|
|
||
| totals.input_tokens += input; | ||
| totals.output_tokens += output; | ||
| totals.cached_tokens += cached; | ||
| totals.reasoning_tokens += reasoning; | ||
| totals.cost_usd += cost; | ||
| totals.turns += 1; | ||
|
|
||
| let key = match group_by { | ||
| UsageGroupBy::Day => turn.created_at.format("%Y-%m-%d").to_string(), | ||
| UsageGroupBy::Model => thread.model.clone(), | ||
| UsageGroupBy::Provider => provider_label_for_model(&thread.model).to_string(), | ||
| UsageGroupBy::Thread => thread.id.clone(), | ||
| }; | ||
| let bucket = buckets.entry(key.clone()).or_insert_with(|| UsageBucket { | ||
| key, | ||
| ..UsageBucket::default() | ||
| }); | ||
| bucket.input_tokens += input; | ||
| bucket.output_tokens += output; | ||
| bucket.cached_tokens += cached; | ||
| bucket.reasoning_tokens += reasoning; | ||
| bucket.cost_usd += cost; | ||
| bucket.turns += 1; | ||
| } | ||
| } | ||
|
|
||
| let group_by_str = match group_by { | ||
| UsageGroupBy::Day => "day", | ||
| UsageGroupBy::Model => "model", | ||
| UsageGroupBy::Provider => "provider", | ||
| UsageGroupBy::Thread => "thread", | ||
| } | ||
| .to_string(); | ||
|
|
||
| Ok(UsageAggregation { | ||
| since, | ||
| until, | ||
| group_by: group_by_str, | ||
| totals, | ||
| buckets: buckets.into_values().collect(), |
| let filter = resolve_thread_filter(query.include_archived, query.archived_only); | ||
| let threads = state | ||
| .runtime_threads | ||
| .list_threads(query.include_archived.unwrap_or(false), query.limit) | ||
| .list_threads(filter, query.limit) | ||
| .await |
|
|
||
| /// Resolve the user-supplied CORS origins for `deepseek serve --http`. | ||
| /// | ||
| /// Sources, in priority order (later sources extend earlier ones): |
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the runtime API, including configurable CORS origins, extended thread management capabilities (such as title support and filtering), and a new usage aggregation endpoint. My review identified a significant performance bottleneck in the aggregate_usage function due to inefficient file system operations, a need for more robust provider labeling in the usage tracking logic, and a potential inaccuracy in historical cost reporting when thread models are updated. I have provided a code suggestion to improve the provider labeling logic.
| for thread in self.store.list_threads()? { | ||
| let turns = self.store.list_turns_for_thread(&thread.id)?; |
There was a problem hiding this comment.
The current implementation of aggregate_usage has a significant performance bottleneck. It iterates over all threads and, for each thread, calls list_turns_for_thread, which performs a full directory scan of the turns/ directory. This results in
| fn provider_label_for_model(model: &str) -> &'static str { | ||
| if model.starts_with("deepseek-ai/") { | ||
| "nvidia-nim" | ||
| } else if model.starts_with("deepseek-") { | ||
| "deepseek" | ||
| } else if model.starts_with("openai/") || model.starts_with("anthropic/") { | ||
| "openrouter" | ||
| } else { | ||
| "unknown" | ||
| } | ||
| } |
There was a problem hiding this comment.
The provider_label_for_model logic is incomplete and may mislabel usage buckets. For example, OpenRouter and Novita models often use the deepseek/ prefix (e.g., deepseek/deepseek-v4-pro), which would currently be labeled as "unknown". Additionally, the deepseek-ai/ prefix is ambiguous as it is used by both NVIDIA NIM and self-hosted SGLang instances.
| fn provider_label_for_model(model: &str) -> &'static str { | |
| if model.starts_with("deepseek-ai/") { | |
| "nvidia-nim" | |
| } else if model.starts_with("deepseek-") { | |
| "deepseek" | |
| } else if model.starts_with("openai/") || model.starts_with("anthropic/") { | |
| "openrouter" | |
| } else { | |
| "unknown" | |
| } | |
| } | |
| fn provider_label_for_model(model: &str) -> &'static str { | |
| if model.starts_with("deepseek-ai/") { | |
| "nvidia-nim" | |
| } else if model.starts_with("deepseek-") { | |
| "deepseek" | |
| } else if model.starts_with("openai/") || model.starts_with("anthropic/") || model.starts_with("deepseek/") { | |
| "openrouter" | |
| } else { | |
| "unknown" | |
| } | |
| } |
| let cost = crate::pricing::calculate_turn_cost_from_usage(&thread.model, usage) | ||
| .unwrap_or(0.0); |
There was a problem hiding this comment.
Usage cost calculation in aggregate_usage uses the current model of the thread for all historical turns. Since a thread's model can now be changed via PATCH /v1/threads/{id}, this will lead to inaccurate historical cost reporting if a thread was switched between models with different pricing (e.g., switching from pro to flash). To fix this accurately, the model used for each turn should be persisted in the TurnRecord.
Picks up the v0.8.10 patch release contents: * Daemon API quartet for whalescale-desktop integration (#561-#564, PR #567). * Bug cluster: macOS seatbelt cargo registry (#558), MCP SIGTERM shutdown (#420), Linux PR_SET_PDEATHSIG (#421). * npm install on older glibc fix (#555/#560 via #556 + #565). * Shell cwd workspace-boundary validation (#524). * Memory help/docs polish (#497 via #569). * Onboarding language picker (#566). * Whale nicknames interleaved with Simplified Chinese. First-time contributors credited in CHANGELOG: @staryxchen, @shentoumengxin, @Vishnu1837, @20bytes. Workspace `Cargo.toml`, all 9 internal path-dep version pins, and `npm/deepseek-tui/package.json` all bumped to 0.8.10. `Cargo.lock` regenerated and committed alongside. Verified locally: * cargo fmt --all -- --check * cargo clippy --workspace --all-targets --all-features --locked -- -D warnings * cargo test --workspace --all-features --locked * bash scripts/release/check-versions.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wn#562 Hmbown#563 Hmbown#564) (Hmbown#567) Bridge work to unblock whalescale-desktop's Settings/Composer/Archived-chats flows without requiring a daemon recompile per dev-port or client-side aggregation. Hmbown#561 / whalescale#255 — CORS allow-list configurable * Add `[runtime_api] cors_origins` config field, `--cors-origin URL` (repeatable) flag on `deepseek serve --http`, and `DEEPSEEK_CORS_ORIGINS` env var. User entries stack on top of the built-in defaults (localhost:3000, localhost:1420, tauri://localhost). Resolution preserves first-seen order and drops empty/duplicate values; invalid HeaderValues log a warning and are skipped. * Refactor `cors_layer()` to read merged origins from `RuntimeApiState`. Hmbown#562 / whalescale#256 — `PATCH /v1/threads/{id}` accepts the full editable field set * Extend `UpdateThreadRequest` with `allow_shell`, `trust_mode`, `auto_approve`, `model`, `mode`, `title`, `system_prompt`. Each is optional; missing means no change. Empty-string clears `title`/ `system_prompt`. Empty `model`/`mode` rejected with 400. * Add `title: Option<String>` to `ThreadRecord` (additive, no schema bump per documented criteria — old readers ignore the field without misinterpretation). `list_threads_summary` now returns the user-set title when present, falling back to the derived input-summary title. * `thread.updated` event payload now carries a `changes` map with only the fields that actually changed. Hmbown#563 / whalescale#260 — list-archived-only filter * New `archived_only=true` query param on `GET /v1/threads` and `GET /v1/threads/summary`. Backed by a new `ThreadListFilter` enum (`ActiveOnly` | `IncludeArchived` | `ArchivedOnly`). `archived_only` takes precedence over `include_archived`. Default behavior unchanged. Hmbown#564 / whalescale#261 — `GET /v1/usage` aggregation * New `RuntimeThreadManager::aggregate_usage` walks all threads/turns, filters by inclusive `since`/`until` RFC 3339 bounds, accumulates token totals + cost (via `pricing::calculate_turn_cost_from_usage`), and groups by `day` (default), `model`, `provider`, or `thread`. * New `GET /v1/usage` route. `since`/`until`/`group_by` query params, `since > until` and unknown `group_by` rejected with 400. Empty time ranges yield empty `buckets` (never 404). 5 new tests cover preflight Allow-Origin echoing for both default and extra origins, the extended PATCH field set + clear-by-empty + 400 paths, the archived_only filter on list + summary endpoints, and the /v1/usage envelope + validation errors. Existing 13 runtime_api tests continue to pass; the parity gates and full workspace test suite are clean. `docs/RUNTIME_API.md` and `config.example.toml` updated to document the new params, body shape, endpoint, and CORS knob. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up the v0.8.10 patch release contents: * Daemon API quartet for whalescale-desktop integration (Hmbown#561-Hmbown#564, PR Hmbown#567). * Bug cluster: macOS seatbelt cargo registry (Hmbown#558), MCP SIGTERM shutdown (Hmbown#420), Linux PR_SET_PDEATHSIG (Hmbown#421). * npm install on older glibc fix (Hmbown#555/Hmbown#560 via Hmbown#556 + Hmbown#565). * Shell cwd workspace-boundary validation (Hmbown#524). * Memory help/docs polish (Hmbown#497 via Hmbown#569). * Onboarding language picker (Hmbown#566). * Whale nicknames interleaved with Simplified Chinese. First-time contributors credited in CHANGELOG: @staryxchen, @shentoumengxin, @Vishnu1837, @20bytes. Workspace `Cargo.toml`, all 9 internal path-dep version pins, and `npm/deepseek-tui/package.json` all bumped to 0.8.10. `Cargo.lock` regenerated and committed alongside. Verified locally: * cargo fmt --all -- --check * cargo clippy --workspace --all-targets --all-features --locked -- -D warnings * cargo test --workspace --all-features --locked * bash scripts/release/check-versions.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Bridge work to unblock whalescale-desktop's Composer/Settings/Archived-chats flows. Each endpoint solves a current daemon-side limitation that whalescale was either working around (per-turn override) or shipping as cut (Settings → Usage / Archived chats). All four endpoints land together because they share the same
runtime_api.rsroute table andruntime_threads.rsdata model.[runtime_api] cors_originsconfig field,--cors-origin URL(repeatable) flag ondeepseek serve --http, andDEEPSEEK_CORS_ORIGINSenv var. User entries stack on top of built-in defaults (localhost:3000, localhost:1420, tauri://localhost).archived#562 / whalescale#256 —PATCH /v1/threads/{id}extended fromarchived-only to the full editable field set:allow_shell,trust_mode,auto_approve,model,mode,title,system_prompt. Empty-string clearstitle/system_prompt. Newtitlefield onThreadRecordis additive — no schema_version bump (old readers ignore it without misinterpretation).archived_only=truequery param onGET /v1/threadsand/v1/threads/summary. Backed by a newThreadListFilterenum.GET /v1/usage?since=&until=&group_by=<day|model|provider|thread>aggregates token totals + cost (viapricing.rs) across all threads/turns. Empty time ranges yield emptybuckets(never 404).docs/RUNTIME_API.mdandconfig.example.tomlupdated. After this merges I will comment the new endpoint shapes on whalescale#255/256/260/261 so the desktop side can wire them.Test plan
cargo fmt --all -- --checkcleancargo clippy --workspace --all-targets --all-features --locked -- -D warningscleancargo test --workspace --all-features --locked— 2011 main tests + all crate suites green; 5 new tests added:cors_layer_appends_extra_origins_and_keeps_defaults— preflight Allow-Origin echo for both default and extra origins; non-allowed origin omittedcors_layer_skips_invalid_origins— invalidHeaderValuestrings logged + skipped, layer build does not panicpatch_thread_accepts_extended_field_set— every new field round-trips; empty-string clears title; empty model rejected; empty body still 400list_threads_archived_only_filter_matches_only_archived— both/v1/threadsand/v1/threads/summaryhonor the filter; precedence overinclude_archivedusage_endpoint_returns_empty_aggregation_for_fresh_store— empty-store envelope shape, all fourgroup_byvalues accepted, bad ISO-8601 / inverted bounds / unknown group_by rejecteddeepseek-tui-core::snapshot,deepseek-protocol::parity_protocol,deepseek-state::parity_stategit diff --exit-code -- Cargo.lockclean (no lockfile drift)🤖 Generated with Claude Code