Skip to content

feat(runtime-api): daemon API quartet for whalescale (#561 #562 #563 #564)#567

Merged
Hmbown merged 1 commit into
mainfrom
feat/v0.8.10-daemon-api
May 4, 2026
Merged

feat(runtime-api): daemon API quartet for whalescale (#561 #562 #563 #564)#567
Hmbown merged 1 commit into
mainfrom
feat/v0.8.10-daemon-api

Conversation

@Hmbown

@Hmbown Hmbown commented May 4, 2026

Copy link
Copy Markdown
Owner

Summary

Bridge work to unblock whalescale-desktop's Composer/Settings/Archived-chats flows. Each endpoint solves a current daemon-side limitation that whalescale was either working around (per-turn override) or shipping as cut (Settings → Usage / Archived chats). All four endpoints land together because they share the same runtime_api.rs route table and runtime_threads.rs data model.

docs/RUNTIME_API.md and config.example.toml updated. After this merges I will comment the new endpoint shapes on whalescale#255/256/260/261 so the desktop side can wire them.

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy --workspace --all-targets --all-features --locked -- -D warnings clean
  • cargo test --workspace --all-features --locked — 2011 main tests + all crate suites green; 5 new tests added:
    • cors_layer_appends_extra_origins_and_keeps_defaults — preflight Allow-Origin echo for both default and extra origins; non-allowed origin omitted
    • cors_layer_skips_invalid_origins — invalid HeaderValue strings logged + skipped, layer build does not panic
    • patch_thread_accepts_extended_field_set — every new field round-trips; empty-string clears title; empty model rejected; empty body still 400
    • list_threads_archived_only_filter_matches_only_archived — both /v1/threads and /v1/threads/summary honor the filter; precedence over include_archived
    • usage_endpoint_returns_empty_aggregation_for_fresh_store — empty-store envelope shape, all four group_by values accepted, bad ISO-8601 / inverted bounds / unknown group_by rejected
  • Parity gates green: deepseek-tui-core::snapshot, deepseek-protocol::parity_protocol, deepseek-state::parity_state
  • git diff --exit-code -- Cargo.lock clean (no lockfile drift)

🤖 Generated with Claude Code

…564)

Bridge work to unblock whalescale-desktop's Settings/Composer/Archived-chats
flows without requiring a daemon recompile per dev-port or client-side
aggregation.

#561 / whalescale#255 — CORS allow-list configurable
* Add `[runtime_api] cors_origins` config field, `--cors-origin URL`
  (repeatable) flag on `deepseek serve --http`, and `DEEPSEEK_CORS_ORIGINS`
  env var. User entries stack on top of the built-in defaults
  (localhost:3000, localhost:1420, tauri://localhost). Resolution preserves
  first-seen order and drops empty/duplicate values; invalid HeaderValues
  log a warning and are skipped.
* Refactor `cors_layer()` to read merged origins from `RuntimeApiState`.

#562 / whalescale#256 — `PATCH /v1/threads/{id}` accepts the full editable
field set
* Extend `UpdateThreadRequest` with `allow_shell`, `trust_mode`,
  `auto_approve`, `model`, `mode`, `title`, `system_prompt`. Each is
  optional; missing means no change. Empty-string clears `title`/
  `system_prompt`. Empty `model`/`mode` rejected with 400.
* Add `title: Option<String>` to `ThreadRecord` (additive, no schema bump
  per documented criteria — old readers ignore the field without
  misinterpretation). `list_threads_summary` now returns the user-set title
  when present, falling back to the derived input-summary title.
* `thread.updated` event payload now carries a `changes` map with only the
  fields that actually changed.

#563 / whalescale#260 — list-archived-only filter
* New `archived_only=true` query param on `GET /v1/threads` and
  `GET /v1/threads/summary`. Backed by a new `ThreadListFilter` enum
  (`ActiveOnly` | `IncludeArchived` | `ArchivedOnly`). `archived_only`
  takes precedence over `include_archived`. Default behavior unchanged.

#564 / whalescale#261 — `GET /v1/usage` aggregation
* New `RuntimeThreadManager::aggregate_usage` walks all threads/turns,
  filters by inclusive `since`/`until` RFC 3339 bounds, accumulates token
  totals + cost (via `pricing::calculate_turn_cost_from_usage`), and
  groups by `day` (default), `model`, `provider`, or `thread`.
* New `GET /v1/usage` route. `since`/`until`/`group_by` query params,
  `since > until` and unknown `group_by` rejected with 400. Empty time
  ranges yield empty `buckets` (never 404).

5 new tests cover preflight Allow-Origin echoing for both default and
extra origins, the extended PATCH field set + clear-by-empty + 400 paths,
the archived_only filter on list + summary endpoints, and the
/v1/usage envelope + validation errors. Existing 13 runtime_api tests
continue to pass; the parity gates and full workspace test suite are clean.

`docs/RUNTIME_API.md` and `config.example.toml` updated to document the
new params, body shape, endpoint, and CORS knob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 4, 2026 06:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds four runtime-daemon HTTP API enhancements needed by whalescale-desktop (CORS configuration, richer thread PATCH, archived-only listing, and usage aggregation), updating both the runtime data model and the route table/documentation.

Changes:

  • Make runtime API CORS allow-list extensible via CLI/env/config while preserving built-in dev defaults.
  • Extend threads API: archived-only filtering for list endpoints; broaden PATCH /v1/threads/{id} to multiple editable fields including title.
  • Add GET /v1/usage to aggregate token/cost totals and buckets across threads/turns.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
docs/RUNTIME_API.md Documents new query params, PATCH body shape, usage endpoint, and CORS configuration sources/defaults.
crates/tui/src/runtime_threads.rs Adds title, thread list filter enum, update request fields, and implements usage aggregation over stored turns.
crates/tui/src/runtime_api.rs Wires new query params, CORS layer configuration, and /v1/usage route/handler into the router.
crates/tui/src/main.rs Adds --cors-origin flag and merges CLI/env/config into runtime API options.
crates/tui/src/config.rs Introduces [runtime_api] config table for additional CORS origins and merges it during config overlay.
config.example.toml Adds documented example section for [runtime_api] cors_origins.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +856 to +921
let mut buckets: BTreeMap<String, UsageBucket> = BTreeMap::new();
let mut totals = UsageTotals::default();

for thread in self.store.list_threads()? {
let turns = self.store.list_turns_for_thread(&thread.id)?;
for turn in turns {
if let Some(s) = since
&& turn.created_at < s
{
continue;
}
if let Some(u) = until
&& turn.created_at > u
{
continue;
}
let Some(usage) = turn.usage.as_ref() else {
continue;
};
let cached = usage.prompt_cache_hit_tokens.unwrap_or(0) as u64;
let reasoning = usage.reasoning_tokens.unwrap_or(0) as u64;
let input = usage.input_tokens as u64;
let output = usage.output_tokens as u64;
let cost = crate::pricing::calculate_turn_cost_from_usage(&thread.model, usage)
.unwrap_or(0.0);

totals.input_tokens += input;
totals.output_tokens += output;
totals.cached_tokens += cached;
totals.reasoning_tokens += reasoning;
totals.cost_usd += cost;
totals.turns += 1;

let key = match group_by {
UsageGroupBy::Day => turn.created_at.format("%Y-%m-%d").to_string(),
UsageGroupBy::Model => thread.model.clone(),
UsageGroupBy::Provider => provider_label_for_model(&thread.model).to_string(),
UsageGroupBy::Thread => thread.id.clone(),
};
let bucket = buckets.entry(key.clone()).or_insert_with(|| UsageBucket {
key,
..UsageBucket::default()
});
bucket.input_tokens += input;
bucket.output_tokens += output;
bucket.cached_tokens += cached;
bucket.reasoning_tokens += reasoning;
bucket.cost_usd += cost;
bucket.turns += 1;
}
}

let group_by_str = match group_by {
UsageGroupBy::Day => "day",
UsageGroupBy::Model => "model",
UsageGroupBy::Provider => "provider",
UsageGroupBy::Thread => "thread",
}
.to_string();

Ok(UsageAggregation {
since,
until,
group_by: group_by_str,
totals,
buckets: buckets.into_values().collect(),
Comment on lines +599 to 603
let filter = resolve_thread_filter(query.include_archived, query.archived_only);
let threads = state
.runtime_threads
.list_threads(query.include_archived.unwrap_or(false), query.limit)
.list_threads(filter, query.limit)
.await
Comment thread crates/tui/src/main.rs

/// Resolve the user-supplied CORS origins for `deepseek serve --http`.
///
/// Sources, in priority order (later sources extend earlier ones):

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements to the runtime API, including configurable CORS origins, extended thread management capabilities (such as title support and filtering), and a new usage aggregation endpoint. My review identified a significant performance bottleneck in the aggregate_usage function due to inefficient file system operations, a need for more robust provider labeling in the usage tracking logic, and a potential inaccuracy in historical cost reporting when thread models are updated. I have provided a code suggestion to improve the provider labeling logic.

Comment on lines +859 to +860
for thread in self.store.list_threads()? {
let turns = self.store.list_turns_for_thread(&thread.id)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of aggregate_usage has a significant performance bottleneck. It iterates over all threads and, for each thread, calls list_turns_for_thread, which performs a full directory scan of the turns/ directory. This results in $O(N_{threads} \times M_{total_turns})$ file system operations and JSON parses. For a user with a large number of threads and turns, this will be extremely slow and inefficient.

Comment on lines +632 to +642
fn provider_label_for_model(model: &str) -> &'static str {
if model.starts_with("deepseek-ai/") {
"nvidia-nim"
} else if model.starts_with("deepseek-") {
"deepseek"
} else if model.starts_with("openai/") || model.starts_with("anthropic/") {
"openrouter"
} else {
"unknown"
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The provider_label_for_model logic is incomplete and may mislabel usage buckets. For example, OpenRouter and Novita models often use the deepseek/ prefix (e.g., deepseek/deepseek-v4-pro), which would currently be labeled as "unknown". Additionally, the deepseek-ai/ prefix is ambiguous as it is used by both NVIDIA NIM and self-hosted SGLang instances.

Suggested change
fn provider_label_for_model(model: &str) -> &'static str {
if model.starts_with("deepseek-ai/") {
"nvidia-nim"
} else if model.starts_with("deepseek-") {
"deepseek"
} else if model.starts_with("openai/") || model.starts_with("anthropic/") {
"openrouter"
} else {
"unknown"
}
}
fn provider_label_for_model(model: &str) -> &'static str {
if model.starts_with("deepseek-ai/") {
"nvidia-nim"
} else if model.starts_with("deepseek-") {
"deepseek"
} else if model.starts_with("openai/") || model.starts_with("anthropic/") || model.starts_with("deepseek/") {
"openrouter"
} else {
"unknown"
}
}

Comment on lines +879 to +880
let cost = crate::pricing::calculate_turn_cost_from_usage(&thread.model, usage)
.unwrap_or(0.0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Usage cost calculation in aggregate_usage uses the current model of the thread for all historical turns. Since a thread's model can now be changed via PATCH /v1/threads/{id}, this will lead to inaccurate historical cost reporting if a thread was switched between models with different pricing (e.g., switching from pro to flash). To fix this accurately, the model used for each turn should be persisted in the TurnRecord.

@Hmbown Hmbown merged commit 0047b32 into main May 4, 2026
12 checks passed
@Hmbown Hmbown deleted the feat/v0.8.10-daemon-api branch May 4, 2026 07:18
Hmbown added a commit that referenced this pull request May 4, 2026
Picks up the v0.8.10 patch release contents:
* Daemon API quartet for whalescale-desktop integration (#561-#564,
  PR #567).
* Bug cluster: macOS seatbelt cargo registry (#558), MCP SIGTERM
  shutdown (#420), Linux PR_SET_PDEATHSIG (#421).
* npm install on older glibc fix (#555/#560 via #556 + #565).
* Shell cwd workspace-boundary validation (#524).
* Memory help/docs polish (#497 via #569).
* Onboarding language picker (#566).
* Whale nicknames interleaved with Simplified Chinese.

First-time contributors credited in CHANGELOG: @staryxchen,
@shentoumengxin, @Vishnu1837, @20bytes.

Workspace `Cargo.toml`, all 9 internal path-dep version pins, and
`npm/deepseek-tui/package.json` all bumped to 0.8.10. `Cargo.lock`
regenerated and committed alongside.

Verified locally:
* cargo fmt --all -- --check
* cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
* cargo test --workspace --all-features --locked
* bash scripts/release/check-versions.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MMMarcinho pushed a commit to MMMarcinho/DeepSeek-TUI that referenced this pull request May 6, 2026
…wn#562 Hmbown#563 Hmbown#564) (Hmbown#567)

Bridge work to unblock whalescale-desktop's Settings/Composer/Archived-chats
flows without requiring a daemon recompile per dev-port or client-side
aggregation.

Hmbown#561 / whalescale#255 — CORS allow-list configurable
* Add `[runtime_api] cors_origins` config field, `--cors-origin URL`
  (repeatable) flag on `deepseek serve --http`, and `DEEPSEEK_CORS_ORIGINS`
  env var. User entries stack on top of the built-in defaults
  (localhost:3000, localhost:1420, tauri://localhost). Resolution preserves
  first-seen order and drops empty/duplicate values; invalid HeaderValues
  log a warning and are skipped.
* Refactor `cors_layer()` to read merged origins from `RuntimeApiState`.

Hmbown#562 / whalescale#256 — `PATCH /v1/threads/{id}` accepts the full editable
field set
* Extend `UpdateThreadRequest` with `allow_shell`, `trust_mode`,
  `auto_approve`, `model`, `mode`, `title`, `system_prompt`. Each is
  optional; missing means no change. Empty-string clears `title`/
  `system_prompt`. Empty `model`/`mode` rejected with 400.
* Add `title: Option<String>` to `ThreadRecord` (additive, no schema bump
  per documented criteria — old readers ignore the field without
  misinterpretation). `list_threads_summary` now returns the user-set title
  when present, falling back to the derived input-summary title.
* `thread.updated` event payload now carries a `changes` map with only the
  fields that actually changed.

Hmbown#563 / whalescale#260 — list-archived-only filter
* New `archived_only=true` query param on `GET /v1/threads` and
  `GET /v1/threads/summary`. Backed by a new `ThreadListFilter` enum
  (`ActiveOnly` | `IncludeArchived` | `ArchivedOnly`). `archived_only`
  takes precedence over `include_archived`. Default behavior unchanged.

Hmbown#564 / whalescale#261 — `GET /v1/usage` aggregation
* New `RuntimeThreadManager::aggregate_usage` walks all threads/turns,
  filters by inclusive `since`/`until` RFC 3339 bounds, accumulates token
  totals + cost (via `pricing::calculate_turn_cost_from_usage`), and
  groups by `day` (default), `model`, `provider`, or `thread`.
* New `GET /v1/usage` route. `since`/`until`/`group_by` query params,
  `since > until` and unknown `group_by` rejected with 400. Empty time
  ranges yield empty `buckets` (never 404).

5 new tests cover preflight Allow-Origin echoing for both default and
extra origins, the extended PATCH field set + clear-by-empty + 400 paths,
the archived_only filter on list + summary endpoints, and the
/v1/usage envelope + validation errors. Existing 13 runtime_api tests
continue to pass; the parity gates and full workspace test suite are clean.

`docs/RUNTIME_API.md` and `config.example.toml` updated to document the
new params, body shape, endpoint, and CORS knob.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MMMarcinho pushed a commit to MMMarcinho/DeepSeek-TUI that referenced this pull request May 6, 2026
Picks up the v0.8.10 patch release contents:
* Daemon API quartet for whalescale-desktop integration (Hmbown#561-Hmbown#564,
  PR Hmbown#567).
* Bug cluster: macOS seatbelt cargo registry (Hmbown#558), MCP SIGTERM
  shutdown (Hmbown#420), Linux PR_SET_PDEATHSIG (Hmbown#421).
* npm install on older glibc fix (Hmbown#555/Hmbown#560 via Hmbown#556 + Hmbown#565).
* Shell cwd workspace-boundary validation (Hmbown#524).
* Memory help/docs polish (Hmbown#497 via Hmbown#569).
* Onboarding language picker (Hmbown#566).
* Whale nicknames interleaved with Simplified Chinese.

First-time contributors credited in CHANGELOG: @staryxchen,
@shentoumengxin, @Vishnu1837, @20bytes.

Workspace `Cargo.toml`, all 9 internal path-dep version pins, and
`npm/deepseek-tui/package.json` all bumped to 0.8.10. `Cargo.lock`
regenerated and committed alongside.

Verified locally:
* cargo fmt --all -- --check
* cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
* cargo test --workspace --all-features --locked
* bash scripts/release/check-versions.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants