feat(tools): byte-level schema canonicalize for prefix-cache stability#2499
feat(tools): byte-level schema canonicalize for prefix-cache stability#2499HUQIANTAO wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request introduces a new schema_canonicalize module to recursively canonicalize JSON schemas, ensuring deterministic serialization and prefix-cache stability by sorting object keys and elements in required and dependentRequired arrays. This canonicalization is applied to tool input schemas in the registry. Feedback suggests optimizing the sorting of object keys by using sort_unstable_by instead of sort_by to avoid unnecessary allocations.
| // drain(), so we swap to a temporary and rebuild. | ||
| let old = std::mem::take(map); | ||
| let mut entries: Vec<(String, Value)> = old.into_iter().collect(); | ||
| entries.sort_by(|a, b| a.0.cmp(&b.0)); |
There was a problem hiding this comment.
Since the keys of a JSON object are guaranteed to be unique, the stability of the sorting algorithm is not required. Using sort_unstable_by instead of sort_by avoids allocating temporary helper memory and is generally faster.
| entries.sort_by(|a, b| a.0.cmp(&b.0)); | |
| entries.sort_unstable_by(|a, b| a.0.cmp(&b.0)); |
7cee9cd to
95c80c9
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Harvested into I fixed the formatting failure locally with |
…ility When MCP servers return tool schemas, the field order within each schema object and the order of entries in required / dependentRequired arrays can vary across reconnections. This causes the serialized tool catalog bytes to change even when the logical schema is unchanged, busting DeepSeek's KV prefix cache. Add schema_canonicalize::canonicalize_schema which recursively: - Sorts every required array alphabetically - Sorts every dependentRequired sub-array alphabetically - Rebuilds object keys in alphabetical order - Recurses into all nested objects and arrays The canonicalize step runs after schema_sanitize in build_api_tools, so each tool's input_schema is first cleaned then byte-stabilized. The existing OnceLock api_cache pins the result, ensuring the tool catalog bytes are identical across reads and across process restarts. 8 unit tests cover: required sorting, dependentRequired sorting, equivalent-ordering byte match, recursive nesting, empty schemas, deeply nested schemas, non-required array preservation, and key ordering.
95c80c9 to
5067d8b
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Closing: this slice was harvested upstream (per the maintainer comments) — the work is in main, no need to keep the open PR alive. Thanks for the review! |
Summary
When MCP servers return tool schemas, the field order within each schema object and the order of entries in
required/dependentRequiredarrays can vary across reconnections. This causes the serialized tool catalog bytes to change even when the logical schema is unchanged, busting DeepSeek's KV prefix cache.This PR adds
schema_canonicalize::canonicalize_schemawhich recursively:requiredarray alphabeticallydependentRequiredsub-array alphabeticallyThe canonicalize step runs after
schema_sanitizeinbuild_api_tools, so each tool'sinput_schemais first cleaned then byte-stabilized. The existingOnceLockapi_cache pins the result, ensuring the tool catalog bytes are identical across reads and across process restarts.Motivation
This is the highest-value, lowest-risk optimization identified in the cache audit. The
serde_json::Value::Objectbacked byIndexMap(viapreserve_orderfeature) preserves insertion order from the MCP server, which can differ between reconnections. By canonicalizing after sanitize, we guarantee that the same logical schema always produces the same bytes, preventing unnecessary prefix cache busts.Changes
crates/tui/src/tools/schema_canonicalize.rs(199 lines including 8 unit tests)crates/tui/src/tools/registry.rs- addcanonicalize_schemacall aftersanitizeinbuild_api_toolscrates/tui/src/tools/mod.rs- add module declarationTesting
8 unit tests covering:
sorts_required_array- basic required array sortingequivalent_ordering_matches- different field orders produce identical bytessorts_dependent_required- dependentRequired sub-array sortingrecursive_into_properties- nested schema canonicalizationpreserves_non_required_array_order- non-required arrays keep semantic orderhandles_empty_schema- empty object edge casehandles_deeply_nested- deeply nested schema recursionkey_order_is_alphabetical_after_canonicalize- object key orderingAll existing tests continue to pass. No new dependencies added (uses existing
serde_json).Risk Assessment
Extremely low risk:
OnceLock)preserve_orderschema_sanitizealready runs first