refs(#2264): harden PrefixFingerprint with full tool JSON hash#2477
refs(#2264): harden PrefixFingerprint with full tool JSON hash#2477encyc wants to merge 2 commits into
Conversation
Phase 1.5 — upgrade PrefixFingerprint::compute() to hash the full tool JSON serialization (name + description + schema) instead of just tool names. This catches schema/description drift in addition to name changes. - Serialize each tool via serde_json::to_string, sort by name, join - New test: fingerprint_detects_schema_change_not_just_name_change - All 21 prefix_cache tests pass - Aligned with prompt_zones.rs tool_catalog_digest approach
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be5dc46fb7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let joined = tool_names.join(","); | ||
| let mut serialized: Vec<String> = tools | ||
| .iter() | ||
| .filter_map(|t| serde_json::to_string(t).ok()) |
There was a problem hiding this comment.
Hash the serialized API tool payload
When tool metadata changes in fields that are not sent to the chat API, this now reports a tool-prefix drift even though the request prefix is byte-for-byte unchanged. I checked the chat request builders: they serialize tools through tool_to_chat_for_base_url, which keeps only the function name, description, parameters, and sometimes strict; fields like allowed_callers, defer_loading, input_examples, and cache_control are internal-only. Hashing the whole internal Tool here will make sessions re-pin and show cache invalidation for changes to those fields that cannot affect DeepSeek's prefix cache.
Useful? React with 👍 / 👎.
Addresses chatgpt-codex review: the previous full serde_json::to_string included internal-only fields (allowed_callers, defer_loading, input_examples, cache_control) that are never sent to the chat API. This caused spurious drift detection when those fields changed. - New tool_to_api_json() helper mirrors tool_to_chat() serialization: only type, name, description, parameters, strict - Doc comment fixed: 'sorted by name' → 'sorted lexicographically by JSON text' (greptile review)
Refs #2264 (Phase 1.5 — conservative path).
Summary
Upgrades
PrefixFingerprint::compute()to hash the full tool JSONserialization (name + description + input_schema) instead of just
tool names. Follows the same
tool_catalog_digestapproach fromthe Phase 1
prompt_zonesmodule.Before: tool hash =
SHA-256("read_file,write_file")After: tool hash =
SHA-256(joined sorted JSON blobs)What changed
PrefixFingerprint::compute(): serialize each tool viaserde_json::to_string,sort by JSON text for deterministic ordering, join with
\ntools_sha256fieldfingerprint_detects_schema_change_not_just_name_change— verifies that changing a tool's description produces a different hash
Files
crates/tui/src/prefix_cache.rsTesting
Greptile Summary
This PR hardens
PrefixFingerprint::compute()by replacing the tool-name-only hash with a hash over the full API-shaped tool JSON (name + description + input_schema + strict), catching schema and description drift that the old approach missed. A new private helpertool_to_api_jsonmirrors the structure oftool_to_chatin the client layer, and a new test confirms that changing a tool's description produces a differenttools_sha256.PrefixFingerprint::compute()now sorts and hashes full serialized tool JSON blobs instead of just joined tool names, consistent with thetool_catalog_digestpattern inprompt_zones.tool_to_api_jsonintentionally excludes internal-only fields (allowed_callers,defer_loading,input_examples,cache_control) that are stripped before the API call.to_api_tool_name()encoding used bytool_to_chat; for tools whose names contain special characters the fingerprint JSON and wire JSON differ, though drift detection remains correct because the encoding is injective.Confidence Score: 5/5
Safe to merge; the change is additive and all 21 existing prefix_cache tests continue to pass alongside the new schema-change detection test.
The core drift-detection logic is correct: because to_api_tool_name is an injective transformation, hashing raw tool names still detects every change that would affect the wire bytes. The only finding is a doc comment that overstates the fidelity of the serialisation; it does not affect runtime behaviour.
No files require special attention beyond the documentation note on tool_to_api_json.
Important Files Changed
Reviews (2): Last reviewed commit: "fix: hash only API-visible tool fields, ..." | Re-trigger Greptile