Skip to content

refs(#2264): harden PrefixFingerprint with full tool JSON hash#2477

Closed
encyc wants to merge 2 commits into
Hmbown:mainfrom
encyc:pref/prefix-full-tool-hash
Closed

refs(#2264): harden PrefixFingerprint with full tool JSON hash#2477
encyc wants to merge 2 commits into
Hmbown:mainfrom
encyc:pref/prefix-full-tool-hash

Conversation

@encyc

@encyc encyc commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Refs #2264 (Phase 1.5 — conservative path).

Summary

Upgrades PrefixFingerprint::compute() to hash the full tool JSON
serialization (name + description + input_schema) instead of just
tool names. Follows the same tool_catalog_digest approach from
the Phase 1 prompt_zones module.

Before: tool hash = SHA-256("read_file,write_file")
After: tool hash = SHA-256(joined sorted JSON blobs)

What changed

  • PrefixFingerprint::compute(): serialize each tool via serde_json::to_string,
    sort by JSON text for deterministic ordering, join with \n
  • Doc comment updated on tools_sha256 field
  • New test: fingerprint_detects_schema_change_not_just_name_change
    — verifies that changing a tool's description produces a different hash

Files

File Change
crates/tui/src/prefix_cache.rs +24 / −6

Testing

  • 21 prefix_cache tests pass (21/21)
  • New test confirms schema change detection

Greptile Summary

This PR hardens PrefixFingerprint::compute() by replacing the tool-name-only hash with a hash over the full API-shaped tool JSON (name + description + input_schema + strict), catching schema and description drift that the old approach missed. A new private helper tool_to_api_json mirrors the structure of tool_to_chat in the client layer, and a new test confirms that changing a tool's description produces a different tools_sha256.

  • PrefixFingerprint::compute() now sorts and hashes full serialized tool JSON blobs instead of just joined tool names, consistent with the tool_catalog_digest pattern in prompt_zones.
  • tool_to_api_json intentionally excludes internal-only fields (allowed_callers, defer_loading, input_examples, cache_control) that are stripped before the API call.
  • The helper does not apply the to_api_tool_name() encoding used by tool_to_chat; for tools whose names contain special characters the fingerprint JSON and wire JSON differ, though drift detection remains correct because the encoding is injective.

Confidence Score: 5/5

Safe to merge; the change is additive and all 21 existing prefix_cache tests continue to pass alongside the new schema-change detection test.

The core drift-detection logic is correct: because to_api_tool_name is an injective transformation, hashing raw tool names still detects every change that would affect the wire bytes. The only finding is a doc comment that overstates the fidelity of the serialisation; it does not affect runtime behaviour.

No files require special attention beyond the documentation note on tool_to_api_json.

Important Files Changed

Filename Overview
crates/tui/src/prefix_cache.rs Upgrades PrefixFingerprint::compute() to hash full tool JSON (name + description + input_schema) via a new tool_to_api_json helper, adds a test verifying schema-change detection; tool name is serialised raw rather than through to_api_tool_name(), so the fingerprint bytes diverge from the actual wire bytes for tools with special characters in their names.

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (2): Last reviewed commit: "fix: hash only API-visible tool fields, ..." | Re-trigger Greptile

Phase 1.5 — upgrade PrefixFingerprint::compute() to hash the full tool
JSON serialization (name + description + schema) instead of just tool
names. This catches schema/description drift in addition to name changes.

- Serialize each tool via serde_json::to_string, sort by name, join
- New test: fingerprint_detects_schema_change_not_just_name_change
- All 21 prefix_cache tests pass
- Aligned with prompt_zones.rs tool_catalog_digest approach
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be5dc46fb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/tui/src/prefix_cache.rs Outdated
let joined = tool_names.join(",");
let mut serialized: Vec<String> = tools
.iter()
.filter_map(|t| serde_json::to_string(t).ok())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash the serialized API tool payload

When tool metadata changes in fields that are not sent to the chat API, this now reports a tool-prefix drift even though the request prefix is byte-for-byte unchanged. I checked the chat request builders: they serialize tools through tool_to_chat_for_base_url, which keeps only the function name, description, parameters, and sometimes strict; fields like allowed_callers, defer_loading, input_examples, and cache_control are internal-only. Hashing the whole internal Tool here will make sessions re-pin and show cache invalidation for changes to those fields that cannot affect DeepSeek's prefix cache.

Useful? React with 👍 / 👎.

Comment thread crates/tui/src/prefix_cache.rs Outdated
Addresses chatgpt-codex review: the previous full serde_json::to_string
included internal-only fields (allowed_callers, defer_loading,
input_examples, cache_control) that are never sent to the chat API.
This caused spurious drift detection when those fields changed.

- New tool_to_api_json() helper mirrors tool_to_chat() serialization:
  only type, name, description, parameters, strict
- Doc comment fixed: 'sorted by name' → 'sorted lexicographically
  by JSON text' (greptile review)
@Hmbown

Hmbown commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Thanks @encyc — harvested the PrefixFingerprint hardening into main for v0.8.49 in 37cfd97 and c22b60c, preserving the #2264 follow-up trail. Closing this PR as integrated and credited in the changelog.

@Hmbown Hmbown closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants