-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
| SEP Number | #1561 |
|---|---|
| Title | Addition of unsafeOutputHint Tool Annotation |
| Author | Xiang Yan |
| Status | Proposal |
| Created | 2025-09-28 |
| Specification | MCP 2025-06-18 |
Abstract
This SEP proposes the addition of a new unsafeOutputHint annotation to the Model Context Protocol (MCP) specification. This hint indicates whether the output of a tool is considered safe for direct injection into a model’s prompt. To preserve backward compatibility with existing tools and clients, the default value of this hint will be false.
Summary
The unsafeOutputHint allows tool authors to explicitly communicate whether their tool’s return values can be safely passed into a model context without risk of including untrusted or uncontrolled data.
This differs from:
trustedHint (which marks the tool itself as safe to call) (SEP #1487), and
secretHint (which marks tool output as potentially sensitive) (SEP #1560).
Instead, unsafeOutputHint answers: “Can this output be safely injected into a model’s prompt?”
Motivation
Currently, MCP has no way to signal whether tool output is safe for direct use as model input. Without this annotation:
Clients may assume all outputs are safe to inject, even when they include uncontrolled or user-supplied content (e.g., database query results, scraped web data).
Tools that produce deterministic or internally generated outputs cannot clearly indicate that their data is inherently safe.
Developers and users lack a standard, protocol-level signal to guide safe prompt construction.
By introducing unsafeOutputHint, MCP enables explicit differentiation between clean, controlled outputs and potentially unsafe ones.
Proposal
Extend the ToolAnnotations interface with a new optional field:
interface ToolAnnotations {
destructiveHint?: boolean
idempotentHint?: boolean
openWorldHint?: boolean
readOnlyHint?: boolean
title?: string
trustedHint?: boolean //SEP: 1487
secretHint?: boolean //SEP: 1560
unsafeOutputHint?: boolean // NEW: Indicates whether tool output can be safely injected into model context
}Default: unsafeOutputHint = false.
When explicitly set to false, clients should treat the tool’s output as potentially unsafe and apply protective measures (e.g., sanitization, filtering, user confirmation, or exclusion from prompt injection).
Rationale
Choosing false as the default preserves backward compatibility:
Existing tools implicitly behave as if their outputs were safe for context injection.
Changing the default to false would cause existing clients to suddenly treat all tool outputs as unsafe, potentially breaking workflows.
At the same time, tool developers who know their outputs may contain uncontrolled or user-supplied data can explicitly set unsafeOutputHint = true.
Backwards Compatibility
Tools not declaring this hint will default to true, maintaining compatibility with current assumptions.
Clients unaware of this hint will ignore it, as with other annotations.
Tools that need to mark their outputs as unsafe can explicitly override the default.
Implementation
The MCP specification should be updated to:
Add unsafeOutputHint to the ToolAnnotations interface.
Define its semantics:
Default = false (for backward compatibility).
Explicit true = output may be unsafe for direct model injection.
Provide examples:
false: math solvers, date calculators, deterministic formatters.
true: database queries, search tools, user-input collectors.
Discussion
This SEP introduces a safety annotation for prompt injection concerns, filling a gap between trustedHint (tool safety) and secretHint (data sensitivity). While hints remain advisory rather than guarantees, unsafeOutputHint provides a protocol-level mechanism to reduce risks from uncontrolled outputs.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status