Skip to content

[SPEC] Annotations for MCP Requests and Responses (security/privacy) #711

@SamMorrowDrums

Description

@SamMorrowDrums

There have been lots of concerns about Indirect Prompt Injection attacks and data exfiltration. This proposal attempts to address some parts of it that MCP itself can improve.

Motivation and Context

As MCP adoption grows, the need for robust, composable trust and sensitivity controls becomes critical. Today, MCP clients and servers have limited means to track, propagate, and enforce trust boundaries on data as it flows through tool invocations, especially in multi-organization and open-world scenarios. This is a preliminary proposal, and a full RFC will follow if the community like the direction. This RFC proposes a standard for trust annotation metadata, enabling both clients and servers to:

  • Mark data as sensitive, dangerous, or originating from untrusted sources
  • Track attribution and provenance of all context shared in a session
  • Enforce policies (e.g., block, escalate, or require confirmation) based on trust annotations

Proposal

Optional Trust Annotation Metadata

MCP tools (servers) MAY emit the following annotations in responses, and MUST respect them in requests:

  • privateHint: Indicates the data is internal or private to an organization, but not necessarily highly sensitive (e.g., internal documentation, non-public but low-risk data).
  • sensitiveHint: Indicates the presence of sensitive data, with more granular levels (e.g., sensitiveHint: low|medium|high). For example:
    • low: Internal-only, not for public release but not highly confidential
    • medium: Confidential, may include customer data or intellectual property
    • high: Highly sensitive, regulated, or secret even within a context (e.g., credentials, secrets, regulated PII)
  • openWorldHint: Indicates data more likely to be subject to prompt injection or originating from a public/untrusted source. The key distinction between this and the Tool schema hint is this about where the data may originate from, rather than a tool having unbounded access to data.
  • maliciousActivityHint: Indicates detected or suspected malicious activity (e.g., secrets, prompt injection, or other attacks)
  • attribution: List of source attributions for all data included in the response (e.g., resource URIs, local files, etc.)

Annotation Responsibility

MCP servers are primarily responsible for emitting trust and sensitivity annotations because they have the most accurate context about their own data, resources, and operations. Servers MAY know whether a resource is internal, private, or sensitive, and can best determine the appropriate annotation for each response or result. This aligns with the MCP architecture, where servers are isolated and only receive the context necessary for their operation (see Architecture).

However, clients can and should also set or propagate annotations based on local knowledge or user actions outside of MCP. For example, a host application like VSCode may know the user is working in a private repository, and if the LLM context now includes file content from that repo, the client should set privateHint or an appropriate sensitiveHint level on all subsequent requests—even if the server does not explicitly mark the data as such. This ensures that trust boundaries are respected even when context is aggregated from multiple sources or user actions.

This dual responsibility enables both servers and clients to contribute to a more accurate and secure trust model, leveraging their unique perspectives and local knowledge.

Propagation

  • If any annotation (e.g., sensitiveHint) is ever true in a session, it MUST be included in all subsequent requests in that session.
  • For list/search results, annotations MAY be specified per result.
  • annotations MAY be collected by clients to present to a user as part of a manual confirmation intervention.

Enforcement

  • MCP clients MAY enforce basic rules (e.g., block, escalate, or require user confirmation for dangerous actions).
  • MCP servers MAY enforce nuanced policies (e.g., refuse to write to an open world resource if sensitiveHint is true).

Critical Recommendation: Malicious Activity Detection

If no other part of this RFC is adopted into the MCP specification, the inclusion of maliciousActivityHint is essential. As MCP tools and servers begin to implement detection for malicious activity (such as prompt injection, secret leakage, or other attacks), there must be a standardized way for servers to communicate these findings to clients and end users. This enables:

  • User and admin alerting for compliance and security scenarios
  • Human-in-the-loop review and escalation
  • Auditing and reporting of suspicious or non-compliant actions

Without this, critical security signals may be lost between tool boundaries, reducing the effectiveness of any detection or compliance system built on MCP.

Example: Email MCP

  • Internal/external recipients are annotated.
  • Refuse to send private repo content to an external address if openWorldHint is true.
  • Prevent writing to a dangerous (external) address when sensitiveHint is true.

Deterministic Tracking

  • All context and annotations are tracked and used by deterministic systems (client/server), not the model itself.

User Experience

  • Instead of flat-out prevention, dangerous actions MAY trigger user confirmation, even if previously allowed.
  • Trust in all MCP servers used is required for useful enforcement, malicious servers are not in scope for this RFC.

Sequence Diagram

sequenceDiagram
    participant User
    participant MCP Client
    participant MCP Server (A)
    participant MCP Server (B)

    User->>MCP Client: Initiate tool call (may include context)
    MCP Client->>MCP Server (A): Request (with context, trust annotations)
    MCP Server (A)-->>MCP Client: Response (with trust annotations)
    MCP Client->>MCP Server (B): Request (with aggregated trust annotations)
    MCP Server (B)-->>MCP Client: Response (may refuse, escalate, or require confirmation)
    MCP Client->>User: Escalate or require confirmation if policy triggers
Loading

Problems and Limitations

While this RFC provides a foundation for trust and sensitivity annotations, there are important limitations and open problems:

  • Divergent Definitions: Different MCP servers may have different definitions of "internal", "external", "private", and "sensitive". For example, what is considered internal in one organization may be external in another, and sensitivity levels may not be consistent across servers.
  • Session Risk Accumulation: As a session accumulates more context from various sources, the risk profile may increase. This RFC enables more human-in-the-loop notifications as risk grows, and encourages users to start a new session to reduce scope when possible.
  • Contextual Appropriateness: Some actions may be appropriate in one context but not another (e.g., emailing the CEO about layoffs vs. sending that information to a direct report). The RFC does not attempt to encode all such policies, but provides the primitives for host applications and registries to implement them.
  • Deliberate Incompleteness: This RFC intentionally avoids specifying a single, universal policy. Instead, it provides the bones for limited registries, host applications, and situation-specific tools to define and react to annotations as appropriate for their environment.
  • Annotation Sharing: In some cases, it may not be desirable to share annotations across MCP servers. Implementers should consider privacy and security implications when propagating annotation metadata.

This RFC is a toolkit, not a complete solution. It is designed to enable, not constrain, the development of robust trust and compliance systems in MCP.

Implementation Guidance

This RFC is intentionally minimal and flexible. It is designed to enable host applications, registries, and other tools to define their own policies and enforcement strategies, rather than prescribing a one-size-fits-all solution. The goal is to provide the necessary primitives for trust and sensitivity awareness, while allowing for situation-specific adaptation and human oversight.

Implementation Considerations

  • Annotations should be optional and settable by tool authors (MCP server developers)
  • Clients MUST propagate and enforce annotations
  • UI/UX should reflect annotation state (e.g., warnings, confirmation dialogs)
  • Policy enforcement should be configurable

Open questions:

  • Can an MCP reliably discover the sensitivity of an item?
  • Can an MCP reliably discover the security posture that applies to the audience?

References

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions