Data Scrubbing
Data handling is the standardized context in how we want SDKs help users filter data.
SDKs should not include PII or other sensitive data in the payload by default. When building an SDK we can come across some API that can give useful information to debug a problem. In the event that API returns data considered PII, we guard that behind a flag called Send Default PII. This is an option in the SDK called send-default-pii and is disabled by default. That means that data that is naturally sensitive is not sent by default.
When a user manually sets the data on the scope (user, contexts, tags, data, request, response, etc.), this data should not be gated by the Send Default PII flag and should always be attached to all outgoing telemetry. This also applies to the data that the user manually sets on a span, log, metric and other types of telemetry (directly or, for example, via BeforeSend).
Certain sensitive data must never be sent through SDK instrumentation, regardless of any configuration:
- HTTP Headers: The keys of known sensitive headers are added, while their values must be replaced with
"[Filtered]".- The SDK performs a partial, case-insensitive match against the following headers to determine if they are sensitive:
["auth", "token", "secret", "password", "passwd", "pwd", "key", "jwt", "bearer", "sso", "saml", "csrf", "xsrf", "credentials"]
- The SDK performs a partial, case-insensitive match against the following headers to determine if they are sensitive:
SDKs should only replace sensitive data with "[Filtered]" when the data is gathered automatically through instrumentation. If a user explicitly provides data (for example, by setting a request object on the scope), the SDK must not modify it.
Some examples of data guarded by send_default_pii: false:
- When attaching data of HTTP requests and/or responses to events
- Request Body: "raw" HTTP bodies (bodies which cannot be parsed as JSON or FormData) are removed
- HTTP Headers: header values, containing information about the user are replaced with
"[Filtered]"
- User-specific information (e.g. the current user ID according to the used web-framework) is not collected and therefore not sent at all.
- On desktop applications
- The username logged in the device is not included. This is often a person's name.
- The machine name is not included, for example
Bruno's laptop
- SDKs don't set
{{auto}}asuser.ip_address. This instructs the server to keep the connection's IP address. - Server SDKs remove the IP address of incoming HTTP requests.
Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS. All other platforms require the event to include user.ip_address={{auto}} which happens if sendDefaultPii is set to true.
Before sending events to Sentry, the SDKs should invokes callbacks. That allows users to remove any sensitive data client-side.
before-sendandevent-processorscan be used to register a callback with custom logic to remove sensitive data.
Since Cookie and Set-Cookie headers can contain a mix of sensitive and non-sensitive data, SDKs should parse the cookie header and filter values on a per-key basis, depending on the SDK setting and the sensitivity of the cookie value. In case, the SDK cannot parse each cookie key-value pair, the entire cookie header must be replaced with "[Filtered]". An unfiltered, raw cookie header value must never be sent.
This selective filtering prevents capturing sensitive data while retaining harmless contextual information for debugging. For example, a sensitive session cookie's value is replaced with "[Filtered]", but a non-sensitive cookie for the theme preference can be sent as-is.
When attached as span attributes, the results should be as follows:
http.request.header.cookie.user_session: "[Filtered]"http.request.header.cookie.theme: "dark-mode"http.request.header.set_cookie.theme: "light-mode"http.request.header.cookie: "[Filtered]"(Used as a fallback if the cookie header cannot be parsed)
App state can be critical to help developers reproduce bugs. For that reason, SDKs often collect app state and append to events through auto instrumentation.
When attaching data that could potentially include sensitive data or PII, it's important to:
- Add a note on the docs to notify developers.
- Mark that part of the protocol on Relay as such. This allows data scrubbing to run on those fields.
Some examples of auto instrumentation that could attach sensitive data:
- A SQL integration that includes the query. If a user doesn't use parameterized queries, and appends sensitive data to it, the SDK could include that in the event payload.
- Desktop apps including window title.
- A Web framework routing instrumentation attaching route
toandfrom.
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").