Sentry Blog

Works on my machine: how we use AI to reproduce reported bugs

Mon, 08 Jun 2026 09:00:00 GMT

Sentry's SDK teams maintain and support SDKs for a vast ecosystem of languages and frameworks. See our release registry for a source of truth. We're currently at 159 published packages across the entire ecosystem. If you use it, we probably support it.

All of these SDKs are open source and have their own GitHub repositories that we maintain on a daily basis. And like any other open source project, we get tons of bug reports and issues on these.

In this post, I'll talk about a Claude skill we've been leveraging to help make our reproduction flow smoother and reduce triage time and fatigue.

Bug triage flow

Sometimes bugs are easy to fix - could have been a missing null check, a missing conditional branch or some other small oversight.

Other times, they aren't so easy for a plethora of reasons:

Tedious setup, or "boilerplate", just to get the environment ready
Esoteric code paths
Legacy versions
Edge case interactions no one thought of
Data races and other concurrency problems
Forked libraries with different contracts

Boilerplate

Particularly for our SDK bugs, the boilerplate factor is quite annoying. Let's take a recent example. To reproduce this, we would need to setup the following:

A Python venv with the correct version
A new Django boilerplate app with the correct version
A Sentry SDK with the correct version
Create a Django View that reproduces and showcases the exact problem which is applicable only to HTTPS proxies
Run everything, trigger the view and hope that it shows the problem in question

All of this is necessary just to acknowledge that the problem the original user reported is real and replicable. Once reproduced, it's typically much easier to roll out the actual fix.

Reproduction papertrail

Another recurring discussion within the teams was how to keep track of all these one-off boilerplate apps that we used to test SDK logic, and reproduce/fix problems.

Ideally we would have a shared repository of these apps with backlinks to the issues, but no one wanted the burden of maintaining yet another collection of apps on top of everything else we already do. Several SDK engineers had their own ad-hoc collection of apps they used for their day-to-day SDK development.

`repro` skill + repository

Enter LLMs. Turns out LLMs are pretty good at doing some of the tedious stuff mentioned above.

Even if they cannot get to the root of a hairy problem, they at least set up the boilerplate and give me a playground with all the correct parameters which I can move forward with, massively reducing tedium.

So I wrote up and iterated on a Claude skill that:

Takes a GitHub issue URL as input
Parses the SDK language, issue number
Gathers metadata on language version, framework version, SDK version
Makes a new directory and branch from the language/issue-number
Attempts to create a minimal reproduction using standard tooling for the language (uv, npm, bundle, etc.)
Tries to run the reproduction, bails out if it's too complicated
Writes up clear instructions for running the reproduction
Makes a PR
Optionally adds a backlink to the PR to the original user issue (using Claude's AskUserQuestion tool)

Note that we only ask the LLM to attempt a reproduction and stop if too complicated. This sort of logic is very effective when working with agents since if we ask too much of them, they will often stumble. If we give them an out, they're more likely to explain the challenge than just stumble through it.

Example run on the Python issue

Continuing with the above Python example, the skill created this reproduction. We can see that it created a minimal Django app and gave very clear instructions to run the reproduction. Using this basic setup, I was able to roll out the subsequent fix very rapidly. I probably saved a few hours of figuring out how to setup Django with an HTTPS proxy correctly and then examining how that interacts with our SDK logic.

Lessons on writing skills

Skills are very generic Markdown files so it's a bit opaque how to make them reliable and avoid having them go off the rails.

Some insights I have from writing this one:

Use CLIs to interact with other systems; here we're using the gh CLI to perform GitHub operations
Split out the work to be done into clear steps
Add an Error Handling section explaining what's not allowed and what to do with bad inputs
Use other in-built tools such as AskUserQuestion for user input or validation

Full automation?

We will play around with fully automating this flow on GitHub issues in the future. A major concern voiced by several engineers here is increased bot noise. We're already drowning in bot communication on several fronts so we want to be careful how many of these we enable automatically. The right amount of automation in any given problem space is not always full automation and a pair of human eyes in the right places are absolutely necessary.

Errors, traces, logs, metrics: when to reach for what

Fri, 05 Jun 2026 09:00:00 GMT

When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It sounds like it should be obvious. Errors, traces, logs, and metrics are the four kinds of telemetry most apps run on, four tools in one box, and they overlap enough that the honest answer is every developer's favourite: it depends. You can stuff context into span attributes instead of logging it. You can count log events instead of emitting a metric. You can add a duration to a log and call it a span.

[I had a spiderman meme here but legal told me it would be infringing so I removed it]

But the fact that you can doesn't mean you should. Each signal exists because it answers a different question, and feeds a different workflow once it lands. Left without solid guidelines, the default is to reach for whatever's most familiar or already there, and miss what the other kinds are for.

This post is the guidance I wanted to have, for myself and my robots. Want just the skill? Skip to the end.

In Sentry, errors, traces, logs, and metrics all come from one SDK, included on every plan. Errors and tracing have been around for years (2012 and 2020), structured logs landed last year, and Application Metrics completed the set back in May of this year. If you've had your application instrumented with Sentry for a while, errors and traces are probably already flowing, with logs and metrics left as tools for you to complete your telemetry story.

Errors, traces, logs, metrics: one question each

Errors: "What just broke?"

A stack trace and an exception type, grouped into an Issue that gets deduplicated, assigned, and tracked until it's resolved. If your code threw an exception, it's an error.

Traces: "Did the request flow the way it was supposed to?"

A trace is a waterfall of timed spans. It's how you follow a request across your services and see where the time went: the DB query that dragged, the API call that timed out, the LLM tool call that took 8 seconds instead of 200ms.

Metrics: "How's this trending over time?"

Counters, gauges, and distributions, each kept as an individual measurement you can slice by any attribute and drill from an aggregate back into the samples (and the trace) behind it. Not just "12,000 checkouts this week," but 8,400 from the US, 2,600 from the EU, and 1,000 from everywhere else, and how that line moved across the last deploy. Metrics are a historical signal as much as a right-now one, which makes them an easy candidate for dashboards and alerts (but you can still set up alerts on pretty much all signals from Sentry).

Logs: "What was happening at this point in the code?"

The state of the system at one specific moment, captured as a structured event: config values, feature flags, the inputs and outputs of a function, the user ID. Logs are the trail through a function's decision tree: the markers you drop at the points where the code makes a choice, so that later, a human or an agent can follow the reasoning. They fill in the why once errors and traces have told you what broke and where the time went.

A real(ish) world example

Let's say you run a storefront with a React frontend and a Python API. Support starts forwarding tickets: the product recommendations on the account page look generic for a chunk of logged-in customers: bestsellers, not the personalized picks they're used to. The vibes are off.

Did anything crash?

First place I'd look is Issues. No exception in the React app, no failed request, every call to /recommendations/{user_id} came back 200. As far as error tracking is concerned, the app is perfectly healthy.

Was anything slow, or did the request go off-path?

Pull a trace for one of the affected requests. The route and the database queries are auto-instrumented; I added a few named spans for the recommendation steps:

The request loaded the user, evaluated the ranking_v2 flag, queried recommendations_v2, fell back to popular items, and ranked them. The path is right and the timing's fine. That recommendations_v2 query succeeded (returning zero rows is a perfectly successful query), so the code did what it was built to do and fell back. The trace tells me the request flowed as designed. It can't tell me the design just quietly failed this user. On the surface, everything is fine.

Can we dig a little deeper?

Search the logs for the user from the ticket, and the structured log from inside the handler will give you the state at the moment it decided to fall back.

This user got bucketed into the ranking_v2 feature flag, which reads personalized picks from a new recommendations_v2 table. The table shipped, but the rows were never backfilled, so the lookup came back empty. To the code, an empty result is a perfectly valid "no personalized recs for this user," the same thing a brand-new user with no history would get. So it falls back to bestsellers and returns 200.

Why not just attach this data on the span? You could set outcome and candidate_count as span attributes. But traces might be sampled, and the one request a customer is complaining about usually ends up being the one that's sampled out (at least with my luck). A span attribute is great for reading a trace you've found; it can't help you find one. Logs aren't sampled.

How many people hit it?

One affected customer is a support ticket. Knowing whether it's a small subset of users or a significant chunk is the difference between fixing it Monday and paging someone tonight. A recommendations.served counter, tagged with ranking_version and outcome, draws the line:

The v2 path is serving almost nothing but fallbacks, v1 is normal, and the drop lines up with the flag rollout. Scope and trigger, without opening a single trace.

No one signal cracked it; each ruled something out. No Issues in the feed meant it wasn't a crash. The metric said it wasn't a one-off: the whole v2 cohort was falling back. The trace, where one was sampled, showed the path running exactly as designed, which is why it slipped through. The log, pulled up by the user_id from the ticket, said why, and I never needed the trace to get to it.

When to reach for what

I use this as a gut check:

What you want to know	Reach for
Something crashed, show the stack trace	Errors
How long did this take? Which step is slow?	Traces
Did the request flow through the steps I expected?	Traces
What was the state when the code made this decision?	Logs
What did this function receive and return?	Logs
How often does X happen? Is the rate normal?	Metrics
Did something change after the deploy?	Metrics

The tricky cases are the overlaps, and of course there is nuance to all of this because the same value can show up in more than one signal.

Span attribute or metric?

If it's context about one request's flow through the system and you want it while reading that trace, it's a span attribute. It rides on the span in the waterfall. If it's a standalone value you want to chart, alert on, or slice over time across all requests, it's a metric. The same number can warrant both: candidate_count as a span attribute lets me read one request; recommendations.served as a metric lets me watch the rate. One is for inspecting a single flow, the other for watching the aggregate.

Log or span?

The span is the timed node in the flow, and most of them are auto-instrumented, so you rarely write them. The log is the decision-point state inside that node, and you always write it on purpose. Span answers where and how long; log answers what was true and why.

Log or metric?

A log is one request's story, the needle. A metric is the aggregate, the question of whether the haystack is normal. When you want to find the specific request that went wrong, that's a log. When you want to know how many requests went wrong, that's a metric.

Error or log?

If it needs a stack trace and should be tracked as an Issue, it's an error. If it's an unexpected-but-handled condition worth recording, it's a log. If it's truly non-critical, logger.warning(exc_info=True) captures the traceback in logs without creating noise in your error feed.

What the instrumentation looks like

Everything above came out of one endpoint: the GET /recommendations/{user_id} route from the walkthrough, the function that loads the user, checks the ranking_v2 flag, queries recommendations_v2, and falls back to popular items when it comes back empty. Here's that same handler with the instrumentation in place.

Most of it you don't write. The FastAPI integration traces the request, the database integration traces every query, so you get the path and the timing without a single hand-written span.

What you do place by hand are the deliberate signals: a span attribute or two to enrich the flow, the decision-point log, and the metric.


from sentry_sdk import logger

# The route is auto-instrumented. FastAPI gives you the request span;
# the DB integration gives you a span for every query below. You write none of it.
@app.get("/recommendations/{user_id}")
def get_recommendations(user_id: int):
    user = db.get_user(user_id)                          # auto-instrumented db span
    use_v2 = flag_enabled("ranking_v2", user)
    ranking_version = "v2" if use_v2 else "v1"

    candidates = db.personalized_recs(user_id, version=ranking_version)  # auto db span
    outcome = "personalized" if candidates else "fallback"
    items = candidates or db.popular_items()             # auto db span on the fallback

    # SPAN ATTRIBUTE: context about THIS request's flow, read inside the trace.
    # It rides on the auto-instrumented request span; no new span needed.
    span = sentry_sdk.get_current_span()
    span.set_data("ranking_version", ranking_version)
    span.set_data("recommendation.outcome", outcome)

    # LOG: the trail through the decision tree, the state at the moment the
    # code chose personalized vs. fallback. The only signal that records *why*.
    logger.info(
        "recommendations lookup",
        attributes={
            "user_id": user_id,
            "ranking_version": ranking_version,
            "flag.ranking_v2": use_v2,
            "source_table": f"recommendations_{ranking_version}",
            "candidate_count": len(candidates),
            "outcome": outcome,
        },
    )

    # METRIC: the rate across all requests, sliceable by version and outcome.
    sentry_sdk.metrics.count(
        "recommendations.served",
        1,
        attributes={"ranking_version": ranking_version, "outcome": outcome},
    )

    return items

Three deliberate touches, each carrying a piece the others can't. The span attribute tags the request's flow with the ranking path so it's right there when I open the trace. The log records what the function decided and why, at the instant it decided. The metric counts the outcome with enough dimension to slice it later.

If you do want a sub-operation timed in the waterfall (say the ranking step, or a call to an external recommender), you can wrap it in a custom span with sentry_sdk.start_span.

Beyond what you write, the SDK fills in even more on its own. Frontend SDKs tag everything with the browser, OS, and release. Call sentry_sdk.set_user() once and that user follows the errors, spans, logs, and metrics for the request. And because all four come from the same SDK, they share a trace_id and correlate on their own: every log carries the trace it belongs to, and you can jump from a metric spike straight into the traces behind it, without gluing four vendors together to get there.

All of this is ready for you to use and included in every plan. The deliberate signals (the span attributes, the decision-point logs, the metrics) are the ones you place yourself, and they only help if you do it ahead of time, at the spots where your code makes a decision worth questioning later.

Right tool for the job

The split above isn't just conceptual. It's baked into the APIs, and each one is tuned for its job. The Metrics API is built for emitting counts and measures you'll aggregate. The span API is built for measuring durations and the shape of a request. The log API integrates with your favourite structured logging library, so the lines you already write become queryable events. Reaching for the API that matches the workflow usually means reaching for the one that matches the kind of value you have: a count, a duration, or a moment.

Sampling falls out of the same logic. Traces are best as a sampled representation of your traffic: you don't need every request to understand where time goes, so a percentage is plenty (and cheaper). Logs are the opposite: you keep all of them, because the entire point is to find the one rare request that went sideways, and you can't find what you sampled away. Metrics aren't sampled either; like logs, you filter them with before_send_metric. Match the retention to the question: a representative sample for "where does time go," every single event for "what happened to this request."

You're not the only one debugging your codebase anymore

Cody from Modem instrumented his AI agent to find out where it was spending time. He worked with Codex to wrap the async work and the logical chunks (everything that runs before the call to the model, say) in spans. Cache hits and time-to-first-token became metrics he could watch over time. Values that only meant something next to a specific operation stayed as span attributes, and the lightweight "this happened here" markers became logs. The span-attribute-versus-metric call wasn't always obvious to him; his rule was that if a value only made sense in the context of a span, it lived on the span.

With the tracing in place, he pointed Codex at the Sentry data through the MCP server, feeding it real runs from his Playwright tests in development, and gave it one goal: optimize the code path. The agent read the spans, found work that could run in parallel, and rewrote the code to stop awaiting results until they were actually needed.

It could do that because a trace is a structured dependency tree with timing on every node, a format an agent can reason about directly. Hand it the same information as a stream of log lines and it would have to reconstruct the call graph from timestamps and string matching first.

But what about wide events?

There's a popular argument that the four signals are overkill: emit one rich, wide event per request and derive the rest later. It's half right.

Emit wide, absolutely. The best version of any signal is a structured event packed with context (the flag that was on, the user, the inputs and the outputs), not a bare number or a one-line string.

But the shape you emit is the shape you get to work with. One fat event in a columnar store charts fine after the fact, but it can't group itself into a deduplicated Issue, render itself as a waterfall, or fire a real-time alert on a threshold you haven't defined yet. Those are workflows, and each needs its data in a particular shape.

So emit wide, into the signal whose workflow you actually need. That's why the handler emits both a metric and a log: same decision, same trace, two shapes, because watching a rate and reconstructing one request are different jobs.

Getting started

Logs and metrics are the two you probably haven't turned on yet — they’re relatively new to Sentry, and people are still just finding them. Both are included on every plan.

You don't have to wire them up by hand. Point your coding agent at Sentry's setup skills for your stack and it installs the SDK, turns on tracing, logs, and metrics, and drops instrumentation at the decision points. Then aim it at your Sentry data through the MCP server and give it something real: your slowest trace, your newest issue.

Prefer to grab just the decision framework? It's a skill of its own:

npx skills add getsentry/sentry-for-ai --skill sentry-instrumentation-guide

The telemetry you emit to debug is the same telemetry it reads to help.

How we cut build times by two-thirds by deleting our CMS

Thu, 28 May 2026 09:00:00 GMT

At Sentry, we're obsessed with things not breaking. It's kind of our whole deal. But for a while, our own marketing site was testing that obsession.

Much of what you see on sentry.io (the marketing site, blog, open source microsite, etc.) were running on a fleet of legacy Gatsby sites powered by a traditional headless CMS. On paper, it worked. In practice, we were juggling a fragile web of plugins, restrictive schemas, and external API dependencies that loved to fail right when we needed to ship.

So, we did what any sane engineering team would do: we ripped it out and replaced it with Astro, Markdown, and AI-driven automation.

The problem: the "headless" headache

Our old stack was starting to feel like a Rube Goldberg machine.

The build bottleneck: Gatsby's consolidated data layer was convenient, but as our content grew, our build times ballooned to ~14 minutes/build. At an average of 95 builds/day, this ended up being around 22 build hours used daily.
The CMS tax: We managed the content of ~2500 pages in a single CMS instance. We have different page schemas and connected component schemas without the ability to have conditional fields, so we ended up buying a conditional fields plugin to avoid hitting the schema limit. This made for an additional annual subscription on top of our monthly subscription, and scalability was still limited.
External fragility: Every build relied on external CMS and marketing automation system APIs via Gatsby plugins. During the last month before we started the rebuild, an issue with the CMS's Gatsby plugin would fail 3-5 times a day (to which there was no resolution, even after submitting a support ticket) and the marketing automation API would also fail multiple times a day due to rate limits (more on that below).

The solution: Astro and the power of "just files"

We migrated the framework to Astro. We chose it because it's built for the modern web — fast by default and incredibly flexible.

Vite-powered speed: Moving to Vite meant our local development and production builds finally felt like they belonged in 2026. We reduced our build times from ~14 minutes to less than 4 minutes, resulting in a savings of ~15.8 build hours daily.
Framework agnostic: Astro lets us use the best tool for the job. If a component works better in React, we use React. If it's a simple static partial, it's just HTML/CSS.
Vercel for the heavy lifting: We offloaded image processing to Vercel, ensuring our assets are optimized without dragging down the build process.

But the biggest shift wasn't the framework — it was how we handled content. We ditched the headless CMS UI for Markdown and Frontmatter.

AI-native content management (without the SaaS bloat)

Instead of paying for an "AI Add-on" from a CMS provider, we built a direct integration with Claude Skills.

Now, when someone needs to update the site, they don't log into a bloated dashboard. They use a skill-driven workflow that:

Guides the user through a process that precisely updates the Markdown files and Frontmatter directly.
Generates a live preview.
Drafts a Pull Request for review.

Why go custom instead of using a CMS-integrated AI?

Zero dependencies: Content lives in the repo. No API outages mean no failed builds.
Unlimited schemas: With Frontmatter, we define the structure. If we need a new field or schema type, we just add it. No subscription tiers, no restrictions.
The "Sentry" way: For a company with a developer-first culture that values the deeply technical, managing content as code feels right. It's version-controlled, peer-reviewed, and lives right next to the components that render it.

The process: how we did it

Moving a site with ~2500 pages between our marketing site and blog is a massive undertaking. We had a team of 2.5 developers and a two-month window to get it done.

Because of the small team size and large site volume, we relied on Claude Code for much of the coding. Our developers spent the bulk of their time on planning, scoping, and developing requirements, then reviewing the code, directing changes, and fine-tuning the output.

Scoping

This was easier for this project for 2 reasons:

We use a monorepo for these websites so the bots had the full context of what was being built and migrated
We did not implement any net-new design

Since we've been working in the existing codebase for over a few years, we had some ideas of where we could easily remove code bloat. We took a deeper look at the pages in certain directories to validate, and based on the content, eliminate bespoke pages with templates. As a result, we consolidated ~200 pages into 3 templates, making the site DRY and significantly easier to maintain.

Building with bots

We started with the data. Since our headless CMS was plugged into our Gatsby site and pre-existing parts of the Astro site, and our headless CMS provided JSON files of each schema available, we provided the planning agent with existing CMS schema and had it duplicated in Frontmatter. Since we were already connected to the CMS's API, we had an agent swarm pull down the data, map it to Frontmatter files, and pull down the images and save them locally as either an asset (for all non-meta image images needing image optimization) or in the public directory for SEO images.

Once the data was in place, we provided plan agents with the location of the existing template files in Gatsby, directions on where to place it in the new framework, what data it was using, and asked the agent to interview us on any missing information. From there, the planning agent would pass along the build tests to general purpose agents for the build, which was passed on to general purpose agents for testing.

After that round, our team would review & fix any regressions, which was followed with AI PR reviews using both Sentry's Seer AI code review tool and Cursor's Bugbot (along with other quality checks built into the repo, including secret scanning and our standard automated tests).

Testing with bots

As part of our development process, we experimented with Claude running visual regression tests with Playwright and a homegrown MCP we built to compare visual elements from our Gatsby site to the new Astro replacement.

The DOM-inspector MCP

Big shoutout to Dylan Coots on our team who built a DOM Inspector MCP that uses Puppeteer (headless Chrome) to connect to a locally running dev server and programmatically inspect, measure, and interact with elements on the page. It was designed to find UI layout issues like spacing shifts, element dimensions, and computed styles that can be passed along to a bot to fix.

Core Architecture

DOMInspector class — the central object that owns the Puppeteer browser/page lifecycle. It has two modes: a fully-owned browser (launched by the class) and a session-managed mode where an external page is passed in via the static fromPage() factory method.
Browser management — launches a headless Chrome instance with memory-constrained flags (--max-old-space-size=256, limited renderer processes) and handles graceful shutdown with a 5-second timeout before force-killing the process by PID if needed.
DOM inspection methods — includes inspectElement() (dimensions + computed CSS), measureDistance() (pixel/rem gap between two elements including which CSS property creates it), measureLayoutShift() (reloads the page and diffs element positions before/after a transition), and debugPage() (scans the DOM for common component patterns when a selector isn't found).
Interaction & navigation — interactWithElement() clicks a selector and measures before/after state; navigateToUrl() navigates to a new URL and clears stale console logs; setViewport() supports named presets (mobile/tablet/desktop/large) or any custom pixel width with auto-calculated height.
Utility methods — screenshot() (full page or scoped to a selector, returned as base64), evaluateJs() (runs arbitrary async JS in the page context), waitForSelector(), getPageContent() (text/HTML/outerHTML with truncation safety), and getConsoleLogs() (buffered, capped at 500 entries).
CLI entrypoint — main() only runs when the file is executed directly (not require()'d), accepts --url and --port flags, and runs a quick inspection of a hardcoded component (.WhoSentYouWrapper) as a smoke test.
extractUrlFromText() — a helper exported alongside the class for parsing localhost URLs out of natural-language strings, suggesting this is meant to be called from an MCP server that receives user text prompts.

What worked for us

The Playwright visual regression tests and the DOM Inspector MCP worked best together. Our visual testing workflow started with Playwright tests for each template to identify visual regressions. The results would be passed to another agent and fixes made. We followed it with the DOM Inspector MCP to fine tune elements that weren't fixed after the Playwright test fixes. We found the DOM Inspector to be more accurate with smaller, element-based inspections. Even then it wasn't 100%, but it did save us time on fixing tedious styling issues.

Updating content (also with bots)

Since updating content hurts in the CMS, we wanted to make it easy to update content without needing deep technical knowledge of Frontmatter or code in general, so we made some Claude skills for it.

Skills for the command line

For non-developers, understanding git operations (or even being in the terminal) can be intimidating. But, we saw the value of using a PR-based workflow for quality and consistency. So, we made some utility skills to help with this:

/new-branch — this would pull down the main branch on origin to prevent any avoidable merge conflicts, add a prefix to the branch name to know it came from a skill, and avoid any cruft from past branch checkouts from being included in the new PR.
/deploy-local-preview — since starting up a local dev server to preview your work takes a few lines in the terminal, we created a skill that does this for users. The skill will navigate to the site selected, spin up a dev server, and deploy a local preview.

Skills to update content

For each of our page types, we built skills that will create a Frontmatter file, ask the user for each field, upload images (with image size checks), call /deploy-local-preview to check the work, and use the Github CLI to create a pull request. This provides guardrails to make sure all the required information is given, reduces navigating a cumbersome CMS UI, prevents massive image files from being used (we added a polite reminder to compress to under 250kb and won't accept the larger file) and keeps page updates strictly to focus on content, not code.

Things to consider

Since we've built out skills with AI to update the content, there are a few things to keep in mind:

Compute is expensive. You don't need Opus to deploy a local preview when Haiku will do the job. We set default models on certain skills to make sure the model is right for the task. We also set the default model in the repo to Opus 4.6 to save on usage.
Use skills to catch large image files and other common things that will slow down performance. We added filesize limits to our page skills to prevent massive images from getting uploaded to our codebase and slowing down the build and site performance.
Protect sensitive parts of the site with Hooks. Since the models have access to the whole codebase and there are sensitive items you don't want changed, don't allow AI to change them. For example, we protected our Content Security Policy with a PreToolUse hook that prevents any changes to our CSP.

Fixing the "rate limit problem"

While we were under the hood, we tackled another recurring nightmare: API rate limits for our forms.

Our forms relied on fetching fields from our marketing automation system during the build. If we hit a rate limit, the build broke. To fix this, we built a service using Vercel Blob. We now fetch and store form fields in a fast, reliable blob store at the start of the build.

This reduced our marketing automation system API calls to nearly zero during the critical build phase, removing yet another point of failure.

The results: reliability as a feature

The shift from a heavy, API-dependent CMS to a lean, file-based Astro site has been a game-changer for our productivity.

Metric	Before (Gatsby + CMS)	After (Astro + Claude)
Average Build Time	14 Minutes	< 4 Minutes
Web Vitals Score	89	97
Broken Staging Builds	Frequent (API/Plugin issues)	95% Reduction
Content Schema Limits	Restricted by Plan	Unlimited
Vibe Check	Frustrating	High-Five Worthy

By moving our content into the codebase and using Claude to bridge the gap for non-technical users, we didn't just speed up our site — we made our entire deployment pipeline more resilient.

Because at the end of the day, the best way to fix a broken build is to remove the things that break it in the first place.

You don’t need to pick one: how Sentry and OpenTelemetry work together

Wed, 27 May 2026 00:00:00 GMT

You already instrumented the backend with OpenTelemetry. Your services emit spans. Your teams know the OTel APIs. Maybe you already run a Collector. So when you start evaluating Sentry, the obvious question is:

Do you need to replace your OpenTelemetry setup with the Sentry SDK?

No.

The practical answer is usually: keep OpenTelemetry where it already works, add the Sentry SDK where it gives you more application context, and send OpenTelemetry Protocol (OTLP) events to Sentry. For a web app, that often means using the Sentry SDK on the frontend for browser tracing, errors, logs, Session Replay, and source maps, while keeping OpenTelemetry on the backend for existing service instrumentation.

One scope note: OTLP can carry traces, logs, and metrics. At this moment, Sentry's OTLP ingest supports logs and traces, not metrics. We're considering adding support for them in the future.

The important part is separating two decisions that often get lumped together:

How traces stay connected across frontend and backend.
How backend OTLP events are exported to Sentry.

Once you separate those, the architecture gets a lot easier to reason about.

Sentry vs OpenTelemetry is the wrong question

The first decision is trace linking. If a user clicks a button in your React app and that click triggers a backend request, the frontend and backend need to agree on the same distributed trace context. In this example, the Sentry frontend SDK sends W3C traceparent headers (configurable through the propagateTraceparent option), and the OpenTelemetry backend continues the trace.

That linking is handled by the frontend SDK configuration:

Sentry.init({
  integrations: [
    Sentry.browserTracingIntegration(),
  ],
  tracesSampleRate: 1.0,
  // ensure traceparent headers get sent
  propagateTraceparent: true,
  tracePropagationTargets: [
    'localhost',
    '127.0.0.1',
    /^http:\/\/localhost:8000\/api\//,
    /^http:\/\/127\.0\.0\.1:8000\/api\//,
    // your backend endpoint here
  ],
})

The second decision is export. After your backend creates telemetry, where do those OTLP events go?

There are two common options:

Send OTLP events directly from the backend to Sentry's OTLP endpoint.
Send OTLP events to an OpenTelemetry Collector, then have the Collector forward them to Sentry's OTLP endpoint.

That trace-continuation step is what lets a Sentry-instrumented browser action become the parent of backend OpenTelemetry work, regardless of which OTLP export option you choose.

If you want the reference docs for these pieces, start with linking Sentry SDKs with OpenTelemetry SDKs, sending OpenTelemetry traces directly to Sentry, sending OpenTelemetry logs directly to Sentry, and forwarding OpenTelemetry data to Sentry.

Direct OTLP vs Collector forwarding

Direct OTLP and Collector forwarding both end at Sentry's OTLP endpoint. The difference is whether your service talks to Sentry itself or talks to a Collector first.

Approach	Use it when	What you get	Tradeoff
Direct OTLP to Sentry	You have one backend service or project and want the smallest setup	Fewer moving parts and a short path from service to Sentry	Less central control over processing, sampling, and routing
Collector forwarding	You have multiple services, already run a Collector, need processing, or want multi-vendor routing	Centralized routing, batching, processing, sampling, and easier vendor evaluation	Another component to deploy and operate

Direct OTLP is the simplest path for a single backend project:

Collector forwarding is the better fit when your observability setup is already more than one app talking to one destination. It gives you one place to receive telemetry from many services, batch it, process it, and send it to one or more backends.

That last part matters when you are evaluating Sentry. You can keep routing telemetry to an existing vendor while also forwarding a copy to Sentry, then compare the debugging experience without rewriting backend instrumentation.

There is one important Sentry-specific detail for larger setups: a generic OTLP HTTP exporter points at one Sentry project endpoint with one project key. If you send every service through that one exporter, every service lands in the same Sentry project. For multi-project routing, use the Sentry exporter. It can route OTLP events to projects based on a resource attribute like service.name, and it can auto-create missing projects when configured with the right Sentry API permissions.

A demo architecture

Let's check out a demo project that uses the Collector forwarding path:

React + @sentry/react
  -> fetch with traceparent
  -> FastAPI + OpenTelemetry
  -> OpenTelemetry Collector
  -> Sentry OTLP endpoint

The frontend lives in frontend/. It is a React + Vite app using @sentry/react. The backend lives in backend/. It is a FastAPI service using the OpenTelemetry SDK, FastAPI instrumentation, SQLAlchemy instrumentation, manual spans, and standard logging in checkout logic. The Collector lives in collector/ and forwards those OTLP events to Sentry.

The point of the demo is not that every layer uses the same SDK. The point is that every layer participates in the same trace, with backend logs attached to that debugging context.

The frontend uses the Sentry SDK

The Sentry setup is in frontend/src/instrument.ts. It enables browser tracing, Session Replay, logs, and trace propagation:

Sentry.init({
  dsn: import.meta.env.VITE_SENTRY_DSN || undefined,
  environment: import.meta.env.MODE,
  sendDefaultPii: true,
  integrations: [
    Sentry.browserTracingIntegration(),
    Sentry.replayIntegration({
      maskAllText: false,
      blockAllMedia: false,
    }),
  ],
  enableLogs: true,
  tracesSampleRate: 1.0,
  propagateTraceparent: true,
  tracePropagationTargets: [
    'localhost',
    '127.0.0.1',
    /^http:\/\/localhost:8000\/api\//,
    /^http:\/\/127\.0\.0\.1:8000\/api\//,
  ],
  replaysSessionSampleRate: 1.0,
  replaysOnErrorSampleRate: 1.0,
})

The frontend checkout flow is in frontend/src/hooks/useCheckoutLab.ts. It creates Sentry spans for user-facing work, logs useful state changes, and captures unexpected errors:

await Sentry.startSpan(
  {
    name: 'Run checkout scenario',
    op: 'ui.checkout',
    attributes: {
      scenario,
      itemCount,
      totalCents,
    },
  },
  async () => {
    const order = await createCheckout(cart, scenario)
    setLastOrder(order)
  },
)

The actual request is a normal fetch call:

const response = await fetch(`${API_BASE_URL}/api/checkout`, {
  method: 'POST',
  headers: { 'content-type': 'application/json' },
  body: JSON.stringify({ items, scenario }),
})

There is no manual trace header code in that request. The Sentry browser tracing integration handles the propagation as long as the destination matches tracePropagationTargets.

The backend keeps OpenTelemetry

The backend does not install or initialize the Sentry SDK. Its OpenTelemetry setup is in backend/app/core/observability.py.

It creates an OpenTelemetry TracerProvider, attaches service resource attributes, exports spans over OTLP HTTP, and registers W3C trace-context propagation:

resource = Resource.create(
    {
        **parse_resource_attributes(settings.otel_resource_attributes),
        "service.name": settings.otel_service_name,
        "deployment.environment": settings.app_environment,
    }
)
provider = TracerProvider(resource=resource)

if settings.otel_exporter_otlp_traces_endpoint:
    exporter = OTLPSpanExporter(endpoint=settings.otel_exporter_otlp_traces_endpoint)
    provider.add_span_processor(BatchSpanProcessor(exporter))

trace.set_tracer_provider(provider)
propagate.set_global_textmap(
    CompositePropagator(
        [
            TraceContextTextMapPropagator(),
            W3CBaggagePropagator(),
        ]
    )
)

It also configures OTLP log export. The backend creates a LoggerProvider, attaches an OTLPLogExporter, and adds an OpenTelemetry LoggingHandler to the app logger:

provider = LoggerProvider(resource=build_resource(settings))
exporter = OTLPLogExporter(endpoint=settings.otel_exporter_otlp_logs_endpoint)
provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
set_logger_provider(provider)

otel_handler = LoggingHandler(level=logging.INFO, logger_provider=provider)
app_logger = logging.getLogger("checkout_trace_lab")
app_logger.addHandler(otel_handler)
app_logger.setLevel(logging.INFO)

The default backend endpoints are local:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318/v1/logs

That means the backend exports OTLP traces and logs to the Collector, not directly to Sentry.

The FastAPI app also allows the browser trace headers through CORS:

allow_headers=[
    "content-type",
    "sentry-trace",
    "baggage",
    "traceparent",
    "tracestate",
]

This is easy to miss. If your browser is making cross-origin requests and CORS blocks the propagation headers, your frontend and backend traces can split apart even if both sides are instrumented correctly.

The checkout flow adds manual OTel spans and logs

The backend service code in backend/app/services/checkout.py models a checkout workflow. It creates manual OpenTelemetry spans for business operations:

def validate_cart(engine: Engine, payload: CheckoutRequest) -> dict[str, ProductRow]:
    with tracer.start_as_current_span("checkout.validate_cart") as span:
        requested_ids = [item.product_id for item in payload.items]
        span.set_attribute("cart.item_count", len(payload.items))
        ...

The checkout path includes spans for:

checkout.validate_cart
checkout.reserve_inventory
checkout.calculate_tax
checkout.payment
checkout.write_order
checkout.send_confirmation

It also emits backend logs for the same business steps, such as checkout.started, checkout.cart_validated, checkout.inventory_reserved, checkout.payment_approved, checkout.order_written, and checkout.completed. Those logs go through Python's standard logging API, then the OpenTelemetry LoggingHandler exports them through OTLP.

This is the part OTel-first teams care about most. Those spans and logs stay in the OpenTelemetry pipeline. You do not need to rewrite them with Sentry.startSpan() or Sentry logging APIs just to view the trace and related logs in Sentry.

The Collector forwards OTLP to Sentry

The Collector config is in collector/otel-collector.yaml. It receives OTLP over gRPC and HTTP:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

It batches the data:

processors:
  batch:
    send_batch_size: 1024

Then it forwards to Sentry with the OTLP HTTP exporter:

exporters:
  debug:
    verbosity: basic
  otlphttp/sentry:
    endpoint: ${env:SENTRY_OTLP_ENDPOINT}
    headers:
      x-sentry-auth: "sentry sentry_key=${env:SENTRY_OTLP_PUBLIC_KEY}"
    compression: gzip
    encoding: proto

Reminder: this demo uses generic otlphttp for a single Sentry project. For multi-project routing or automatic project creation, swap it for the sentry exporter.

The configured pipelines send to both the debug exporter and Sentry:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, otlphttp/sentry]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, otlphttp/sentry]

That is the Collector forwarding pattern: the backend sends OTLP events to one local endpoint, and the Collector decides where that telemetry goes next.

What each layer is responsible for

The cleanest way to understand this architecture is by ownership.

The frontend Sentry SDK owns browser-specific debugging context:

Browser spans and frontend transactions
Frontend errors and React error boundaries
Session Replay
Frontend logs
Source maps through the Sentry Vite plugin
Trace propagation to the backend

The backend OpenTelemetry SDK owns backend instrumentation:

FastAPI request spans
SQLAlchemy spans
HTTPX spans if the backend calls other services
Manual checkout spans
Backend logs exported with OpenTelemetry
Resource attributes such as service.name and deployment.environment
OTLP export to the Collector

The Collector owns routing and processing:

Receiving OTLP on 4317 and 4318
Batching telemetry
Printing debug output locally
Forwarding to Sentry's OTLP endpoint
Providing the place to add sampling, transforms, filters, or multi-vendor routing later

This is why Sentry and OpenTelemetry are not competing choices here. They are doing different jobs in the same observability pipeline.

What you can see in Sentry

In the demo, I ran the checkout scenarios from the same frontend session. In Sentry, they show up as one connected trace that starts in React and continues through the FastAPI backend. You can see the normal checkout, slow payment, inventory miss, payment declined, and backend crash work in the same distributed trace instead of jumping between separate tools.

The backend logs are associated with that trace too. Logs like checkout.started, checkout.inventory_reserved, checkout.payment_slow_path, and checkout.completed come from Python's standard logging API, get exported through OTLP, and land in Sentry attached to the same debugging context as the spans.

That is the useful part of the setup: the frontend SDK gives you browser context, the backend keeps its OpenTelemetry spans and logs, and Sentry gives you one place to inspect the full request path.

A decision tree for your own app

Use direct OTLP to Sentry when:

You have one backend service or one project.
You want the fewest moving parts.
You do not need central processing or multi-destination routing yet.

Use Collector forwarding when:

You already run an OpenTelemetry Collector.
You have multiple backend services.
You need services to land in separate Sentry projects.
You need sampling, filtering, transforms, or batching outside the app process.
You want to send telemetry to Sentry and another vendor while evaluating Sentry.

Add the Sentry backend SDK later when:

You need error monitoring. OpenTelemetry does not capture and send errors to Sentry—only the Sentry SDK can capture backend exceptions and link them to the trace, so you can jump from an error to the full request path and inspect the related logs, spans, and context.
You want profiling.
You want Sentry's Application Metrics.
You want other Sentry features that are not represented by your current OpenTelemetry data.

That last step is optional, not a prerequisite. You can start with Sentry on the frontend and OpenTelemetry on the backend, then decide later whether adding the backend Sentry SDK is worth it for your services.

Start with the smallest change that preserves your trace

If your backend already uses OpenTelemetry, do not start by rewriting instrumentation. Start by deciding where OTLP events should go.

For a single backend, direct OTLP to Sentry is usually enough.

For multiple services, vendor evaluation, or anything that needs routing and processing, put a Collector in the middle.

Then link your frontend Sentry SDK to your backend OTel SDK with W3C traceparent propagation. That gives you the useful part first: one trace that starts where the user action starts and continues through the backend code you already instrumented.

You do not need to pick Sentry or OpenTelemetry. Use both where they fit.

Your agent can't fix what it can't see

Tue, 26 May 2026 16:00:00 GMT

Agents are getting better and better at fixing bugs. They're even getting better at testing their work, thanks to headless browsers, sandboxes, simulators, etc.

But what about the bugs that only show up once you bring in different browsers, languages, extensions, internet speeds, and all the other variables that get mixed in the second you ship to prod? Or all the bugs that only show up when you account for… well, humans being humans and doing weird stuff you didn't expect them to do?

The bottleneck for self-healing software isn't agent intelligence. It's that agents have no idea what actually broke. They're debugging from source code alone, which is roughly as effective as diagnosing a server outage by skimming the README. What they're missing is production context: the stack trace, the request payload, the environment, the breadcrumbs leading up to the failure.

Your agents need someone/something telling them what's breaking in the wild and giving them the context they need to understand why.

We built Sentry MCP and the Sentry CLI to make that context available to both humans, and increasingly as important, their agents. You can wire up a system today where a Sentry alert triggers an agent, the agent investigates the issue using the same evidence you would, and a draft PR with a fix lands in your repo before you open a browser.

Why draft PRs, not auto-merge

Let's be honest about what's realistic. A system that detects, fixes, tests, deploys, and monitors its own patches without human involvement is not something you should build today. That's how you get a very exciting incident review.

The useful version is more modest: a production error fires, an agent investigates it with real Sentry context, writes a small fix with a regression test, and opens a draft PR. A human is very much in the loop.

That's not fully autonomous, but it's not trivial either. Most bugs sit in a queue, triaged, prioritized, assigned, waiting, and often lose out to new features. Seer diagnoses the root cause in under two minutes. A complete Autofix run, from root cause analysis to an opened PR, takes about six minutes.

An agent that opens a reviewable, mergeable fix six minutes after the error fires is a meaningful change to your mean time to resolution, even if a human still clicks merge.

Two ways to give your agent production context

Sentry MCP is the right choice for agents that support the Model Context Protocol (Claude Code, Cursor, Codex, Windsurf, VS Code with Copilot). Your agent connects to the hosted server, authenticates via OAuth, and gets structured access to issues, events, traces, and Seer analysis. No local install required.

# One-liner for any MCP-compatible client
npx add-mcp https://mcp.sentry.dev/mcp

# Or for Claude Code specifically
claude mcp add --transport http sentry https://mcp.sentry.dev/mcp

If your client doesn't support the one-liner, add the config manually:

{
  "mcpServers": {
    "sentry": {
      "url": "https://mcp.sentry.dev/mcp"
    }
  }
}

The Sentry CLI is the right choice for scripted workflows, CI pipelines, or any automation where you need structured output you can pipe to jq or feed into another process.

curl https://cli.sentry.dev/install -fsS | bash
sentry auth login

Here's what that looks like:

$ sentry issue list

Issues in acme/checkout:
╭──────────────┬──────────────────────────────────────────────────────┬──────┬─────┬────────┬───────┬──────────────╮
│ SHORT ID     │ ISSUE                                                │ SEEN │ AGE │ EVENTS │ USERS │ TRIAGE       │
├──────────────┼──────────────────────────────────────────────────────┼──────┼─────┼────────┼───────┼──────────────┤
│ CHECKOUT-P1  │ TimeoutError: Payment charge exceeded 30s            │   3h │  3h │  1.8k  │   340 │ High  86%    │
├──────────────┼──────────────────────────────────────────────────────┼──────┼─────┼────────┼───────┼──────────────┤
│ CHECKOUT-N7  │ TypeError: Cannot read property 'total'              │   1d │  5d │    215 │    82 │ High  71%    │
├──────────────┼──────────────────────────────────────────────────────┼──────┼─────┼────────┼───────┼──────────────┤
│ API-34       │ RateLimitError: Too many requests to /v1/charges     │   3d │ 21d │     67 │    24 │ Med   42%    │
╰──────────────┴──────────────────────────────────────────────────────┴──────┴─────┴────────┴───────┴──────────────╯
Tip: Use 'sentry issue view ' to view details.

CHECKOUT-P1 is at the top, a timeout in the checkout service with 1.8k events and an 86% fixability score. Drill in:

$ sentry issue view CHECKOUT-P1

CHECKOUT-P1: TimeoutError: Payment charge exceeded 30s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╭────────────┬─────────────────────────────────────────────╮
│ Status     │ ● Unresolved (Ongoing)                      │
│ Fixability │ High (86%)                                  │
│ Level      │ error                                       │
│ Platform   │ node                                        │
│ Project    │ checkout-service                            │
│ Events     │ 1832                                        │
│ Users      │ 340                                         │
│ First seen │ 3 hours ago                                 │
│ Last seen  │ 12 minutes ago                              │
│ Culprit    │ chargeCustomer (src/payment.ts)             │
│ Link       │ https://acme.sentry.io/issues/CHECKOUT-P1/  │
╰────────────┴─────────────────────────────────────────────╯

Tip: Use 'sentry issue explain CHECKOUT-P1' for AI root cause analysis

Looks like a straightforward timeout. An agent with just this would add retry logic or bump the timeout. But run sentry issue explain:

$ sentry issue explain CHECKOUT-P1

ℹ Starting root cause analysis, it can take several minutes...

Root Cause Analysis Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Cause #0: The checkout service's /charge endpoint times out
waiting for the payment service, which blocks on an inventory
availability check. The inventory service's check_stock query
regressed from ~200ms to ~28s after migration
0047_drop_unused_indexes removed the compound index on
(product_id, warehouse_id).

Repository: acme/inventory-service
Affected: src/queries/check_stock.ts:18
First seen: release-3.1.0 (deployed 3h ago)

Reproduction steps:
1. User submits checkout → POST /charge
2. Payment service calls inventory.check_stock(items)
3. check_stock runs full table scan (missing index) → 28s
4. Payment call exceeds 30s timeout → TimeoutError bubbles up to checkout

To create a plan, run: sentry issue plan CHECKOUT-P1

The root cause isn't in the checkout service at all. It's a dropped database index in the inventory service, two hops away in the trace. No amount of retry logic in payment.ts fixes that.

From alert to draft PR

When a Sentry alert fires on a new or regressed issue, a webhook triggers a worker that checks out your repo and runs a coding agent with a prompt grounded in the specific issue:

A production error was captured by Sentry. The issue ID is CHECKOUT-P1.

Use Sentry MCP to retrieve the full issue details: stack trace,
breadcrumbs, tags, release, environment, distributed traces,
suspect commits, and Seer analysis.

Based on the evidence:

1. Identify the root cause. Follow traces across services.
2. Make the smallest safe fix in the right repository.
3. Add or update a regression test that covers this failure.
4. Run the test suite.
5. Open a draft PR with the Sentry issue link, root-cause
   summary, files changed, and test results.

The agent pulls the issue via MCP. The distributed trace shows the checkout call chaining through the payment service into an inventory check that's taking 28 seconds. Metrics confirm the inventory service's p99 spiked from 200ms to 28s three hours ago. Suspect commits point at a migration in acme/inventory-service that dropped a compound index. Session replay shows users rage-clicking "Pay" while nothing happens, generating duplicate charge attempts.

sentry issue plan CHECKOUT-P1 lays out the fix: restore the compound index on (product_id, warehouse_id). A draft PR lands in acme/inventory-service with the migration, a root-cause summary linking back to the Sentry trace, and a regression test.

Try it with Cursor Automations

We publish a cookbook recipe for this exact workflow using Cursor's Automations feature. It walks through connecting your repo to Sentry, adding the MCP server to an automation, and configuring a webhook alert to trigger on regressed issues.

Because Sentry knows the release history and suspect commits, the agent doesn't search the entire repo for the problem. It starts where the evidence points. For regressed issues specifically, it can identify which commit reintroduced the bug, read the original fix, and understand what went wrong the second time around.

What's next

The more telemetry your app sends to Sentry (traces, metrics, logs, session replays), the harder the bugs an agent can tackle. Today it's dropped indexes across service boundaries. Six months ago it was null checks. The merge rate on Autofix PRs has climbed from 41% to 46% in that time, and the diagnosis complexity is growing with it.

There are real limits. Bugs that need product judgment, issues in code the agent can't reach, and problems where there isn't enough telemetry to connect the dots: those still need you. But the surface area of what agents can fix is expanding every month.

Connect Sentry MCP to your editor or install the CLI. Hook up your repos for code mappings and tracing. Run sentry issue explain on something that's been sitting in your backlog and see what it finds.

Check out the Seer Autofix docs for more on coding agent handoff to Claude Code and Cursor.

The product analytics you already have

Thu, 21 May 2026 09:00:00 GMT

You already have everything you need.

If you're using Sentry, you have traces, structured logs, and now application metrics. Most teams use that stuff for debugging and stop there. But get this: that same data can answer most of the product questions you've been sending to a separate analytics tool, maintained by a separate team, with a separate data model and a separate bill. (Not all of them. We'll get honest about the gaps later.)

This isn't a post about whether product analytics tools should exist. It's about the fact that developers have been sitting on top of a goldmine of product insight and outsourcing the questions to someone else. You don't have to.

There's a reason this matters more now than it did five years ago. The line between "product manager" and "software engineer" is blurring. Engineers are increasingly expected to think about adoption, retention, and user behavior, not just uptime and latency. If you're a product engineer (and increasingly, that's just "engineer"), the tools you already use for debugging are the same tools you should be using to answer product questions. You just haven't been querying them that way.

Every product question maps to telemetry you already have

"How many users completed onboarding this week?" That's a counter metric: metrics.count("onboarding.completed", 1). You can slice it by plan tier, country, or referral source with attributes you're already setting.

"What's our p95 checkout latency by region?" That's a distribution on a span. The trace already has the timing. You just need to query it.

"Why did signups drop on Tuesday?" That's a structured log. The signup service logged signup.failed with reason: email_validation_error and deploy_sha: a3f9b2c. The log is already correlated to the trace that shows the full request lifecycle, the span that errored, and the release that introduced it. From the trace, you're one click away from the issue, which links you to the commit and the line of code that caused it.

These are the same questions your PM asks their analytics tool. The difference is that when you answer them from your telemetry, the answer comes with context. The analytics tool gives you the "what." Your telemetry gives you the "what," the "why," and a direct path to the code that's responsible.

A customer told us recently that they were tired of analytics tools giving them a heads-up after the damage was done. They wanted to be proactive. The way they got there was setting up alerts and monitors on the telemetry they were already collecting. When a business-critical metric starts trending down, the alert fires before the weekly dashboard review catches it. And because the alert is on a span or a metric that's connected to the full trace, the investigation starts with context, not a context-switch to a different tool. They didn't change the data they were collecting. They changed how they were watching it.

What it looks like in practice

Say you want to know whether users are adopting your new export feature, whether it's performant, and whether paid users behave differently than free users.

You don't need to file a ticket asking someone to instrument this in an analytics tool. You already have the three primitives: spans, logs, and application metrics.

Spans for request-level context. Spans tell you how requests move through your system and how long each operation takes. The export API handler already has a span. Add business-level attributes (like export.user_tier, which lets you slice by plan):

with sentry_sdk.start_span(op="export.generate", name="Generate export file") as span:
    span.set_attribute("export.file_size_mb", file_size_mb)
    span.set_attribute("export.format", export_format)
    span.set_attribute("export.user_tier", user.plan)
    span.set_attribute("export.row_count", row_count)

Query adoption, performance, and errors in one place:

span.op:export.generate | count(), avg(span.duration), count_if(span.status:internal_error)
  group by export.user_tier, export.format

export.user_tier  | export.format | count() | avg(span.duration) | error_count
──────────────────|───────────────|─────────|────────────────────|────────────
pro               | xlsx          | 1,204   | 3.8s               | 89
pro               | csv           | 3,847   | 1.2s               | 24
free              | csv           | 12,493  | 0.9s               | 3
free              | xlsx          | 891     | 4.1s               | 142

You can already see the story: xlsx exports are 3-4x slower than csv, and free-tier xlsx is erroring on 16% of requests. You didn't need an analytics tool to surface that. You needed to query the spans you already had.

Structured logs for discrete business events. Logs capture discrete events with enough detail to help you debug what happened and why. When a user upgrades their plan, you want to record the events that took place before and after the plan change, where it was triggered, and any revenue impact. That's a business event worth recording, but it's not a performance-sensitive operation you need to trace end-to-end. Log it:



sentry_sdk.logger.info(
    "plan.upgraded",
    previous_plan=previous_plan,
    new_plan=new_plan,
    user_id=user.id,
    upgrade_source=source,  # "paywall", "settings", "checkout"
    mrr_delta_usd=mrr_change,
)

Now you can query plan upgrades by source, see which upgrade path drives the most revenue, and if something breaks in the upgrade flow, the log entry is already linked to the trace context that shows you where it failed.

Query your logs for upgrade events from the past 30 days:

plan.upgraded | count(), sum(mrr_delta_usd)
  group by upgrade_source

upgrade_source | count() | sum(mrr_delta_usd)
───────────────|─────────|───────────────────
paywall        | 842     | $27,340
settings       | 214     | $8,120
checkout       | 1,307   | $51,890

Checkout drives the most upgrades, but it also converts at a higher average MRR per upgrade ($39.69 vs. $32.47 for paywall). That's the kind of insight your PM is running a separate tool to get, and it's sitting in your logs.

Application Metrics for KPIs and health signals. Metrics are how you track the measures that tell you whether things are healthy – both in your service and your business. Things like checkout conversion rate, signup and traffic over time, and error budget burn. These are the signals you track continuously, get alerted on, and study trends:

from sentry_sdk import metrics

metrics.count("export.completed", 1, attributes={"tier": user.plan, "format": export_format})
metrics.distribution("export.file_size", file_size_mb, attributes={"tier": user.plan})

This gives you a counter and distribution you can alert on, visualize, and use to observe trends over time without building or maintaining an analytics pipeline.

That's three primitives in one SDK covering every angle you need: request-level detail from spans, discrete business events from logs, and aggregated trends from metrics, all queryable in one tool and all connected to each other.

Compare this to the usual workflow: someone instruments a feature_export_used event in an analytics tool, builds a dashboard, checks it weekly. Three weeks later they notice adoption is flat. They ask engineering if there are issues. Engineering checks Sentry, finds the export is timing out for files over 50MB, which covers most real-world usage. Three weeks lost because the analytics tool could see the symptom but not the cause.

With your telemetry, you can set up a monitor on that span's error rate and duration. When the timeout starts happening, the alert fires on the span itself, not on a downstream analytics metric that takes weeks to reflect the problem. The metric shows the count dropping. The span shows the duration spiking. The log shows the error. And all three point back to the same trace, the same release, the same line of code.

To go one step further, you can ask Seer if anything is broken that may be causing adoption to stay flat.

The skills already transfer

If you're already using Sentry for debugging, the only mental shift is realizing that "business telemetry" and "system telemetry" aren't different categories.

The business question ("did the user convert?") and the engineering question ("did the request succeed?") are the same question asked at different altitudes. You don't need a separate tool to ask the business question. You need to add purchase.value_usd to the span you already have, log the purchase.completed event with the attributes that matter, and increment the counter.

# On the span
span.set_data("purchase.value_usd", order.total)
span.set_data("purchase.item_count", len(order.items))

# As a structured log
sentry_sdk.logger.info(
    "purchase.completed",
    value_usd=order.total,
    item_count=len(order.items),
    coupon_applied=bool(order.coupon),
    user_id=user.id,
)

# As a metric
metrics.count("purchase.completed", 1, attributes={"coupon": str(bool(order.coupon))})
metrics.distribution("purchase.value", order.total, attributes={"plan": user.plan})

One customer framed it well: when there's a discrepancy in a business metric, it shows up in other parts of the system too. That's what observability means. Everything is connected. The moment you start treating your telemetry as the source of truth for product questions, you stop needing a second system to answer them.

Where this gets hard (and why it's getting easier)

Let's be honest about the gaps.

Multi-session retention analysis, behavioral cohorts ("users who did X but not Y within 14 days"), and cross-session funnel conversion are genuinely difficult to reconstruct from raw telemetry. Spans are request-scoped. Logs are event-scoped. Metrics are time-series. Stitching together a user's journey across sessions and days requires aggregation infrastructure that observability tools haven't traditionally built.

If your PM needs a 30-day retention curve segmented by acquisition channel, you can't just GROUP BY your way there today.

But most of the data is already there. Your spans, logs, and metrics all carry user IDs, timestamps, and business attributes. The missing piece is the aggregation and visualization layer. That's a query engine problem, not a data model problem. OpenTelemetry is making this easier to solve every quarter, because once instrumentation is standardized, the aggregation layer becomes commoditized. The gap is real, and it's shrinking.

We think there's interesting work to be done here, and we plan to dig into some of these harder use cases in future posts, showing how far you can get with Sentry's existing query tools even for cross-session analysis.

For the questions that matter day-to-day as a developer building and shipping features, the gap doesn't exist. Is my feature being adopted? Is it performant? Is it erroring? Are paid users behaving differently than free users? Which upgrade path drives the most revenue? You can answer all of these right now, from the telemetry you already have, without waiting for anyone.

Try it

Pick one feature you shipped recently that you're curious about. There most certainly is an opportunity to:

Add business-level attributes to the spans on the critical path of that feature's functionality
Add Sentry Logs that are high cardinality wide events including details you would want to query this data by (user plan, surface, activity data)
Sprinkle application metrics across with attributes that will be useful in creating dashboards.

We have found that agents are quite good at following these types of requests, and we have some skills to deploy in your local IDE and once you're done, ask Seer Agent about the best ways to create dashboards, monitors and alerts for them.

See how long it takes before the analytics dashboard for that feature stops being the thing anyone opens first.

You already have the data, you already have the tools, and you've just been letting someone else ask your questions for you.

New ways to agentically build and edit dashboards

Thu, 14 May 2026 09:00:00 GMT

The traditional dashboard workflow, teams slowly handcrafting visualizations to track critical KPIs, is dying in a world of AI agents.

A few years ago, in a pre-agentic-everything world, we tried to make it easier for developers to monitor critical experiences. We introduced Insights pages, which were pre-configured dashboards any Sentry user could adopt instantly that surfaced common health signals, like Web and Mobile Vitals.

The idea was right, but there was a problem: while many companies share common signals, every organization is unique. Without meaningful customization, most teams still ended up having to slog through manually building dashboards themselves. So we kept iterating.

Large language models are what finally made a reality of on-demand, customizable dashboards possible. Visualizations remain one of the most information-dense ways for humans (and agents) to communicate. What changed is the cost of creating those views.

Instead of assembling dashboards widget by widget, you can now prompt an agent to create or edit dashboards directly in Sentry, or use the Sentry CLI to connect Sentry to other models, and generate a dashboard tailored to the task at hand.

Insights pages are now Sentry-built Dashboards. Clone them, then ask an agent to customize them for your project. Dashboards can now be created in seconds, used for the lifetime of a project or investigation, and discarded once they stop providing value.

What's new

Agentic dashboard creation & editing (beta): All organizations with AI-powered features enabled can now create and edit dashboards in Sentry using an agent-powered chat experience.
Insights are now Sentry dashboards: We replaced Insights pages with clonable, editable Sentry dashboards that you can customize to fit your specific use case.
Dashboard creation & editing via the Sentry CLI: You can create and manage Sentry dashboards from your terminal via the all-new Sentry CLI.

Note: Sentry dashboards use issue, tracing and application metrics data. You can also query against multiple event types in Sentry and save queries to dashboards.

Agentic dashboard generation and editing

You can now create and edit dashboards in Sentry agentically via the same capabilities that power Seer, Sentry's AI debugger (Note: while Seer is an add-on option, agentic dashboard creation is a separate feature that is available for free). When you create or edit a dashboard, you still have the option to add or edit manually, but you can now just tell Sentry what you want and we'll put it together for you automatically. Creating and editing dashboards with AI makes the experience of going from concept to final dashboard much faster than manual creation:

It's a best practice to use agentic dashboard creation as a starting point for a new dash or edits to an existing dash, and then verify and tweak the updates to make sure it's exactly what you want. This experience is still in open beta, so we're still ironing out some of the kinks.

Many more dashboards can now easily be created. With this in mind, we've created a markdown widget to encourage you to leave notes and document what you've built:

Dashboard revision history

Dashboards now maintain a revision history. Any edits made through the UI, Seer agent, or API are automatically tracked. To view prior revisions, click the clock icon in the upper-right corner.

If you, or a 🤖, make a mistake, you can restore a known-good version by selecting a previous revision and clicking Revert to Selection.

Sentry use case: fixing and monitoring jest tests

Last week, we increased the number of CI runners we use for jest tests, moving from 4 to 8 on our main branch. We've had 4 runners for years to parallelize the work. Total CI time depends on how long the slowest runner takes to do its work; to make sure that everything finishes at the same time we measure the duration to run each test file and parallelize based on that. More runners should mean faster CI time.

Instead, we noticed that overall CI time had been regressing for a few days, going from 6 minutes up to 10. The culprit was that the balancer script wasn't running successfully because tests were failing out. Once we fixed some flaky tests, that time went down from ~10 minutes to less than 4. Our new runner configuration did improve overall CI time after all!

To prevent this from happening again we used agentic dashboard creation (plus some manual metric creation) to whip up some new dashboards (and monitors) to keep tabs on flakes, balance failures, and overall slowness:

Insights are now Sentry dashboards

In the Dashboard nav item, you'll see the option to explore Sentry Built dashboards:

This is a collection of dashboards pre-built by Sentry to address common monitoring use cases. The Sentry-built dashboards cannot be edited, but they can be duplicated to create custom dashboards. You can think of them as templates that can be adapted to address your specific use case.

On the frontend side, the Web Vitals dashboard surfaces LCP, INP, CLS, and TTFB across your real users, broken down by page, with a Performance Score that flags which pages have the most room to improve. Frontend Session Health connects deployments to crash and error rates, so you can see when a release tanked your stability. Frontend Assets shows you which JS and CSS assets are slow or render-blocking, useful when LCP regressed and you don't yet know why.

On the backend side, you get dashboards for slow queries (with drill-downs into individual query summaries and sample events), cache hit/miss rates, queue throughput and processing latency, and outbound API requests grouped by domain. The Backend Overview ties these together with p50/p75 duration and the most time-consuming queries and domains.

For mobile, there's Mobile Vitals (cold/warm app starts, slow and frozen frame rates, TTID/TTFD), Mobile Session Health (crash-free sessions and users, with release annotations), and dedicated drill-downs for app starts and screen rendering.

There are also framework-specific dashboards: a Next.js Overview with a tree-based SSR view for finding performance bottlenecks, and a Laravel Overview tuned for Laravel-specific metrics.

Creating dashboards via the Sentry CLI

The Sentry CLI includes a full dashboard command, bringing dashboard creation and management out of the browser and into your shell. You can list, view, create, and modify dashboards, including adding, editing, and deleting individual widgets, without ever leaving your terminal. With the new dashboard command, dashboards become code-adjacent artifacts: scriptable, repeatable, and reviewable. Define dashboards in shell scripts or CI pipelines, commit them alongside your application code, and roll them out the same way you ship features.

Every command supports --json output, making it straightforward for scripts, internal tools, or AI coding agents to provision and update dashboards programmatically (that means you can easily create dashboards from Claude Code, GitHub Copilot, Cursor, or your agent of choice).

Sentry use case: investigating integrations

In order to test out a potential yabeda integration for application metrics, one of our engineers created a custom dashboard with the CLI and documented the process.

Get started

Whether you're cloning a Sentry-built dashboard as a starting point, prompting an agent to spin one up from scratch, or scripting dashboard creation directly from your terminal, the goal is the same: spend less time building the view and more time acting on what it shows you:

Sentry Dashboards are available now on all plans. Open Dashboards to see them.
Custom Dashboards are available on all plans. Click Create Dashboard to start from scratch, or duplicate a pre-built dashboard.
Agentic Dashboard creation is live now for organizations with AI features enabled. If you don't see the Generate Dashboard option, check your org's AI settings.

Have questions or feedback?

Join the conversation in our Discord

New to Sentry?

Try Sentry for free

From vibe code to production-ready: observability for Next.js and Supabase apps

Mon, 11 May 2026 09:00:00 GMT

The way we build software has drastically changed over the past few years. What hasn't changed is that this software ends up in front of real people: you, me, my mom.

And when those users inevitably run into something broken, you as the application's developer need to be equipped with the right tools, context and understanding of what broke, where it broke, and how to fix it as quickly as possible.

Every day we're inching closer to self-healing software. If you are building a Next.js application and are using Supabase as the backend service, the tooling described below can help you get one step closer to a self-closing loop of producing quality software and fixing what slipped through the cracks with minimal disruption.

TL;DR

Supabase gives you query performance insights, row-level security (RLS) advisories, and edge function logs out of the box, but it can't trace across your full stack
Sentry fills that gap: distributed traces from your Next.js frontend through Supabase Edge Functions to Postgres, all in one place
Log draining from Supabase into Sentry gives you a single source of truth for errors, traces, and infrastructure logs
Sentry auto-detects N+1 queries, slow spans, and performance regressions without manual configuration
Seer, Sentry's AI debugger, can suggest a likely root cause for new issues automatically and hand off fixes to your coding agent

The stack problem agents create

AI-assisted development has a specific failure mode: agents write working code that has no observability built in. You could end up with a Next.js app that talks to Supabase via three different connection methods (direct Postgres, the Supabase JS SDK, and Drizzle, because the agent kept switching strategies), edge functions running in Deno, and no unified view of what's actually happening at runtime.

The other failure mode is subtler. Agents forget indexes. They could end up writing N+1 queries that are invisible locally because your dev database has 40 rows. You ship, your database grows to 400 rows, and suddenly a search query takes ten seconds. Sentry catches this automatically, but only if it's instrumented correctly from the start.

Getting that instrumentation right requires understanding a few things about how Supabase and Sentry fit together.

Supabase's built-in observability and its limits

Supabase has solid built-in observability. The Query Performance panel in the dashboard shows which queries run most often and which consume the most time. That's where you start when performance is the problem. The Advisors surface security issues like missing RLS policies and rank them by severity. The Index Advisor flags missing indexes before they become production incidents.

The Logs section gives you structured logs from every Supabase subsystem: edge functions, the Postgres REST API (PostgREST), the connection pooler, storage, and cron jobs. You can query them with SQL directly in the dashboard.

That's genuinely useful. But it's bounded by what Supabase can see, which is everything that happens inside Supabase. It can't tell you that a slow Postgres query was triggered by a specific user action in your Next.js frontend, or that an edge function timeout caused a cascade of errors in your API layer. For that, you need distributed tracing across the full stack.

Connecting Supabase logs to Sentry

The fastest way to get Supabase data into Sentry is the log drain. In the Supabase dashboard, under Logs > Drain, you add a destination and paste your Sentry data source name (DSN). All logs from that Supabase project start flowing into a corresponding Sentry project.

A few things worth knowing about this:

It's currently all-or-nothing. You can't filter by log level on the Supabase side before the drain
Once logs are in Sentry, you can filter by severity (severity:warn, severity:error) in the Log Explorer
Keep the log drain in its own Sentry project, separate from your Next.js app and your edge functions. This keeps the signal clean and makes it easier to set project-specific alerts

The reason to bother with this, beyond convenience, is that Sentry can correlate these infrastructure logs with traces from your application layer. When an edge function throws an error, you can see the full request path: Next.js page load → API route → edge function → Postgres query, with timing for each span.

For a step-by-step walkthrough of this setup, see the Supabase log drain monitoring recipe.

Instrumenting Next.js and Supabase Edge Functions

This is where most agent-generated setups go wrong. Next.js is a full-stack framework that runs in multiple runtimes: Node.js on the server, V8 in the browser, and potentially edge runtimes. Supabase Edge Functions run in Deno. These are not the same environment, and they need separate Sentry projects and separate SDK configurations.

The Sentry CLI handles this detection automatically:

npx sentry@latest init

For the Next.js app, your sentry.server.config.ts should include the Supabase integration to get automatic instrumentation of database queries:




const supabaseClient = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 0.1,
  integrations: [
    // Instruments Supabase queries as spans in your traces
    // so you can see exactly which DB calls are slow
    Sentry.supabaseIntegration(supabaseClient, Sentry, {
      tracing: true,
      breadcrumbs: true,
    }),
  ],
});

Without the Supabase integration, your traces will show that an API route was slow, but not which query caused it. With it, every Supabase SDK call becomes a named span with timing data. See the Next.js integrations docs for the full list of what's available.

For edge functions running in Deno, initialize Sentry at the top of each function before any other imports:



Sentry.init({
  dsn: Deno.env.get("SENTRY_DSN"),
  tracesSampleRate: 1.0, // sample everything in edge functions; volume is usually low
});

Deno.serve(async (req) => {
  return await Sentry.withIsolationScope(async () => {
    // your handler code
  });
});

The reason for separate projects: when Sentry's AI features (more on this below) analyze an issue, they work within a project's context. Mixing Next.js errors with Deno errors and Postgres logs in a single project makes that analysis noisier and less useful.

Automatic detection: N+1 queries, slow spans, and Web Vitals

Once instrumented, Sentry starts surfacing issues you didn't know to look for.

N+1 queries get detected automatically. If your code fetches a list of posts and then queries the database once per post to get comments, Sentry identifies the pattern and creates a performance issue. This is the kind of "logic" agents like to write constantly. It's the natural way to express the functionality, and it's invisible until you have real traffic.
Slow spans appear in the Trace Explorer. You can see exactly which database query, API call, or server-side render is consuming time, with the full request context attached.
Core Web Vitals for the frontend (LCP, INP, CLS) show up in the Next.js performance dashboard alongside your API latency and server transaction data. Having frontend and backend performance in one place makes it easier to figure out whether a slow page is a rendering problem or a slow API response.

The prebuilt Next.js dashboard in Sentry covers most of what you need out of the box and doesn't count against your dashboard quota.

Setting up agents to instrument correctly

Two things make the difference between an agent that instruments your app correctly and one that produces outdated, incomplete configuration.

MCPs over training data

Both Sentry and Supabase have Model Context Protocol (MCP) servers. When your coding agent has access to the Sentry MCP, it can query your actual issues, traces, and project configuration in real time instead of guessing based on training data that might be two years old. Sentry's SDK has changed significantly, and agents without current context will often configure it as if it's only for error monitoring, missing performance tracing entirely.

Skills files

For Claude Code, this is .claude/. For Cursor and others, .agents/. These files give your agent project-specific context that persists across sessions. Take a look at our Agent Skills documentation for a detailed breakdown of all the skills Sentry offers.

A practical workflow: when you need to add Sentry to a project, go to the Sentry docs, find the SDK for your framework, copy the setup prompt they provide, and give that to your agent. The docs include current best practices and the right SDK version. Don't just tell the agent to "add Sentry." It will find a way to do it, and the result will probably work, but it won't be right.

Monitoring beyond errors

Errors are the obvious case. But some of the most useful monitoring is for things that aren't errors.

Log-based monitors let you alert on patterns in your log stream. If you're draining Supabase logs into Sentry, you can create a monitor that fires when the count of connection received logs drops below a threshold in a given hour. Not an error, just a signal that something might be wrong with your database connectivity. In the Sentry UI: Alerts > Create Alert > Logs, filter by message content, set a count threshold, and assign it to yourself or a team.
Dynamic alerting is useful when you don't know your normal thresholds yet. Set an alert to use anomaly detection instead of a fixed value. Sentry's ML figures out what "normal" looks like for your transaction response times and fires when something falls outside that pattern. Start with dynamic, tune to specific values once you understand your baseline.
Sentry CLI for dashboards: The new Sentry CLI has a dashboards command that an agent can use to build a custom dashboard from your actual trace data. Point it at your project, ask it to build a performance dashboard for your application, and it will inspect your active transactions and spans to figure out what's worth visualizing. The output isn't perfect (you'll want to review widget configurations), but it's a reasonable starting point that takes about thirty seconds instead of thirty minutes.

Seer: from alert to fix

Seer is Sentry's AI debugger. It has access to your full issue history, traces, logs, and session replays. You can ask it plain questions: "which of my open issues are getting worse?" or "what are my slowest database queries?" and it will pull from your actual data to answer.

The more interesting capability is Autofix. Configure it in your Sentry project settings by connecting your repository. When a new issue comes in, Seer automatically suggests a likely root cause and, if you want, generates a draft PR with a suggested fix. You can configure how far it goes: root cause only, or full fix with updated tests.

For the Supabase security advisory workflow: the Supabase MCP exposes RLS policy issues and other advisories. An agent with both the Supabase MCP and the Sentry MCP can fetch those advisories and create Sentry issues from them, putting security problems into the same workflow as application errors. From there, Seer can pick them up and attempt fixes automatically.

This is what "self-healing software" actually looks like in practice: not magic, but a pipeline where new issues get triaged, analyzed, and handed to a coding agent without you having to be the one who notices them first.

Where to start

The fastest path is three steps: run npx sentry@latest init to instrument your Next.js app, add the Supabase integration to your server config for query-level spans. Then set up a log drain from Supabase into its own Sentry project. That gets you unified tracing across the full stack. From there, connect your repo to Seer and let it start suggesting fixes for new issues as they come in.

The Sentry Supabase integration docs cover setup end to end. Supabase has their own Sentry monitoring guide and a separate guide for edge function monitoring.

Monitor Unreal Engine Game Performance with Application Metrics

Fri, 08 May 2026 00:00:00 GMT

Your Unreal game can ship with zero errors and still not feel great. Stutters during combat, a frame-rate cliff on the big boss, rubber-banding in multiplayer, none of it shows up as a crash and none of it shows up in Sentry, leaving you without any visibility into what your players are actually experiencing in the wild. Well, until now.

Unreal Engine already gives you plenty of tools to measure game performance and collect runtime stats, but all that data stays on the dev's machine.

The Unreal SDK's new automatic performance metrics feature closes this gap by piping FPS, frame time, network health, and other common game telemetry straight to Sentry, so your team gets actionable insight into where performance breaks down, on which hardware, for which players. Pair it with Release & Health and you can watch the performance impact of each release land over time.

A quick note before we dig in: every gamedev has used a profiler at some point. Automatic performance metrics are a different-but-related tool, both go after the same problem at different layers: metrics find where the game is slowing down, profiling explains why.

What Sentry now tracks

Currently, Unreal SDK auto-instruments metrics for several key areas that impact overall performance including frame time, network and game-specific stats.

Frame time

The most direct read on whether your game feels responsive. Frame times tell you "how long the engine spent on each frame"; breaking it down by thread tells you which subsystem is the bottleneck.

Average FPS
Total frame time
Game thread work time
Render thread work time
GPU frame time

Comparing game thread vs render thread vs GPU time is the classic way to tell whether you're CPU-bound or GPU-bound and which team (gameplay, rendering, content) owns the fix.

FPS metric example (grouped by GPU)

Network insights

Multiplayer performance lives or dies by connection quality, and crash reporting can't see any of it. These metrics tell you whether packet loss, latency or bandwidth starvation is quietly degrading the experience.

Incoming/outgoing bandwidth
Packet throughput and loss
Client ping and jitter
Active connection count

Server builds additionally get per-client ping averages, per-client bandwidth and saturated-connection counts for load-shedding analysis (see the full list of network metrics).

These metrics only exist during active multiplayer sessions. Singleplayer games without networking emit nothing here and some values are client-only (ping, jitter) or server-only (active clients, saturation).

Ping metric example

Game stats

A small grab-bag of engine-level signals that often explain hitches the frame-time breakdown alone can't.

Number of active UObjects
Physical memory used by the process
Duration of the blocking GC pause

A UObject count that climbs steadily between GCs is a classic leak signature and correlating it with GC pause duration often reveals exactly when a leak starts hurting player experience.

Unlike frame time, these are sampled on a slower cadence: memory and object count every 60 seconds, GC pause emitted after each collection cycle. Values change slowly enough that per-frame resolution would be wasted throughput.

Used Memory metric example (grouped by platform, console-only)

Sampling performance metrics

Emitting a metric every frame would be an overhead on its own. To avoid that, the SDK samples at a fixed interval, emitting one data point every N frames for per-frame metrics like frame time and FPS, and every N seconds for slower-changing ones like memory use or network health. The defaults are conservative and tunable per project:

~2 samples per second for frame time at 60 FPS
Every 10 seconds for network
Every 60 seconds for game stats

On any single client this is sparse, a hitch on a non-sampled frame won't be captured. But across many players the aggregate distribution converges on the real picture. You want to know "what's the p95 frame time on RTX 3050 hardware?", not "what did frame #47312 look like on dev's laptop." If you need tighter resolution simply dial the interval down.

Metrics attributes

An aggregate FPS number on its own doesn't tell you much. What makes it useful is breaking it down: per GPU, per platform, per level. Every automatic metric is tagged with context attributes so you can do exactly that:

GPU model name
Number of CPU cores
Total physical RAM
Screen resolution
Current game map/level name

Metrics also carry the release version, operating system, and crucially the trace ID of whatever was happening when they were emitted. That last one is what separates metrics-in-Sentry from a standalone monitoring tool: spot a frame-time spike in the dashboard, click into the sample and you land in the full trace for that moment alongside any errors and spans captured with it.

For example, group FPS (game.perf.fps) by GPU (gpu.name) and the answer to "what FPS do RTX 3080 players actually see versus RTX 3050?" is one query away. Swap the grouping to OS (os.name) and you can compare memory footprint across Xbox, PlayStation and Switch.

Try it out and tell us what's next

Automatic performance metrics are enabled by default in Unreal SDK 1.11.0. See the Unreal SDK metrics docs for more on engine-version requirements and advanced configuration. Automatic metrics work on desktop, consoles and Android (with iOS support coming soon).

Ship a build with automatic performance metrics enabled and let it run for a few sessions, that's often enough to see whether hardware segmentation, frame-time percentiles or network health are already surfacing something worth fixing.

And since the feature is still experimental, what gets measured next is up for grabs. If there's a signal you wish we were capturing, open an issue on the Unreal SDK repo, as that's the best way to shape where this goes.

Have questions or feedback?

Join the conversation in our Discord
Email us at gaming-updates@sentry.io

New to Sentry?

Try Sentry for free

Fixing JavaScript observability, one library at a time

Thu, 07 May 2026 00:00:00 GMT

Over the past few weeks, we have been driving a cross-ecosystem effort to replace the "monkey-patching" that powers all JavaScript APM tools today with something built into the runtime. Here is why, how, and where it stands.

This applies to server-side JavaScript only (Node.js, Bun, Deno, Cloudflare Workers). Browsers do not have diagnostics_channel and lack the async context propagation primitives needed to polyfill it.

Monkey-patching does not scale

My teammate Sigrid wrote a detailed breakdown of why monkey-patching is failing and how TracingChannel solves it.

The short version: every JavaScript APM tool, including Sentry's, instruments libraries by intercepting require() and import calls at runtime using import-in-the-middle (IITM) and require-in-the-middle (RITM). This breaks with ECMAScript Modules (ESM), does not work in non-Node runtimes, conflicts with bundlers, and couples us to internal implementation details we do not control. The SDK also must load before the library it instruments, or instrumentation silently does nothing.

This is not a Sentry-specific problem. Every APM vendor maintaining JavaScript instrumentation deals with the same fragility. The ecosystem is stuck.

Most library maintainers do not think about observability. They do not know what they would need to expose, and adopting something like OpenTelemetry means taking on an implementation burden, not just adding a standard. APMs managed to patch their way around this for years, so nobody on the library side ever had to figure it out.

But there's a better way.

TracingChannels - observability without patching

In late 2025, we were working with Pooya Parsa (creator of Nitro, h3, and the unjs ecosystem) on the best way to build a Sentry SDK for the Nitro framework. During that conversation, my teammate Sigrid suggested we look into TracingChannel, a built-in API from Node's diagnostics_channel module. Sigrid's blog post covers that API in depth, but the core idea is simple: if a library publishes structured events on a TracingChannel, any APM tool can subscribe to those events without patching anything. The library just says "a query started" and "a query ended," and whoever is listening can create spans from that.

// Library side (e.g. inside mysql2)


const queryChannel = tracingChannel('mysql2:query');

queryChannel.tracePromise(async () => {
  return await connection.query(sql);
}, { query: sql, serverAddress: host, serverPort: port });

The cost of this added code is minimal, so this is an easy sell for library maintainers. From APM's side, we just need to subscribe to that tracing channel and we get the events. No IITM, no RITM, no loader hooks, no initialization ordering. Zero overhead when nobody is listening. Works across Node, Bun, and Deno. Bundler safe. The API has been available since Node 18, and dc-polyfill covers runtimes that lack it, which already matches our support range.

Everyone agrees, nobody is pushing

After getting enough learnings about the tracing channel API and how to make it work with OpenTelemetry, I opened an issue on Otel JS in November 2025 to discuss TracingChannel support.

The response was positive. A while after, someone from the OTel team even created a draft API approach for integrating TracingChannel into the OTel SDK.

But there is no significant push to drive ecosystem adoption. The draft exists; the ecosystem work does not.

Everyone agrees that TracingChannel is the future of JavaScript observability, but nobody is doing the work of getting libraries to adopt it. We have many instrumentations across databases, web frameworks, message queues, and AI providers that need TracingChannel support. That is a mountain of upstream PRs, each requiring understanding the library's internals, writing a proposal that maintainers will accept, implementing the changes, and iterating on review feedback.

So I thought "fine, why not just get the ball rolling?"

The first step was proving the pattern works. I had already built TracingChannel support by hand in h3, srvx, unstorage, db0, and Nitro as part of the earlier SDK work. The unjs ecosystem was receptive and moved fast, which gave us shipped examples to point to and an end-to-end mental model: how events should be shaped, how context propagation flows, how to make it work with OTel, and what semantic conventions to follow.

We also learned early that you can't just say "hey you should use TracingChannel," which is just begging to be shelved to collect dust. Instead, like we did with Nitro, we say "Hey, we will do it for you and help you own it." Accepting code into a repository adds a burden of maintenance, so we offer to help own it and make it part of the library.

With that in mind, I reached out to pg, mysql2, and redis to gauge their interest, offering to fully own this 'til it ships and provide support even after. These are the top database driver libraries in the ecosystem, accounting for over 60 million downloads per week combined. If we can get TracingChannel in them, we can get other libraries. All three said yes and were open to receiving a PR.

I also reached out to Stephen Belanger, the creator of the diagnostics_channel API in Node.js core. He is now helping push this forward, providing feedback on proposals and acting as the voice of authority which is sometimes needed to convince maintainers.

So one by one, we're making this happen across the ecosystem.

For context on how this fits into the bigger picture: My team is working on making our SDK runtime-agnostic, we are working multiple paths in parallel, most of which have an immediate effect. The TracingChannel initiative work is the long-term play. We cannot expect users to upgrade to new library versions overnight, and we probably won't convince everyone to implement them at the same time so the migration will be gradual.

Scaling it with AI

Here is the practical reality: Being one person trying to add TracingChannel support to 44 libraries is just not going to happen. I do not know the internals of any of them. I have never looked at the Redis protocol implementation or mysql2's query pipeline before this project.

So I built a feedback loop using Claude Code that handles the per-library heavy lifting via SKILLS:

Research and Propose. Given a library name, Claude researches its async model, existing OTel instrumentation, maintenance status, and internal architecture, then drafts a proposal following all the patterns we have established. I review and adjust before it goes anywhere.
Implement. Given an approved proposal, Claude produces a working implementation with tests, handling tracePromise/traceCallback selection, hasSubscribers guards, Node 18 compatibility, and integration tests against real services via Docker.
Capture Review Feedback. When a PR gets reviewed upstream, Claude triages every comment, assesses validity, suggests responses, and flags patterns that should inform future proposals. I decide what to act on and handle all communication with maintainers myself.
Update the Tracker. Claude fetches the latest status of every upstream PR and keeps the migration tracker current.

Each cycle feeds the next one. Learnings from one library's review process improve the next library's proposal. The knowledge compounds and is dumped into a LEARNING.md file to guide future work.

To clarify the human/AI split: Claude handles research, boilerplate implementation, and pattern application. I handle architecture decisions, insertion point identification, all maintainer communication, and final review of every line before it ships. Critically, every commit is co-authored and AI involvement is made transparent. Library maintainers interact with a human, not with an AI. I kept certain parts human-led because that shows respect to the maintainer's work, which is critical to convincing them to adopt code into their library.

This approach turned what would be a multi-year solo effort into a production line where I can keep dishing out proposals every day, start implementations in parallel, learn from them all and integrate the learnings into pending and future work.

10 merged, 34 to go

We are tracking many instrumentations across four categories. Here is where things stand:

Category	Total	Merged	PR Open	In Discussion	Not Started
OTel-provided	24	4	2	6	12
Sentry-built	10	0	0	1	9
Other ecosystem	8	5	2	1	0
Logging	2	1	0	0	1
Total	44	10	4	8	22

Notable wins:

mysql2 - Merged. One of the most popular database drivers in the npm ecosystem.
node-redis and ioredis - Both merged. The two dominant Redis clients now ship TracingChannel support.
h3, srvx, unstorage - All merged. The unjs ecosystem was early and enthusiastic. This touches Nitro, which in turn touches Nuxt and other downstream frameworks.

We also helped establish ecosystem coordination through an e18e umbrella issue and the untracing spec that standardizes TracingChannel usage for library authors.

What this means for Sentry

This flips the instrumentation model. Libraries own the contract, and we subscribe to it. Every problem described above (ESM breakage, init ordering, runtime lock-in, bundler conflicts) goes away. Our instrumentation code gets simpler, and we stop maintaining runtime-specific hacks.

This also benefits every APM tool, not just Sentry. Driving it builds trust with library maintainers and the broader community, sure, but several maintainers have specifically called out that they appreciate the approach because it helps everyone and is not biased towards any one APM provider.

The flywheel is starting

Take node-redis as a case study. During our collaboration with the Redis team, they were already working on their own first-party OpenTelemetry instrumentation. They wanted our TracingChannel proposal to align with and power that instrumentation. We re-implemented their already shipped metrics plugin using tracing channels and it worked without changing a single test. Now, we are helping them with traces.

Shortly after mysql2 shipped TracingChannel support, someone independently built mysql2-otel-instrumentation, a pure diagnostics_channel subscriber that replaces OTel's monkey-patched @opentelemetry/instrumentation-mysql2. The motivation was exactly the problem we are solving: RITM was not working. A library adds TracingChannel support, and the subscribers manifest on their own.

What's next

We have open PRs against Express, PostgreSQL (pg), Knex, and GraphQL, the kind of libraries where TracingChannel support means millions of applications get better observability without changing a line of their own code. MongoDB, Mongoose, Prisma, and Hono are in active discussion, and we have drafted proposals for Koa and Consola. There are still 20+ libraries on the list we have not reached out to yet, including Node's built-in HTTP module, Kafka clients, and AI provider SDKs.

Beyond individual library adoption, the next layer is reducing duplication on the consumer side. Right now, every APM tool that subscribes to a TracingChannel has to independently map library payloads to OpenTelemetry semantic conventions. We are designing a shared mapper registry, a set of co-maintained modules that translate TracingChannel events into standardized spans and attributes. The goal is to build and prove this internally at Sentry first, then open-source it so any APM vendor can plug in. If a library ships TracingChannel support and a mapper exists, instrumentation becomes automatic.

The long-term picture is an ecosystem where libraries emit events as a first-class concern, mappers are community-maintained, and APM tools compete on what they do with the data rather than on how creatively they can patch your dependencies. We are not there yet, but the flywheel is turning.

You can help by talking about tracing channels and advocating for their adoption in the libraries you use. If you maintain a library and want to add TracingChannel support, the untracing conventions and our published proposals are a good starting point.

Improved debugging for Expo apps with the React Native SDK

Wed, 06 May 2026 09:00:00 GMT

Events from Expo apps account for about 75% of the total event volume we receive from React Native apps. That number made it an easy decision to invest in updates to the Sentry React Native SDK to improve the debugging and performance workflow for your Expo apps.

With these updates, you can now:

Filter issues by OTA update channel or version to instantly narrow down whether a problem is tied to a specific update
Get alerted on emergency launches so you know when your OTA pipeline is failing before users report it
Track EAS Build health in Sentry so you don't have to dig through build logs to find out what broke
See the full picture of navigation performance including prefetch timing and asset loading

Automatic OTA update context on every event

When you ship over-the-air updates with Expo Updates, things can go wrong in ways that are invisible without the right context. Which update channel was the user on? Which runtime version? Was this the embedded bundle or a downloaded update?

Now, every Sentry event is automatically enriched with an ota_updates context, with no other setup required. You get the update ID, channel, runtime version, launch duration, and whether the app is using embedded assets. All of this is captured out of the box in Expo projects.

We also set searchable tags (expo.updates.channel, expo.updates.runtime_version, expo.updates.update_id) on every event, so you can filter your issue stream down to a specific channel or update with a single search query.

Emergency launch detection

Expo Updates performs an emergency launch when it fails to load the latest OTA update and falls back to the embedded bundle. When this happens, your users are silently running an older version of your app, and you might not even know.

The SDK now detects emergency launches at startup and automatically sends a warning-level event to Sentry with the reason. You can set up an alert on the expo.updates.emergency_launch tag and know immediately when your update pipeline is broken in production.

EAS build hooks: Track build failures in Sentry

Build failures in EAS Build happen on remote infrastructure, outside your local environment and outside your app. Until now, debugging them meant digging through EAS build logs manually.

With the new EAS Build Hooks, you can send build lifecycle events directly to Sentry. Add three script entries to your package.json (or just one if you prefer the combined on-complete hook) and every failed build will send an EASBuildError event with the build platform, profile, build ID, git commit hash, and CI status.

You can also capture successful builds to give you a complete picture of your build pipeline health right inside Sentry. All events are tagged with eas.* tags for easy filtering and alerting.

Setup is minimal: point the hook scripts at the ones the SDK provides, set your DSN as an EAS secret, and you're done. Check out the EAS Build Hooks documentation for setup instructions.

Performance spans for Expo Router prefetching

Expo Router v5 introduced router.prefetch() to preload routes before the user navigates to them. It's a great tool for perceived performance, but until now, prefetch timing was invisible in your traces.

Wrapping your router with Sentry.wrapExpoRouter(useRouter()) now creates a navigation.prefetch span for each prefetch call. You can see exactly how long route preloading takes alongside your other navigation spans, and identify routes where prefetching is slow or unnecessary.

Expo constants and environment context

Every event from an Expo app is automatically enriched with an expo_constants context containing metadata about the execution environment: where the app is running (Expo Go, standalone, bare), the app name and version from app.json, Expo SDK version, EAS project ID, and debug mode status.

Combined with the OTA updates context, this gives you a complete picture of the environment for every event without writing a single line of configuration code.

Image and asset loading instrumentation

Image and asset loading is one of the biggest contributors to how fast your app feels. We've added automatic performance spans for two of the most common Expo packages:

expo-image: Wrap Image with Sentry.wrapExpoImage(Image) once at startup, and every Image.prefetch() and Image.loadAsync() call gets a performance span.
expo-asset: Wrap Asset with Sentry.wrapExpoAsset(Asset) for spans on Asset.loadAsync().

Both wrappers are safe to call multiple times, create spans only when a trace is active (zero overhead otherwise), and don't require expo-image or expo-asset to be installed. They're peer dependencies.

Getting started

All of the OTA update and constants context is enabled by default. No configuration, no extra dependencies. For build hooks and performance instrumentation, setup is a few lines of code. Make sure you're on version 8.10 to get the latest improvements and fixes.

Expo also allows you to pull in issue details and replays from Sentry for errors occurring in your EAS deployments.

Don't have a Sentry account yet? Sign up for free, the developer plan includes everything you need to instrument an Expo app end to end. Have feedback on these integrations or ideas for what should come next? Open an issue on getsentry/sentry-react-native or drop into our Discord.

Introducing Application Metrics: Track the signal, see the spike, jump to the trace

Tue, 05 May 2026 00:00:00 GMT

A few weeks ago we had a bug with Session Replay. Replays were failing in some browsers once more than 1,000 video segments loaded. We had no idea how often it happened or who was hitting it, and because the failure didn't always produce an error, we had no way to find affected users to reproduce it.

Before, we could've answered this with spans or logs, but it's clunky — spans are often sampled, so you can miss outliers; logs are less structured and tend to change over time. Both are better suited for investigation. Metrics are ideal for tracking known behaviors over time. So we set up a metric in the Sentry SDK with a user and provider attribute, filtered for sessions over 1,000 segments, and had a repro case in minutes.

That's the job Application Metrics is for: track the signals you care about, and attach the context you might need later. When something breaks, the data is already there waiting.

Full events, not pre-aggregated counters

Metrics tools designed for tracking infrastructure telemetry tend to aggregate, stripping out information like user, IP address, and region. They're just a counter.

Sentry's Application Metrics stores full events, including high-cardinality fields like user. So you're able to ask not just "was the checkout experience slow in my application?", but "was the checkout experience slow for users on the east coast?", or "was a specific user's scheduled job causing a queue backlog?"

Same SDK, one line of code

If you're on a recent Sentry SDK, metrics are already enabled. No new dependencies or sidecar — just one more line.

There are three types you'll reach for most:

Counter — increment a number each time something happens. Think payment.declined, search.zero_results, or email.failed. Good for tracking rates and totals you want to alert on.
Distribution — record a value each time something happens, then ask questions about the spread. How long did that job take? How many items were in the queue? Use this when the average isn't the whole story.
Gauge — track a current value over time. queue.depth, cache.size, active.connections. The number you'd want on a dashboard.

You can attach attributes to all three. That's where Application Metrics differs from many infrastructure monitoring tools, which pre-aggregate and strip context. When you attach user.id, region, or projectId, the event is stored with that context intact — so when a distribution spikes, you're not just looking at a number, you're looking at a number tied to a specific user, in a specific region, on a specific project.

Click a spike, see the trace

By storing full metrics events — including the trace ID — metrics become part of a broader trace-connected debugging workflow.

When a metric reaches an unexpected threshold (a background job backing up with unsent emails; a UI component taking painfully long to load) you can jump from that metric to traces, logs, and errors, and get a full picture of what actually went wrong around the time your pager went off:

Is a 429 error happening in a loop at the same time that a distribution measuring React component load times spikes?
Is an upstream email service running slow at the same time that a gauge measuring queue depth increases?

How we actually used this to find a Session Replay bug

To investigate the Session Replay problem, we began by adding a distribution that tracked the number of video segments loaded. We included the high-cardinality projectId attribute.

Here's the code we added to start tracking video segments in replays:

const replayId = replay?.getReplay().id;
const projectId = replay?.getReplay().project_id;

const onLoadAllEvents = useEffectEvent(() => {
  const attributes = {
    projectId: String(projectId),
    replayId,
  };

  Sentry.metrics.distribution('replay.eventCount', events?.length ?? 0, {
    attributes,
  });

  Sentry.metrics.distribution('replay.videoEventCount', videoEvents?.length ?? 0, {
    attributes,
  });
});

See getsentry/sentry#114001.

We attached replayId and projectId as attributes on the scope so we could isolate high event counts to specific projects and replays. Given that we were having trouble reproducing the problem, this would help us catch the issue red-handed, tracing it back to a specific organization.

With that in place, we quickly learned two things:

The issue was rare — just 7 occurrences in the past week.
We had the exact users affected.

From there, we could trace those sessions, reproduce the issue, and fix it.

Because each metric event carries a trace ID, we could go further. We added targeted logs to see exactly what the user was doing when >1,000 frames loaded — were they scrubbing the video, loading many videos in succession, etc. Next time we saw a replay.videoEventCount over 1,000, we jumped to the connected trace, saw the log lines, and had the context to fix the bug.

Metrics vs. everything else

Metrics aren't a replacement for errors, traces, or logs. They fill a specific gap: tracking interesting, well-understood events in your application with high fidelity.

Not every event needs to be a metric. Logs are great during investigation. But when you find a signal you care about long-term — something that tracks application health — turn it into a metric.

Good candidates: business KPIs tied to code execution (payment.declined, search.zero_results), application health indicators (job.retried, email.failed), resource utilization (queue.depth, cache.hit_rate), and success/failure rates you want to alert on.

Not the right fit: infrastructure metrics like CPU and memory (use your infra tool), forensic debugging (use Sentry Logs), or request-level performance and connectivity (use Sentry Tracing).

Start with the metric your team checks first

Every Sentry plan comes with 5GB of Application Metrics. If you're on a recent SDK version, you already have access.

Pick the one signal your team reaches for first when something goes wrong. Maybe it's checkout.failed, maybe it's queue.depth, maybe it's deployment.duration. Instrument it, attach the attributes you'd want to filter on — user, project, region, whatever matters for that metric — and set an alert threshold.

When it fires, click through to the trace, find the context around the spike, and fix it.

Start a free Application Metrics trial in Explore > Metrics, or check out the Application Metrics docs →.

Two commands to Sentry: now on Stripe Projects

Wed, 29 Apr 2026 07:00:00 GMT

Two commands. That's how little it takes to go from nothing to a fully configured Sentry project with error monitoring, performance tracing, and session replay:

stripe projects init my-app
stripe projects add sentry/project

No signup form. No email verification dance. No dashboard tab-switching to copy-paste a DSN into your .env. Your account is created, your project is provisioned, and five environment variables land in your working directory, ready for your SDK to pick up.

And if you're using a coding agent? It does the same thing, except you didn't type the commands. You just said "add error monitoring."

What this actually is

Sentry is now a provider in Stripe Projects. Stripe Projects is a CLI workflow that lets developers (and their AI agents) discover, provision, and manage infrastructure services directly from the terminal. Think of it as a package manager, but for the services your app depends on at runtime.

Here's the full catalog:

$ stripe projects catalog sentry

SERVICES
    project   ● Free tier
    seer      ● Paid

PLANS
    developer ● Free
    team      ● $29/month
    business  ● $89/month

Two deployable services (a Sentry project and Seer AI), three plan tiers. All manageable from the CLI. Billing goes through your existing Stripe payment method, with no separate Sentry billing setup.

The "just tell your agent" part

When you run stripe projects init, it scaffolds agent skill files into your project:

.agents/skills/stripe-projects-cli/SKILL.md
.claude/skills/stripe-projects-cli/SKILL.md
.cursor/rules/stripe-projects-cli.md

These teach Claude Code, Cursor, or any coding agent how to use the Stripe Projects CLI. The agent reads the skill, discovers available services via stripe projects catalog sentry --json, and provisions what you need.

A real interaction looks like this:

You: "Add error monitoring to this project"

Agent: [runs stripe projects catalog sentry --json]
Agent: [runs stripe projects add sentry/project --no-interactive --accept-tos]
Agent: "Done. Sentry is provisioned. Your DSN and auth token are in .env.
        I can integrate the Sentry SDK into your app next."

The agent doesn't need special Sentry knowledge. It just needs the Stripe Projects CLI and the skill file that init already created. Provisioning becomes a step it handles between writing your code and running your tests.

Once the account is provisioned, you can ask the agent to instrument your app with sentry init. Since the auth token and DSN are already in the environment, the Sentry CLI knows exactly which project to target, with no configuration prompts, no guesswork.

Upgrades, downgrades, and the billing dance

Plans are first-class resources:

stripe projects add sentry/team                          # start on Team ($29/mo)
stripe projects upgrade sentry-plan sentry/business      # upgrade to Business
stripe projects downgrade sentry-plan sentry/team        # change your mind

All non-interactive, all scriptable. Billing happens through Stripe's Shared Payment Token, so your existing Stripe payment method pays for Sentry with zero billing configuration on the Sentry side.

You can also add Seer (our AI debugging assistant) as a separate service:

stripe projects add sentry/seer      # $40/active contributor/month
stripe projects remove sentry-seer   # if you change your mind

The magic login

This one's my favorite. When you need your Sentry dashboard:

stripe projects open sentry

This mints a single-use magic login URL. Click it (or let the CLI open it), and you're logged into your Sentry dashboard. You skip the password prompt, the single sign-on (SSO) redirect, and the "which account was this again?" moment. Straight to your issues page.

It's the kind of feature that sounds trivial but has a surprisingly dense security model once you start thinking about two-factor authentication (2FA) users, expired passwords, and SSO bypass prevention.

Multi-team collaboration

Here's where it gets interesting. Stripe's protocol distinguishes between the account owner (the Stripe account's email) and the actor (the person running the CLI command). We use this to build a proper collaboration model:

First team member runs stripe projects add sentry/project → creates the Sentry org
Second team member runs the same command → joins the same Sentry org
Everyone shares one org, one billing setup, one set of projects

We tie the Sentry organization to the Stripe organization, so everyone on the same Stripe account ends up in the same Sentry org. No per-developer silos, no "who created this and why can't I see it" conversations.

Credential rotation

stripe projects rotate sentry-project

New DSN, new auth token, old ones revoked. The fresh credentials land in your .env automatically. The dashboard stays closed.

How to try it

Install the Stripe CLI and the Projects plugin:
```
stripe plugin install projects
```

Initialize and add Sentry:

stripe projects init my-app
stripe projects add sentry/project

That's it. Your SENTRY_DSN and SENTRY_AUTH_TOKEN are in .env, ready for any Sentry SDK to pick up.

For the full lifecycle (catalog browsing, plan management, Seer, deep links, credential rotation), check the Stripe Projects documentation.

If you're building with AI coding agents and want error monitoring that provisions itself, give it a try. And if your agent breaks something in the process, well, you'll have Sentry to tell you about it.

Sentry's integration with Perforce is now generally available

Wed, 29 Apr 2026 00:00:00 GMT

Perforce meets Sentry

If you work in game development, VFX, or any industry dealing with large binary assets, chances are your codebase lives in Perforce P4. It's the version control system behind some of the biggest games and creative projects in the world — and until now, it's been one of the last major SCMs without first-class Sentry support.

Today, we're changing that. The Sentry + Perforce P4 integration is now generally available for all Sentry organizations.

What you get

The integration connects your P4 server directly to Sentry, unlocking the same source-code-aware debugging workflows that Git-based teams have relied on for years:

Stack trace linking — Click from an error's stack trace directly to the corresponding file in your Perforce P4 depot or P4 Code Review (formerly Helix Swarm) instance.
Commit tracking — Associate Perforce P4 changelists with Sentry releases so you always know exactly what code shipped.
Suspect commits — Sentry automatically identifies which changelists likely introduced an error, cutting your triage time.
Suggested assignees — Get assignment recommendations based on changelist authorship — the person who last touched the code gets surfaced first.
P4 Code Review (formerly Swarm) linking — If your team uses the P4 Code Review (formerly Helix Swarm) application for code reviews and browser-based depot browsing, Sentry links directly to your P4 Code Review instance. Stack trace links open the exact file in P4 Code Review's web UI, so reviewers and investigators can jump from an error straight into the code without needing a local workspace.

On-demand source context

The most requested capability during our beta: show me the code. Previously, showing the source code for native crashes required users to upload source maps to Sentry. Without source maps, showing the stacktraces was possible, but they didn't have source context.

With on-demand SCM source context, Sentry fetches source code directly from your Perforce P4 depot and displays it inline in the stack trace — even when your crash dumps or error reports don't include embedded source. Expand any in-app frame and Sentry pulls the relevant lines on the fly, with the error line highlighted.

This is especially valuable for native game development workflows where minidumps and crash reports rarely carry source context. Instead of switching to your IDE or running p4 print manually, the code is right there in the issue.

How it works

Setup takes just a few minutes:

Connect your server — Go to Settings > Integrations > Perforce and enter your credentials.
Configure code mappings — Map your Sentry projects to Perforce P4 depots and set up path translations.
Enable source context (optional) — Turn on SCM source context in your project's General Settings to get inline code in stack traces.

Sentry communicates with your Perforce P4 server using the P4Python library, executing lightweight read-only commands (p4 depots, p4 changes, p4 print). Each organization maintains isolated credentials, and we support both password-based auth and pre-generated P4 tickets for LDAP environments.

How to get started

During beta, we worked closely with multiple game studios to battle-test the integration against real-world Perforce P4 deployments. We resolved concurrency challenges around P4 trust and ticket file isolation, ensuring connections stay clean and independent — whether you have one project or hundreds hitting the same or multiple servers.

The Perforce integration is available now for all Sentry organizations.

Read the Docs

Have questions or feedback?

Join the conversation in our Discord
Email us at gaming-updates@sentry.io

New to Sentry?

Try Sentry for free

Introducing Seer Agent: The answer is already in Sentry. Now you can ask for it.

Tue, 28 Apr 2026 07:00:00 GMT

This is a story about an engineer's night that could have been bad, but ended up… not so bad.

A few weeks ago, on a Saturday, our AI debugger, Seer, started failing.

Note the big scary spike on the right.

The errors were generic failures from the LLM calls, nothing that pointed at a root cause. Most of the team wasn't scheduled to be on this weekend, and it just so happened Indragie, our Head of AI, was online. He started paging engineers.

While he waited for people to come online, he opened up a tool we've been testing internally for a few months now: Seer Agent. Indragie told Seer Agent a bit about what he was seeing, and asked it to figure out what was going on.

It came back in seconds. The model calls were being rate-limited in specific regions for a specific model, even though we had enough provisioned throughput to handle the traffic. The rate limiting turned out to be a symptom of an upstream infrastructure outage on the provider's side, which we confirmed after the incident, but Seer Agent had already pointed us at the exact region-and-model pattern that made the provider's role obvious. Everything else was fine.

That's the kind of finding that would normally start with someone pulling up a dashboard, filtering by region, cross-referencing traffic against error rate, noticing the shape, and then working backwards to why one specific region was stumbling. Indragie knows his stuff, but he's not contributing to the codebase day to day, he's management ;), so it would have taken him at least half an hour to get there. If we're being honest, probably longer.

He had the root cause ready before the on-call engineer joined the channel.

That's the job Seer Agent is for: to investigate any issue in your application from 'big super visible outage that has people shouting at you on Twitter' to 'things are running slow and you don't know why'.

Today, we're rolling out Seer Agent to everyone in open beta.

The problem isn't always an issue

Seer's original premise was simple: when Sentry catches an issue, Seer reads the stack trace, the trace data, the logs, replays, commit history, and the code, and tells you what's wrong. It works well because the investigation has a concrete starting point (the issue), and the data you need is already linked to it.

But a lot of debugging doesn't start with an error.

Sometimes it starts the way Indragie's example started: you do have an issue, but the error message isn't the most helpful and the real failure is somewhere upstream that the stack trace doesn't reach.

In all of those cases, you know something about the symptom. You just don't know where to look.

So you start manually: open the trace explorer, write a query, filter by environment, group by region, switch to logs, pivot on a tag, go look at the service that's upstream of this one, check its error rates, go back to traces, try a different span attribute. You're not debugging yet. You're navigating to where the debugging will happen.

Seer Agent is the tool that does that navigation for you. You describe what you're seeing, and it does the traversal across all of the context Sentry has on your system and tells you what it found.

Your telemetry is already a graph

You can already search across your telemetry in Sentry's Explore product. You can write queries against traces, filter logs, pivot on attributes. Explore is powerful, and for people who already know the ins and outs of their Sentry data it's the fastest way to answer a specific question.

The problem with starting a debugging session in Explore is that you have to know the shape of your data before you can ask anything. If you don't know which service is upstream of the failing one, you can't filter for it. If you don't know what span attribute to group by, the group-by is a shrug. Explore rewards operators who already have the map.

Seer Agent doesn't search your telemetry the way a generic LLM with a search tool would. Sentry's telemetry is already trace-connected. When an error happens, we know the trace it happened in. It knows the spans inside that trace, the logs emitted during those spans, the deploy that was live at the time, and the commits in that deploy. The agent walks those connections directly. It isn't guessing at time ranges and hoping the right rows show up in a text search; it's traversing a graph that was built at ingest.

Concretely: if you ask about an error, Seer Agent can pull the exact trace that produced it, the exact spans in that trace, the exact logs emitted by those spans, and the exact source lines the spans came from, without a single WHERE timestamp BETWEEN clause. Then it can walk the same graph in the other direction: which other services participated in traces that touched this endpoint, which of them were unhealthy at the same moment, and what their error rates looked like.

That's what made Indragie's investigation fast. He didn't tell Seer Agent "look at region-level error rates for the Vertex AI provider." He gave it the Sentry issue. It pulled the trace, saw which regions the failing calls were routed to, cross-referenced against recent calls to other models that went through the same provider, noticed that one specific model family was failing in specific regions while others were fine, and surfaced the pattern. Four steps of manual pivoting, done in one pass.

Fixing the hard issues

Some bugs are fun to investigate and tackle yourself. "Lmao look at this silly line of code, who wrote this — oh no, it was me."

Others are not. They're big and ugly and complex and require you to have (or quickly obtain) an absurd amount of context in your brain just to know where to start. Not coincidentally, these are things Seer Agent is very good at.

Failures whose root cause is upstream of your service. Your stack trace ends at your own call site; the real cause is a 429 from someone else's data center. Without Seer Agent you go find the provider's status page, check whether the region you use is affected, and correlate against your own traffic. Seer Agent correlates the traffic against the request shape (provider, model, region, time) and tells you whether the failure is distributed in a pattern that indicates an upstream cause before you open another tab.

Failures that don't trigger a clean alert. A slow degradation on a single endpoint, a 1% error rate that started two hours ago, a tail-latency increase that's only visible in p99. These are the investigations that start with "I noticed this and I want to know if it's real." Seer Agent can pull the baseline for you, compare the current window against it, and tell you whether the thing you noticed is statistically interesting or noise.

Failures that span services. An issue fires in service A, but the real cause is that service B started returning malformed responses ten minutes ago. A trace-connected graph is the only way to see this cleanly, and a human walking the graph manually will lose context two hops in. The agent doesn't.

The bottleneck moves from "where do I look" to "what do I do about what I found," which is where you actually want your engineers spending their time.

Multiplayer Mode in Slack

The Slack Seer agent is in active development, but in beta and ready to use today. You'll be able to start an investigation the same way you'd ask an on-call engineer, by DMing or mentioning it in an incident channel, without having to bounce to the Sentry UI while you're trying to put out a fire. Here's an example how we were using Seer Agent in Slack while building it:

The more interesting thing is that the investigation becomes multiplayer. In the Sentry UI, Seer Agent is a solo tool. But in Slack, anyone in the channel can redirect it mid-step, add context the agent didn't have, or just watch the traversal and learn the system a little better. The investigation also stays in the thread after the incident resolves, so when the same pattern shows up next month, someone can search for it instead of starting over.

You can also trigger Autofix directly from Slack. Sentry alerts now include a "Fix with Seer" button and an initial read on the likely error. Clicking it kicks off the full Autofix workflow. This is currently in public beta. Read more about it in the docs.

The setup is light. If you already have the Sentry Slack integration installed, the Fix with Seer button on your error alerts is already live. If you don't, install it from Settings → Integrations. To DM Seer Agent or @mention it in a channel, run /sentry link in Slack to connect your account — if the Sentry app is already installed, just pick the same workspace when you link and the new features turn on.

What we're building next

A few of the things on the short list, roughly ordered by when you'll see them:

Auto-triage on incident creation. Right now, you have to go to Sentry or Slack and prompt Seer. The better version is the one where an incident getting created automatically fires off an investigation and posts the findings back to the incident channel before anyone has to ask. There's a design for this on our side, and we're starting with our own incident workflow.

Proactive follow-ups. When the agent finishes an analysis, it should suggest the next question, not wait for you to figure out what to ask next. "Do you want me to check whether this pattern exists in other services?" is a cheap prompt to generate and a large quality-of-life win for investigations that run long.

Message queueing and forceful interrupts. Small items, but both high-frequency complaints: you can't queue a follow-up while the agent is thinking, and sometimes you want to kill the current step and redirect without losing the session. Both are on the near-term list.

How to try it

Seer Agent is in open beta for all Sentry users. Open any page in Sentry, hit Cmd + / or click the "Ask Seer" button, and ask it something.

Peek a the docs here, and we'll run a workshop on it next month if you want to watch the team drive it live.

If you find a case where it falls over, tell us. Half of what's on the "what we're building" list above came from people using it and telling us exactly where the agent went wrong.

Two years without cookies on the site, here's where we ended up

Mon, 27 Apr 2026 00:00:00 GMT

In January 2024, I wrote about removing all advertising cookies and user tracking from sentry.io. It was eight months into the decision at the time, and we were still figuring out what broke and what surprised us. That post struck a nerve: it became one of the most-read things we've ever published, probably because everyone building or running a product on the web was watching the same cookie deprecation timeline and wondering what would actually happen if someone just ripped the bandaid off.

It's been over two years now. We never put the cookies back (also never planned to). And the way we spend our growth budget has changed pretty dramatically as a result. Not because we planned some grand strategy from the start, but because removing cookies forced us to rethink where we put our money, and what we actually expected it to do. Roughly 70% of our growth budget now goes to awareness. Here's a sample of what that actually looks like:

We signed a multi-year deal with the Golden State Warriors, Valkyries, and Chase Center.
Syntax.fm became part of Sentry in 2023 rather than starting our own corporate podcast.
We spend a decent amount on billboards and OOH every year.
Podcasts, Reddit, YouTube, third-party newsletters, and influencers are huge channels for us.
We donated $750,000 to open source maintainers through our Open Source Pledge these last two years (and more prior).

Along with the core business model and everything else we do, it's safe to say these investments are working. Our new activated users have been growing exponentially.

This type of investment is pretty uncommon for a company that sells to developers, but we started a lot smaller before getting to these bigger plays. Here's what I've learned so far.

The ways people discover software are changing

If you're building a product, you've probably noticed this already: the ways people find and evaluate tools are shifting under everyone's feet.

There used to be a fairly predictable path. Someone Googles a problem, clicks a few results, maybe reads a comparison post, signs up for a trial. You could grow a product by being good at showing up in that flow: writing content, running some ads, doing SEO.

That path is getting less reliable. Referral traffic from Google is down and zero-click searches are up. Semrush's 2025 data shows around 60% of Google searches now end without a click to any website. When AI Overviews appear, organic click-through rates drop roughly 40%. People's feeds and inboxes are flooded with increasingly competent AI-generated content and outreach, making it harder for anything genuine to cut through.

Meanwhile, companies are scrambling to figure out AEO/GEO (answer engine optimization / generative engine optimization — trying to get LLMs to recommend their products), because it's a million if not billion dollar play for the future. Unfortunately the emerging playbook for that is flooding the zone with listicles, AI-generated templates, or taking the low road against competitors with comparison pages.

In short: there is less organic traffic to go around, LLMs are increasingly making recommendations on behalf of the people who used to click through to your site, and the internet is getting noisier. If you're trying to get your product in front of people, the old playbook is degrading fast.

Because of all of this, we decided the way to grow Sentry isn't by out-producing the noise. It's by showing up in channels where authenticity still compounds, where real people actually spend time, where there's emotional connection, and doing so in phases so we can actually see what's working.

Awareness spend pays off differently now

One thing that changed how I think about where to spend: the channels that drive awareness are now the same channels that LLMs pull from when making recommendations.

YouTube has overtaken Reddit as the most frequently cited social platform in AI-generated answers. Data from Bluefish (reported in Adweek) shows YouTube appeared as a cited source in about 16% of LLM answers over the past six months, compared to 10% for Reddit. Goodie AI's analysis of 6.1 million citations shows YouTube's share of social citations roughly doubled between August and December 2025.

A developer watching a technical YouTube video where someone uses Sentry to debug a hydration error is one thing. An LLM citing that video when someone asks "what's the best tool for catching hydration errors?" is another thing entirely. Brand awareness, discoverability, and LLM recommendations aren't separate problems anymore — they're the same problem.

A dollar spent on a developer influencer video feeds all three at the same time. Here's a great LinkedIn thread on that.

This was one of the unexpected upsides I couldn't have predicted when we removed cookies. In the original post I talked about how going cookieless forced us into self-reported attribution (asking signups how they heard about us) and holistic measurement (looking at blended data rather than pixel-level tracking). That mindset shift is exactly what made it possible to invest confidently in awareness, because these channels don't show up cleanly in any tracking dashboard. If we'd kept our old attribution stack, I'm honestly not sure we would have had the conviction to make some of these bigger bets.

The trap of only doing what's measurable

Here's where I'll get a little inside-baseball on how companies think about spending money to grow, because I think it's useful context for anyone building a product, even a side project.

Research from the IPA (popularized by LinkedIn's B2B Institute) suggests the optimal split between brand building and direct-response is around 60/40 in favor of brand. But most companies do the opposite; Refine Labs found that the typical company puts roughly 80% of budget into things with immediate measurable results (paid search, retargeting) and only 20% into things that build long-term awareness. The reason is simple: it's way easier to justify spending money when you can point to a dashboard that says "we spent X and got Y signups."

This leads to a cycle I've seen at multiple companies: you invest in SEO, paid search, and retargeting ads. Your dashboards show good results. You max those channels out. Growth flatlines. You restructure, expand topics, double down. It doesn't move fast enough. Someone panics and calls for a rebrand or website overhaul.

The harder but more interesting question is: how do you measure someone seeing your product in a YouTube video, or hearing about you on a podcast, or noticing your logo on a billboard and then Googling you three weeks later? A Wynter survey found that 50% of B2B SaaS companies don't even try to track brand awareness. Half the industry isn't measuring it, while the channels that are measurable are getting worse.

At Sentry we've been fortunate to have leadership that sees the value in awareness spend, because internally it's a hard sell at a lot of companies (you're essentially asking for budget without a clean ROI story). When we removed third-party advertising cookies from our website (I wrote about that whole journey here), it stripped away our reliance on tracking data, which was equal parts frustrating as it was eye-opening. It forced us into a few habits that have ended up benefiting growth:

Trusting instincts. We all know intuitively that we're influenced by what we see on YouTube, what we read on Reddit, what our peers recommend, even a well-placed billboard. But it's hard to act on that when you can't prove it in a spreadsheet.

Looking at holistic data. When we made a big investment in a top 5 tech podcast, we saw our unattributed, direct, and organic signups rise that same quarter. We didn't need granular tracking to connect those dots; the overall numbers backed it up.

Not being afraid to test and fail. We have failed a handful of times on podcasts, influencers, and channels. But roughly every fail comes with one channel that opens up a lot of growth, a worthy tradeoff if you're thinking a few years down the road.

A framework for how we invested

Here's the general approach that's worked for us at Sentry. It's helped us stay close and authentic with our audience as we scaled. Think of it as concentric circles:

Start with known quantities first. Focus on influencers, podcasts, and newsletters that solely speak to the people you're building for and nobody else. For us it's developers, so we focus on highly technical content that no marketing team or salesperson would be interested in. Newsletters like TLDR Web Dev or Bytes, podcasts like Syntax.fm.

Then as you saturate those, go to the broader category. For us it's tech: anyone interested in building applications.

Finally, figure out what your audience cares about beyond your category. We asked Reddit to map which other subreddits our target audience frequents. They came back with finance, tech, comedy, and sports (specifically basketball and F1). That insight led us to partner with the Warriors, Valkyries, and to sponsor podcasts like WTF1 and The Race. Seeing your brand sponsor the things your audience loves creates a deeper connection. Do more people than developers watch basketball? Sure. But the diehard devs in the Bay Area and beyond will hopefully see this and associate with Sentry on a deeper level because of it.

The key is to do this in waves and stages. If you do it all at once, you won't know what worked.

Knowing if it's working

None of this works if you're not paying attention to timing: when a campaign goes live, how long it should take to see a response, and which of your signup channels should be moving.

I like to take a weekly time series of our signups by channel and just annotate which weeks we do major things. It's not perfect attribution. But it's directional, and directional is enough to keep stacking good decisions while ditching experiments that don't land. I don't want to come off as flippant, but sometimes I think we overcomplicate this. Tracking awareness can be simpler than people expect.

Back in the cookies post, I mentioned that we salvaged about 50% of our attribution data after going cookieless and instituted a self-reported "how did you hear about us" survey. Two years in, that survey has become one of our most valuable data sources. It's how we learned that YouTube was driving more awareness than we thought, that certain podcast sponsorships were landing, and that developer word-of-mouth was far more influential than any display ad campaign we ever ran. The irony of losing our tracking pixels and gaining better insight into what actually works hasn't worn off.

Bottom line

These channels unlocked new levels of growth I didn't fully see coming. I'm not saying this is the right approach for every company or that you should ditch measurement entirely. But if the predictable channels are plateauing, it's worth experimenting with a phased approach: one investment at a time.

Two years ago we removed all advertising cookies and told ourselves we'd see what happens. What happened is that we stopped optimizing for what was easy to measure and started investing in what we actually believed would work. That led to us understanding our customers better, and that ultimately has led to so many learnings and growth booms along the way.

When agents orchestrate agents, who's watching?

Thu, 23 Apr 2026 00:00:00 GMT

You used to monitor services.

Then you started monitoring AI calls inside services.

Now your AI agent is spinning up other AI agents to complete tasks. Your old monitoring instincts need to evolve.

This isn't hypothetical. Agentic architectures are already in production. Coding agents are calling search agents; orchestrators are spawning specialized sub-agents for retrieval, planning, and execution. Teams are shipping these systems faster than they're figuring out how to watch them.

The problem isn't that agents fail. It's that when they do, you often can't tell which agent introduced the failure, or whether anything technically failed at all.

Traditional tracing wasn't built for this

In a traditional stack, debugging a request means following one thread from entry point to database. One service, one owner, one place to look.

In a multi-agent system, a single user action might trigger a planner agent, three tool-call agents, a validation agent, and a write agent. That's five actors, potentially across different models, different prompts, and very different latency budgets. Errors don't always surface as exceptions. A bad output from a sub-agent might not throw an error at all. It might just start the spiral, propagating as context corruption further down the chain. The orchestrator thinks it succeeded. The user sees something wrong. You open your logs and find nothing obviously broken.

If you want to see what this looks like in practice, this breakdown of a real multi-agent debugging session shows exactly how a silent tool failure two hops upstream can corrupt final output without triggering a single error. It's a good illustration of why the instinct to "read the logs" stops working at this level of complexity. In this world, little missteps compound and avalanche.

This post focuses on what that complexity looks like when you're operating at scale, across teams, with enterprise reliability expectations.

The visibility problem compounds with scale

One agent is readable. Two agents are manageable. Five agents calling each other conditionally, with branching logic and shared context? It's a different category of problem entirely.

You're no longer debugging code execution. You're debugging emergent behavior across a distributed decision graph. The same way microservices made "it's slow somewhere in the stack" a meaningless statement without traces, multi-agent systems make "the AI did something wrong" nearly impossible to act on without the right instrumentation.

Most teams discover this the hard way. Maybe that's a sudden uptick in user churn with no clear cause, or an LLM silently returning bad data three hops down the chain. A token cost bill that tripled overnight. No alert fired because no single component technically crossed a threshold.

Distributed tracing solved this exact problem for microservices. The question is whether your AI pipeline is instrumented to handle the next version of that problem.

What actual, useful multi-agent monitoring looks like

Getting visibility into multi-agent systems isn't a new product category. It's about applying the right primitives with the right granularity. Sentry's AI observability tooling is built on the same foundation as its distributed tracing, which means the mental model transfers even as the complexity scales. Here's what that actually requires:

Trace continuity across agent handoffs. The trace ID needs to follow the task through every agent invocation, not restart at each boundary. You need to see the full tree: who called what, in what order, with what inputs and outputs. A flat list of spans with the same parent doesn't offer the same value when you need to understand which agent in the middle of the chain introduced a bad state.

Per-agent span attribution. Latency, token usage, model version, prompt hash, and output signal should be attributable to each agent individually, not rolled up to the top-level call. Knowing your orchestrator took 4.2 seconds tells you almost nothing. Knowing it was waiting 3.8 seconds on a retrieval sub-agent that returned low-confidence results tells you exactly where to go. This level of attribution is possible by attaching metadata such as model version, token counts, and prompt identifiers to each span during instrumentation.

Failure mode differentiation. Agent timeout, bad tool call output, context window overflow, model refusal, and hallucination downstream of a technically valid response are completely different problems with completely different fixes. Grouping them all as "AI errors" is the equivalent of logging every 500 as "server error." Technically accurate, operationally useless.

Cost and token attribution at the task level. A task that spawned six agents and consumed 40K tokens is a different animal than one that consumed 4K. You need this at query time, broken down per transaction, per user, and per feature. Not buried in an end-of-month billing aggregate. Just tag spans with cost and usage metadata at each agent boundary during instrumentation.

Nested span trees showing agent relationships. Sentry's trace view shows agent invocations as nested spans, so you can see which agent called which, in what order, and what each one consumed. When multiple agents are calling a shared tool or a downstream agent is being invoked by more than one parent, that structure is visible in the trace.

Where Sentry fits

Sentry already has the primitives: distributed tracing, spans, breadcrumbs, performance metrics. If you're using the Sentry SDK in your AI pipeline, you're closer than you think.

For supported frameworks, setup is minimal. Sentry auto-instruments agent invocations, tool calls, and LLM requests across the major AI frameworks in both Python and Node.js, including OpenAI, Anthropic, Google GenAI, LangChain, LangGraph, Pydantic AI, OpenAI Agents SDK, and Vercel AI SDK. Install the SDK and enable tracing to capture baseline visibility. Sentry can group similar failures across runs based on error patterns and metadata. For multi-agent systems, you'll typically extend this with custom spans and tags to reflect your agent architecture.

Here's what that looks like for the OpenAI Agents SDK in Python:


from sentry_sdk.integrations.openai_agents import OpenAIAgentsIntegration

sentry_sdk.init(
    dsn="YOUR_DSN",
    traces_sample_rate=1.0,
    integrations=[OpenAIAgentsIntegration()],
)

If your framework isn't on the supported list, manual instrumentation takes about 10 lines of code per span type using Sentry's gen_ai.* span conventions such as gen_ai.invoke_agent, gen_ai.execute_tool, and gen_ai.request.

One thing worth knowing before you set this up: all AI integrations capture prompt and response content by default, since recordInputs and recordOutputs both default to true. If your prompts or responses contain sensitive data, set both to false. Make sure your privacy policy permits capturing this content before going to production with the defaults enabled.

Either way, you end up with a trace tree showing nested agent invocations, tool executions, and LLM calls as child spans. That gives you visibility into execution and performance. Understanding output correctness and decision quality still requires additional validation layers on top.

Seer can help reduce time-to-triage. When a multi-agent task fails and you have a trace spanning five agents, Seer can analyze the error context, surface the most likely source of degradation, and give you a starting point grounded in your actual production data rather than five equally plausible places to begin.

For a full setup guide across supported frameworks, the AI agent observability guide covers instrumentation in detail. Start there if you're setting this up for the first time.

Practical starting point: instrument your orchestrator first. Get the top-level task as a transaction, with each agent call as a child span. Even partial visibility is better than none when you're trying to triage a production degradation at 2am.

This is the readiness question

Teams adopting agentic systems are going to face the same question their SRE teams faced when they migrated from monolith to microservices: how do we know this is working?

It's not a question that stays abstract for long. The first time an agent-orchestrated workflow produces a wrong answer at scale, or quietly runs up a token bill nobody can attribute, or degrades in a way that no individual span flagged, that's when the question becomes urgent. By then, the teams that already instrumented are triaging. The teams that didn't are left guessing.

The teams that answer that question first with real traces, real attribution, and real alerting are the ones that get to keep running agents in production. The others roll it back after the first incident they can't explain.

Multi-agent observability isn't a nice-to-have at scale. It's table stakes for anyone taking agents beyond the prototype phase. The complexity doesn't ask permission before it shows up in production. It's already there.

No more monkey-patching: Better observability with tracing channels

Tue, 21 Apr 2026 00:00:00 GMT

Almost every production application uses a number of different tools and libraries, whether that's a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what's going on in production, application developers reach out for Application Performance Monitoring (APM) tools like Sentry.

But there's an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves. The task of getting this data is delegated to APM tools like Sentry or OpenTelemetry, which instrument crucial functionality of a library on their behalf.

What is instrumentation?

The most fundamental requirement to make an application observable is the ability to instrument each of its components and the libraries it uses. Instrumentation is the process of adding code to a program to monitor and analyze its internal operations and generate diagnostic data. It's exactly what the Sentry SDKs and OpenTelemetry instrumentation are doing under the hood.

Consider a typical HTTP client library. Application developers want to know when a request starts and completes, along with some metadata like URL, status code and headers. Today, libraries handle this inconsistently: some provide custom hooks like emitter.on('request', ...), while others offer vendor-specific middleware to intercept requests. In these cases, Sentry and OpenTelemetry can write plug-ins that emit observability data.

This works, but it puts the burden on the library or framework (e.g. Nuxt) to consciously design an instrumentation API and identify the right places to expose it. Hooks and interceptors allow injecting observability code at the correct spots, but APM maintainers are entirely dependent on library authors to keep those APIs stable over time. On top of that, there is no shared convention (each library exposes different hook shapes and different metadata) so APM maintainers must write and maintain very different plugins for each library.

How server-side JavaScript is instrumented

The traditional approach to JavaScript instrumentation is "monkey-patching". That's modifying library code at runtime so that library functions not only do their original job, but also emit observability data. This is only possible in CommonJS (CJS), where modules are mutable and synchronously loaded.

However, the ecosystem is shifting. As server-side JavaScript moves further toward ES Modules (ESM), this approach breaks down. ES modules are immutable and loaded asynchronously, which means you simply can't patch imports at runtime the same way anymore. For further information: the ESM Observability Instrumentation Guide covers this topic in greater detail.

The current workaround (and a way to "patch" imports) is using Module Customization Hooks paired with the --import flag. A popular hook is import-in-the-middle/hook.mjs. It works, but it's brittle, complex, and feels like what it is: a workaround.

Both monkey-patching in CJS and Module Customization Hooks in ESM share the same fundamental flaw: they apply instrumentation "from the outside". The library itself is passive. The question worth asking is: what if libraries were active participants in their own observability and emit telemetry data themselves?

This would be possible through diagnostics APIs like Tracing Channels.

Libraries should emit their own telemetry

Rather than waiting for APM tools to reach in and grab data, libraries can proactively expose their internal operations using tools built directly into the runtime. The right tool for this is Diagnostics Channels, and more specifically, Tracing Channels. Those features are being developed by the Node.js Diagnostics Working Group.

A huge shoutout to Stephen Belanger, the creator of the diagnostics_channel API in Node.js, who founded the working group and has been instrumental in pushing this topic forward. He's been providing feedback on proposals and acting as a voice of authority, which is sometimes exactly what's needed to convince library maintainers to get on board.

Diagnostics Channels

Diagnostics Channels are a high-performance, synchronous event system built directly into Node.js. They're also supported in Bun, Deno, and Cloudflare Workers (via the Node.js compatibility flag), making them a cross-runtime primitive.

Their primary use case is one-off events. For example, "a connection was opened" (like node-redis does this here). The limitation is that they don't inherently represent a full lifecycle. You have to manually link start and stop events to measure duration.

Tracing Channels

Tracing Channels solve exactly that limitation. A Tracing Channel is a bundle of related Diagnostics Channels that automatically creates sub-channels for a complete operation lifecycle: start, end, error, and asyncStart. More importantly, a TracingChannel automatically propagates context across async boundaries. This means APM tools can correlate a database query back to the incoming HTTP request that caused it, without any manual bookkeeping.

Together, they give library and framework authors a standardized way to expose internal operations without coupling to any specific logging or tracing vendor. The library emits structured events and observability tools decide what to do with them.

How libraries can implement Tracing Channels

Tracing Channels have essentially zero cost when unused. If no subscriber is listening, emitting data costs almost nothing. It means library authors can add tracing channels without worrying about penalizing users who don't need observability. The benefits are that there is no monkey-patching needed anymore and it eliminates the need for users to pass --import flags for preloading in ESM.

Naming and consistency: The channel is the contract

Tracing Channels should always be scoped to the library that emits them, using the npm package name as the namespace. Since package names are globally unique, this keeps channel names collision-free. For example, mysql2 ships mysql2:query which would emit tracing:mysql2:query:start and all other channels. And the unstorage library ships unstorage.get which emits tracing:unstorage.get:start and so on. The untracing package is working to establish broader naming standards across the ecosystem.

Equally important: Always emit a consistent data structure. Sentry and other APM tools can only provide automatic instrumentation if they know what shape your payload will have.

The pattern itself is straightforward. The library wraps its operation in a tracePromise call:

// Library side (e.g. inside ioredis)


const commandChannel = dc.tracingChannel("ioredis:command");

// In the command execution path:
commandChannel.tracePromise(
  async () => {
    return await executeCommand(cmd);
  },
  { command: cmd.name, args: cmd.args },
);

And on the consumer side, an SDK like Sentry subscribes to those events:

// Consumer side (e.g. Sentry SDK)


dc.tracingChannel("ioredis:command").subscribe({
  start(payload) {
    // create span
  },
  asyncEnd(payload) {
    // finish span
  },
  error({ error }) {
    // record error
  },
});

The library and the observability tool never need to know about each other. The channel is the contract.

The ecosystem is already moving

In early February 2026, we (Andrei, Jan and Sigrid) from Sentry attended OTel Unplugged EU and brought up the topic "Prepare for better JS ESM Support", which was voted on the list of top priorities for the OpenTelemetry ecosystem.

So this isn't a theoretical proposal. A growing number of well-known libraries have already shipped or merged PRs for Diagnostics Channel and Tracing Channel support.

On the framework and HTTP side, undici (Node.js's built-in HTTP client) has shipped Diagnostics Channels since Node 20.12, and also fastify (docs), nitro (PR) and h3 (PR) have native support. On the database side, unstorage (PR) and mysql2 (Docs) already use Tracing Channels, and pg / pg-pool are actively working on it. Redis clients aren't far behind either and already support Tracing Channels in ioredis (PR) and node-redis (PR).

None of this happens without the people willing to do the work. A massive shoutout to Sentry engineer Abdelrahman Awad (@logaretm) for driving Tracing Channel implementations across multiple libraries. And a special thanks to Pooya Parsa (@pi0), his openness to collaborate in h3 and nitro was instrumental in formalizing this approach and showing the ecosystem what it could look like.

The vision ahead

We're still in a "chicken and egg" phase. Libraries need to add channels before APM tools have strong reasons to listen to them, and APM tools need to start listening before authors feel the pressure to add them.

The goal is universal JS observability: a world where Node.js, Bun, and Deno share the same diagnostic patterns, and instrumentation just works without monkey-patching in CJS, without --import flags in ESM, and without fragile workarounds. Libraries become active drivers of observability ensuring they are emitting data they think is the most relevant to their users.

Debugging multi-agent AI: When the failure is in the space between agents

Thu, 16 Apr 2026 00:00:00 GMT

I've been building a multi-agent research system. The idea is simple: give it a controversial technical topic like "Should we rewrite our Python backend in Rust?", and three agents work on it. An Advocate argues for it, a Skeptic argues against, and a Synthesizer reads both briefs blind and produces a balanced analysis. Each agent has its own model, its own tools, its own system prompt.

It worked great in testing. Then I noticed the Synthesizer kept producing analyses that leaned heavily toward one side. Not wrong, but noticeably lopsided. I mean, rewriting the Sentry monorepo in Rust is arguably a bad idea, but it was arguing against on things where I clearly knew it should be for it.

I eventually traced it to the Skeptic's web_search tool. The Advocate was returning 3-4 solid data points per query. The Skeptic, however, was searching for different terms that didn't match the data as well, and was getting back a single generic result. So the Advocate's brief was well-sourced with citations, and the Skeptic's brief was... vibes. The Synthesizer did what any reasonable reader would do: it weighted the better-sourced argument more heavily.

The bug was in a tool call, inside one agent, that silently degraded the input to a completely different agent two steps later. I only found it by clicking through the trace and reading tool outputs at each step.

What is multi-agent observability?

Multi-agent observability is visibility into how multiple AI agents coordinate, hand off work, and influence each other's decisions.

You probably already know single-agent observability: one reasoning chain, some tool calls, a response. The multi-agent version tracks a graph of interconnected reasoning chains where the output of one agent becomes the input of another. A failure anywhere in the graph can silently corrupt everything downstream.

If you're running a single agent with a few tools, standard agent observability has you covered. But the moment you have agents calling other agents, delegating subtasks, or running in parallel with results merged later, you need a different level of visibility.

Why single-agent monitoring doesn't cut it here

Your existing agent monitoring tells you that Skeptic ran in 3.1 seconds and consumed 2,400 tokens. It does not tell you that Skeptic's web_search returned weak results, that the brief it produced was thin compared to the Advocate's, and that the Synthesizer produced a biased analysis because one of its inputs was poor.

There are three specific reasons this falls apart.

Blame is distributed. When the final output is wrong, you can't point at one agent. The Advocate built a reasonable argument from what its tools gave it. The Synthesizer did a reasonable synthesis of what it received. The bug is in the interaction between them, and no single agent's logs will show it.

The worst failures look fine. In traditional software, things throw errors. In multi-agent AI, an agent returns a plausible-but-thin result, the next agent incorporates it without question, and by the time the final output arrives, weak data has been confidently summarized through multiple layers. You'd never know unless you compared the raw inputs.

You can't test every path. A single agent with 5 tools has 5 possible actions per step. Three agents with 5 tools each, running in parallel and merging results? The number of possible execution paths is absurd. You need to observe what actually happens in production because you can't pre-test every combination.

Most "multi-agent" examples are actually single-agent

Before going further, I want to be honest. I built a multi-agent startup idea validator as my first attempt at this playground, and then realized... it was fake multi-agent. A "Market Analyst" handing off to a "Technical Advisor" handing off to a "Devil's Advocate" is just one agent with different tools. A single agent with all the tools and a comprehensive system prompt produces the same output with less latency and less cost.

Microsoft's Cloud Adoption Framework puts it directly: "Don't assume role separation requires multiple agents. Distinct roles might suggest multiple agents, but they don't automatically justify a multi-agent architecture."

Multi-agent earns its pain when:

Objectives genuinely conflict. An agent told to "argue for" and "argue against" in the same prompt produces mediocre output at both. A generator and a critic need to be separate, or the critic pulls its punches.
Information must be isolated. If Agent A seeing Agent B's work would bias the result, they can't share a context window. Advocate/skeptic. Blind peer review.
Different models serve different roles. Cheap fast model for research, expensive capable model for synthesis. One agent means one model.
Tasks should run in parallel. Two independent research tasks running concurrently as separate agents is genuinely faster than one agent doing them sequentially.
Security boundaries require separation. The agent reading user PII shouldn't have database write access.

If your use case doesn't hit at least two of these, start with a single agent and save yourself the debugging pain I'm about to describe.

Common multi-agent architecture patterns

Each pattern produces a different trace shape and breaks in its own way.

Orchestrator / Worker

One agent routes tasks to specialists. This is the most common pattern in the OpenAI Agents SDK, LangGraph, and custom implementations.

POST /api/research (http.server)
└── gen_ai.invoke_agent "Research Director"
    ├── gen_ai.request "chat gpt-5.4"                         ← plan subtasks
    ├── gen_ai.execute_tool "delegate_research"
    │   └── gen_ai.invoke_agent "Web Research Agent"
    │       ├── gen_ai.request "chat gpt-5.4-mini"
    │       ├── gen_ai.execute_tool "web_search"
    │       └── gen_ai.request "chat gpt-5.4-mini"            ← summarize
    ├── gen_ai.execute_tool "delegate_analysis"
    │   └── gen_ai.invoke_agent "Data Analysis Agent"
    │       ├── gen_ai.request "chat gpt-5.4-mini"
    │       ├── gen_ai.execute_tool "query_database"
    │       └── gen_ai.request "chat gpt-5.4-mini"
    └── gen_ai.request "chat gpt-5.4"                         ← synthesize

How it breaks: The orchestrator misclassifies the task and routes to the wrong specialist, who then does perfect work on the wrong problem. Or it passes insufficient context, and the specialist hallucinates what's missing.

Parallel with merge

Independent agents work concurrently on the same problem, and a final agent merges results. This is what the balanced research system uses, and it's the pattern I think has the most interesting debugging challenges.

Advocate workflow .............. 3.2s  (parallel)
├── gen_ai.invoke_agent "Advocate"
│   ├── gen_ai.request "chat gpt-5.4-mini"         ← plan research
│   ├── gen_ai.execute_tool "web_search"           ← find evidence
│   ├── gen_ai.execute_tool "fetch_benchmark"      ← get numbers
│   └── gen_ai.request "chat gpt-5.4-mini"         ← write brief

Skeptic workflow ............... 2.8s  (parallel)
├── gen_ai.invoke_agent "Skeptic"
│   ├── gen_ai.request "chat gpt-5.4-mini"         ← plan research
│   ├── gen_ai.execute_tool "web_search"           ← find counter-evidence
│   └── gen_ai.request "chat gpt-5.4-mini"         ← write brief

Synthesizer workflow ........... 4.1s  (sequential, after both)
└── gen_ai.invoke_agent "Synthesizer"
    └── gen_ai.request "chat gpt-5.4"              ← blind analysis

How it breaks: Uneven tool quality. If one agent's tool calls return richer data, the merge agent naturally weights that side more heavily. The merge agent has no way to know its inputs were unequal, because it only sees the finished briefs, not the raw tool results underneath. This is the bug I had the pleasure of dealing with while crafting this blog post.

Peer handoffs

Agents transfer control directly to each other. The OpenAI Agents SDK handoff() pattern works this way.

POST /api/chat (http.server)
└── gen_ai.invoke_agent "Triage Agent"
    ├── gen_ai.request "chat gpt-5.4-mini"
    ├── gen_ai.handoff "from Triage Agent to Billing Agent"
    └── gen_ai.invoke_agent "Billing Agent"
        ├── gen_ai.request "chat gpt-5.4-mini"
        ├── gen_ai.execute_tool "check_balance"
        ├── gen_ai.handoff "from Billing Agent to Dispute Specialist"
        └── gen_ai.invoke_agent "Dispute Specialist"
            ├── gen_ai.request "chat gpt-5.4"
            └── gen_ai.execute_tool "file_dispute"

How it breaks: State management at the handoff. When Agent A transfers to Agent B, what gets passed? Full conversation history? A summary? Just the last message? Pass everything and you blow context windows. Summarize and you lose nuance. Bugs in the handoff protocol are the hardest to find because they look like bugs in the receiving agent.

What makes multi-agent debugging different

There are a few specific problems you only hit when multiple agents are involved.

Blame attribution across boundaries. When a multi-agent system returns wrong output, the question is: did the right agent receive the task? Did it get the right context? Did it do bad work with good input, or good work with bad input? Without traces that span the full agent graph, you're reading each agent's logs in isolation trying to reconstruct what happened at the boundaries.
Silent cascading failures. This is the one that got me. An agent returns a plausible response, the downstream agent accepts it, and the final output is wrong, but every span shows status: ok. To catch these, you need to be able to compare input and output at each agent boundary and see the full prompt and response at each LLM call. Token counts and latency alone won't help.
Context drift across handoffs. Every time an agent summarizes before passing to the next, information is lossy-compressed. After three handoffs, the original user intent can be barely recognizable. In a trace, you can see this by reading the prompts in sequence: the first agent has the full query, the second has a summary, the third has a summary of a summary. The fix is usually architectural (pass structured data instead of natural language), but you have to see the drift before you can fix it.
Cost explosion without attribution. In our research system, the Synthesizer uses gpt-5.4 while the researchers use gpt-5.4-mini. Without per-agent cost tracking, you'd see total spend growing but wouldn't know the Synthesizer accounts for 60% of the cost despite running only once per query.

A debugging walkthrough with the balanced research system

Here's how I actually found the bug from the opening. The Synthesizer was producing lopsided analyses, and I wanted to figure out why.

Comparing the parallel agents

First thing I did was look at both research agent workflows side by side in the trace view:

Advocate workflow .......................... 3.2s  ✓
├── gen_ai.invoke_agent "Advocate" ......... 3.1s
│   ├── gen_ai.request "chat gpt-5.4-mini" . 0.6s  ← plan
│   ├── gen_ai.execute_tool "web_search" ... 0.2s  ← "rust performance"
│   ├── gen_ai.execute_tool "web_search" ... 0.1s  ← "rust adoption"
│   ├── gen_ai.execute_tool "fetch_benchmark" 0.1s ← rust benchmarks
│   └── gen_ai.request "chat gpt-5.4-mini" . 1.8s  ← write brief

Skeptic workflow ........................... 2.8s  ✓
├── gen_ai.invoke_agent "Skeptic" .......... 2.7s
│   ├── gen_ai.request "chat gpt-5.4-mini" . 0.5s  ← plan
│   ├── gen_ai.execute_tool "web_search" ... 0.1s  ← "python migration costs"
│   └── gen_ai.request "chat gpt-5.4-mini" . 1.9s  ← write brief

The asymmetry was immediately obvious. The Advocate made 3 tool calls. The Skeptic made 1.

Inspecting the tool results

Clicking into the Advocate's web_search spans, each returned 3-4 data points:

["Rust programs typically run 2-5x faster than equivalent Python...",
 "Discord switched from Go to Rust... latency drop from 50ms to 1ms",
 "Figma rewrote their multiplayer server... memory usage by 10x"]

The Skeptic's single web_search had searched for "python migration costs":

["No specific data found for 'python migration costs'. Consider refining your search terms."]

So the Skeptic wrote its brief from general knowledge with no citations, while the Advocate had 10+ data points from 3 searches.

Following it to the Synthesizer

Clicking the Synthesizer's gen_ai.request span and reading the prompt confirmed it. It received one well-sourced brief with citations and benchmark data, and one brief with general arguments and no data. It weighted the better-sourced one more heavily, which is exactly what you'd want a synthesizer to do. The problem was upstream.

The fix

Two options: improve the Skeptic's prompt to try multiple search queries when the first returns weak results, or improve the web_search tool to handle broader query terms. I did both. Watched the traces afterward, and both agents were producing comparably sourced briefs.

The root cause was a weak tool result for one agent that cascaded through the pipeline as information asymmetry. Without seeing every tool call and every prompt in the trace, I would have blamed the Synthesizer's prompt for being biased.

Auto-instrumenting multi-agent frameworks

Sentry auto-instruments the OpenAI Agents SDK, LangGraph, and other frameworks. The integration activates automatically when the package is detected. Here's the setup for the balanced research system:



from agents import Agent, Runner, function_tool, ModelSettings

sentry_sdk.init(
    send_default_pii=True,   # captures prompts and responses in spans
    traces_sample_rate=1.0,
    enable_logs=True,
)

@function_tool
def web_search(query: str) -> str:
    """Search the web for information on a topic."""
    ...

advocate = Agent(
    name="Advocate",
    model="gpt-5.4-mini",
    model_settings=ModelSettings(temperature=0.3),
    instructions="Build the strongest case FOR the position...",
    tools=[web_search],
)

skeptic = Agent(
    name="Skeptic",
    model="gpt-5.4-mini",
    model_settings=ModelSettings(temperature=0.3),
    instructions="Build the strongest case AGAINST the position...",
    tools=[web_search],
)

synthesizer = Agent(
    name="Synthesizer",
    model="gpt-5.4",
    model_settings=ModelSettings(temperature=0.5),
    instructions="Produce balanced analysis from two research briefs...",
)

async def analyze(topic: str):
    # Parallel execution: two independent trace trees
    advocate_result, skeptic_result = await asyncio.gather(
        Runner.run(advocate, topic),
        Runner.run(skeptic, topic),
    )

    synthesis_input = f"""
    Brief A: {advocate_result.final_output}
    Brief B: {skeptic_result.final_output}
    """
    return await Runner.run(synthesizer, synthesis_input)

SENTRY_DSN is read from the environment. send_default_pii=True is what enables prompt and response capture in spans, which is essential for debugging the handoff problems described above. The SDK creates gen_ai.invoke_agent spans for each agent, gen_ai.execute_tool spans for tool calls, and gen_ai.request spans for LLM calls with token counts and model info.

For JavaScript/TypeScript with the Vercel AI SDK or LangChain, use tracesSampler to capture AI routes at 100%:



Sentry.init({
  dsn: process.env.SENTRY_DSN,
  sendDefaultPii: true,
  tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
    if (attributes?.['sentry.op']?.startsWith('gen_ai.')) {
      return 1.0;
    }
    if (name?.includes('/api/chat') || name?.includes('/api/agent')) {
      return 1.0;
    }
    return inheritOrSampleWith(0.2);
  },
});

For more on why you should sample AI traces at 100%, see the companion post on sampling strategies for agentic applications.

Building multi-agent dashboards

Pre-built agent dashboards show per-model and per-tool aggregates. For multi-agent systems, you need to slice by agent. Some dashboards you can build with the Sentry CLI (or follow our hands-on dashboards cookbook):

Per-agent cost attribution:

sentry dashboard widget add 'Multi-Agent Monitoring' "Cost by Agent" \
  --display table --dataset spans \
  --query "sum:gen_ai.usage.total_tokens" "count" \
  --where "span.op:gen_ai.invoke_agent" \
  --group-by "gen_ai.agent.name" \
  --sort "-sum:gen_ai.usage.total_tokens"

This is how I found out the Synthesizer was 60% of my cost despite running once per query (because it uses gpt-5.4 instead of gpt-5.4-mini).

Tool reliability by agent:

sentry dashboard widget add 'Multi-Agent Monitoring' "Tool Errors by Agent" \
  --display table --dataset spans \
  --query "failure_rate" "count" \
  --where "span.op:gen_ai.execute_tool" \
  --group-by "gen_ai.agent.name" "gen_ai.tool.name" \
  --sort "-failure_rate"

If the Skeptic's web_search returns empty results 15% of the time while the Advocate's returns empty 3% of the time, you've found your lopsided synthesis problem before users report it.

Agent duration comparison:

sentry dashboard widget add 'Multi-Agent Monitoring' "Agent Duration p95" \
  --display bar --dataset spans \
  --query "p95:span.duration" \
  --where "span.op:gen_ai.invoke_agent" \
  --group-by "gen_ai.agent.name"

Agents doing similar work should take similar time. Big duration gaps between parallel agents usually mean one is making more (or fewer) tool calls than expected.

What I'd recommend if you're building multi-agent systems

Based on debugging this system and reading a lot of traces:

Capture prompts and responses at every agent boundary. This is the send_default_pii=True flag. Token counts show cost. But the prompts, responses, and tool input/output data are where you'll actually find bugs. The handoff boundaries between agents are where most multi-agent issues live.

Name your agents clearly. "Agent" and "Sub-Agent" in your trace view tells you nothing. "Advocate" and "Skeptic" and "Synthesizer" tells a story you can follow.

Compare parallel agents. When agents run concurrently and their outputs merge, the merge agent can't tell if its inputs were equally good. But you can tell from the traces. Look for asymmetry in tool call counts, token usage, and duration between agents that should be doing similar work.

Sample at 100%. This matters even more for multi-agent than single-agent. A run that fails on a specific combination of tool results might happen 1 in 50 times. At 10% sampling, you'll need 500 runs before you capture one. See how to sample AI traces at 100% for the setup.

Alert on tool failure rates per agent, not globally. A tool that fails 5% globally might fail 20% for one specific agent because of how it formulates queries. Global averages hide per-agent problems.

Connect to your full stack. A slow web_search tool might be caused by rate limiting from an upstream API, not an agent issue. Multi-agent traces that sit inside your existing distributed traces let you see everything.

Getting started

If you're already using Sentry for agent monitoring, multi-agent traces work automatically. The SDKs detect agent invocations, handoffs, and tool calls.

Starting fresh:

pip install sentry-sdk or npm install @sentry/node
Initialize with traces_sample_rate=1.0 and send_default_pii=True
Run your multi-agent workflow. Spans appear in Sentry's trace view.

For setup across 10+ frameworks, see the AI agent observability guide.

Grave improvements: Native crash postmortems via Android tombstones

Wed, 15 Apr 2026 00:00:00 GMT

Native crashes on Android have always been harder to debug than they should be.

The platform has its own crash reporter (debuggerd) that captures the crashing thread, every other running thread, register state, and memory maps into a file called a tombstone. Tombstones have been a part of Android for a long time; in fact, they've been there in one form or another since Android's first commit.

The problem: for most of Android's life, you couldn't read tombstones programmatically from inside your app. That left SDK-based native crash reporting (like ours) stuck replicating infrastructure the platform already had — at the cost of binary overhead, incomplete Java frame symbolication, and a C++ fork we had to maintain against a moving AOSP target.

Android 11 (SDK level 30) introduced ApplicationExitInfo. Android 12 (SDK level 31) added access to the trace input stream for ApplicationExitInfo.REASON_CRASH_NATIVE.

Sentry's Android SDK, as of version 8.30.0, reads that stream on all devices running Android 12 and above and ships it as a native crash event. This dramatically improves crash reporting for Android apps that use native code, whether your team wants just basic crash alerts or deep debugging info.

Let's dive into how things worked before, what it took to wire it into the SDK without breaking the existing NDK integration, and the improvements these changes bring.

Before tombstones: a fork chasing a moving target

Before tombstone support, the Native SDK (sentry-native) was used in the Android SDK as the primary native error reporting source. Since Android is based on Linux, a considerable part of the SDK could be reused. For those parts that couldn't, work on integrating Android-specific code began in 2019.

Specifically, the integration of libunwindstack (the AOSP platform unwinder still used today to produce stack traces for debuggerd and, in turn, tombstones) was a key moment for supporting native crashes in Sentry's Android SDK.

Why, you ask? Because the Native Development Kit (NDK) did not offer a general-purpose stack walker (narrator: it still does not).

libunwindstack is not part of the NDK, but part of the Android Open Source Project (AOSP) platform code and thus is not directly accessible to app developers via the usual means. Sentry forked a repository that patched the platform code to build with the NDK, and since then, maintained that fork without any changes in the upstream patched version. This provided stack tracing capabilities inside the rather complicated Android Runtime (ART) environment, which has mixed stack-traces between classic native code, native code that is part of the VM execution, and Java/Kotlin frames that also appear as native frames, since they are either interpreted, JITed, or AOTed.

While this can already be challenging for a normal stack-walker, it is also a problem from the perspective of symbolication: there are more OEM builds than Sentry can realistically collect platform binaries from. So, while a core set of libraries will likely exist in our backend stores, we cannot rely on all of them being available. Thus, symbolication on Android happens on the client-side.

Considerable restructuring in the platform code, however, made manual upstream alignment very hard over time. In addition to that libunwindstack is a C++ library, which means, while being light on standard library usage, it still needs to be linked against it statically in order to ensure being isolated at runtime from ABI-incompatible versions of the standard library.

It also introduced a couple of challenges:

The biggest issue always was size: since we must package binaries for x86, x86_64, armeabi-v7a, and arm64-v8a, we currently add around 1MiB of stripped binary to every app that needs native error reporting or instrumentation. The Sentry SDK code only accounts for 20% of that size; the rest is libunwindstack and the C++ infra it depends on.
Incomplete implementation: since the library size is already significant, certain features have been excluded from the build: there is currently no DEX/OAT symbolication (meaning none of the Java frames are symbolicated), and there is incomplete support for locating DWARF CFI in OAT frames, which often leads to dramatically shortened stack traces in release builds.
Limited context: Since the inproc backend, which handles the crashes on Android, doesn't stop any threads by design, it also only provides the stack trace of the crashed thread, which, in particular on Android, is often way too little context to uncover the root-cause of a crash

So while all of these are fixable, the effort required is significant and would also lead to a long-term commitment to maintain against the moving target that is AOSP. Introducing tombstone support allows us to fix all the issues mentioned for users who run on Android 12+, which is a significantly growing portion of the incoming events and user base. At the same time, it opens the door to work on better solutions for edge cases.

Android 12+ accounts for ~69% of 2B+ Android error events ingested over the past 30 days.

What tombstones give us

The problems outlined above: size, incomplete traces, missing Java symbolication, and maintenance burden, all stem from replicating the platform crash infrastructure that already exists on the device.

Tombstones fix each of them:

All threads, fully symbolicated. Where the inproc backend could only capture the crashing thread, tombstones provide stack traces and register sets for every thread at the moment of the crash. On Android, the crashing thread is often just the victim of a problem that originated in another thread. Seeing all of them is the difference between a solvable crash and an enigma.
Java/Kotlin frames resolved. The platform unwinder has full access to ART internals that a forked NDK build cannot have. DEX/OAT symbolication, which we deliberately excluded to limit binary size, comes for free.
No binary overhead for stack traces. Tombstones are produced by the platform's own libunwindstack, the same library we have been forking and shipping. The ~1MiB binary weight for all supported ABIs drops to zero for apps that rely on tombstones alone.
Maintenance shifted to the platform. We consume structured output instead of tracking AOSP restructuring and keeping a C++ fork buildable against the NDK.
Register memory context. Memory dumps around pointer values in the crashing thread's registers show the data being operated on at the point of the crash. (Not yet integrated into the Sentry event payload or UI.)
Symbol resolution: Since we now have modules and resolved symbols on the client, we can also strip non-actionable trace contents like runtime-internal frames before sending.

How to use it

Tombstone support is available since version 8.30.0 of sentry-android-core.

If your app runs on Android 12+ and you enable tombstones, you will automatically get more complete reports delivered for all native crashes that affect your app. If you used Native SDK/NDK integration, you will automatically get better stack traces for all your threads and still see the context you created on the native side.

If you have never used the Native SDK interfaces in your native code directly, you can evaluate your options for disabling the NDK integration. If enough users of an app moved on to Android 12+, there is no further use in running both integrations.

If, however, the Native SDK interface is still in direct use, both integrations work together without any visible degradation in user experience.

If you want to turn on the feature, you can do so programmatically via SentryAndroidOptions:

SentryAndroid.init(context) { options ->
    options.isTombstoneEnabled = true
}

Or declaratively in your AndroidManifest.xml:

<meta-data android:name="io.sentry.tombstone.enable" android:value="true" />

Since tombstones capture every thread at the moment of the crash, you can inspect any of them directly in the issue detail view:

The "Most Relevant" view strips the trace down to the actionable frames, the ones that drive issue grouping and naming, isolating inApp JNI frames, but excluding Jetpack Compose layers:

Expanding the collapsed frames reveals the complete picture: from __libc_init through process startup, the Android message loop, the native/Java runtime boundary crossings, and up through the view layer to the crash site:

The implementation challenges

Tombstone support touched on many layers of the SDK because native crash reporting intersects with session management, event deduplication, envelope caching, event enrichment, and the existing NDK integration that already handles the same class of crashes through a completely different mechanism.

Sharing infrastructure with ANR detection

The most immediate architectural challenge was that the SDK already had an integration consuming ApplicationExitInfo: the ANR integration, which handles REASON_ANR. Both integrations need the same lifecycle: query the historical exit list, skip already-reported entries, distinguish the latest (enrichable) entry from older (historical) ones, persist a "last reported" timestamp marker, wait for the previous session to flush, and block until the event is written to disk.

Duplicating this would have been the faster path, but the implementation instead extracted a generic history dispatcher parameterized by a policy interface. Each integration implements the policy (target reason, historical flag, report builder), while the dispatcher owns traversal, ordering, deduplication, and flush coordination. The envelope cache's timestamp marker system was similarly generalized so both ANR and tombstone markers are handled polymorphically.

This refactoring had a cascading consequence for event processing. The existing ANR event processor was tightly coupled to ANR assumptions and enriched every "backfillable" event (which are events that don't have access to the live scope of the session they emerge from, but the scope can forensically be reconstructed) as though it were an ANR. With tombstones now also flowing through as backfillable events, the processor was generalized with an enrichment strategy interface. ANR-specific logic (exception synthesis from textual thread dumps, background/foreground fingerprinting, profile-based culprit identification) moved into a dedicated enricher. At the same time, the shared path (scope backfilling, options backfilling, device/OS context) became the generic default that tombstones are fully served by without needing their own enricher.

A considerable part of this new infrastructure can now be reused for other ApplicationExitInfo categories, likely even when the resulting artifacts won't be events (but rather entries in SDK client reports).

Coexisting with the NDK integration

The deeper problem was that tombstones and the existing Sentry NDK integration (using sentry-native-ndk) report the same crash. The Native SDK catches the signal at runtime via its own signal handler and writes an envelope to the "outbox". The tombstone is generated by the platform's debuggerd, which is invoked after the Native SDK's signal handler chains to the previous handler, but the tombstone only arrives through ApplicationExitInfo on the next launch, after the process has been killed.

If both integrations are active, every native crash produces a duplicate. We need both to get the full picture: the richer stack traces, thread coverage, and up-to-date memory maps from the tombstone, combined with user-supplied scope data from the Native SDK. So we can't simply turn one off in favor of the other.

Solving this required correlating the two events by timestamp (within a 5-second tolerance) and merging them into one. The correlation itself was trivial. The complexity came from the different paths the two events take before they can be merged.

The Native SDK serializes envelopes to a shared app directory (the "outbox"), which acts as a signal to the Android SDK that an envelope is ready to send. For a native crash, this signal arrives too late to be picked up by the normal outbox sending infrastructure during the crash. So, on the next start, that infrastructure loads every envelope fully into memory, because its sole purpose is to send them to the backend. If we reused it for merge discovery, we would deserialize every queued envelope into memory just to find the one native crash event worth merging. On a device that has been offline and accumulated envelopes, this means a spike in memory pressure and CPU load for what is almost always a single match.

Instead, a lightweight scan phase streams through each envelope file, parsing only item headers and extracting the platform and timestamp fields via streaming JSON, without deserializing the full event. A bounded input stream tracks position within each envelope item and skips unread bytes to correctly advance to the next item. Full deserialization only happens once a timestamp match is found. The resulting streaming envelope/event parsing infrastructure can likely be reused in other parts of the SDK.

The merged event carries a TombstoneMerged exception mechanism (alongside the existing Tombstone and signalhandler mechanisms) so the backend, developers, and customers can distinguish provenance.

Session and `crashedLastRun` lifecycle

Native crash reporting interacts with session tracking in ways that require careful coordination. When the tombstone integration processes a crash, it needs to end the previous session as crashed and set crashedLastRun to true. But the NDK integration has its own mechanism for this: a crash marker file checked by the session finalizer on next launch.

A dedicated marker hint (deliberately distinct from the one used for ANRs) was introduced so that the envelope cache can recognize tombstone events during session persistence: when it sees the hint, it ends the previous session as crashed with the crash timestamp. The session finalizer then detects the already-crashed state and sets crashedLastRun accordingly, without re-processing the NDK crash marker. Crucially, the native crash marker file is still cleaned up regardless of whether the tombstone integration handled the crash. Otherwise, the NDK path would re-report it on every subsequent launch.

The protobuf dependency problem

Android tombstones use a protobuf format defined in AOSP (tombstone.proto). The initial implementation used protobuf-javalite for decoding, which immediately caused version conflicts for SDK consumers already using protobuf (usually via Firebase). Within a month of the initial release, we replaced it with epitaph, a handwritten decoder for the tombstone protobuf encoding, free of transitive dependencies and weighing around 30KiB. We also added a scheduled CI workflow to monitor AOSP for changes to the tombstone protobuf schema, so we know early if any consequential format changes land in the platform.

The unifying theme across these challenges is that native crash reporting is not a self-contained feature. It sits at the intersection of the SDK's event pipeline, session lifecycle, disk caching, and the existing NDK integration, each of which had been designed with the assumption that it was the only actor in its domain.

Adding tombstone support meant teaching these components to share: the history dispatcher with ANR detection, the outbox with the NDK integration, the session finalizer with a new crash source, and the event processor with a new category of event. We chose refactoring over duplication at each of these intersection points, which made the initial PRs larger and the review cycles a bit longer, but left the architecture at least as clean as we found it. Especially the common Java SDK core did not see any behavioral changes.

Closing the gap

Tombstone support closes a gap that has existed since Sentry first shipped native crash reporting on Android: the difference between what the platform knows about a crash and what the SDK could tell you.

While that gap might seem arbitrary since we could replicate parts of the platform's own crash infrastructure inside the app, it only happened by paying the cost of binary size, maintenance burden, and still incomplete results. With ApplicationExitInfo providing programmatic access to the same data that debuggerd produces, we can now offer richer crash context with less overhead and fewer moving parts.

Of course, the limitation is real: this only works on Android 12 and above. For older devices and apps that need instrumentation of their native code beyond error reporting, the NDK integration remains available, and the two coexist cleanly. But with Android 12+ now representing 75% (according to apilevels.com, as of 03/2026) of cumulative usage distribution, the balance has tipped. For most apps, tombstone support is the primary native crash reporting path today, and sentry-native-ndk is the fallback.

Tombstone support is available in sentry-android-core 8.30.0 and above. See the Android SDK docs for configuration details and guidance on whether to keep or drop the NDK integration for your app.

Sample AI traces at 100% without sampling everything

Thu, 09 Apr 2026 00:00:00 GMT

A little while ago, when agents were telling me "You're absolutely right!", I was building webvitals.com. You put in a URL, it kicks off an API request to a Next.js API route that invokes an agent with a few tools to scan it and provide AI generated suggestions to improve your… you guessed it… Web Vitals. Do we even care about these anymore?

I had the traceSampleRate set to 100% in development, but in production, I sampled it down to 10% because… well that's what our instrumentation recommends. Kyle wrote a great blog post explaining that "Watching everything is watching nothing". But AI is non-deterministic. And when I was debugging an error from a tool call, I realized I was missing very important spans emitted from the Vercel AI SDK because of that sampling strategy.

An agent run with 7 tool calls doesn't get partially sampled. You either capture the whole span tree or you lose it entirely. This is how head-based sampling works.

I was chasing ghosts.

Agent Runs Are Span Trees, and Sampling Is All-or-Nothing

A typical agent execution looks like this in Sentry's trace view:

POST /api/chat (http.server)
└── gen_ai.invoke_agent "Research Agent"
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← initial reasoning
    ├── gen_ai.execute_tool "search_docs"              ← tool call
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← process results
    ├── gen_ai.execute_tool "summarize"                ← second tool call
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← decides to hand off
    └── gen_ai.execute_tool "transfer_to_writer"       ← handoff via tool
        └── gen_ai.invoke_agent "Writer Agent"
            ├── gen_ai.request "chat gemini-2.5-flash"
            └── gen_ai.execute_tool "format_output"

That's 11 spans in a single run. The sampling decision happens once, at the root: the POST /api/chat HTTP transaction. Every child span inherits that decision. If the root is dropped, all 9 spans disappear.

This is fundamentally different from sampling HTTP requests, where dropping one GET /api/users is no big deal because the next one is basically identical.

Agent runs are not identical. Each one makes different decisions, calls different tools, processes different data. An agent that hallucinated on run 67 might work perfectly on run 420. If your sample rate dropped 67, you'll never know what went wrong.

How Head-Based Sampling Actually Works (and Why It Matters Here)

Both the Sentry JavaScript and Python SDKs use head-based sampling: the decision is made at the start of the trace, before any child spans exist.

In the JavaScript SDK, SentrySampler.shouldSample() is explicit about this:

// We only sample based on parameters (like tracesSampleRate or tracesSampler)
// for root spans. Non-root spans simply inherit the sampling decision
// from their parent.

Non-root spans don't get a vote. If the root span was dropped, tracesSampler is never called for any child, including your gen_ai.request and gen_ai.execute_tool spans. They inherit the parent's fate.

In Python, the same logic lives in Transaction._set_initial_sampling_decision(). The traces_sampler callback receives a sampling_context dict with transaction_context (containing op and name) and parent_sampled. It only fires for root transactions.

This means head-based sampling doesn't support independently sampling gen_ai child spans at a different rate than their parent transaction. There's no "sample 100% of LLM calls but 10% of HTTP requests." If the HTTP request is dropped, the LLM calls inside it are dropped too.

I'd love to walk through a few different scenarios to show the difference in filtering approaches based on wether or not the root span is from an agent or the application.

Scenario 1: The `gen_ai` Span IS the Root

Sometimes your agent run is the root span. Maybe it's a cron job thats running an agent, a queue consumer processing an AI task, or a CLI script. In these cases, tracesSampler sees the gen_ai.* operation directly and you can match on it:

JavaScript:

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
    // Standalone gen_ai root spans - always sample
    if (attributes?.['sentry.op']?.startsWith('gen_ai.') || attributes?.['gen_ai.system']) {
      return 1.0;
    }

    return inheritOrSampleWith(0.2);
  },
});

Python:

def traces_sampler(sampling_context):
    op = sampling_context.get("transaction_context", {}).get("op", "")

    # Standalone gen_ai root spans - always sample
    if op.startswith("gen_ai."):
        return 1.0

    parent = sampling_context.get("parent_sampled")
    if parent is not None:
        return float(parent)

    return 0.2

sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)

This is the easy case. The hard case is next.

Scenario 2: The `gen_ai` Spans Are Children of an HTTP Transaction

This is the common case in web applications. A user hits POST /api/chat, your framework creates an http.server root span, and somewhere inside that request handler your agent runs. By the time the first gen_ai.request span is created, the sampling decision was already made for the HTTP transaction.

The fix: identify which routes trigger AI calls and sample those routes at 100%.

JavaScript:

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
    // Standalone gen_ai root spans
    if (attributes?.['sentry.op']?.startsWith('gen_ai.') || attributes?.['gen_ai.system']) {
      return 1.0;
    }

    // HTTP routes that serve AI features - always sample
    if (name?.includes('/api/chat') ||
        name?.includes('/api/agent') ||
        name?.includes('/api/generate')) {
      return 1.0;
    }

    return inheritOrSampleWith(0.2);
  },
});

Python:

def traces_sampler(sampling_context):
    tx_context = sampling_context.get("transaction_context", {})
    op = tx_context.get("op", "")
    name = tx_context.get("name", "")

    # Standalone gen_ai root spans
    if op.startswith("gen_ai."):
        return 1.0

    # HTTP routes that serve AI features - always sample
    if op == "http.server" and any(
        p in name for p in ["/api/chat", "/api/agent", "/api/generate"]
    ):
        return 1.0

    # Honour parent decision in distributed traces
    parent = sampling_context.get("parent_sampled")
    if parent is not None:
        return float(parent)

    return 0.2

sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)

Replace the route strings with whatever paths your AI features live on. If your entire app is AI-powered, skip the tracesSampler and just set tracesSampleRate: 1.0.

The Cost Math: AI API Bills Dwarf Observability Costs

The instinct to sample AI traces at a lower rate usually comes from cost concerns. Let's look at the actual numbers.

What	Cost per event
Claude Sonnet 4 input (1K tokens)	~$0.003
Claude Sonnet 4 output (1K tokens)	~$0.015
Gemini 2.5 Flash input (1K tokens)	~$0.00015
Gemini 2.5 Flash output (1K tokens)	~$0.0006
A typical agent run (3 LLM calls, 2 tool calls)	$0.02-$0.15
Sentry span events for that agent run (~9 spans)	Fraction of a cent

The LLM calls themselves are 10-100x more expensive than the monitoring. You're already paying for the AI call; dropping the observability span to save a fraction of a cent per call is like skipping the dashcam to save on gas.

When 100% Tracing Isn't Feasible: Metrics and Logs as a Safety Net

If you genuinely can't sample AI routes at 100%, because of, say, massive scale or strict budget restraints, you can still capture the important signals from every AI call using Sentry Metrics and Logs. Both are independent of trace sampling.

JavaScript - emit metrics on every LLM call:



// After every LLM call, regardless of trace sampling:
Sentry.metrics.distribution("gen_ai.token_usage", result.usage.totalTokens, {
  unit: "none",
  attributes: {
    model: "claude-sonnet-4-6",
    user_id: user.id,
    endpoint: "/api/chat",
  },
});

Sentry.metrics.distribution("gen_ai.latency", responseTimeMs, {
  unit: "millisecond",
  attributes: { model: "claude-sonnet-4-6" },
});

Sentry.metrics.count("gen_ai.calls", 1, {
  attributes: {
    model: "claude-sonnet-4-6",
    status: result.error ? "error" : "success",
  },
});

Python - emit metrics on every LLM call:



sentry_sdk.metrics.distribution(
    "gen_ai.token_usage",
    result.usage.total_tokens,
    attributes={
        "model": "claude-sonnet-4-6",
        "user_id": str(user.id),
        "endpoint": "/api/chat",
    },
)

sentry_sdk.metrics.distribution(
    "gen_ai.latency",
    response_time_ms,
    unit="millisecond",
    attributes={"model": "claude-sonnet-4-6"},
)

sentry_sdk.metrics.count(
    "gen_ai.calls",
    1,
    attributes={
        "model": "claude-sonnet-4-6",
        "status": "error" if error else "success",
    },
)

You can also log every call with structured attributes for searchability:

JavaScript:

Sentry.logger.info("LLM call completed", {
  model: "claude-sonnet-4-6",
  user_id: user.id,
  input_tokens: result.usage.promptTokens,
  output_tokens: result.usage.completionTokens,
  latency_ms: responseTimeMs,
  status: "success",
});

Python:

sentry_sdk.logger.info(
    "LLM call completed",
    model="claude-sonnet-4-6",
    user_id=str(user.id),
    input_tokens=result.usage.prompt_tokens,
    output_tokens=result.usage.completion_tokens,
    latency_ms=response_time_ms,
    status="success",
)

Here's what each telemetry layer gives you:

Signal	Traces (sampled)	Metrics (100%)	Logs (100%)
Full span tree with prompts/responses	Yes	No	No
Token usage distributions (p50, p99)	Partial	Yes	No
Cost attribution by model/user	Partial	Yes	Yes
Error rates by model/endpoint	Partial	Yes	Yes
Latency distributions	Partial	Yes	No
Searchable per-call records	Yes	No	Yes

The recommended approach: Use tracesSampler to capture 100% of AI-related routes. If that's not possible, combine a lower trace rate with metrics and logs emitted on every call. Traces give you the debugging depth; metrics and logs give you the aggregate picture.

Once you're emitting these metrics, you can build custom dashboards that go beyond what the pre-built AI Agents dashboard shows. The Sentry CLI makes this scriptable:

# Find your most expensive users - the pre-built dashboard doesn't group by user
sentry dashboard create 'AI Cost Attribution'
sentry dashboard widget add 'AI Cost Attribution' "Most Expensive Users" \
  --display table --dataset spans \
  --query "sum:gen_ai.usage.total_tokens" \
  --where "span.op:gen_ai.request" \
  --group-by "user.id" \
  --sort "-sum:gen_ai.usage.total_tokens" \
  --limit 20

# Cost per conversation - find runaway multi-turn sessions
sentry dashboard widget add 'AI Cost Attribution' "Cost per Conversation" \
  --display table --dataset spans \
  --query "sum:gen_ai.usage.total_tokens" "count" \
  --where "span.op:gen_ai.request" \
  --group-by "gen_ai.conversation.id" \
  --sort "-sum:gen_ai.usage.total_tokens" \
  --limit 20

The pre-built dashboard gives you per-model and per-tool aggregates. Custom dashboards answer the business questions: who's driving cost, which features justify their AI spend, and which conversations are spiraling.

The Full Production Config

Here's a complete setup that samples AI routes at 100%, everything else at your baseline, and emits metrics as a safety net:

JavaScript:



Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
    if (attributes?.['sentry.op']?.startsWith('gen_ai.') || attributes?.['gen_ai.system']) {
      return 1.0;
    }
    if (name?.includes('/api/chat') || name?.includes('/api/agent')) {
      return 1.0;
    }
    return inheritOrSampleWith(0.2);
  },
});

// Wrapper for any LLM call - emit metrics regardless of sampling
function trackLLMCall(model, usage, latencyMs, userId) {
  Sentry.metrics.distribution("gen_ai.token_usage", usage.totalTokens, {
    attributes: { model, user_id: userId },
  });
  Sentry.metrics.distribution("gen_ai.latency", latencyMs, {
    unit: "millisecond",
    attributes: { model },
  });
  Sentry.metrics.count("gen_ai.calls", 1, {
    attributes: { model, status: "success" },
  });
}

Python:



def traces_sampler(sampling_context):
    tx = sampling_context.get("transaction_context", {})
    op, name = tx.get("op", ""), tx.get("name", "")

    if op.startswith("gen_ai."):
        return 1.0
    if op == "http.server" and any(
        p in name for p in ["/api/chat", "/api/agent"]
    ):
        return 1.0

    parent = sampling_context.get("parent_sampled")
    if parent is not None:
        return float(parent)
    return 0.2

sentry_sdk.init(
    dsn="...",
    traces_sampler=traces_sampler,
)

# Wrapper for any LLM call - emit metrics regardless of sampling
def track_llm_call(model, usage, latency_ms, user_id):
    sentry_sdk.metrics.distribution(
        "gen_ai.token_usage", usage.total_tokens,
        attributes={"model": model, "user_id": str(user_id)},
    )
    sentry_sdk.metrics.distribution(
        "gen_ai.latency", latency_ms,
        unit="millisecond",
        attributes={"model": model},
    )
    sentry_sdk.metrics.count(
        "gen_ai.calls", 1,
        attributes={"model": model, "status": "success"},
    )

Quick Reference

Situation	What to do
AI is the core product	`tracesSampleRate: 1.0` - sample everything
AI is one feature in a larger app	`tracesSampler` with AI routes at 1.0, baseline for the rest
Can't afford 100% on AI routes	Lower trace rate + metrics/logs on every call
Already using `tracesSampler`	Add AI route matching to your existing logic
Sample rate is already 1.0	No change needed

The underlying principle: agent runs are high-value, low-volume (relative to HTTP traffic), and expensive to reproduce. Sample them accordingly.

If you're just getting started with AI monitoring, check out our companion post on the developer's guide to AI agent monitoring, which covers the full setup across 10+ frameworks, the pre-built dashboards, and a real debugging walkthrough.

For framework-specific setup, see our AI monitoring docs. If you're using an AI coding assistant, install the Sentry CLI skill (npx skills add <https://cli.sentry.dev>) to configure your sampling, build custom dashboards, and investigate issues directly from your editor.

AI agent observability: The developer's guide to agent monitoring

Tue, 07 Apr 2026 00:00:00 GMT

Most discussions about agent observability read like outdated compliance checklists with "AI" substituted for older technologies. They emphasize comprehensive logging, evaluation metrics, and governance frameworks—but provide no actual code examples or guidance for real debugging scenarios.

Effective agent monitoring requires two essential components: dashboards showing aggregate behavior across all agents, and detailed traces explaining specific failures. Most platforms provide only one. Here's what having both looks like in practice.

What is Agent Observability?

Agent observability provides complete visibility into AI agent operations: model invocations, tool selections, decision sequences, handoffs, token consumption, and associated costs.

Traditional application monitoring focuses on requests, errors, and response times. This works adequately for stateless HTTP services where requests are independent.

AI agents operate fundamentally differently. A single agent execution might involve multiple model calls, tool invocations, sub-agent transfers, and reasoning loops—all interdependent. When outputs are incorrect, failure points could be anywhere: incorrect tool responses, context window limitations, wrong function selection, or lost state during handoffs.

Agent observability provides comprehensive visibility into the complete decision-making process across these interconnected operations. Agent quality assessment, workflow debugging, and cost control all require this visibility level.

Why Traditional Monitoring Fails for AI Agents

Standard APM tools report that POST /api/chat returned status 200 in 4.2 seconds. They won't reveal that internally, the agent executed 5 model calls, with the third call selecting an incorrect tool that returned outdated information, which the model then accurately summarized as garbage.

An "log everything later" approach produces dashboards showing counts and averages without enabling deeper investigation. An agent producing incorrect output might have completed 12 model calls, executed 4 tools, transferred to a sub-agent, then generated incorrect output. Aggregate metrics indicate error rate increases. They don't indicate where reasoning failed.

The solution requires structured tracing based on consistent standards, allowing dashboards, traces, and alerts to communicate uniformly.

The OpenTelemetry Standard for Agent Observability

The OpenTelemetry gen_ai semantic conventions establish standardized instrumentation for agent systems. Instead of custom logging, every AI operation produces a structured span containing consistent attributes. Core operations include:

Span Operation	Captured Information
`gen_ai.request`	Single model call: model name, prompt, response, token counts
`gen_ai.invoke_agent`	Complete agent lifecycle from task initiation to final output
`gen_ai.execute_tool`	Tool/function invocation: name, input, output, duration

These compose hierarchically:

POST /api/chat (http.server)
└── gen_ai.invoke_agent "Research Agent"
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← initial reasoning
    ├── gen_ai.execute_tool "search_docs"              ← tool call
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← process results
    ├── gen_ai.execute_tool "summarize"                ← second tool call
    ├── gen_ai.request "chat claude-sonnet-4-6"        ← decides to hand off
    └── gen_ai.execute_tool "transfer_to_writer"       ← handoff via tool
        └── gen_ai.invoke_agent "Writer Agent"
            ├── gen_ai.request "chat gemini-2.5-flash"
            └── gen_ai.execute_tool "format_output"

This is an open standard, not proprietary. Any platform following it can ingest these spans. The span operation follows the pattern gen_ai.{operation_name}. For manual instrumentation, gen_ai.request covers all model calls. SDK auto-instrumentation may generate more specific operations like gen_ai.chat or gen_ai.embeddings depending on API calls. Because these are structured spans rather than unstructured logs, they enable both dashboards and trace visualization.

Key Metrics for AI Agent Monitoring

Before selecting tools, track these measurements for production agents:

Reliability metrics:

Agent error rate — percentage of agent executions that fail or produce errors
Tool failure rate — identifies unreliable tools and their impact on agent success
Latency (p50, p95) — per-agent and per-model tracking to identify regressions

Cost metrics:

Token usage — input, output, cached, and reasoning tokens per model. Cached and reasoning tokens are subsets, not cumulative. Incorrect calculation means fictional cost dashboards.
Cost per model — compare similar workloads. Example: claude-sonnet-4-6 costs $10.8K weekly while gemini-2.5-flash-lite handles equivalent volume for $645.
Cost per user/tier — identifies which users or pricing levels consume most AI resources

Quality metrics:

Tool call frequency — tracks how often agents invoke each tool and invocation sequence
Token efficiency — average tokens per successful completion. Growing numbers suggest inflating prompts or context windows.
Cache hit rate — percentage of input tokens served from cache. If caching is enabled but this metric isn't improving, something needs investigation.

Comprehensive platforms following OpenTelemetry conventions surface these metrics automatically from trace data.

Auto-instrumentation for 10+ Frameworks

Sentry auto-instruments major AI frameworks in Python and Node.js, including OpenAI, Anthropic, Google GenAI, LangChain, LangGraph, Pydantic AI, OpenAI Agents SDK, Vercel AI SDK, and others. Manual span creation isn't needed. Installation, tracing enablement, and automatic pickup occur.

Complete setup:



sentry_sdk.init(
    dsn="YOUR_DSN",
    traces_sample_rate=1.0,
)
# OpenAI, Anthropic, LangChain, LangGraph, Pydantic AI,
# Google GenAI -- all auto-instrumented when detected.

That's the entire configuration. Making Anthropic or OpenAI calls produces visible spans.

Pre-built Agent Monitoring Dashboards

Most observability platforms include pre-built agent monitoring dashboards. Once instrumentation is active, Sentry's AI Agents dashboard provides three views:

AI Agents Overview

Displays agent runs, duration, total model calls, tokens consumed, and tool invocations. This is the "is everything functioning?" view.

AI Agents Model Details

Per-model cost projections, token breakdown (input/output/cached/reasoning), and latency. This automatically displays cost metrics.

AI Agents Tool Details

Per-tool invocation frequency, error rates, and p95 latency. A tool failing 12% of the time appears here before users report problems.

These dashboards appear immediately once spans flow. However, they display aggregates: per-model totals, per-tool error rates, overall agent counts. They answer technical questions and highlight problems—but what about business-level inquiries?

Custom Agent Monitoring Dashboards

Pre-built dashboards show aggregate health signals. They don't show who drives AI costs, which features justify spending, or whether caching strategies save money. Addressing these questions requires slicing trace data by custom dimensions: user tier, feature flag, experiment group.

Some platforms enable custom queries against span data. With the Sentry CLI, you can script this—and its agent skill system allows AI coding assistants like Claude Code to build dashboards:

"Who are my most expensive users?"

sentry dashboard create 'AI Cost Attribution'

sentry dashboard widget add 'AI Cost Attribution' "Most Expensive Users" \
  --display table --dataset spans \
  --query "sum:gen_ai.usage.total_tokens" \
  --where "span.op:gen_ai.request" \
  --group-by "user.id" \
  --sort "-sum:gen_ai.usage.total_tokens" \
  --limit 20

"Which pricing tier is eating my AI budget?"

Tag users with their plan, then group in the dashboard:

sentry_sdk.set_tag("user_tier", user.plan)  # "free", "pro", "enterprise"

sentry dashboard widget add 'AI Cost Attribution' "AI Cost by Tier" \
  --display bar --dataset spans \
  --query "sum:gen_ai.usage.total_tokens" \
  --where "span.op:gen_ai.request" \
  --group-by "user_tier" \
  --sort "-sum:gen_ai.usage.total_tokens"

This reveals that free-tier users consume 60% of AI budget. The same tagging pattern works for any dimension: team, feature_flag, experiment_group.

"Which agents are token-hungry?"

sentry dashboard widget add 'AI Cost Attribution' "Avg Tokens per Agent" \
  --display table --dataset spans \
  --query "avg:gen_ai.usage.total_tokens" "count" \
  --where "span.op:gen_ai.invoke_agent" \
  --group-by "gen_ai.agent.name" \
  --sort "-avg:gen_ai.usage.total_tokens"

If "Research Agent" averages 15K tokens per run while "Summarizer Agent" averages 2K, you know where to focus prompt optimization.

"Is my prompt caching actually saving money?"

sentry dashboard widget add 'AI Cost Attribution' "Cache Hit Rate" \
  --display line --dataset spans \
  --query "sum:gen_ai.usage.input_tokens.cached" "sum:gen_ai.usage.input_tokens" \
  --where "span.op:gen_ai.request"

If cached-to-total ratio isn't improving after enabling caching, your prompt structure needs investigation.

Why Tracing Matters for Agent Monitoring

Dashboards show totals. Traces show decisions.

A dashboard indicates error rates increased or latency spiked. A trace identifies which agent, which model call, and which tool caused it.

Distributed tracing already captures complete span hierarchies for requests: browser interactions, HTTP calls, server routing, database queries. Agent observability integrates into this. Your gen_ai.* spans appear as children within existing traces, so model calls, tool executions, MCP server interactions, and sub-agent transfers sit alongside regular application spans. No separate system required.

This integration is powerful. You're examining agent data within full request context, from user click to final tool response, with agent decisions as one layer in the entire stack.

Here's what this looks like in Sentry's trace view:

Single request, end-to-end: from user clicking "Send Message" through API, agent orchestration with model calls and MCP server interactions, through handoff to second agent. Clicking any span reveals model, tokens, cost, and system prompt details.

Agent Observability Best Practices

Whatever platform you choose, implement these practices:

Use structured tracing, not logs. Unstructured logs can't reconstruct reasoning chains. OpenTelemetry gen_ai spans provide searchable, filterable hierarchies powering dashboards and trace views simultaneously.
Sample AI traces at 100%. Agent runs are span hierarchies. Sampling drops complete executions, not individual calls. If tracesSampleRate is below 1.0, you're losing entire agent runs. Use tracesSampler to keep AI routes at 100% while sampling everything else at baseline. (Detailed sampling guide)
Track cost by user, not just by model. The pre-built dashboard shows per-model totals. You need per-user and per-tier attribution for business decisions about rate limiting, pricing, and model routing.
Monitor tool reliability separately. A tool failing 5% of the time might not appear in overall error rates, but causes 1 in 20 agent runs to produce bad output. Your dashboard should surface per-tool error rates distinctly.
Connect AI monitoring to your full stack. Agent failure might stem from slow database queries, failed external API calls, or frontend timeouts. Isolated AI monitoring can't reveal these root causes.

Full-Stack Agent Observability

Agent observability becomes most powerful when layered on top of comprehensive APM platforms, linking agent spans to errors, performance traces, session replays, and logs across your entire system.

Isolated AI monitoring shows gen_ai spans separately. You see that Research Agent completed 8 model calls costing $0.04. What remains invisible is why it made 8 instead of 3: your search_docs tool executes a slow Postgres query timing out, causing the agent to retry with rephrased queries repeatedly.

When agent spans share context with your broader infrastructure, everything clarifies. Errors include their complete span hierarchy. Session replays show user interactions triggering bad agent runs. Upstream issues (sluggish vector databases, unreliable external APIs) appear in the same trace as resulting agent behavior.

Four Steps to First Trace

Install the SDK: pip install sentry-sdk or npm install @sentry/node
Initialize with tracing enabled
Make an AI call; spans and dashboards populate automatically
(Optional) Install the CLI skill for your AI assistant:

npx skills add https://cli.sentry.dev

If your framework is auto-instrumented, you're complete. If not, manual instrumentation requires approximately 10 lines per span type.

For comprehensive guidance on capturing 100% of AI traces, see our companion post on sampling strategies for agentic applications.

Try Sentry at no cost - AI monitoring is included across all plans.

AI Agent Monitoring FAQs

What is agent observability?

Agent observability is complete visibility into AI agent operations: model calls, tool selections, decision chains, handoffs, token consumption, and costs. It transcends traditional monitoring by tracking complete reasoning sequences across multi-turn interactions.

How is agent monitoring different from LLM monitoring?

LLM monitoring measures individual model calls (latency, tokens, errors). Agent monitoring tracks complete agent cycles: multi-step reasoning, tool execution, agent-to-agent transfers, and how individual calls combine into workflows.

What metrics should I track for AI agents?

Minimum metrics: agent error rate, tool failure rate, latency (p50/p95), token usage per model, cost per user/tier, and cache hit rate. These divide into reliability (functioning properly?), cost (expenditure?), and quality (improving?) categories.

What tools support agent observability?

OpenTelemetry gen_ai semantic conventions represent the emerging standard. Sentry, LangSmith, Langfuse, Arize, and Datadog all provide agent observability with distinct approaches. Sentry distinguishes itself through full-stack context: agent data connected to errors, performance traces, session replays, and logs unified in one system.

Send your existing OpenTelemetry traces to Sentry

Thu, 02 Apr 2026 00:00:00 GMT

You spent months instrumenting your app with OpenTelemetry. The idea of ripping it out to adopt a new observability backend is not an option.

Sentry's OTLP endpoint means you don't have to. In fact, two environment variables are all you need and your existing traces start showing up in Sentry's trace explorer.

Sentry's OTLP support is currently in open beta. This means you can start using it today, but there are some known limitations we'll cover later.

Why OTLP: keep your instrumentation, just change the destination

The main advantage of using OpenTelemetry is that your instrumentation stays vendor-neutral. Your instrumentation code uses OpenTelemetry's standard APIs, and OTLP (the protocol) sends that data to any compatible backend. This means you can switch observability backends anytime by changing a few configuration lines. This is particularly useful if you:

Are already heavily invested in the OpenTelemetry ecosystem
Want to keep your instrumentation flexible or already use OpenTelemetry in other parts of your stack

If you're starting from scratch and only need Sentry, the native Sentry SDK provides full support for all Sentry features (including span events, session replay, and profiling), while OTLP support is still in beta and has some limitations. We'll compare both approaches later in this guide.

Prerequisites

Before we start, you need:

A Sentry account (the free tier works fine)
Node.js 18 or later installed
Basic familiarity with Express.js

If you don't have a Sentry project yet, create one now. Select Express as the platform when prompted. You can skip the DSN setup instructions because you'll use the OTLP endpoint instead.

Get your Sentry OTLP credentials

Sentry provides dedicated OTLP endpoints for each project. To find those:

Click Settings in the left sidebar.
Under the Organization section in the Settings sidebar, click Projects.
Find your project in the list and click on it to open the project settings.
In the project settings sidebar, click Client Keys (DSN) under the SDK Setup section.
Select the OpenTelemetry tab. Click the Expand button to see all OTLP endpoint values.

Sentry UI showing Settings > Client Keys (DSN) > OpenTelemetry tab with OTLP endpoints visible

Keep this tab open. We'll use the following values in the next step:

OTLP Traces Endpoint: The URL where Sentry receives traces (which looks like https://o{ORG_ID}.ingest.us.sentry.io/api/{PROJECT_ID}/integration/otlp/v1/traces)
OTLP Traces Endpoint Headers: The authentication header value. Copy only the value after x-sentry-auth= (which looks like sentry sentry_key={YOUR_PUBLIC_KEY}). The demo app's instrument.js file sends this as the x-sentry-auth header. You're just providing the value, not the header name.

Connect your OpenTelemetry app to Sentry

For this example, we'll start with a sample book recommendation service that you can grab from our GitHub repo. It already has OpenTelemetry tracing instrumentation wrapped into it. You don't need to change your instrumentation code. Just point it at Sentry's OTLP endpoint.

Clone the starter app

Run the following commands to clone the book recommendation app:

git clone https://github.com/getsentry/otlp-tracing-sentry.git
cd otlp-tracing-sentry
npm install

This app includes:

OpenTelemetry SDK (already configured)
Custom tracing spans throughout the code
Multi-level trace instrumentation (database queries, API calls, and parallel operations)

Configure Sentry as the OTLP destination

Create a .env file in the project root:

cp .env.example .env

Now edit .env and add your Sentry OTLP credentials from the previous step:

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://o{YOUR_ORG_ID}.ingest.us.sentry.io/api/{YOUR_PROJECT_ID}/integration/otlp/v1/traces
OTEL_EXPORTER_OTLP_TRACES_HEADERS=sentry sentry_key={YOUR_PUBLIC_KEY}
OTEL_SERVICE_NAME=book-recommendation-service
PORT=3000

Replace the placeholder values with your actual Sentry credentials. The OTEL_SERVICE_NAME will help you filter traces later in Sentry.

That's it. You've just connected OpenTelemetry to Sentry with two lines of configuration.

Generate a trace and watch it appear in Sentry

Start the application:

npm start

You should see the following:

OpenTelemetry tracing initialized
Service: book-recommendation-service
Book Recommendation Service running on http://localhost:3000

Generate a trace

In a new terminal window, send a request to create a book recommendation:

curl -X POST http://localhost:3000/recommend \
  -H "Content-Type: application/json" \
  -d '{"userId": "user123"}'

You'll get a JSON response with book recommendations:

{
  "userId": "user123",
  "userName": "Alice Johnson",
  "recommendations": [
    {
      "bookId": 201,
      "title": "Project Hail Mary",
      "score": 0.95,
      "availability": 6
    }
  ]
}

View the trace in Sentry

Now let's see what this looks like in the Sentry Traces view:

Go to your Sentry project.
Navigate to Explore in the left sidebar, then click Traces.

Sentry Traces page showing span samples

The page displays a list of span samples from your traces. Each row represents a span with its duration and description. Click on the Trace Samples tab to switch to viewing complete traces.

Trace Samples tab showing the expanded trace with all spans

Click on a trace to open the waterfall view. You'll see a multi-level trace showing the complete request flow, including nested operations, parallel operations, and how long each one takes.

Explore span attributes

Click on any span in the waterfall to see its attributes.

Trace waterfall view with span attributes panel

The waterfall view makes it easier to see where your app spends its time, instead of guessing which async call wandered off on its own.

For example, the getUserProfile span includes attributes like:

action: SELECT
category: db
db.operation: SELECT
db.system: postgresql

These attributes make your traces searchable. You can filter traces by user ID, database operations, or any custom attribute you add.

How the OpenTelemetry instrumentation works

Let's look at how the app creates these traces. Here's what's happening behind the scenes so you can reuse the same patterns in your own app.

OpenTelemetry SDK initialization

The instrument.js file sets up the OpenTelemetry SDK and configures the OTLP exporter:







// Configure the OTLP trace exporter
const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,
  headers: {
    'x-sentry-auth': process.env.OTEL_EXPORTER_OTLP_TRACES_HEADERS || '',
  },
});

// Create SDK instance
const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'book-recommendation-service',
  }),
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

These are the key parts:

OTLPTraceExporter sends traces to Sentry's OTLP endpoint.
NodeSDK initializes OpenTelemetry with automatic instrumentation.
getNodeAutoInstrumentations() automatically traces HTTP requests, database calls, and other operations.

Custom spans

The index.js file imports instrument.js first, then creates custom spans for business operations:



const tracer = trace.getTracer('book-recommendation-service', '1.0.0');

Here's how we create a span for the database query:

async function getUserProfile(userId) {
  return tracer.startActiveSpan('getUserProfile', async (span) => {
    span.setAttribute('db.system', 'postgresql');
    span.setAttribute('db.operation', 'SELECT');
    span.setAttribute('user.id', userId);

    await delay(50);

    const profile = {
      userId,
      name: 'Alice Johnson',
      preferences: ['fiction', 'mystery', 'sci-fi']
    };

    span.end();
    return profile;
  });
}

The startActiveSpan method creates a new span and makes it the "active" span. Any child spans created inside this function automatically become children of this span.

Nested spans

We can create nested operations by starting new spans within a parent span:

async function getReadingHistory(userId) {
  return tracer.startActiveSpan('getReadingHistory', async (span) => {
    span.setAttribute('db.system', 'postgresql');
    span.setAttribute('user.id', userId);

    await delay(60);
    const history = [
      { bookId: 101, title: 'The Great Gatsby', rating: 5 },
      { bookId: 102, title: '1984', rating: 4 },
      { bookId: 103, title: 'To Kill a Mockingbird', rating: 5 }
    ]; // Get data

    // Nested operation
    const filtered = await tracer.startActiveSpan('filterRecentBooks', async (childSpan) => {
      childSpan.setAttribute('books.count', history.length);
      await delay(20);
      const recent = history.slice(0, 2);
      childSpan.end();
      return recent;
    });

    span.end();
    return filtered;
  });
}

This creates the nested structure we saw in the Sentry waterfall view.

Parallel operations

For operations that run concurrently, use Promise.all:

async function checkBookAvailability(bookIds) {
  return tracer.startActiveSpan('checkBookAvailability', async (span) => {

    const checks = await Promise.all([
      tracer.startActiveSpan('checkWarehouse1', async (s) => {
        s.setAttribute('warehouse.id', 'US-EAST-1');
        await delay(40);
        s.end();
        return { warehouse: 'US-EAST-1', available: 2 };
      }),
      tracer.startActiveSpan('checkWarehouse2', async (s) => {
        s.setAttribute('warehouse.id', 'US-WEST-1');
        await delay(45);
        s.end();
        return { warehouse: 'US-WEST-1', available: 3 };
      })
    ]);

    span.end();
    return checks;
  });
}

Sentry shows these spans running in parallel on the waterfall view, making it clear which operations we can optimize.

OTLP vs native Sentry SDK

If you're already on OpenTelemetry, you can stay there until it makes sense to change it up.

If you're starting fresh and only using Sentry, use the native SDK — you'll get more features and less config. Here's how they differ in implementation.

Setup and configuration

OTLP:

// instrument.js



const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,
  headers: {
    'x-sentry-auth': process.env.OTEL_EXPORTER_OTLP_TRACES_HEADERS
  },
});

const sdk = new NodeSDK({
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Native Sentry SDK:

// instrument.js


Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 1.0,
});

Creating spans

OTLP:


const tracer = trace.getTracer('my-service', '1.0.0');

async function getUserProfile(userId) {
  return tracer.startActiveSpan('getUserProfile', async (span) => {
    span.setAttribute('user.id', userId);
    // Your code here
    span.end();
    return result;
  });
}

Native Sentry SDK:



async function getUserProfile(userId) {
  return Sentry.startSpan(
    {
      op: 'db.query',
      name: 'getUserProfile',
      attributes: { 'user.id': userId },
    },
    async () => {
      // Your code here
      return result;
    }
  );
}

With OTLP, you must manually call span.end(). The native Sentry SDK automatically ends the span when the callback completes.

When to use OTLP

Use OpenTelemetry with OTLP if you:

Already have OpenTelemetry instrumentation in your codebase
Send traces to multiple observability backends
Need vendor-neutral instrumentation
Work with AI or LLM frameworks that use OpenTelemetry by default
Use the OpenTelemetry Collector for processing traces

When to use native Sentry

Use the native Sentry SDK if you:

Are starting from scratch without existing instrumentation
Use Sentry as your only observability backend
Need features that are currently limited in the OTLP beta (such as span events, full span link support, and searchable array attributes)
Want automatic integration with Sentry error tracking and session replay

Known limitations (open beta)

As of the time this was published, Sentry's OTLP support is in open beta and a few things don't work yet. Here's what to watch out for.

Span events are dropped

OpenTelemetry span events are not supported. If your instrumentation adds events to spans, they will be dropped during ingestion.

// This event will be dropped
span.addEvent('cache-miss', { key: 'user:123' });

If you need to track events, use span attributes or create separate spans.

Span links have limited support

Span links are ingested and displayed in the trace view, but you cannot search, filter, or aggregate by them. You can see the links when viewing a trace, but they won't appear in trace queries.

Array attributes have limited support

Array attributes work the same way as span links. Sentry ingests and displays them, but you cannot use them in search queries or aggregations.

// This array attribute will display but won't be searchable
span.setAttribute('book.genres', ['fiction', 'mystery', 'sci-fi']);

If you need searchable arrays, consider using separate attributes or joining the array into a string.

OpenTelemetry traces FAQs

Why aren't my traces appearing in Sentry?

Verify the OTLP endpoint and headers in your .env match the values from Settings > Client Keys (DSN) > OpenTelemetry (OTLP). Traces can take 30-60 seconds to appear after being sent. Check the console for OpenTelemetry tracing initialized.

To verify the exporter is sending data, enable debug logging by adding this to instrument.js:


diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);

See the OpenTelemetry troubleshooting guide for more options.

Why are my spans appearing flat instead of nested?

You're likely using startSpan instead of startActiveSpan. The active span becomes the parent for any child spans created within its scope.

How do I reduce memory usage from tracing?

Lower your sampling rate, in production, capturing 100% of traces is rarely necessary. Add these to your .env to sample 10%:

OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

How do I add more context to my spans?

Use span.setAttribute() to attach user IDs, request IDs, feature flags, or any other relevant data. The more context you add, the easier it is to filter and debug in Sentry.

How do I find slow operations in my app?

Add spans around suspected bottlenecks, then use Sentry's waterfall view to see exactly where time is being spent.

Can I set up alerts on trace data?

Yes. Sentry alerts can notify you when traces exceed performance thresholds or show error patterns.

Can I trace requests across multiple services?

Yes. OpenTelemetry's automatic context propagation handles this, no extra instrumentation needed.

Logging in Next.js is hard (But it doesn't have to be)

Mon, 30 Mar 2026 00:00:00 GMT

A typical Next.js deployment can execute code in up to three different runtimes: Edge, Node.js, and the browser.

You may already be capturing logs from server-side code, but if you are not capturing the full request from middleware through server rendering to the browser, you are missing critical debugging information when problems arise.

TL;DR: A typical Next.js deployment can run in up to three environments: Node, Edge, and the browser. Most JavaScript logging libraries target Node; far fewer are compatible with Edge and the browser. LogTape and Sentry both provide runtime-agnostic logging in JavaScript.

Why logging in Next.js is hard

Problem 1: Most Loggers Assume Node.js

Most loggers were built specifically for Node.js, relying on APIs like AsyncLocalStorage or fs that aren't available in the browser or Edge runtimes.

You'll often see Pino (or wrappers such as Next-Logger) suggested as the best logger for Next.js, but neither is actually a good choice for Next.js.

Pino, and by extension Next-Logger, is designed for Node.js, and uses a polyfill to work in the browser. But that polyfill means surrendering the performance benefits the library has in Node, and you still cannot capture logs in Edge functions or middleware (running on Edge).

Problem 2: Missing out on client-side logging

It's easy to assume you don't even need to capture client-side logs, because your "frontend code" is all server-side.

By default, Next.js uses Server Components for all pages and components. So by default, any logs emitted from your "frontend code" will actually be captured in your server-side logs.

However, once you add a use client boundary for interactive components, that code will be executed in the browser.

Your "frontend code" is really a mix of Server and Client Components that work together to render a single page and log to two different places.

We have to solve that fragmentation and make sure all frontend code, no matter where it runs, is captured and logged to the same place.

Problem 3: Trace-connected structured logging

Logging is only one part of observability; on its own, it is most useful when you are debugging locally. In production, once you're collecting logs from dozens, hundreds, or even thousands of requests, you need a way to tie related logs together for querying and aggregation.

Tracing adds a unique ID to each request in your app and appends that ID as structured data to every log you send. Then later, you can query logs based on that ID to find every related log from that same request, along with other telemetry in your monitoring platform, such as errors in Sentry.

Adding tracing to Next.js is actually easy, but it is still a step you have to take, and there are several ways to do it.

We're going to pair a JavaScript logging library with Sentry to instrument Next.js with trace-connected logs. In a future post, we'll cover and compare another way to instrument Next.js with tracing, using OpenTelemetry.

What to look for in a Next.js logger

What should that logger do? I recently compared all of the current popular JavaScript logging libraries and broke down why you should be using a logging library in the first place. The same holds mostly true for Next.js, but we need to be even more specific.

When evaluating a logging solution for Next.js, consider:

Runtime Match: Needs to run on Node, Browser, and Edge runtimes (if using Edge).
Tracing Support: Next.js apps are multi-service by default. Tracing connects logs from multiple sources under a single trace.
Production Features: Filtering for data redaction and noise reduction; context management and child loggers to improve structured logging and make later querying and aggregation easier.

As you might expect, feature-wise, libraries have been coalescing and imitating one another's good practices, to the point where they have become similar.

Still, the biggest difference to keep an eye out for is runtime support and performance.

There are two practical fits that cover the full scope of a Next.js app, and they are not mutually exclusive:

LogTape with the Sentry sink
Sentry.logger with the Sentry Next.js SDK

LogTape

Mentioned in my other post, LogTape is one of the newest libraries on the scene and is quickly becoming a favorite dedicated logging library. It's built from the ground up with no dependencies and runs natively in all JavaScript runtimes.

LogTape's context management and categories are especially useful for tagging and organizing your logs for more efficient querying later.

Configure categories and the Sentry sink




await configure({
  sinks: {
    sentry: getSentrySink(),
  },
  loggers: [
    {
      category: ["next-app"],
      lowestLevel: "info",
      sinks: ["sentry"],
    },
    {
      category: ["next-app", "middleware"],
      lowestLevel: "info",
      sinks: ["sentry"],
    },
    {
      category: ["next-app", "client"],
      lowestLevel: "info",
      sinks: ["sentry"],
    },
  ],
});

Each logger you define can be filtered in Sentry (for example category: next-app.client) to fetch only the logs from a particular category.

Explicit context in a client component

You can also add contexts to loggers to automatically append data:

"use client";




const logger = getLogger(["next-app", "client"]);


  // Automatically includes the orderId in all future logs from this component
  const ctx = useMemo(() => logger.with({ orderId }), [orderId]);

  useEffect(() => {
    const fromQuery = new URLSearchParams(window.location.search).get("orderId");
    if (fromQuery && fromQuery !== orderId) {
      ctx.with({ fromQuery }).warn(
        "Confirmation orderId {orderId} does not match URL query {fromQuery}; check redirects and rewrites.",
      );
    }
  }, [orderId, ctx]);

  return <p>Thanks for your order.</p>;
}

You can read the full Structured Logging with LogTape post to get a deeper look at best practices for structured logging with LogTape and Sentry.

Sentry Next.js SDK

But, you might not need an additional logging library at all.

Sentry's Next.js SDK includes Sentry Logs, a logging library built into Sentry's SDKs for multiple platforms, not just JavaScript.

Sentry's Logger for Next.js is also runtime-agnostic, providing logging everywhere your Next.js app can run. And if you are already using Sentry, or plan to use Sentry for capturing errors and tracing anyway, you can add logging without adding any new dependencies.



Sentry.init({
  dsn: process.env.SENTRY_DSN,
  enableLogs: true,
});

Sentry.logger.info("Checkout completed", {
  orderId: order.id,
  userId: user.id,
  userTier: user.subscription,
  cartValue: cart.total,
  itemCount: cart.items.length,
  paymentMethod: "stripe",
});

Scopes and contexts work a little differently than in LogTape, but the functionality is similar.

You can use Sentry.withScope to set context data that will automatically be included on every log emitted inside the callback.

"use client";



function setGlobalAndIsolationScopes() {
  Sentry.getGlobalScope().setAttributes({ service: "checkout", version: "2.1.0" });
  Sentry.getIsolationScope().setAttributes({ org_id: "org_demo_001", user_tier: "pro" });
}

function calcShipping() {
  Sentry.logger.info("calcShipping: rate lookup", { carrier: "demo_carrier" });
  return 12.5;
}

function checkout() {
  const shipping = calcShipping();
  Sentry.logger.info("checkout: shipping computed", { shipping_usd: shipping });
}

function onCheckout() {
  setGlobalAndIsolationScopes();
  Sentry.withScope((scope) => {
    scope.setAttribute("checkout_id", crypto.randomUUID());
    checkout(); // nested logs inherit global + isolation + checkout_id
  });
}

In the Sentry Log Explorer, opening the checkout: shipping computed entry shows the fields passed to that log call (shipping_usd at 12.5) and the merged attributes applied to the same scope up till that point (service, version, org_id, user_tier and checkout_id).

Getting trace-connected logs end to end

Finally, we want more than just messages from our logs. We want structured data with useful debugging information. When we see a log, we want to know where it came from, what triggered it, and what else happened as a part of that request.

Tracing monitors the execution and timing of requests throughout an app. If there is a function or service that ends up causing slowdowns or even errors for users of the app, tracing data is how we collect information and ultimately discover the problem.

When we configure Sentry, every request will be assigned a unique "Trace ID" that will link all data connected to that request together.

Use Sentry's setup wizard to automatically instrument your Next.js app with tracing and logs.

npx @sentry/wizard@latest -i nextjs

You should end up with three files similar to the following.


Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: process.env.NODE_ENV === "development" ? 1.0 : 0.1,
  enableLogs: true,
});

They'll be named instrumentation-client.ts, sentry.server.config.ts, and sentry.edge.config.ts. One entry point per runtime.

Then use Sentry.logger as above, or integrate with an existing logger, like LogTape.

If your app was already instrumented with console.log, you should try to upgrade to structured logging, but you can still forward console output to Sentry with the consoleLoggingIntegration.

That's all you need to do. Every request will now contain a trace ID that will follow through the entire app, across runtimes, allowing you to query for all of the logs and traces related to that request.

Querying the logs

With logs throughout the application and connected with a trace ID, you can start querying for logs for debugging, custom dashboards, alerts, and more.

In Explore > Logs you can search for logs based on any of the structured data properties.

Sentry will automatically inject several useful attributes, like the environment, which is showing us this was from the development server. In this log, there was also browser attribute present which was automatically applied and shows us that this request came from Chrome. You'll also notice the Chrome icon on the right. Server-side logs won't contain either of these.

If we wanted to filter the logs down to include only the logs from the browser, we could search has: browser or browser.name: Chrome if we wanted to see a specific browser.

It's not uncommon to add a service or component attribute to logs, to make it more clear where a log was emitted from. You can use scopes with the Sentry logger or categories with LogTape to broadly append a queryable attribute like this to all logs in a stack.

Read my post about structured logging to get a better idea about the type of data you might want to append to your logs.

Every attribute shown here, including any additional data you append to the logs, is queryable. To see the other logs that were a part of this same request, we just need to click on the trace ID.

Next steps

Setting up a logger that captures the full surface area of your Next.js app is the first major step, but how you instrument your logs, and make use of that data is what really matters.

Implement structured logs with a lot of high cardinality data.
Add contextual data, like the name of the service or component that triggered the log.
Audit your existing logs and start replacing old console.log statements.
Learn more about how to query logs.

With structured logs, packed with useful high-cardinality data to query, you (or your LLM) will be able to quickly debug new issues, as they come in. You can write queries yourself, and configure dashboards to visualize aggregate data.

Try using Sentry's Seer AI to query logs with natural language. You can use the Sentry MCP server, or click the "Ask Seer" button on the log explorer page. Rolling out now, you can even ask Seer to create custom dashboard widgets for you from your log data, or other data you might correlate with logs.

Add logs now, cover all of your surfaces, and tomorrow's bugs will be much more approachable.

Next.js observability gaps and how to close them

Tue, 24 Mar 2026 00:00:00 GMT

This blog is based on a recent live workshop. You can watch the the full livestream on Youtube.

Next.js gives you a lot for free; server-side rendering, file-based routing, edge runtimes. What it doesn’t give you is a clear picture of what’s actually happening in production. The framework’s three-runtime architecture (client, server, edge) means errors can surface in one layer while originating in another, database queries hide behind ORM abstractions, and server actions swallow useful error messages before they ever reach the browser.

This post walks through a few specific observability gaps in Next.js apps, why they exist, and how to close them with Sentry.

TL;DR

Next.js production builds strip error details from server actions. The client sees “An error occurred in a server component render” with zero context. Sentry captures the original server-side exception with full stack traces.
Hydration errors are among the most common and least helpful errors in React. Sentry provides an HTML diff view that shows exactly which DOM nodes diverged between server and client renders.
Logs and metrics aren't sampled like traces. You get 100% of that data regardless of your tracesSampleRate, so use them for anything that can't afford gaps.
Server actions don’t emit OpenTelemetry spans, so they need manual instrumentation with withServerActionInstrumentation to appear in your traces.
Database queries through ORMs like Drizzle are invisible to tracing by default. Adding an integration for your database client (like libSQL for Turso) surfaces every query as a span.
AI agent monitoring using the Vercel AI SDK integration gives you per-model token usage, cost breakdowns, and tool call traces without leaving Sentry.

Three runtimes, three config files

Next.js runs code in different environments. Running the Sentry wizard gets you started:

npx @sentry/wizard@latest -i nextjs

The wizard creates separate initialization files for each: instrumentation-client.ts for the browser, sentry.server.config.ts for Node.js, and sentry.edge.config.ts for edge runtimes.

This generates configuration files for each runtime, a global error boundary (global-error.tsx), and wraps your next.config.ts with withSentryConfig. The next.config.ts wrapper handles source map uploads for readable stack traces and configures tunnel routing, which sends Sentry data through your own server to avoid ad blockers.

A few things worth noting about the config:

Sample rates matter. Set tracesSampleRate to 1.0 in development, 10–20% in production. Going higher burns through quota fast.
sendDefaultPii attaches user IP addresses to replays and events. Optional, but useful for correlating sessions to real users.
Edge config can differ. If your middleware just reroutes requests, you can safely disable tracing in the edge config to reduce noise.

One more thing about the setup: call Sentry.setUser() once after authentication to propagate user context across errors, logs, traces, and replays.

Hydration errors: common and not very helpful

Hydration is the process where React attaches event handlers to server-rendered HTML, making it interactive. Hydration errors happen when the markup rendered by React on the client doesn’t match the initial server-rendered HTML, or when invalid HTML was sent by the server, and React couldn’t fix it.

The classic cause: a theme toggle that reads from localStorage. The server renders the light theme (it has no access to localStorage), the client reads the stored dark theme preference, and React throws a hydration error because the HTML doesn’t match.

In production, the browser gives you almost nothing useful. You get a minified React error pointing to a decoder URL, and a stack trace full of chunk files.

The HTML diff that actually helps

To help you debug hydration errors, Sentry provides a diff tool that shows the differences between client-rendered and server-rendered HTML. If you have Session Replay enabled, Sentry will detect hydration errors and bring them into your issue stream.

The diff shows before (server) and after (client) in a format that looks like a GitHub PR review, displaying a diff of the page before and after React has hydrated helps you find the element or attribute that caused the error. The easiest ones to spot are text content mismatches, incorrectly nested HTML elements, and attribute changes.

If you’re already using Session Replay, you get automatic grouped hydration error issues for free. They’re generated from Replays, so they have no impact on your error quota.

The fix for theme-related hydration errors is usually straightforward: defer the theme read to a useEffect so the initial server and client renders match, then apply the stored preference after hydration completes.

Server actions are a tracing blind spot

Server actions are Next.js’s pattern for handling form submissions and mutations, essentially typed POST requests. Sentry automatically instruments most operations, but server actions require manual setup.

The reason: server actions don’t emit OTel spans Sentry can hook into. Because of how Turbopack bundles them, auto-instrumentation is very hard and extremely error-prone. It would require building a Next.js server actions compiler, which is not something that seems reasonable to do.

Without instrumentation, a server action shows up as an anonymous HTTP POST. With it, you get a named span, timing data, and (critically) distributed trace continuity between client and server.

Wrapping a server action

Wrap your server actions with Sentry.withServerActionInstrumentation(). Here’s what that looks like:

"use server";





  return Sentry.withServerActionInstrumentation(
    "login", // Name that appears in Sentry traces
    {
      headers: await headers(), // Connects client and server traces
      formData,
      recordResponse: true,
    },
    async () => {
      // Your actual login logic
      const result = await authenticateUser(formData);
      return result;
    },
  );
}

The withServerActionInstrumentation wrapper creates named spans for each action, captures timing and errors, connects client and server traces via headers, and attaches form data to Sentry events.

The headers parameter is what makes distributed tracing work. Sentry reads the trace ID and baggage from the request headers to stitch together the client-initiated trace with the server-side execution. Without it, you get two disconnected traces instead of one continuous picture.

Production error messages are useless (by design)

There’s another reason server action observability matters. In production builds, Next.js intentionally strips error details from server-side failures before they reach the client. What the user sees: “An error occurred in a server component render. The specific message is omitted in production builds to avoid leaking sensitive details”

This is the right security decision. It’s also completely useless for debugging. But because Sentry instruments the server side directly, you still get the full exception "Database connection lost during authentication" instead of the sanitized nothing. This alone justifies the setup cost if you’re using server actions for anything important.

Logs and metrics: choosing the right signal

Errors, logs, and metrics serve different purposes, and the distinction matters for how you instrument a Next.js app.

Errors (Sentry.captureException) — something is broken and needs fixing. Creates an issue, triggers alerts, feeds into Seer for root cause analysis.
Logs (Sentry.logger) — contextual breadcrumbs. What happened before, during, and after a failure. High-cardinality, queryable, trace-connected.
Metrics (Sentry.metrics) — counters, durations, gauges. Good for dashboards and alerts on aggregate patterns.

To enable logs, add enableLogs: true to each of your Sentry init files:

// instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  tracesSampleRate: 0.1,
  enableLogs: true,
});

Once enabled, Sentry.logger sends structured logs from anywhere in your application:



Sentry.logger.info("User added talk to schedule", {
  userId: session.user.id,
  talkId: talk.id,
  action: "add_to_schedule",
});

Because logs are trace-connected, when you open an issue in Sentry, you see every log emitted during that trace. You can also navigate to the Log Explorer, filter by any attribute (like talkId or userId), and build alerts or dashboards from the results.

One important distinction: logs and metrics aren’t sampled. If your tracesSampleRate is 10%, you’ll still get 100% of your logs and metric data points. Traces use statistical sampling and Sentry extrapolates aggregate numbers, but logs and metrics give you exact counts.

Database queries disappear behind your ORM

If you’re using an ORM like Drizzle with a database like Turso, your traces will show server actions and API routes, but the actual SQL queries inside them are invisible by default. You’ll see that a request took 850ms but not why.

Fixing this requires two things: wiring up the database client integration and adding it to your Sentry server config.

For a Turso (libSQL) database, add the libsqlIntegration to your server config:

// sentry.server.config.ts




Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 0.1,
  integrations: [
    libsqlIntegration({ client }),
  ],
});

You’ll also need to add @libsql/client to the serverExternalPackages in your next.config.ts so it bundles correctly.

Once configured, every Drizzle query surfaces as a span with the actual SQL, even though you wrote your queries using Drizzle’s TypeScript API. Sentry translates the ORM calls into their SQL equivalents in the trace waterfall. This means you can use the Query Insights view to see operations per minute, average duration, and get automatic alerts for N+1 queries or slow database calls.

The same pattern applies to other databases. For Postgres (including Neon), the Sentry Node SDK includes Postgres instrumentation by default, so you might not need any custom configuration. For Supabase, there’s a dedicated Supabase integration.

AI agent monitoring: tracing token spend back to users

If your Next.js app includes AI features (chat interfaces, agent workflows, generated content, etc.), you probably have a decent-sized bill from your model provider. What you probably don’t have is a breakdown of which features, which users, or which agent paths are responsible for that cost.

The vercelAIIntegration adds instrumentation for the AI SDK by Vercel to capture spans using the AI SDK’s built-in telemetry. This integration is enabled by default in the Node runtime, but not in the Edge runtime.

For each AI function call, you can enable detailed telemetry:




const result = await streamText({
  model: anthropic("claude-sonnet-4-20250514"),
  prompt: userMessage,
  experimental_telemetry: {
    isEnabled: true,
    functionId: "search-agent", // Shows up in Sentry as the span name
    recordInputs: true,
    recordOutputs: true,
  },
});

Setting functionId in experimental_telemetry makes it easier to correlate captured spans with function calls. If you have multiple agents, say a router that delegates to a search agent and an info agent, each using different models, each gets its own named span in the trace.

In Sentry’s Agent Monitoring view, you get:

Model cost breakdown — which models you’re using, how much, and what it costs
Token usage — input and output tokens per model, per request
Tool call visibility — every tool invocation, including errors, linked back to the triggering trace
Full trace context — AI calls shown alongside database queries, API calls, and everything else in the request

That last point is the one that matters most. If an AI response takes five seconds, is it because the model is slow, or because the tool call triggered a slow database query? The trace waterfall shows you both in the same view, rather than requiring you to cross-reference your Anthropic dashboard with your application logs.

Both recordInputs and recordOutputs default to true. Set these to false if your prompts or responses contain sensitive data you don’t want sent to Sentry.

Closing the gaps

A lot of Next.js observability problems look like missing data until you know what to look for. Anonymous POST requests that are actually server actions. 850ms responses with no explanation. Hydration errors pointing at minified decoder URLs. Once you've seen each one, the fix can be straightforward. But the first time they show up in production, they're easy to lose hours on.

Three runtimes, three configs. Next.js splits across client, server, and edge. Instrument all of them, but configure each appropriately.
Hydration errors need visual diffs. The browser error message is useless in production. Sentry’s diff tool shows you the actual DOM divergence.
Server actions need manual wrapping. No OTel spans means no auto-instrumentation. Use withServerActionInstrumentation and pass headers for distributed tracing.
Logs and metrics aren’t sampled. Unlike traces, you get every single one. Use them for the data that can’t afford gaps.
ORM queries are invisible by default. Add a database integration to see actual SQL in your traces and catch N+1 queries automatically.
AI monitoring connects cost to context. Token spend is meaningless without knowing which users, features, and code paths generated it.

Get started with the Next.js SDK docs, or check out the debugging Next.js series on YouTube for more stuff like this.

Seer fixes Seer: How Seer pointed us toward a bug and helped fix an outage

Fri, 20 Mar 2026 00:00:00 GMT

On February 21, 2026, Sentry's AI-powered issue summarization experienced an outage in the EU region. Approximately 80-90% of requests to Seer's Issue Summary API endpoint failed, disabling AI Summary cards on new issues and generating over 40,000 error events.

The root cause traced back to an upstream incident: Google Cloud Platform declared unavailability for gemini-2.5-flash-lite in several EU regions. However, Sentry had provisioned throughput capacity in europe-west1 with guaranteed resources. The outage should have been minor—Sentry was only using 12% of provisioned capacity.

The actual problem stemmed from application code, not infrastructure. A latency optimization feature blocklisted every Gemini region in the EU, including the one with guaranteed capacity.

How Seer Routes LLM Calls in the EU

Seer executes gemini-2.5-flash-lite through GCP Vertex AI. The EU deployment maintains provisioned throughput in europe-west1, providing reserved capacity during demand spikes. Several other EU regions use Standard pay-as-you-go capacity without guaranteed availability.

The LLM client implements a region fallback mechanism with temporary blocklisting: regions accumulating 6 failures within a short window are temporarily removed from rotation. This optimization reduces latency during Autofix sessions, which trigger 50-100 LLM calls.

A critical invariant should exist: never blocklist provisioned throughput regions. That capacity represents paid-for, guaranteed resources. Sentry enforced this rule in the US deployment but omitted it from the EU configuration.

The Cascade

When europe-west1 returned 504 Deadline Exceeded errors during the GCP incident, six failures triggered blocklisting. All traffic shifted to Standard PayGo regions unprepared for full load. europe-west4 returned 429 RESOURCE_EXHAUSTED and was blocklisted. Then europe-central2. Within minutes, every EU region was blocklisted, and calls returned LlmNoRegionsToRunError—no allowed regions remained.

Critically, most calls to europe-west1 succeeded because provisioned throughput absorbed the load. The blocklist triggered on raw failure count regardless of success rate, enabling a region handling the vast majority of traffic to be banned for having six clustered failures.

The Code Problem

The original blocklist logic:

def should_blocklist(region: str, model: str, error_count: int) -> bool:
    return error_count >= BLOCKLIST_THRESHOLD

The required fix:

def should_blocklist(region: str, model: str, error_count: int) -> bool:
    if is_provisioned_throughput_region(region, model):
        return False  # Never blocklist PT regions

    return error_count >= BLOCKLIST_THRESHOLD

The US deployment hardcoded an exception for its PT region. When EU provisioned throughput was added after a previous incident, the blocklist code wasn't updated. Configuration relied on developers remembering to maintain a separate, manually-updated allowlist—a classic gap between infrastructure provisioning and application awareness.

A secondary issue: the blocklist threshold of 6 errors was hardcoded based on months-old load patterns. Sentry is replacing it with an error-rate-based approach.

Seer Debugging Seer

Sentry's AI debugging tool proved essential for understanding the blast radius of its own outage. Standard monitoring detected the alert, but Seer's analysis of the LlmNoRegionsToRunError issue determined the impact in seconds.

Seer identified that failed issue summaries caused ~~42,000 errors, with spam detection (~~1,600) and autofix (~850) also affected. It confirmed >99% of events occurred in the EU deployment and traced the blocklisting cascade through breadcrumb trails.

The analysis reached the region blocklisting mechanism autonomously. Engineers, applying knowledge of provisioned throughput architecture, recognized that the PT region shouldn't have been blocklisted. Seer confirmed calls to the PT region mostly succeeded during the GCP incident—the precise combination of facts needed to identify the fix.

The Lesson

Latency optimizations can create failure modes worse than having no optimization at all. Circuit breakers opening too aggressively, blocklists ignoring reserved capacity, or fallback chains amplifying failures can transform upstream provider incidents into complete service outages.

The bug exploited a mundane gap: the distance between "we provisioned GCP capacity" and "our code knows we provisioned GCP capacity." Organizations routing LLM requests across multiple regions should audit circuit breakers to ensure reserved regions receive special protection. This fix required six lines of code.

For Seer's analytical capabilities, consult the Seer documentation.

You're probably overdue for a Sentry SDK upgrade

Thu, 19 Mar 2026 00:00:00 GMT

Session Replay. Structured logs. AI monitoring. Automatic OpenTelemetry tracing. Feature flag tracking. If you haven't seen these in your Sentry dashboard, your SDK version is probably the reason.

Whether you're on @sentry/react, @sentry/nextjs, @sentry/vue, @sentry/angular, @sentry/sveltekit, or any other @sentry/* package, they all version together. When we say v10, we mean all of them.

Here's the thing: based on npm download numbers, roughly half of all Sentry JavaScript SDK installs are still on v8 or older.

The numbers

We pulled npm download stats for the major @sentry/* packages. Here's where weekly installs land as of March 2026:

Package	Weekly total	Still on v7	v7 + v8 combined
`@sentry/node`	14.9M	4.8M (32%)	7.3M (49%)
`@sentry/browser`	14.5M	3.2M (22%)	7.3M (50%)
`@sentry/react`	9.9M	2.0M (20%)	4.9M (49%)
`@sentry/nextjs`	3.7M	524K (14%)	1.5M (41%)
`@sentry/vue`	1.1M	307K (28%)	642K (59%)

The pattern holds across every package: roughly half of all installs are two or more major versions behind current.

If that's you, this post is a map of what you're missing and how to close the gap.

The SDK isn't just an error catcher anymore

The Sentry SDK started as a crash reporter. Today it's a full observability client: errors, performance traces, session replays, structured logs, cron monitors, user feedback, and AI agent monitoring. Each capability feeds context into the others. A replay shows you what the user did before the error. A trace shows you which microservice was slow. Logs give you the application-level "why."

If you're on an old version, you have the error. On the current version, you have the story.

What you're missing

Session Replay (v8+)

Session Replay captures what happened in the browser before, during, and after an error. It reconstructs the DOM, user clicks, navigation, and console output into a video-like playback. It's privacy-aware by default: text and inputs are masked, and you control what gets captured.

The key part: replays link directly to errors and traces. When you're looking at a bug report, you can watch the user reproduce it. No more having to ask users what happened. No more waiting for them to get back to you. No more guessing what "it doesn't work" means.

Available in @sentry/browser, @sentry/react, @sentry/vue, @sentry/angular, and @sentry/svelte.

Structured Logs (v9+)

Sentry.logger.info(), Sentry.logger.error(), and four more severity levels, with structured attributes that link to traces and errors.

Sentry.logger.error('Payment processing failed', {
  orderId: 'order-123',
  amount: 99.99,
  gateway: 'stripe',
  retryCount: 3,
});

These aren't console.log replacements floating in CloudWatch or Datadog. They're logs that show up in the same Sentry issue, linked to the trace that was active when they fired.

The Logs API also supports template strings that Sentry can group and search:

Sentry.logger.info(Sentry.logger.fmt`User ${userId} completed checkout for order ${orderId}`, {
  amount: 99.99,
  paymentMethod: 'credit_card',
});

AI Monitoring (v9+/v10+)

If you're calling LLMs from your backend, the SDK can instrument those calls automatically. OpenAI and LangChain support landed in v9. Anthropic and Vercel AI SDK support followed in v10. With AI monitoring, you get token usage, latency, and error tracking for every call.

Sentry.init({
  dsn: '__DSN__',
  integrations: [
    Sentry.openAIIntegration(),       // OpenAI
    Sentry.anthropicAIIntegration(),  // Anthropic/Claude
    Sentry.vercelAIIntegration(),     // Vercel AI SDK
    Sentry.langChainIntegration(),    // LangChain
  ],
});

OpenTelemetry Tracing (v8+)

v8 rebuilt performance monitoring on OpenTelemetry. The old mental model treated "transactions" and "spans" as separate concepts with manual lifecycle management. The new model: everything is a span, and the lifecycle is automatic.

// v7: manual transaction management
const transaction = Sentry.startTransaction({ name: 'checkout' });
const span = transaction.startChild({ op: 'db.query' });
// ... do work ...
span.finish();
transaction.finish();

// v8+: just wrap your code
const result = Sentry.startSpan({ name: 'checkout', op: 'db.query' }, () => {
  return db.query('SELECT ...');
});

That's less code, but it's also a different relationship with instrumentation. You don't manage span lifecycles. You describe what you're measuring, and the SDK handles the rest. Nested spans work automatically:

Sentry.startSpan({ name: 'checkout' }, () => {
  Sentry.startSpan({ name: 'validate-cart', op: 'function' }, () => {
    // automatically a child of 'checkout'
    validateCart();
  });
  Sentry.startSpan({ name: 'charge-card', op: 'db.query' }, () => {
    // also a child of 'checkout'
    chargeCard();
  });
});

On Node.js, Express, Fastify, Hapi, Postgres, MongoDB, Redis, Prisma, GraphQL, MySQL, and Mongoose are all auto-instrumented with zero manual setup. Just Sentry.init().

What changed under the hood

The features above are the headline reasons to upgrade. Here's the compressed version of what changed structurally at each major version.

v8: Package consolidation

@sentry/tracing
@sentry/hub
@sentry/integrations
@sentry/replay

These were all merged into the core SDKs. Integrations became functions (new BrowserTracing() became browserTracingIntegration()) for better tree-shaking. User Feedback widget, Cron Monitoring, and FID collection shipped. Angular v14+ became required. The new scope model (getCurrentScope(), getIsolationScope(), getGlobalScope()) was introduced, deprecating Hub, getCurrentHub(), and configureScope() with console warnings. This is the single biggest source of breaking-change noise when upgrading from v7.

v9: Deprecated API removal

Hub
getCurrentHub()
configureScope()

The deprecated scope APIs were deleted. @sentry/utils merged into @sentry/core, @sentry/types deprecated. ES2020 became the baseline. Feature flag tracking arrived with built-in LaunchDarkly and OpenFeature support. Node.js 18 became the minimum.

v10: OpenTelemetry v2

The underlying OpenTelemetry dependencies upgraded to v2.x. FID collection removed in favor of INP (Interaction to Next Paint), the metric Google actually uses for Core Web Vitals. @sentry/node-core shipped a lightweight mode for teams that want error tracking, logs, and metrics without full OpenTelemetry instrumentation. Next.js Turbopack support landed.

Quick reference: features by version

Feature	Minimum version	Packages	Docs
Cron Monitoring	v7+	node, all server SDKs	Set Up Crons
Session Replay	v8+	browser, react, vue, angular, svelte	Set Up Session Replay
User Feedback Widget	v8+	browser, all frontend SDKs	Set Up User Feedback
Structured Logs	v9+	all	Set Up Logs
Feature Flag Tracking	v9+	all	Set Up Feature Flags
AI Monitoring (OpenAI, LangChain)	v9+	node	Set Up AI Agent Monitoring
AI Monitoring (Anthropic, Vercel AI)	v10+	node	Set Up AI Agent Monitoring
INP (as sole Core Web Vital)	v10+	browser, all frontend SDKs	—

Deprecated packages you might still have

If you see any of these in your lockfile, you're at least two major versions behind:

@sentry/tracing (merged into core SDKs in v8)
@sentry/hub (merged into @sentry/core in v8)
@sentry/integrations (merged into core SDKs in v8)
@sentry/replay (merged into @sentry/browser in v8)
@sentry/types (deprecated in v9, use @sentry/core)
@sentry/utils (deprecated in v9, use @sentry/core)

These packages still resolve on npm, so they won't break your install. But they're unmaintained and they add dead weight to your node_modules. If you see them, your Sentry setup needs attention.

Security and performance

Even if you don't care about new features, staying current keeps you patched and lean.

Security:

CVE-2023-46729: SSRF risk via insufficient validation of the Next.js tunnel route. Patched in v7.77.0, but only users on a current v7.x got the fix. If you were pinned to an older v7 release, you stayed vulnerable.
IP address inference removed by default. Starting in v9 (fully enforced in v10.4.0), the SDK no longer instructs the Sentry backend to infer user IP addresses unless you set sendDefaultPii: true. Privacy-by-default.
fetchProxyScriptNonce removed in v9. The SvelteKit option was dropped due to security concerns around CSP bypass.
Transitive dependency CVE patches (fast-xml-parser, rollup, tar, nuxt) only land on the current major version line. If you're pinned to v7, you're not getting these fixes.

Performance:

@sentry/browser base bundle: 26 KB gzipped (v10). Tree-shaking flags can bring it down to ~24.5 KB.
ES5 polyfills dropped in v8, ES2020 baseline in v9. Smaller transpiled output for the vast majority of environments that support these natively.
6 legacy packages removed in v8, 2 more deprecated in v9. Simpler dependency graph, less duplication in your node_modules.
Replay bundle reduced by ~20 KB via tree-shaking improvements (v7.73.0+).

Let your AI assistant handle it

You don't have to upgrade or configure Sentry by hand. Sentry publishes agent skills: instruction sets that teach AI coding assistants how to work with Sentry in your project. They work with Claude Code, Cursor, GitHub Copilot, OpenAI Codex, and more.

The newest skill, sentry-sdk-upgrade, can handle the entire migration for you. It runs a 4-phase workflow: Detect (reads your package.json, finds Sentry configs, greps for deprecated patterns) → Recommend (categorizes changes as auto-fixable, AI-assisted, or manual-review) → Guide (applies changes file by file with explanations) → Cross-Link (verifies the build passes and suggests new features to enable). It covers v7→v8, v8→v9, and v9→v10.

Install the skills with one command:

npx skills add getsentry/sentry-for-ai --skill sentry-sdk-upgrade

Then ask your assistant to do the work:

What to say	What happens
"Upgrade my Sentry SDK to v10"	Detects your version, scans for deprecated APIs, migrates code file by file
"Add Sentry to my React app"	Sets up `@sentry/react` with error boundaries and routing
"Enable Sentry logging"	Configures Structured Logs in your `Sentry.init()`
"Monitor my OpenAI calls"	Adds `openAIIntegration()` with token tracking
"Add performance monitoring"	Configures tracing with the right integrations for your framework
"Fix the recent Sentry errors"	Pulls issues from Sentry and applies fixes

The upgrade skill is especially useful for the v7→v8 jump, where the number of API changes is large but most of them are mechanical renames. Your assistant can also wire up new features like Session Replay, Logs, or AI monitoring after the upgrade without you looking up the config. Skills are versioned and can be committed to your repo with dotagents so every team member gets the same setup.

"But upgrading is painful"

Let's be honest about the effort, then talk about the tooling.

v7 to v8 is the biggest jump. The performance monitoring API was rewritten on OpenTelemetry. Tracing concepts changed (transactions became spans), import order matters for Node.js auto-instrumentation, and 6 packages were consolidated. Expect a few hours for a typical app, more for complex setups with custom instrumentation.

v8 to v9 is moderate. Deprecated APIs were removed: Hub, getCurrentHub(), configureScope(), and others. If you fixed the deprecation warnings that v8 printed, this is straightforward. The hard gate is Node.js 18 minimum.

v9 to v10 is genuinely easy. 8 breaking changes, mostly internal. The CHANGELOG itself says "minimal breaking changes." OpenTelemetry v2 under the hood, FID removed (INP replaces it), and a handful of internal API cleanups.

Each jump gives you warning first. APIs are deprecated with console warnings for a full major version before they're removed. You don't get surprised.

The sentry-sdk-upgrade agent skill can automate much of this. It detects deprecated patterns with grep, applies mechanical renames automatically, and walks through complex changes with explanations. For the v7→v8 jump especially — where you'd otherwise be manually renaming dozens of integration constructors and scope APIs — the skill handles the tedious parts so you can focus on the genuinely tricky changes like custom instrumentation rewrites.

Migration guides with before/after code for every breaking change:

The practical upgrade path

Check where you stand.

# Check your version now
npm ls @sentry/react @sentry/nextjs @sentry/vue @sentry/angular @sentry/sveltekit @sentry/node 2>/dev/null | grep @sentry

Point your AI assistant at it, or read the migration guide. If you have Sentry agent skills installed, tell your assistant "upgrade my Sentry SDK to v10" — it'll detect your version, scan for deprecated APIs, and walk through the migration file by file. Otherwise, read the migration guide for your jump on docs.sentry.io (v7 to v8, v8 to v9, v9 to v10).
If you're more than 2 major versions behind, upgrade one version at a time. Going v7 to v10 in one PR is a recipe for confusing errors. Go v7 to v8, verify, then v8 to v9, and so on.
Start with the core SDK, then enable new features incrementally. Get the base upgrade working first. Then add Session Replay. Then Logs. Each one is an independent init() option or integration, so you don't have to adopt everything at once. Or install agent skills and let your AI coding assistant configure them for you.
Use debug: true during migration.
```
Sentry.init({
  dsn: '__DSN__',
  debug: true, // Logs SDK decisions to the console
});
```
This surfaces configuration issues, dropped events, and integration problems immediately.

Close the gap

The SDK team ships weekly. Every release you skip adds to the distance between where you are and what's available. With 4.8 million weekly @sentry/node installs still on v7, we know this isn't a small problem. That's why we've invested heavily in migration guides and agent skills — including the sentry-sdk-upgrade skill that can handle the migration for you — to make the path forward clear.

Pick one version jump. Read the migration guide. Close the gap.

# Check your version now
npm ls @sentry/react @sentry/nextjs @sentry/vue @sentry/angular @sentry/sveltekit @sentry/node 2>/dev/null | grep @sentry

Migration guides: docs.sentry.io/platforms/javascript/migration
Changelog: github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md
Agent skills: docs.sentry.io/ai/agent-skills

Fair Source Software in the AI age

Tue, 17 Mar 2026 00:00:00 GMT

Have you noticed AI recently? Yeah, us too. Generative AI is wreaking havoc on the software status quo, and that includes licensing, and that generates … opinions.

Sentry has a long history of having opinions about software licensing. We started life as an unlicensed side project in 2008, then went through BSD, to BSL, to writing our own license, FSL. Most recently, in 2024, we launched Fair Source to carve out an industry niche for the best of source-available licensing (including FSL): simple non-compete, eventually Open Source. Fair Source adoption is growing.

So what's going on with AI? How does it impact software licensing? Specifically, does Fair Source still work as intended? Is it still a safe option for your company? Spoiler alert: yes. Let's dive in.

The new AI moment

The software industry has taken a huge leap forward. As Andrej Karpathy put it, this shift happened "not gradually and over time in the 'progress as usual' way, but specifically this last December." The last round of AI models in 2025 (Opus 4.5 on November 24, Codex 5.2 on December 11) were the first ones good enough to depend on as standalone agents in a harness such as Claude Code or OpenCode, rather than as a glorified autocomplete within traditional IDEs like VS Code or Cursor.

On top of that, OpenClaw, the open-source AI personal assistant, exploded in popularity, demonstrating both the viability of vibe-coding ("I ship code I don't read"), and the overwhelming demand for agents to do more than just write code.

This Cambrian explosion raises questions across the software industry and more broadly in society. As far as licensing goes, what is the status quo that AI is upending?

Standard model

Since the 1970s, the international community has considered software to be "literary works for copyright purposes" (WIPO FAQ). This forms the basis for what we might call the standard model of software licensing: a human writes software, and the law automatically recognizes their copyright. The author is then free to give permission to others to use the software, modify it, distribute it, and so forth. The legal instrument for this is a license agreement.

A small subset of license agreements meet the criteria of the Open Source Definition (OSD), a document maintained by the Open Source Initiative (OSI) since 1998. (OSI does not have a legal trademark on the term "Open Source," but they do have a clear socio-historical claim on it.) Software under these licenses is Open Source software (OSS).

Another set of licenses meet the criteria of the Fair Source Definition (FSD), a document we wrote in 2023 to launch Fair Source, a movement complementary to Open Source that encourages companies to safely share their core software products. Software under these licenses is Fair Source software (FSS).

Open Source	Fair Source
read, run, modify, distribute	read, run, modify, distribute
	simple non-compete
	eventually Open Source

For completeness, Microsoft's Software License is an example of a license that fits neither OSD nor FSD, so the software they release under it is neither OSS nor FSS.

In practice, most companies are careful to choose the right license for their goals, and to respect the licenses of others. For example, at Sentry, we have an extensive internal policy on software licensing. We use FOSSA to help us manage our compliance with licenses of software we consume. Of course, we also go above and beyond the license terms of the OSS we consume, proactively funding its maintainers as a member of the Open Source Pledge (which we also started btw).

How AI disrupts licensing

LLMs disrupt software licensing in at least three ways:

LLMs are trained on public source code without much effort to respect license terms. Is it "fair use"?
LLMs make it very easy to rewrite libraries, potentially obviating copyleft licenses.
The output of LLMs will seemingly not be subject to copyright protection.

The second and third are even more of an issue since the December Leap. Let's look at each in turn, and then consider the implications for Fair Source software.

No putting the genie back in the bottle

LLMs are generating more and more of our code, but how were they trained? On publicly available sources, and we can say with near certainty that LLMs are not complying with license requirements, whether that's the strong restrictions of copyleft licenses like GPL, or even the minimal attribution restrictions of permissive licenses like MIT and BSD. When was the last time your coding agent provided an attribution notice with its suggestions? But even with this imputed use by LLMs, what is to be done about it now?

We've seen copyright holders in other industries like books, music and photography sue the LLM providers for copyright infringement. Although we have not seen court decisions come out of these suits yet, we have seen some of them result in monetary settlements. However, there is an important distinction between those media and open source software. The former have no express license terms granting broad rights to infringe copyrights in a manner that by their very definition is "technology-neutral". It definitely makes arguments of copyright infringement and breach of contract much harder to defend.

While we have seen an attempt by developers to enforce their copyrights against the LLM providers, that is facing challenges due to the difficulty in providing specific examples of copied code. This supports the model companies' "fair use" position. What if the infringement is not even done by making a copy of the code, but creating a derivative work based on existing software that has been completely rewritten by an LLM?

No stopping permissive rewrites

There has been a longstanding tension in the industry around the Next.js web framework. It's one of the most popular, and technically it is Open Source under the MIT license, but it can really only be used as a first-class citizen on one hosting platform. The OpenNext project exists to support Next.js apps on other platforms, but it has challenges. Because of this, Steve Faulkner from Cloudflare announced a new project called vinext that reimplements the Next.js API surface in the Vite framework, offering much better compatibility than OpenNext. What's notable is that Steve did it in a week using agentic coding.

In the wake of this, Steve Ruiz joked about taking tl;draw's test suite private, since the test suite was a major baseline for vinext. People like Malte Ube and Gergely Orosz took him seriously, showing just how much uncertainty there is, now that AI agents have brought the cost of coding down so much. Next.js is MIT, so it's fair game for Cloudflare to do a rewrite like this, so long as they provide attribution.

Much more controversial was a rewrite of a venerable Python library, chardet. The long-time maintainer made a good-faith effort to do a "clean-room" reimplementation. The controversy is that he then licensed it under MIT instead of the original author's choice, LGPL. The maintainer argued that it does not trigger the terms of the LGPL because he did not "modify a copy of the Library" (as the LGPL says), but rather did a ground-up rewrite. Chardet seems to be present in the training data of the LLM in question, but the maintainer presents a metrics-based case that the new code is not derived from the old codebase.

What, though, is the legal status of the LLM-generated output?

No copyrights on electric sheep

For a decade, Stephen Thaler has tried to win a copyright assignment for his "Creativity Machine" on images it hallucinated during a simulated near-death experience (yeah that's a rabbit hole). Last week the U.S. Supreme Court declined to hear Thaler's case, letting stand a lower court ruling that a significant human element is necessary to receive copyright protection (EU has a similar requirement). The U.S. Copyright Office is backing this up with what they will grant registration for, in line with the ground rules for their ongoing AI initiative (p 2):

In the Office's view, it is well-established that copyright can protect only material that is the product of human creativity. Most fundamentally, the term "author," which is used in both the Constitution and the Copyright Act, excludes non-humans.

No surprise, then, that their report last January on copyrightability states (p iii):

Copyright does not extend to purely AI-generated material, or material where there is insufficient human control over the expressive elements.
Whether human contributions to AI-generated outputs are sufficient to constitute authorship must be analyzed on a case-by-case basis.

The contest now shifts to the definition of "sufficient human control." However, the January report already draws a significant line: "prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectible ideas." (p. 18).

If prompts don't count, does human code review? Would review need to result in a significant human-authored change, or is reviewing the code enough? How is this demonstrated? If human maintainers look at some code but not other code, is the code they looked at under copyright, and the code they didn't, isn't? How much longer until there is no human code review at all? There seems to be precious little keeping AI-generated code within the bounds of copyright.

No worries with Fair Source

Fair Source was designed to allow companies to share code for their core software products without compromising their business model. It still does, even if the company uses AI to generate code. The key is that Fair Source offers another enforcement mechanism besides copyright infringement for rightholders. Software licenses are considered contracts between parties, and "breach of contract" is a separate violation of law that still applies, even if "copyright infringement" does not.

You can't use clones of Fair Source software to compete with the software you're cloning.

Since Sentry is leading the Fair Source movement, we want to make our position clear: you can't use clones of Fair Source software to compete with the software you're cloning. LLMs just make the process faster, they don't fundamentally alter the equation. Just because technology makes it easier to copy or make a derivative work, that doesn't make it permitted — and because FSL is a contract with its own terms that you agree to when you access the source code, the copyright status of the code doesn't really matter.

We are definitely in a shifting landscape regarding IP rights and artificially generated code. Cloud computing was a technology shift that highlighted some of the inherent limitations of Open Source licensing. AI is further turning OSS on its head, amplifying the distinction between OSS and FSS. It is more important than ever to make the right decision about how to license your project. Sentry is full steam ahead with Fair Source.

Choosing a JavaScript logging library: The 2026 definitive guide

Mon, 16 Mar 2026 00:00:00 GMT

With AI writing more and more of our code, properly monitoring and debugging that code has become an increasingly critical part of the development workflow that can't be ignored. Luckily, we have more time than ever to implement the right tools to do so.

Implementing a production-ready logging solution is easy to do, and provides you and your LLM Agents with a wealth of debugging information from your app, across users and environments.

Why you need a logging library

If you're still using console.log for debugging, you might be wondering why you should bother with a logging library.

High performance - Logging libraries are asynchronous, beating native console logging in performance.
Structured Outputs - Output structured objects rather than strings, and simplify managing additional context and child loggers.
Transports and Sinks - Send logs to one or more destinations, including the console, files, streams, and observability platforms.
Filtering - Filter logs by severity, category, or other criteria to reduce noise. Redact sensitive data before it leaves your application.
Integrations - Integrate with web frameworks, ORMs, and other libraries to automatically log context and errors with a consistent API across all layers of your application.
Trace-connected logging - With Sentry, logs are automatically trace-connected to errors and other events, making it easier to debug and correlate issues.

Picking a logging library

Here's how the big four stack up at a glance.

Library	Version	Runtime	Released	Transports / Sinks	Minified + gzip	Dependencies	Tree-shakable
Pino	10.2.0	Node	2016	✓	3.3 KB	11	❌
Winston	3.17.0	Node	2010	✓	38.3 KB	17	❌
Bunyan	1.8.15	Node	2012	✓	5.6 KB	0	❌
LogTape	2.0.2	Universal	2023	✓	8.3 KB	0	✅

Source: Bundlephobia API.

Quick selection guide

Pick Pino when you're Node-only and care most about speed and a small bundle.
Pick Winston when you want the most transports and configuration options and bundle size isn't a concern.
Pick Bunyan only if you're maintaining an existing codebase that already uses it (not recommended for new projects).
Pick LogTape when you need one logger for Node + browser/edge, or when writing a library that must work everywhere without forcing a choice on the app.

All of the libraries above support custom transports or sinks, so you can pipe logs to whatever backend you use. If you use Sentry for errors and performance, Sentry’s logging capabilities and integrations for Pino, Winston, Bunyan, and LogTape let you send logs into the same place as your issues and traces, so you can search and correlate without juggling multiple tools.

Pino

Best for: Node backends where speed and small bundle size matter.

GitHub: pinojs/pino · Docs: getpino.io · npm: pino

// Setup
const pino = require('pino');
const logger = pino({ name: 'user-service' });

// Usage
logger.info('Request received');

const child = logger.child({ userId: 'u-123', action: 'login' });
child.info('User action');

Pino was created in 2016 by Matteo Collina, creator of Fastify and member of the Node.js Technical Steering Committee. It’s one of the most popular and fastest JSON loggers for Node.js; it can run in the browser via a polyfill, but you lose most of the speed benefits there.

Key features:

Reports to be ~2.5x faster than Winston
Smallest bundle here (3.3 KB gzipped);
Node.js only; browser via polyfill
Pluggable transports and a wide ecosystem

Pino's popularity grew quickly as it provided a huge leap in performance at a smaller size than the competition at the time, and it provides sensible defaults out of the box. Every log will automatically include a timestamp, pid, and level, along with any structured data you provide.

Winston

Best for: Node apps that need a rich ecosystem of transports and familiar, flexible configuration.

GitHub: winstonjs/winston · Docs: GitHub README · npm: winston

// Setup
const winston = require('winston');
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  defaultMeta: { service: 'user-service' },
  transports: [new winston.transports.Console()],
});

// Usage
logger.info('Request received');
logger.info({ userId: 'u-123', action: 'login' }, 'User action');

Winston was released in 2010 by Charlie Robbins, a former Node.js Foundation board member (now OpenJS Foundation). It's the most popular and one of the oldest logging libraries for Node.js, with a large ecosystem and many built-in transports. The trade-off is it's the largest bundle size (38.3 KB) and 17 dependencies in this comparison.

Key features:

Many built-in transports (console, file, HTTP, and many community options) with flexible formatting
Mature, well-documented, and widely used in production
Not tree-shakeable; you pay for the full feature set in bundle size
Node.js only
No data redaction

To call Winston "legacy" would be a disservice, but it does follow an older design pattern that leads to a larger bundle size and more dependencies. Without question, Winston is your choice for mature, well-established Node.js applications that need a wide range of transports and flexible configuration right out of the box.

While all of the libraries mentioned in this list offer custom filtering capabilities, Winston does not explicitly support data redaction. Most loggers offer some form of redaction function that uses regex to replace private or sensitive data before it leaves your application.

Bunyan

Best for: Node services that want a simple, JSON-first API and minimal dependencies.

GitHub: trentm/node-bunyan · Docs: GitHub README · npm: bunyan

// Setup
const bunyan = require('bunyan');
const logger = bunyan.createLogger({ name: 'user-service' });

// Usage
logger.info('Request received');
logger.info({ userId: 'u-123', action: 'login' }, 'User action');

Bunyan was created by Trent Mick in 2012, making it one of the oldest libraries in this list. It's a simple JSON-first logger with zero dependencies and a small bundle size.

Key features:

Zero dependencies; small bundle (5.6 KB gzipped)
Node.js only
No data redaction

Some libraries are small and simple, and so don't require updating often. That said, it's been 5 years since the last release, and it doesn't appear there has been much activity in the GitHub repository. At this time, I am not recommending Bunyan for new projects, though it remains one of the most popular libraries for Node.js.

Like Winston, Bunyan does not support data redaction.

LogTape

Best for: Modern TypeScript applications and libraries designed to run on Node, Deno, Bun, browsers, and edge.

GitHub: dahlia/logtape · Docs: logtape.org · npm: @logtape/logtape

// Setup

await configure({
  sinks: { console: getConsoleSink() },
  loggers: [{ category: ["user-service"], lowestLevel: "info", sinks: ["console"] }],
});

// Usage

const logger = getLogger(["user-service"]);
logger.info("Request received");
logger.info("User action", { userId: "u-123", action: "login" });

LogTape is the newest library here (2023) and the only one in this list that’s fully tree-shakable and runs natively on every major JavaScript runtime: Node, Deno, Bun, browsers, and edge. Their comparison page goes deep on how it stacks up against Pino, Winston, Bunyan, and others.

LogTape reports to be ~2x faster than Pino, and over 10x faster than Winston. Its cross-runtime compatibility comes with a slight increase in bundle size, making it the second smallest library in the list, only beaten by Pino.

Key features:

Universal runtime: Perfect for full stack and serverless applications
Zero dependencies and tree-shakable
Hierarchical categories

LogTape is our "Editor's Favorite". New to the scene, LogTape is the only option in our list that runs natively on Bun, Deno, browsers, and edge platforms like Cloudflare Workers and Vercel Edge Functions.

Special mention

Sentry Logger

Best for: Sentry users, capturing with existing console logs, compatible with all runtimes.

GitHub: getsentry/sentry · Docs: docs.sentry.io

// Setup

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  enableLogs: true,
});

// Usage

Sentry.logger.info("User signed up", {
  userId: user.id,
  plan: "pro",
  referrer: "google",
});

If we are including libraries that aren’t dedicated to logging, Sentry actually takes the award for the newest logging library. The Sentry monitoring platform added logging in late 2025, which can be accessed via the same SDKs you may already be using if you are monitoring your application with Sentry.

Sentry’s logger is the only other logger besides LogTape in this list that is runtime agnostic. You can use Sentry in the browser, or anywhere you deploy your JavaScript code.

Sentry offers a wide selection of SDKs, meaning you can use Sentry’s standard logging across languages. Have a Go backend? Sentry supports logging there too.

Summary

Pino: Small and fast for Node; best when performance and bundle size are your top priority.
Winston: Most options and transports; best when you want one mature, configurable logger and aren't constrained by bundle size.
Bunyan: Tiny and simple JSON logger; Currently not recommended for new projects as it may no longer be receiving updates.
LogTape: Universal runtimes, zero deps, tree-shakable, library-friendly; best when you need one logger for Node and browser/edge, or when libraries need to log without forcing a choice on the app.
Sentry: Easily integrates into existing projects using the Sentry SDK; universal runtime; multiple SDKs for other languages.

What to do next

After picking a logging library, you'll want to start collecting structured logs, and send them to a monitoring platform like Sentry. All of the libraries mentioned above support sending logs to external sources via transports or sinks.

How to query and aggregate logs on Sentry
Trace-connected structured logging with LogTape and Sentry
[Video] Production Logging for JS with LogTape + Sentry

Routing OpenTelemetry logs to Sentry using OTLP

Thu, 05 Mar 2026 00:00:00 GMT

If you've already instrumented your app with OpenTelemetry, you don't have to rip it out to use Sentry. Two environment variables and your logs start flowing into Sentry, no SDK changes, no re-instrumentation. Here's how to set it up in a sample app, and when the native Sentry SDK might be the better call.

Why you'd use OTLP instead of the native SDK

The main advantage of OTLP is that your logging code stays decoupled from any specific observability backend. You can switch where logs go by changing a few config lines. That's useful if you:

Already have OpenTelemetry logging in place
Want to send logs to multiple backends
Need vendor-neutral instrumentation
Work with AI or LLM frameworks that use OpenTelemetry by default
Want to use the broader OpenTelemetry ecosystem

If you're starting from scratch and only need Sentry, the native Sentry SDK is probably the better call. With the native SDK, you get issue creation from logs, session replay integration, automatic breadcrumbs, and built-in error correlation. Ingesting OpenTelemetry traces and logs with Sentry via OTLP endpoints is still in beta and currently lacks these integrated features.

Guide prerequisites

Before we start, you need:

A Sentry account (the free tier works)
Node.js 18 or later installed
Basic familiarity with Express.js

If you don't have a Sentry project yet, create one now. Select Express as the platform. You can skip the DSN setup instructions because you'll use the OTLP endpoint instead.

Get your Sentry OTLP credentials

Sentry exposes separate OTLP endpoints for logs and traces. In this guide, we're focusing on the Logs endpoint. To find your OTLP credentials:

Click Settings in the left sidebar.
Under the Organization section in the Settings sidebar, click Projects.
Find your project in the list and click on it to open the project settings.
In the project settings sidebar, click Client Keys (DSN) under the SDK Setup section.
Select the OpenTelemetry tab. Click the Expand button to see all OTLP endpoint values.

Keep this tab open. We'll use the following values in the next step:

OTLP Logs Endpoint: The URL where Sentry receives logs (which looks like https://o{ORG_ID}.ingest.us.sentry.io/api/{PROJECT_ID}/integration/otlp/v1/logs)
OTLP Logs Endpoint Headers: The authentication header (which looks like x-sentry-auth=sentry sentry_key={YOUR_PUBLIC_KEY})

One thing worth knowing: most OTLP exporters expect headers as raw key/value pairs, not full header strings. You'll need to parse the header in your app. We'll handle this in the setup below.

Connect your OpenTelemetry app to Sentry

We'll use a sample payment processing service that already has OpenTelemetry logging instrumentation. You don't need to touch the logging code itself. Just point it at Sentry's OTLP endpoint.

Clone the starter app

Run the following commands to clone the payment processing app:

git clone https://github.com/getsentry/otlp-logging-sentry.git
cd otlp-logging-sentry
npm install

This app includes the OpenTelemetry SDK already configured, structured logging throughout, multiple log severity levels (INFO, DEBUG, WARN, and ERROR), and rich log attributes for every entry.

Configure Sentry as the OTLP destination

Create a .env file in the project root:

cp .env.example .env

Now edit .env and add your Sentry OTLP credentials from the previous step:

OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=https://o{YOUR_ORG_ID}.ingest.us.sentry.io/api/{YOUR_PROJECT_ID}/integration/otlp/v1/logs
OTEL_EXPORTER_OTLP_LOGS_HEADERS=x-sentry-auth=sentry sentry_key={YOUR_PUBLIC_KEY}
OTEL_SERVICE_NAME=payment-processing-service
PORT=3000

Replace the placeholders with your actual Sentry credentials. The OTEL_SERVICE_NAME will let you filter logs by service in Sentry later.

That's it. Two config lines and OpenTelemetry logs are flowing to Sentry.

Test the integration

Start the app:

npm start

You should see:

OpenTelemetry logging initialized
Service: payment-processing-service
Payment Processing Service running on http://localhost:3000

Generate some logs

In a new terminal window, send a request to process a payment:

curl -X POST http://localhost:3000/process-payment \
  -H "Content-Type: application/json" \
  -d '{"userId": "user123", "amount": 99.99, "paymentMethod": "credit_card"}'

You'll get a JSON response confirming the payment:

{
  "success": true,
  "transactionId": "txn_1730123456789_abc123def",
  "amount": 99.99,
  "currency": "USD",
  "status": "completed"
}

View the logs in Sentry

Now let's see what this looks like in Sentry's Logs view:

Go to your Sentry project.
Navigate to Explore in the left sidebar, then click Logs.

You'll see a list of log entries from your payment processing workflow. Each log shows a timestamp, severity indicator (colored dot), and message.

Explore log attributes

Click on any log entry to expand it and see all its attributes.

For example, the High-risk transaction detected log includes attributes like the following:

fraud_check.score: 97.98
fraud_check.threshold: 70
fraud_check.reason: unusual_amount_pattern
user.id: user123
transaction.id: txn_1762164637756_0hscczobm
severity: warn

All of these are searchable. To add any attribute as a filter, hover over it, click the overflow menu (three dots), and select Add to filter.

How OpenTelemetry logging works

Here's what's happening under the hood, in case you're applying these patterns to your own app.

OpenTelemetry SDK initialization

The instrument.js file configures the OTLP exporter and wires up the logger provider:






// Configure the OTLP log exporter
const logExporter = new OTLPLogExporter({
  url: process.env.OTEL_EXPORTER_OTLP_LOGS_ENDPOINT,
  headers: {
    'x-sentry-auth': process.env.OTEL_EXPORTER_OTLP_LOGS_HEADERS?.replace('x-sentry-auth=', '') || '',
  },
});

// Create logger provider
const loggerProvider = new LoggerProvider({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'payment-processing-service',
  }),
});

loggerProvider.addLogRecordProcessor(new BatchLogRecordProcessor(logExporter));

// Make logger provider available globally
global.loggerProvider = loggerProvider;

These are the key parts:

OTLPLogExporter sends logs to the OTLP endpoint.
LoggerProvider manages the logging system.
BatchLogRecordProcessor groups log records before export, which reduces network overhead at scale.

Emitting structured logs

The index.js file imports instrument.js first, then creates a logger and emits records:



const logger = logs.getLogger('payment-processing-service', '1.0.0');

Here's how we emit a structured log:

function log(severity, severityNumber, message, attributes = {}) {
  logger.emit({
    severityNumber,
    severityText: severity,
    body: message,
    attributes,
  });
}

// Example usage
log('INFO', SeverityNumber.INFO, 'Payment request received', {
  'user.id': userId,
  'payment.amount': amount,
  'payment.method': paymentMethod,
  'transaction.id': transactionId,
});

Each call to logger.emit() takes a severity level, a message body, and a set of attributes. The attributes are what make logs searchable — the more context you add here, the easier it is to find specific events later.

Log severity levels

OpenTelemetry supports six severity levels:


// TRACE (most detailed)
log('TRACE', SeverityNumber.TRACE, 'Function entry', {...});
// DEBUG (debugging info)
log('DEBUG', SeverityNumber.DEBUG, 'Validating payment', {...});
// INFO (informational)
log('INFO', SeverityNumber.INFO, 'Payment received', {...});
// WARN (warnings)
log('WARN', SeverityNumber.WARN, 'High-risk transaction', {...});
// ERROR (errors)
log('ERROR', SeverityNumber.ERROR, 'Payment failed', {...});
// FATAL (critical)
log('FATAL', SeverityNumber.FATAL, 'System failure', {...});

Adding rich attributes

The more attributes you add, the easier it is to debug issues. Here's an example from the fraud detection path:

log('WARN', SeverityNumber.WARN, 'High-risk transaction detected', {
  'user.id': userId,
  'transaction.id': transactionId,
  'fraud_check.score': 85.2,
  'fraud_check.threshold': 70,
  'fraud_check.reason': 'unusual_amount_pattern',
});

All these attributes are searchable in Sentry, so you can find specific transactions quickly without scanning log text.

OTLP vs native Sentry SDK

Both approaches send logs to Sentry. The difference is in what you get automatically.

Setup and configuration

OTLP

// instrument.js


const logExporter = new OTLPLogExporter({
  url: process.env.OTEL_EXPORTER_OTLP_LOGS_ENDPOINT,
  headers: {
    'x-sentry-auth': process.env.OTEL_EXPORTER_OTLP_LOGS_HEADERS
  },
});
const loggerProvider = new LoggerProvider({...});
loggerProvider.addLogRecordProcessor(new BatchLogRecordProcessor(logExporter));

Native Sentry SDK

// instrument.js

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  enableLogs: true, // Required for structured logging
});

Note that Sentry.logger requires Sentry JavaScript SDK v9.41.0 or above.

Emitting logs

OTLP


const logger = logs.getLogger('my-service', '1.0.0');
logger.emit({
  severityNumber: SeverityNumber.INFO,
  severityText: 'INFO',
  body: 'Payment request received',
  attributes: {
    'user.id': userId,
    'payment.amount': amount,
  },
});

Native Sentry SDK


Sentry.logger.info('Payment request received', {
  'user.id': userId,
  'payment.amount': amount,
});

With OpenTelemetry, you specify both severityNumber and severityText manually. The Sentry SDK infers both from the method you call (info(), warn(), and so on). The SDK also associates logs with errors, transactions, and user sessions automatically, without any extra setup.

Log levels

OTLP


logger.emit({ severityNumber: SeverityNumber.DEBUG, ... });
logger.emit({ severityNumber: SeverityNumber.INFO, ... });
logger.emit({ severityNumber: SeverityNumber.WARN, ... });
logger.emit({ severityNumber: SeverityNumber.ERROR, ... });

Native Sentry SDK

Sentry.logger.debug('message', {...});
Sentry.logger.info('message', {...});
Sentry.logger.warn('message', {...});
Sentry.logger.error('message', {...});

What's next

You now have OpenTelemetry logs flowing into Sentry. A few ways to get more value from here:

Add context to your logs. The more attributes you add, the easier it is to debug issues. Add user IDs, request IDs, transaction IDs, feature flags, or any relevant business context to every log entry.
Use consistent attribute naming. Follow OpenTelemetry Semantic Conventions for standardized attribute names. This keeps your logs consistent and easier to search across services.
Set up alerts. Configure Sentry alerts to notify you when certain log patterns appear — ERROR logs exceeding a threshold, or high-risk transactions crossing a fraud score cutoff.
Combine logs with traces. If you're also sending traces to Sentry, you can correlate them with logs to get a complete picture of your application's behavior.

OTLP logging support is still in open beta. If you run into a limitation not listed here, open an issue on GitHub. That's the fastest way to get it on our radar.

Sentry Blog

Works on my machine: how we use AI to reproduce reported bugs

Bug triage flow

Boilerplate

Reproduction papertrail

repro skill + repository

Example run on the Python issue

Lessons on writing skills

Full automation?

Errors, traces, logs, metrics: when to reach for what

Errors, traces, logs, metrics: one question each

Errors: "What just broke?"

Traces: "Did the request flow the way it was supposed to?"

Metrics: "How's this trending over time?"

Logs: "What was happening at this point in the code?"

A real(ish) world example

Did anything crash?

Was anything slow, or did the request go off-path?

Can we dig a little deeper?

How many people hit it?

When to reach for what

Span attribute or metric?

Log or span?

Log or metric?

Error or log?

What the instrumentation looks like

Right tool for the job

You're not the only one debugging your codebase anymore

But what about wide events?

Getting started

How we cut build times by two-thirds by deleting our CMS

The problem: the "headless" headache

The solution: Astro and the power of "just files"

AI-native content management (without the SaaS bloat)

The process: how we did it

Scoping

Building with bots

Testing with bots

The DOM-inspector MCP

What worked for us

Updating content (also with bots)

Skills for the command line

Skills to update content

Things to consider

Fixing the "rate limit problem"

The results: reliability as a feature

You don’t need to pick one: how Sentry and OpenTelemetry work together

Sentry vs OpenTelemetry is the wrong question

Direct OTLP vs Collector forwarding

A demo architecture

The frontend uses the Sentry SDK

The backend keeps OpenTelemetry

The checkout flow adds manual OTel spans and logs

The Collector forwards OTLP to Sentry

What each layer is responsible for

What you can see in Sentry

A decision tree for your own app

Start with the smallest change that preserves your trace

Your agent can't fix what it can't see

Why draft PRs, not auto-merge

Two ways to give your agent production context

From alert to draft PR

Try it with Cursor Automations

What's next

The product analytics you already have

Every product question maps to telemetry you already have

What it looks like in practice

The skills already transfer

Where this gets hard (and why it's getting easier)

Try it

New ways to agentically build and edit dashboards

What's new

Agentic dashboard generation and editing

Dashboard revision history

Sentry use case: fixing and monitoring jest tests

Insights are now Sentry dashboards

Creating dashboards via the Sentry CLI

Sentry use case: investigating integrations

Get started

From vibe code to production-ready: observability for Next.js and Supabase apps

`repro` skill + repository