An eval platform is more than just a test runner. Evals require shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes. Phillip Hetzel explains the design principles behind Braintrust's platform in this session from AI Engineer Europe. https://lnkd.in/e9bTXvsK
About us
Braintrust is the AI observability platform helping teams measure, evaluate, and improve AI in production. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.
- Website
-
https://braintrust.dev/
External link for Braintrust
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco
- Type
- Privately Held
- Founded
- 2023
Products
Braintrust
Automated Testing Software
Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools to improve it. Teams at Notion, Stripe, Zapier, Vercel, and Ramp use Braintrust to compare models, test prompts, and catch regressions — turning production data into better AI with every release.
Locations
-
Primary
Get directions
San Francisco, US
Employees at Braintrust
Updates
-
Evals course module ten: building a multi turn chat app. Move from single-turn to multi-turn use-cases by building a chatbot CLI app with production logging. Use init_logger, wrap_openai, and @traced to capture every conversation as a single trace. More here → https://lnkd.in/dJqiKkAF
-
-
For AI PMs, evals are the new PRD. At Product-Led Alliance Summit New York, Ameya Bhatawdekar discussed the new product development loop and how to translate every element of a traditional PRD into its eval equivalent. Watch here → https://lnkd.in/gs2DHeSV
-
-
Evals course module nine: how to analyze your eval results. Learn about the four ways to analyze eval data: experiment comparison, Loop queries, the Braintrust MCP server, and manual filtering in the UI. More here → https://lnkd.in/gjFmpPvU
-
-
Evals course module eight: how to read a trace. Learn about traces and how to navigate them in the Braintrust UI. Understand span types (root, LLM, scorer, function, task, tool) and use chain-of-thought reasoning to debug scores. More here → https://lnkd.in/gu26fpk9
-
-
If you're building AI products but aren't writing evals, this is the place to start. In Evals for engineers, solutions engineer Doug Guthrie will show you how to: - Instrument an agent with the Braintrust SDK - Look at traces across model calls, tool use, and outputs - Build datasets from failure modes and write scoring functions - Iterate on your prompt and measure quality over time
-
Braintrust reposted this
PMs aren't writing PRDs anymore. Braintrust's Ameya Bhatawdekar on why the new PM superpower in the age of AI is evals. TL;DR: Find where your AI fails → build better evals → ship with confidence
-
Earning stakeholder trust means making signals from your eval and observability data legible across your organization. Braintrust does this in three ways: - Dashboards aggregate metrics across logs and experiments. - Custom trace views turn complex traces into domain-specific interfaces. - Loop translates natural-language questions into SQL over your production data. Read more → https://lnkd.in/gjb-6uGu
-
-
Evals course module seven: how to deal with nondeterminism. Why the same eval can produce different scores across runs, and what to do about it. How temperature affects variance and how trial_count averages results for reliable signal. More here → https://lnkd.in/dtZstRi6
-
-
Braintrust x Nasdaq Thank you to Wing Venture Capital and congratulations to everyone on this year's Enterprise Tech 30.
-