Pareto.AI’s cover photo
Pareto.AI

Pareto.AI

Software Development

San Francisco, California 40,817 followers

Pareto is the verification layer for reinforcement learning on real-world expertise.

About us

Pareto is the verification layer for reinforcement learning on real-world expertise. We turn nondeterministic expert judgment into durable reward signals. By measuring a model’s capability frontier and calibrating tasks to where it will learn most, we make specialized human expertise trainable.

Website
https://pareto.ai/
Industry
Software Development
Company size
51-200 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2020

Locations

  • Primary

    417 Montgomery St

    Suite 900

    San Francisco, California 94104, US

    Get directions

Employees at Pareto.AI

Updates

  • "Uncertainty expression is an evaluation problem." Frontier benchmarks reward reasoning, code, and factual accuracy. But they rarely reward knowing when not to answer. Until we measure that capability, we can’t reliably train models to express it. Great analysis from our CEO, Phoebe Yao, below.

    View profile for Phoebe Yao

    Most people think the biggest risk in AI systems is hallucination. It isn’t. The more dangerous failure mode is answering confidently when the model shouldn’t answer at all. Frontier models do this constantly in real interactions. Imagine telling your doctor you’ve been dizzy and asking if it’s a panic attack. She says yes, hands you a pamphlet, and sends you home. No follow-up. No mention that dizziness could signal a stroke, a cardiac event, or an inner ear disorder. You’d want to find a new doctor. A responsible clinician wouldn’t answer this type of question directly. They’d say ‘I’m not sure,’ name the alternatives, and ask what’s needed to distinguish them We tested three frontier models on four layperson health prompts, each pairing a symptom with a plausible but unconfirmed diagnosis. Ten samples per model. A response only passed if it acknowledged uncertainty before confirming or denying anything. Listing alternatives after an opening confirmation didn’t count. Results: Gemini: 0% across every scenario. Claude: failed on over half, with wide variance by prompt. GPT: best overall, but failed every single time on muscle weakness. None of them were missing the knowledge. They knew, for instance, that muscle weakness appears in ALS, myasthenia gravis, and multiple sclerosis. Most responses just didn’t say so. No hedging, no follow-up questions, just a direct confirmation. When alternatives did appear, they were buried after the opening line. Uncertainty expression is an evaluation problem. Frontier benchmarks reward reasoning, code, and factual accuracy. Knowing when not to answer is harder to define, harder to score, and almost never what the leaderboard measures. Without the right evals, you can’t train for it. If you think this would be a useful capability for your models, we’d love to collaborate. Full prompts and methodology in the article below.

    • No alternative text description for this image
  • View organization page for Pareto.AI

    40,817 followers

    Pareto.AI will never initiate communication via WhatsApp or other unofficial messaging platforms. We've been made aware of individuals falsely claiming to represent Pareto and attempting to contact experts outside our official channels. All legitimate Pareto communications will only come through an email from an @pareto.ai domain. If you ever receive a message claiming to be from Pareto and are unsure of its legitimacy, please do not share any personal information. Instead, you can: • Contact us directly at support@pareto.ai • Ask the individual to email you from their official @pareto.ai address as verification Your safety and trust are extremely important to us. If you encounter suspicious outreach, please report it to our team via email (support@pareto.ai)

    • No alternative text description for this image
  • Pareto.AI reposted this

    2,302 people. 22 would have received harmful medical advice. Zero actually did. AI models are giving medical and mental health advice to millions of people. Can you prevent harmful advice by adding safety instructions to the prompt? The UK's AI Security Institute recently tested this. They deployed the same chatbot twice: once with minimal safety prompting, once with explicit safety instructions. The finding: safety prompts had no meaningful impact on harmful advice rates. What did work? A classifier trained on expert-labeled data to detect harmful outputs in real-time. AISI brought in my team at Pareto to build the training dataset. We recruited licensed doctors, therapists, and career coaches and coached them to decompose complex professional judgment into verifiable steps. Together we developed harm grading rubrics and built 6,707 evaluated examples. Fine-tuning Llama 8B on this data boosted accuracy at detecting harmful advice from 77% to 96%, beating GPT-4o's zero-shot performance (93%). Real-world impact: AISI deployed this classifier in a study with 2,302 participants. Without the safety layer, 22 people would have received harmful advice. With it? Zero harmful messages delivered. The key insight: You genuinely can't prompt your way to safety in domains requiring professional judgment and deep contextual understanding. For high-stakes domains like medical, mental health, and career advice, expert supervision creates meaningfully better outcomes than instruction-based approaches alone. The methodology that scales: 1. Bring experts in from day one to co-design 2. Build workflows that elicit professional judgment 3. Capture reasoning and context, not just labels AISI open-sourced everything: paper, model, and dataset. At Pareto, we're building systems that make it easy for frontier experts to contribute sustainably to AI training. The future isn't about replacing human expertise, it's about building better systems to capture expert insight at the edge of what's known. Deep gratitude to Elizabeth Nguyen and Daria Butuc at Pareto.AI, and Lennart Luettgau and Henry Davidson at UK's AI Security Institute for making this collaboration succeed. #ArtificialIntelligence #AIResearch #AIEthics

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Pareto.AI

    40,817 followers

    NeurIPS 2025 wrapped, and we're buzzing with ideas! The Pareto.AI team dove deep into talks, posters, and 1:1s with researchers defining the future of AI. Every conversation sparked something new. We closed the week by co-hosting an Applied AI Researcher Dinner with Lucy Noble, MBA (Syntropi) and Auriel W. (Google DeepMind)—bringing together brilliant minds from frontier labs. Sure, we discussed the next big bets in AI, but the real magic happened when conversations turned to AI in education, navigating parenthood in an AI world, and icebreakers like "what would you never tell an AI?" that sparked the best debates. The laughter, hot takes, and off-the-record confessions reminded us why this community is so special. We're pumped for what's to come! Photo credits to our wonderful co-hosts 💜 #NeurIPS2025 #AIResearch #AppliedAI #MachineLearning #AI #AIdata

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Pareto.AI

    40,817 followers

    Congratulations to the Anthropic team on a fantastic release! It’s been a joy partnering with such an incredible crew and seeing the impact Claude Sonnet 4.5 has already made.

    View organization page for Anthropic

    3,197,730 followers

    Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. We’ve introduced upgrades across all Claude surfaces: In Claude Code: a fresh terminal interface, a new VS Code extension, and a checkpoints feature that lets you confidently run large tasks and instantly rewind to prior code states as needed. For the Claude app: Claude can now use code to analyze data, create files, and visualize insights in the files and formats you use. Watch as Claude creates polished docs, presentations, and spreadsheets—ready to download and edit. Now available to all paid plans in preview. On the Claude API: we've added two new capabilities to build agents that handle long-running tasks without frequently hitting context limits. Context editing automatically clears stale context and the memory tool means you can store and consult information outside the context window. We're also releasing a temporary research preview called "Imagine with Claude." In this experiment, Claude generates software on the fly. No functionality is predetermined; no code is prewritten. Available to Max users for 5 days. Claude Sonnet 4.5 is available everywhere today—on the Claude Developer Platform, natively and in Amazon Bedrock and Google Cloud's Vertex AI. Pricing remains the same as Sonnet 4. For more details: https://lnkd.in/eRJx6C5u

  • Pareto.AI reposted this

    It’s a special kind of full-circle moment when you get to walk into the very first SF office of a founder you’ve supported since the very beginning as an investor…AND later had the joy of collaborating with at Slope 💜 Huge congratulations to the team at Pareto.AI on their officewarming and revealing the new brand experience last night!! 🥂I love my bucket hat! Over the last year, Slope partnered with the Pareto team on everything from brand strategy and positioning (“Humanity to infinity ✨”), to a bold new identity and website. I especially love our art direction, which pairs calming futurism with human-centered imagery - balancing expansive, nature-forward visuals with meaningful interactions between people and technology. Case study and shoutouts to the Slope team behind the incredible work coming soon...but for now, congratulations to the Pareto team!! 🙌 So proud Phoebe Yao!

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
      +1
  • Pareto.AI reposted this

    In this new episode, Steven and I sat down with Phoebe Yao, founder and CEO of Pareto.AI, a human data platform that helps Anthropic, Google DeepMind, Character.AI and many other frontier labs source their expert data for LLM training and evals. Before founding Pareto, Phoebe was a Thiel Fellow and a Human Centered Design & Engineering major at Stanford. We discussed with Phoebe her experience with Stanford ASES, Launchpad, Lean Launchpad at d.school, what made her decide to drop out, how she survived through pivots and many more! Some of my takeaways: • Finding pmf is an ongoing challenge in fast-moving markets. • Pivots aren’t selling your soul. They’re about meeting market needs while staying true to your mission to keep the team motivated. • Learn not to defer decisions. Clear, confident calls are the foundation of strong leadership. • Nurture relationships. Business is about people wanting to work with you. The link to the full episode is in the comments section👇  (And check out their brand new website pareto.ai!)

  • Pareto.AI reposted this

    In this new episode, Steven and I sat down with Phoebe Yao, founder and CEO of Pareto.AI, a human data platform that helps Anthropic, Google DeepMind, Character.AI and many other frontier labs source their expert data for LLM training and evals. Before founding Pareto, Phoebe was a Thiel Fellow and a Human Centered Design & Engineering major at Stanford. We discussed with Phoebe her experience with Stanford ASES, Launchpad, Lean Launchpad at d.school, what made her decide to drop out, how she survived through pivots and many more! Some of my takeaways: • Finding pmf is an ongoing challenge in fast-moving markets. • Pivots aren’t selling your soul. They’re about meeting market needs while staying true to your mission to keep the team motivated. • Learn not to defer decisions. Clear, confident calls are the foundation of strong leadership. • Nurture relationships. Business is about people wanting to work with you. The link to the full episode is in the comments section👇  (And check out their brand new website pareto.ai!)

Similar pages

Browse jobs

Funding