We partnered with @trajectorylabs to post-train NVIDIA Nemotron 3 Ultra for legal. Here’s what we found:
1) Open-weight models can reach frontier legal performance.
On our Legal Agent Benchmark (LAB), Nemotron 3 Ultra started at a 0% all-pass rate. After post-training, it
Trajectory
25 posts
Building the platform for Continual Learning
Joined December 2025
- Replying to @trajectorylabs4/ The shift is that the most valuable training signal already lives inside companies. Every brief, edit, correction, review, approval, and workflow is proprietary data that can make their models better. Open models give companies the weights. Trajectory helps them turn their
- 3/ On held-out Harvey LAB tasks, base Nemotron 3 Ultra had a 0% all-pass rate. After post-training, it reached 5.8% — between Sonnet 4.6 and Opus 4.6, and above post-trained Nemotron 3 Super. On rubric criteria, it reached 83%, alongside leading closed models.
- Replying to @trajectorylabs2/ A few weeks ago, we post-trained NVIDIA Nemotron 3 Super on Harvey LAB, a set of expert tasks and trajectories on legal domains. Then Nemotron 3 Ultra shipped, pushing the frontier even further. Because Trajectory’s learning layer is model-agnostic, we pointed the same
- Trajectory repostedNemotron 3 Ultra from @nvidia is out today and available on Tinker day one! The flagship from the Nemotron family is built for long-running agents; @trajectorylabs have been using it in early access to power continual learning workflows.
- Trajectory repostedWe worked with @trajectorylabs to run their SDPO++ algorithm on APEX-Agents and see what it could do with real production data. Pass rates went from 5% to 25% on GPT-OSS-120B, and the curve is still climbing. Read more about our work together in their blog post below.5 Days of Trajectory 🏹Day 5: Scaling SDPO to Agentic Tasks Continual learning means you must train on data from production. But production gives you one example per task. A user makes a request once. You get one trajectory, not a batch. However, current RL algorithms
- 5 Days of Trajectory 🏹Day 5: Scaling SDPO to Agentic Tasks Continual learning means you must train on data from production. But production gives you one example per task. A user makes a request once. You get one trajectory, not a batch. However, current RL algorithms
- 🏹 5 Days of Trajectory. Day 4 - Why We’re Building Trajectory AI is the most capable software ever built. You correct it. You teach it what you want. However, the next session starts, and the learning is gone. This is deeply unnatural - nothing intelligent works this
- We’re taking a quick break for the 5 days of Trajectory, but wanted to take this time to say that we’ve been named to @Redpoint’s 2026 Infrared 100 as one of the companies shaping the future of AI infrastructure. We're so grateful for the recognition so early in our journey,


















