Announcing our new $115M pre-seed and early-stage Fund III—dedicated to startups from the remarkable 600,000-person-strong @Cal network, based in Berkeley and worldwide.🧵
Introducing Agent Arena: real-world agentic evals at scale.
How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks.
On Arena, models now get web search, filesystem, and terminal tools to complete complex
Super excited to launch Agent Mode on Arena. This is a huge milestone. Real agentic work has been hard to benchmark… until now. See how top frontier models handle multi-step workflows with search, bash, and file writing. Come break things, run deep research, and see who takes
Introducing Agent Arena: real-world agentic evals at scale.
How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks.
On Arena, models now get web search, filesystem, and terminal tools to complete complex
Agent Arena gives every model access to a Claude-Code-like harness and a computer. Our users went nuts, generating millions of real traces per week. We used this data to build the first large-scale benchmark of agent usefulness in the wild.
We analyze agents by collecting many
Introducing Agent Arena: real-world agentic evals at scale.
How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks.
On Arena, models now get web search, filesystem, and terminal tools to complete complex
Excited to announce @fliptexts’ $1.4M pre-seed, and that we’ve joined @a16z@speedrun!
When @notolegv, @ajiang_xyz and I started building in January, we were stunned at how outdated money apps still are. So we built the finance-expert friend we always wished we had: one that
Lots of exciting news to share today!
1. @RunLLM is now @Herald_Dev. The new name reflects the fact that our AI SRE is the only product on the market that operates autonomously — teaching itself about your product & infra, detecting early warning signs of incidents, and
The Redpoint InfraRed 100 is now live.
These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on.
Congratulations to this year's
Thanks @Forbes for the coverage. We want to give all defenders access to frontier-level security, today.
We're offering $5m in credits to maintainers of critical OSS. Apply here:
The top 5 labs in Text Arena rankings by category show that frontier models have distinct strengths and tradeoffs.
#1 @AnthropicAI, Claude Opus 4.7
- The most consistently dominant model overall, leading top-tier across nearly every major category.
#2 @GoogleDeepMind, Gemini
Today we're launching the Open Defense Initiative: up to $5 million in @depthfirstlabs credits for critical open source projects to find and fix real, exploitable vulnerabilities.
The timing matters: frontier models can autonomously discover and exploit vulnerabilities in
.@depthfirstlabs closes $80M Series B led by Meritech Capital, less than 90 days after their Series A led by Accel!
Qasim Mithani, Daniele Perito, and @andreamichi lead this top applied AI lab on a mission to secure the world’s software.
The timing couldn't be more critical. AI
depthfirst has raised an $80M Series B at a $580M valuation.
Attackers are using AI to break into systems faster than ever before. depthfirst is on a mission to stop this.
RT + Comment “depthfirst” and I’ll send you a FREE vibe coding security agent.