When using Sonnet 4.5 in Devin (probably works in other agents too?), I found it surprisingly effective to just add "when you're done, self-critique your work until you're sure it's correct". Had a few cool cases where it caught issues I would have flagged in a first pass review
Joined April 2021
- sonnet 4.5 feels like the biggest qualitative jump since the newer sonnet 3.5 came out. I've been using it for the last few days in Devin & found some new behaviors I haven't seen in other models: - can manage it's own context well -> it starts to write down notes in markdownWe rebuilt Devin for Claude Sonnet 4.5. Available starting today as an Agent Preview thatโs over 2x faster and 12% better on our Jr. Developer Evals.
00:00 - Very excited about the launch of ryan-100T-2025-07-28. Given the positive results in early testing, we think this massive 18 year training run is just scratching the surface of whatโs possibleCognition is the first AI lab to win a verified gold medal at the IOI. Our human, ryanbAI (@ryanbai1412), placed 7th overall. Impressively, ryanbAI competed under the exact same conditions as all human contestants.
- there's a discrete speed threshold that separates "sync" and "async" coding agent experiences. SWE-grep/SWE-grep-mini are pushing the pareto of what's possible below this line, so you can get more done faster while remaining in flowIntroducing SWE-grep and SWE-grep-mini: Cognitionโs model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent โ or try it in our new playground!
00:00 - [1/6] Excited to share โRLVF: Learning from Verbal Feedback without Overgeneralizationโ Our method C3PO fine-tunes an LLM from 1 sentence of feedback and decreases overgeneralization (=applying the feedback when it should not be applied). Details: austrian-code-wizard.github.io/c3po-website/
GIF - merged first first Devin PR of the day in the uber โ
- Replying to @cognition and @cognition_labsthe future of agents is parallelization!
- Devin is the best agent out there by an order of magnitude & I can't wait to see it used widely in the real world. Huge kudos to @cognition_labs Couldn't be more excited to join the teamToday we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is
00:00 - out: slop in: understanding your code
00:00 - Just shipped Devin project templates ๐ข! These are pre-configured repos for building apps from scratch with Devin. They're designed to be simple to run & test locally, so Devin has a built-in feedback loop. If you ask Devin nicely, it will even deploy your apps Might add more
- I barely use a regular IDE anymore... we've worked hard on shipping new IDE + interactivity features in Devin & it's improved my own success with Devin a lot. Excited to hear your feedback and feature requests!Introducing Devin 2.0: a new agent-native IDE experience. Generally available today starting at $20. ๐งต๐
00:00 - Cognition has signed a definitive agreement to acquire Windsurf. The acquisition includes Windsurfโs IP, product, trademark and brand, and strong business. Above all, it includes Windsurfโs world-class people, whom weโre privileged to welcome to our team. We are also honoring
00:00 - had some cool runs this week where Devin successfully debugged fullstack changes โ getting a video demo of the change is awesome
00:12Devin now has full computer use capabilities and can share screen recordings. You can control desktop apps, build and QA mobile apps, and automate tedious work. Here are some examples that blew our team away: 1. Making a desktop game











