Kodus reposted this
Good to see more unbiased, open-source benchmarking efforts in AI code review. Martian is trying to measure what actually matters for code review agents: catching bugs, improving code, and getting suggestions adopted by developers, instead of just repeating product claims. Also really nice to see Kodus among the tools that showed one of the biggest gains after the offline fixes, at around +10% F1. There’s still a long way to go for the whole category, but more open evaluation like this is good for everyone building seriously in the space.