Every dev has an AI “assistant” in their editor now. LLMs are great for the day-to-day, but let’s be real: writing code is the fun part.
Code Review? Not so much.
So we had to ask: Can LLMs actually review PRs? Or do they just throw out generic suggestions that sound useful but don’t hold up in practice?
We ran a benchmark comparing Kody vs. LLMs (GPT & Claude) to see who really delivers meaningful code reviews. The early data makes one thing clear: they’re not the same.
⚠️ One thing before we dive in: this benchmark is a work in progress. We know the dataset is still small, but the goal is clear: push LLMs to their limits—and see where they break.
See what we found: https://kodus.io/en/benchmarking-code-reviews-kody-vs-raw-llms-gpt-claude/
Login to post a comment.