Install Huzzler App
Install our app for a better experience and quick access to Huzzler.
Head of growth at Kodus
Posts
Every dev has an AI “assistant” in their editor now. LLMs are great for the day-to-day, but let’s be real: writing code is the fun part.
Code Review? Not so much.
So we had to ask: Can LLMs actually review PRs? Or do they just throw out generic suggestions that sound useful but don’t hold up in practice?
We ran a benchmark comparing Kody vs. LLMs (GPT & Claude) to see who really delivers meaningful code reviews. The early data makes one thing clear: they’re not the same.
⚠️ One thing before we dive in: this benchmark is a work in progress. We know the dataset is still small, but the goal is clear: push LLMs to their limits—and see where they break.
See what we found: https://kodus.io/en/benchmarking-code-reviews-kody-vs-raw-llms-gpt-claude/
LLMs alone aren't great at reviewing code—they produce noisy, irrelevant, or even incorrect comments far too often.
We open-sourced Kodus, our AI-powered code review platform built specifically to address this problem. Instead of relying purely on GPT models, we use a deterministic, AST-based rule engine to provide precise, structured context directly to the LLM. The result is a dramatically reduced noise rate, fewer hallucinations, and comments you can actually trust (and merge).
A quick rundown:
- Hybrid approach (AST + GPT): Precise, deterministic context feeding into the LLM reduces false positives and irrelevant suggestions.
- Self-hostable & Open Source: Run on your own infra/cloud—no code leakage, no data privacy concerns.
- Customizable rule engine: Easily define and share context-specific review rules across your team and community.
We'd love your feedback, suggestions, or criticisms—especially if you've experienced frustration with purely GPT-based review tools.