Just Java
frankconnolly's profile
Frank

@frankconnolly•Aug 19, 2025
5.8K
FINEOS's profile
Post cover image

New Java Benchmark for Coding LLMs puts GPT-5 at the top

Avatar of foojayioFoojay.io•From foojay.io•Aug 18, 2025•8m read time

The Brokk Power Ranking introduces a new open-source benchmark for evaluating coding LLMs using 93 real-world Java tasks from large codebases. GPT-5 dominates performance across all categories and price points, though it suffers from slower inference speeds. The benchmark addresses limitations of existing tools like SWE-bench by using fresh, complex tasks that better reflect real-world coding scenarios. Chinese models performed worse than expected, and the study reveals that context length and task complexity significantly impact model performance.

Sort:

frankconnolly's user avatar
Frank
@frankconnolly
Joined Apr 4. 2024
5.8K
FINEOS's profile

FINEOS

Verified

Developer

Would you recommend this post?

Copy link
WhatsApp
Facebook
X
New Squad
  • © 2026 Daily Dev Ltd.
  • Guidelines
  • Explore
  • Tags
  • Sources
  • Squads
  • Leaderboard