LLM Skirmish is a novel adversarial benchmark where frontier LLMs compete in 1v1 real-time strategy games by writing JavaScript code that executes in a Screeps-based game environment. Tournaments run over five rounds, allowing models to adapt strategies based on previous results, testing in-context learning. Results show Claude

7m read time From llmskirmish.com
Post cover image
Table of contents
TL;DRIntroductionOverall StandingsObjectiveTournament SetupAgent SetupIn-context LearningModel Cost EfficiencyModel Breakdown

Sort: