LLM Skirmish is a novel adversarial benchmark where frontier LLMs compete in 1v1 real-time strategy games by writing JavaScript code that executes in a Screeps-based game environment. Tournaments run over five rounds, allowing models to adapt strategies based on previous results, testing in-context learning. Results show Claude
•7m read time• From llmskirmish.com
Table of contents
TL;DRIntroductionOverall StandingsObjectiveTournament SetupAgent SetupIn-context LearningModel Cost EfficiencyModel BreakdownSort: