Appwrite has launched Appwrite Arena, an open-source benchmark that evaluates how well large language models understand Appwrite's services, SDKs, and APIs. It tests models across 191 questions in 9 service categories (Auth, Databases, Functions, Storage, Sites, Messaging, Realtime, CLI, and Foundation) using both deterministic multiple-choice and AI-judged open-ended formats. Models are tested with and without Appwrite Skills files to measure how much documentation context improves performance. Early results show GPT-5.4 leads with skills enabled, while Claude Opus 4.6 leads without. All questions, answers, and scores are fully open source on GitHub.

1 Comment

Sort: