AI labs are increasingly using crowdsourced benchmarking platforms like Chatbot Arena to evaluate models, but experts argue these benchmarks have significant flaws. They criticize the lack of construct validity and allege that AI labs may exploit these benchmarks for exaggerated claims. Experts suggest more dynamic, diverse,

4m read timeFrom techcrunch.com
Post cover image

Sort: