The discontinuation of Hugging Face’s Open LLM Leaderboard has led to the creation of the LLM Evaluation Framework, a tool designed for reproducible and extensible benchmarking of large language models (LLMs). The framework supports multiple model backends, quantized models, comprehensive benchmarks, and offers detailed
Table of contents
LLM Evaluation FrameworkReplicate Huggingface Open LLM Leaderboard Locally🧩 Empowering Transparent and Reproducible LLM Evaluations🚀 Getting Started🧪 Example: Evaluating Your Model on the LEADERBOARD Benchmark📊 Reporting and Results📄 How the Evaluation Report Looks1. 📊 Summary of Metrics2. 📈 Normalized Scores3. 🔍 Task Samples (Detailed Examples)⚙️ Customization🔧 Extending the Framework🤝 ContributingSort: