Large Language Models (LLMs) have immense potential to revolutionize healthcare by tackling complex medical tasks and improving patient care. However, using LLMs in the medical domain comes with challenges, including the need for accuracy and reliability. The Open Medical-LLM Leaderboard aims to address these challenges by providing a standardized platform for evaluating and comparing the performance of various LLMs in healthcare. The leaderboard includes datasets such as MedQA, MedMCQA, PubMedQA, and MMLU subsets. To submit a model for evaluation, it needs to be converted to the safetensors format, compatible with AutoClasses, and publicly accessible.
Table of contents
Datasets, Tasks, and Evaluation SetupInsights and AnalysisSubmitting Your Model for EvaluationWhat's next? Expanding the Open Medical-LLM LeaderboardCredits and AcknowledgmentsAbout Open Life Science AICitationSort: