MLCMU's platform is  dedicated to providing insights and resources for machine learning researchers and practitioners. Through articles, research papers, and tutorials, MLCMU offers insights into machine learning algorithms, deep learning models, and AI applications. Readers can learn about  research projects, experimental methodologies, and real-world applications of machine learning to advance their knowledge and skills in the field.

ML CMU

A framework for validating LLM-as-a-judge systems when rating tasks have multiple correct answers (rating indeterminacy). The approach uses response set elicitation instead of forced-choice ratings, aggregates disagreement into multi-label vectors, and measures human-judge agreement with continuous metrics like MSE. Experiments across nine commercial LLMs and eleven rating tasks show that traditional forced-choice metrics select suboptimal judge systems, while the proposed multi-label approach correctly identifies high-performing judges for downstream tasks like content filtering and prevalence estimation.

Validating LLM-as-a-Judge Systems under Rating Indeterminacy

A Framework for Meta-Evaluation under Rating Indeterminacy