Advances in Machine Learning have led to larger input sizes for models. This post introduces the BABILong framework, a benchmark for testing NLP models on lengthy documents. The framework evaluates how well generative models handle lengthy contexts and separate relevant details. The research team has also conducted an analysis
Sort: