LongICLBench Benchmark: Evaluating Large Language Models on Long In-Context Learning for Extreme-Label Classification

TLDRLarge language models (LLMs) have made progress in processing long textual sequences, but their performance on complex tasks and understanding of longer sequences still needs improvement. Researchers have introduced the LongICLBench benchmark to evaluate LLMs' performance in long in-context learning for extreme-label classification tasks.

