University of Zurich researchers are releasing Ranke-4B, a family of 4 billion parameter language models trained from scratch on 80B tokens of historical data with knowledge cutoffs at specific dates (1913, 1929, 1933, 1939, 1946). Unlike modern LLMs prompted to roleplay historical periods, these models are truly time-locked—they cannot access information beyond their cutoff dates because it doesn't exist in their training data. The models serve as research tools for humanities and social sciences, enabling exploration of historical discourse patterns and worldviews without hindsight contamination. The project will release pretraining data, checkpoints, and code publicly, with a responsible access framework for handling sensitive historical content.

8m read timeFrom github.com
Post cover image
Table of contents
AnnouncementsProject OverviewWhat are History LLMs?Citation

Sort: