MaxText is a high performance and scalable LLM written in Python/Jax. It supports TPUs and GPUs for training and inference. MaxText aims to be a launching off point for ambitious LLM projects. It has achieved high-performance training and can scale to tens of thousands of chips. MaxText is heavily inspired by MinGPT/NanoGPT and

7m read timeFrom github.com
Post cover image
Table of contents
OverviewTable of ContentsGetting StartedRuntime Performance ResultsComparison to AlternativesFeatures and Diagnostics

Sort: