RecurrentGemma is an open-weights Language Model by Google DeepMind based on the novel Griffin architecture. It achieves fast inference when generating long sequences by using a mixture of local attention and linear recurrences. The Flax implementation is recommended for most users. The repository includes the model implementation, examples for sampling and fine-tuning, and dependency management with Poetry.
Sort: