GitHub - mattmireles/gemma-tuner-multimodal: Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.

An open-source toolkit for fine-tuning Google's Gemma 4 and 3n models on text, images, and audio using LoRA on Apple Silicon Macs via PyTorch MPS. It supports local CSV datasets as well as streaming from GCS and BigQuery, making it possible to train on large datasets without copying terabytes locally. Key differentiators include being the only Apple-Silicon-native path for audio+text LoRA fine-tuning, no NVIDIA GPU required, and a guided CLI wizard for setup. Supported modalities include text-only (instruction/completion), image+text (captioning/VQA), and audio+text. The toolkit uses Hugging Face checkpoints with PEFT LoRA and exports merged HF/SafeTensors weights.

#python

#deep-learning

#gemma

#lora

Apr 07•12m read time•From github.com

Table of contents

LoRA for Gemma 4 & 3n — why not just use…?What you can build with this Supported models Architecture (what actually calls what)Requirements Installation CLI cheat sheet Text-only fine-tuning Image fine-tuning Gemma 3n / Gemma 4 on Apple Silicon Data: CSVs, GCS, BigQuery Training visualizer (optional)NVIDIA Granary & streaming Apple Silicon knobs CI & tests Experiment index Troubleshooting Contributing Acknowledgments License

Comment

Bookmark

Copy

Sort: