unknown

SpidR is a self-supervised speech representation model that learns linguistic units from unlabeled audio using masked prediction, self-distillation, and online clustering. The model can be pretrained in 15-24 hours on 16 GPUs and outperforms previous methods on language modeling tasks. The repository provides pretrained checkpoints (SpidR and DinoSR on LibriSpeech 960h), training code, and utilities for SLURM cluster deployment. Models require audio standardization and are available via PyPI or torch.hub.

facebookresearch/spidr: This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision". SpidR is a self-supervis

Become a cool developer with Dev Source! Your ultimate dev source for resources, insights, and a thriving community to learn, grow, and stay ahead every single day!