Meta introduces SAM Audio, a unified multimodal model that isolates sounds from complex audio mixtures using text, visual, or temporal prompts. Built on the Perception Encoder Audiovisual (PE-AV), it achieves state-of-the-art performance across speech, music, and general sound separation. The release includes SAM Audio-Bench
Sort: