An AI-powered multimodal project employs various transformer models to generate chords, beats, lyrics, melody, and tabs for any song from YouTube videos. The system includes models like U-Net for audio separation, Pitch-Net for melody tracking, Beat-Net for tempo tracking, and Chord-Net for chord recognition. It supports multiple languages and allows for editable sheet music creation. Utilizing a combination of STFT, MFCC, and chroma features, it ensures better generalization with minimal training data.
Sort: