Source Code & Drawings: https://github.com/Andreaswt/audio-cnn
Discord & More: https://andreastrolle.com
Modal: https://modal.com/andreas

Hi 🤙 In this video, you'll learn to train and deploy an audio classification CNN from scratch with PyTorch. I'll cover all the required concepts, so no prior experience is needed. The model will classify sounds like a dog barking or birds chirping from an audio file. You'll work with advanced techniques like Residual Networks (ResNet), data mixing, and Mel Spectrograms to build a robust training pipeline. Afterwards, we'll build a dashboard using Next.js and React to upload audio and visualize the model's internal layers to see what it "sees". The project uses Python, PyTorch, Next.js, React, and Tailwind, based on the T3 Stack. You can build along with me from start to finish. All services used are 100% free for you to use.

Features
🧠 Deep Audio CNN for sound classification
🧱 ResNet-style architecture with residual blocks
🎼 Mel Spectrogram audio-to-image conversion
🎛️ Data augmentation with Mixup & Time/Frequency Masking
⚡ Serverless GPU inference with Modal
📊 Interactive Next.js & React dashboard
👁️ Visualization of internal CNN feature maps
📈 Real-time audio classification with confidence scores
🌊 Waveform and Spectrogram visualization
🚀 FastAPI inference endpoint
⚙️ Optimized training with AdamW & OneCycleLR scheduler
📈 TensorBoard integration for training analysis
🛡️ Batch Normalization for stable & fast training
🎨 Modern UI with Tailwind CSS & Shadcn UI
✅ Pydantic data validation for robust API requests

Dataset: https://github.com/karolpiczak/ESC-50

Timestamps
0:00 Demo
2:08 Neural Networks
28:21 CNNs
01:17:28 CNN hyperparameters
01:28:43 Audio in CNNs
01:39:06 Model architecture
01:54:36 Implementing network
02:19:46 Training program
03:52:12 Training
03:56:09 Tensorboard
04:00:10 Inference endpoint
04:59:22 Frontend
06:27:24 Visualization discussion
06:35:26 Results
06:36:27 Exercises

Reinier

Next.js

Community Picks is a section on daily.dev where our community members share the most interesting and valuable content they've discovered online. From insightful articles to handy tools, every post is a gem curated by our dedicated coomunity. To contribute to Community Picks, you need to have at least 250 reputation points, ensuring that only active and trusted members can share their finds.

Community Picks

A comprehensive tutorial covering the complete process of building a convolutional neural network from scratch using PyTorch to classify audio files. The guide starts with neural network fundamentals including neurons, activation functions, and training concepts like forward pass, backpropagation, and loss optimization. It then dives deep into CNN theory, explaining kernels, feature maps, spatial information preservation, and how CNNs extract hierarchical features from images. The practical implementation includes converting audio to spectrograms, training on serverless GPUs with Modal, achieving 83% accuracy, and building a Next.js frontend to visualize the model's convolutional layer outputs and feature extraction process.

Train a Convolutional Neural Network from Scratch: PyTorch, Next.js, React, Tailwind, Python (2025)