Best of Deep Learning — November 2024

1
Video
3Blue1Brown·2y
Large Language Models explained briefly
The post explains large language models (LLMs), how they function, and the complexities behind their training. LLMs predict the next word in a sequence based on probabilities, using vast amounts of text data for training. The introduction of transformers in 2017 allowed for parallel processing of text, enhancing computation efficiency. Pre-training is supplemented by reinforcement learning with human feedback to refine model predictions. The sheer scale of data and computation involved is formidable, taking advantage of specialized hardware like GPUs.
177
3
2
Article
gitconnected·2y
Let’s Build our own GPT Model from Scratch with PyTorch
Learn how to build a basic Generative Pre-trained Transformer (GPT) model from scratch using PyTorch. This tutorial covers auto-regressive models, character-level tokenization, data batching, and training using text in the style of William Shakespeare. It provides a detailed implementation of a bi-gram language model including the use of multi-head attention, forward and training operations, and generating new text tokens.
43
3
Article
daily.dev·2y
Project Sauron: building a two-tower retrieval model for personalized recommendations at daily.dev
Project Sauron by daily.dev uses two-tower retrieval models to deliver personalized content to developers, significantly boosting engagement metrics. The model employs deep learning to process user and post features, creating highly relevant recommendations. Efforts are ongoing to improve the model's accuracy and address concerns such as diversity in recommendations and cold-start user issues.
27
2
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·2y
Categorization of Clustering Algorithms
The post provides an overview of six different types of clustering algorithms beyond the commonly known KMeans. These include centroid-based, connectivity-based, density-based, graph-based, distribution-based, and compression-based algorithms. The visual summary highlights key features and examples like DBSCAN and Gaussian Mixture Models. Additionally, the post promotes an open-source framework called Dynamiq for developing AI applications with AI Agents and LLMs, designed to streamline complex workflows.
22
5
Article
Towards AI·2y
An Introduction to PyTorch versus TensorFlow for Deep Learning
PyTorch and TensorFlow are the most popular frameworks in the deep learning community, providing customizable boilerplates for coding neural network architectures and optimizing computations with GPU resources. Without these frameworks, deep learning models had to be coded from scratch using Numpy, which is more cumbersome and slower without GPU optimization. Familiarity with these frameworks enhances the development of neural networks significantly.
21
2
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·2y
A Hands-on Demo of Autoencoders
Autoencoders are powerful tools in machine learning, useful for tasks such as dimensionality reduction, anomaly detection, data denoising, and detecting multivariate covariate shifts. The post provides a hands-on demo using PyTorch Lightning to train an autoencoder, explaining the key components (encoder and decoder) and their roles. It highlights how to implement and train the model, alongside useful training optimizations like epoch and batch iteration, checkpoint saving, and multi-GPU support. Autoencoders are essential for addressing covariate shift problems in real-world ML models.
17
1
7
Article
Medium·2y
The Softmax Activation Function with Keras
The Softmax activation function is essential for neural networks dealing with multiclass classification. It converts logits, the outputs of the last layer of a neural network, into a discrete probability distribution over target classes. Softmax ensures probabilities are nonnegative and sum to 1. By learning to maximize logit outputs, models improve their accuracy in class predictions. This post explains Softmax's working, its importance in neural networks, and demonstrates its implementation in Keras.
15
8
Article
Community Picks·2y
varungodbole/prompt-tuning-playbook: A playbook for effectively prompting post-trained LLMs
This playbook by Varun Godbole and Ellie Pavlick provides strategies and best practices for effectively prompting post-trained large language models (LLMs). It covers the concepts of pre-training vs. post-training, considerations for creating prompts, the importance of human annotation, and how to iterate on system instructions. The guide also emphasizes the empirical nature of prompt engineering and offers insights into making instructions clear and concise for better model performance.
13
3
9
Article
Machine Learning Mastery·2y
Mastering the Art of Hyperparameter Tuning: Tips, Tricks, and Tools
Machine learning models rely on hyperparameters, which are manually set configurations, to optimize their performance during training. Effective hyperparameter tuning can be challenging due to the vast number of possible combinations. Techniques such as grid search and random search are commonly used to find the best settings efficiently. Additional strategies like cross-validation, early stopping, and leveraging domain knowledge can further enhance the tuning process. Automated methods like Bayesian optimization also offer advanced solutions for balancing exploration and exploitation, making the tuning more intelligent and efficient.
12

See all Deep Learning archives