Best of Deep LearningJune 2024

  1. 1
    Article
    Avatar of medium_jsMedium·2y

    Building an AI Text-to-Video Model from Scratch Using Python

    This post discusses building an AI text-to-video model from scratch using Python. It covers the GAN architecture, understanding GANs, the training process, and generating AI videos based on text prompts.

  2. 2
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    Practical Guide to Linear Algebra in Data Science and AI

    Linear algebra is a practical tool that can be used to solve real-world problems in data science and AI. It is applied across various industries, and understanding its core concepts is essential for working with machine learning, deep learning, computer vision, and generative AI. A linear algebra roadmap for 2024 is provided to guide your learning journey, and there are numerous resources available to help you master linear algebra.

  3. 3
    Article
    Avatar of mlmMachine Learning Mastery·2y

    5 Free YouTube Channels Dedicated to Machine Learning Education

    Discover five YouTube channels that offer free, high-quality tutorials on machine learning, data science, and programming. These channels—StatQuest with Josh Starmer, Codebasics, freeCodeCamp, Sentdex, and Data School—provide content ranging from beginner to advanced levels. Topics covered include machine learning algorithms, Python programming, statistical analysis, and deep learning. The post emphasizes the importance of hands-on practice for effective learning.

  4. 4
    Video
    Avatar of communityCommunity Picks·2y

    Let's reproduce GPT-2 (124M)

    This post discusses the process of reproducing the GPT-2 (124M) model, including loading the weights, implementing the model from scratch, and generating text. It also introduces the Tiny Shakespeare dataset and shows how to use it for training. The author demonstrates how to calculate loss and perform optimization using PyTorch.

  5. 5
    Article
    Avatar of tdsTowards Data Science·2y

    Understanding Transformers

    Transformers, introduced in 2017, revolutionized sequence transduction models by relying entirely on the attention mechanism and allowing for parallel processing, which significantly improved training efficiency and long-term dependency handling compared to previous models like RNNs, LSTMs, and CNNs. Key components of a transformer include tokenization, embedding, the attention mechanism, the encoder, and the decoder. GPT models, which stem from transformers, focus on generative tasks and omit the encoder stack, demonstrating high effectiveness in tasks like generating text after being pre-trained on large corpora of text.

  6. 6
    Article
    Avatar of newstackThe New Stack·2y

    RAG vs. Fine-Tuning Models: What’s the Right Approach?

    Retrieval-Augmented Generation (RAG) retrieves relevant documents to generate contextually accurate responses, ideal for dynamic environments like enterprise search and customer support. Fine-tuning involves training a model on specific datasets for specialized tasks, ensuring consistency and improved performance for targeted applications. Choosing between RAG and fine-tuning depends on the need for adaptability or task-specific expertise.

  7. 7
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    4 Strategies for Multi-GPU Training

    This post discusses four strategies for multi-GPU training: model parallelism, tensor parallelism, data parallelism, and pipeline parallelism.

  8. 8
    Article
    Avatar of medium_jsMedium·2y

    Meet HUSKY: A New Agent Optimized for Multi-Step Reasoning

    HUSKY is a new open-source language agent developed by Meta AI, Allen AI, and the University of Washington. It is designed to handle complex tasks involving numerical, tabular, and knowledge-based reasoning by working in stages: generating the next action and executing it using expert models. HUSKY iterates between generating actions and executing them until a task is solved. It was trained and evaluated on a variety of datasets and has shown competitive performance against existing frontier models like GPT-4.

  9. 9
    Article
    Avatar of hnHacker News·2y

    From Scratch - Generative Adversarial Networks

    Generative Adversarial Networks (GANs) are a method in generative AI that aims to train a Generator (G) model and a Discriminator (D) model simultaneously. The G model learns to generate samples from a given distribution, while the D model learns to distinguish between real and generated samples. The training regime involves updating the D model to maximize the probability of correct classification, and updating the G model to maximize the probability of the D model making a mistake. The Discriminator model has 4 linear layers with dropout and ReLU activations.

  10. 10
    Article
    Avatar of gopenaiGoPenAI·2y

    Understanding Kolmogorov-Arnold Networks (KANs) and Their Application in Variational Autoencoders

    Kolmogorov-Arnold Networks (KANs) are based on a mathematical theorem that allows any continuous function of multiple variables to be represented as a combination of one-dimensional functions. These networks could revolutionize neural network design, particularly for Variational Autoencoders (VAEs), by improving efficiency, interpretability, and flexibility. Key methods involve using splines and piecewise polynomials. Although the post features a standard VAE implementation, it discusses how KAN layers could be incorporated, highlighting potential future research directions in KAN-based models.

  11. 11
    Article
    Avatar of medium_jsMedium·2y

    Want to Learn Quantization in The Large Language Model?

    This post provides a detailed guide on quantization for large language models, explaining its benefits, and demonstrating how to apply it using PyTorch. It covers the definition and necessity of quantization, various methods like asymmetric and symmetric quantization, and includes step-by-step coding instructions for implementing quantization and de-quantization on model weight parameters.

  12. 12
    Article
    Avatar of mlnewsMachine Learning News·2y

    Meet Tsinghua University’s GLM-4-9B-Chat-1M: An Outstanding Language Model Challenging GPT 4V, Gemini Pro (on vision), Mistral and Llama 3 8B

    Tsinghua University's GLM-4 9B is a powerful language model that outperforms GPT-4 and Gemini. It supports multi-round dialogue, code execution, web browsing, and more. GLM-4 9B has a versatile architecture, excels in vision tasks, and surpasses existing models in overall accuracy. It presents opportunities in natural language processing, computer vision, and code generation. The release of GLM-4 9B marks a milestone in language models and sets a new benchmark for open-source models.

  13. 13
    Article
    Avatar of taiTowards AI·2y

    TensorFlow: The Hidden Gem of Data Science

    TensorFlow is an open-source machine learning framework that empowers data scientists to build, train, and evaluate sophisticated machine learning models. It offers advantages such as production-level scalability, interoperable graph exporting, and support for low-level operations across multiple acceleration platforms. Top companies like Airbnb, Google, and Intel are using TensorFlow. However, TensorFlow is sometimes overlooked due to its perceived complexity, competition from other frameworks, and the preference for higher-level APIs.

  14. 14
    Article
    Avatar of medium_jsMedium·2y

    The Math Behind KAN — Kolmogorov-Arnold Networks

    Discover the math behind Kolmogorov-Arnold Networks (KANs), a revolutionary alternative to Multi-Layer Perceptrons (MLPs) in the world of AI and neural networks. Learn about the limitations of MLPs, how KANs leverage the Kolmogorov-Arnold representation theorem, and the advantages of using KANs in terms of accuracy, interpretability, and scalability.

  15. 15
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    An Intuitive Guide to Non-Linearity of ReLU

    The post explains the non-linearity of ReLU activation function in neural networks and how it can capture non-linear curves. It also emphasizes the need for multiple ReLU units to achieve satisfactory results.

  16. 16
    Article
    Avatar of mlnewsMachine Learning News·2y

    Knock Knock: A New Python Library to Get a Notification when Your Training is Complete with just Two Additional Lines of Code

    KnockKnock is a Python library that provides automated notifications for deep learning model training completions and crashes. With just two additional lines of code, users can receive real-time alerts and improve the effectiveness and efficiency of their training process.

  17. 17
    Article
    Avatar of gopenaiGoPenAI·2y

    Yoga-LLM, Part 2: Instruction Fine-tuning

    This is a tutorial on fine-tuning a large language model (LLM) specifically for answering questions on Yoga. The process involves instruction tuning using methods like LoRA for parameter-efficient fine-tuning. The post discusses the selection of tools and frameworks like HuggingFace, Unsloth, and LitGPT for the task. The implementation steps are detailed, including preparation of data and setting training parameters. Inference methods are also covered, using a trained Gemma 2B model for demonstrating the fine-tuning process.

  18. 18
    Article
    Avatar of gopenaiGoPenAI·2y

    Supervised fine tuning (SFT) of Microsoft Phi2 for Text2SQL Task (Part II)

    This article discusses the supervised fine-tuning of the Microsoft Phi2 model for the Text2SQL task. It covers data preparation, loading the dataset, preparing input, loading the pre-trained model, setting up a data collator, model training, model evaluation, and concludes with potential areas for improvement.