Best of LLMOctober 2024

  1. 1
    Article
    Avatar of mlmMachine Learning Mastery·2y

    7 LLM Projects to Boost Your Machine Learning Portfolio

    Explore seven interesting projects designed to enhance your machine learning portfolio with large language models (LLMs). From creating a retrieval-based Q&A app and an LLM-powered workflow automation agent to developing a text-to-SQL query generator and an AI-powered documentation generator for codebases, the guide covers essential components and integration requirements. Gain hands-on experience with vector databases, frameworks, and APIs, and build innovative applications that simplify complex tasks.

  2. 2
    Video
    Avatar of freecodecampfreeCodeCamp·1y

    Generative AI for Developers – Comprehensive Course

    This comprehensive course on generative AI covers essential concepts like large language models, data preprocessing, and advanced techniques such as fine-tuning and prompt engineering. Through hands-on projects using tools like Hugging Face, OpenAI, and LangChain, participants will learn to build real-world applications including text summarization and custom chatbots. The course also delves into vector databases, AI pipelines, and deployment techniques using platforms like Google Cloud Vertex AI and AWS Bedrock.

  3. 3
    Article
    Avatar of quastorQuastor Daily·2y

    How Stripe synchronizes time across their distributed system

    Stripe employs both physical and logical clocks to maintain accurate timekeeping in their distributed billing system. Physical clocks are synchronized using protocols like NTP and PTP to avoid drift, while logical clocks order events without depending on real-world time, using methods like Vector and Lamport Clocks. Stripe combines these approaches with hybrid logical clocks to ensure accurate billing and simulate future events for testing purposes. Additional highlights include a deep dive on caching in system design, eBay's use of LLMs for developer productivity, and Eventbrite’s CSRF defense mechanisms.

  4. 4
    Article
    Avatar of langchainLangChain·2y

    Memory for agents

    Memory is pivotal in enhancing the performance and user experience of AI agents by storing information about previous interactions. This concept is application-specific, meaning different agents will remember different things based on their use case. The main types of memory—procedural, semantic, and episodic—mimic human memory. Each type serves a distinct purpose, from recalling how to perform tasks to storing facts or remembering past interactions. Methods to update agent memory include 'in the hot path', 'in the background', and user feedback. LangChain integrates these concepts to improve its functionalities, providing robust tools for developers.

  5. 5
    Video
    Avatar of TechWithTimTech With Tim·2y

    Advanced Multi-Agent AI App Walkthrough (Python, Langflow, Streamlit & More!)

    This post provides a walkthrough on building an advanced multi-agent AI application using Python, Langflow, Streamlit, and other tools. The application can handle multiple tasks with different language models and integrates a full front end for interaction. Key technologies include Langflow for low-code AI flows, Streamlit for front-end development, and Astrab for a vector database to implement retrieval augmented generation features. The tutorial offers a comprehensive guide on setting up the application, integrating AI features, customizing flow, and connecting with different tools and APIs.

  6. 6
    Article
    Avatar of developingdevThe Developing Dev·2y

    The Case Against Documentation

    While documentation is often seen as essential, it comes with several drawbacks such as high maintenance costs and the risk of becoming outdated. Documentation should be evaluated for its return on investment (ROI). Certain types of documentation, like high-level system diagrams and popular API documentation, can be more beneficial. Inline documentation is preferred for its ease of maintenance. Advances in large language models (LLMs) may soon provide dynamic, up-to-date explanations directly from the codebase.

  7. 7
    Article
    Avatar of neo4jneo4j·2y

    Turn Your CSVs Into Graphs Using LLMs

    The post explores how large language models (LLMs) can assist in creating data models from CSV files for use in Neo4j, emphasizing iterative approaches to avoid data complexity distractions. It discusses using LangChain, prompt engineering for generating consistent outputs, and converting CSV data into Cypher statements for Neo4j. The post also highlights important considerations like adding unique identifiers and creating data import scripts, offering a step-by-step methodology to streamline the process.

  8. 8
    Article
    Avatar of phProduct Hunt·2y

    DevKit 3.0 - Now with VSCode Extension & the latest LLMs

    DevKit 3.0 includes a new VSCode extension and incorporates the latest large language models (LLMs). It was highly rated and first launched in January 2022, with its new features being highlighted in October 2024.

  9. 9
    Article
    Avatar of firebase-developersFirebase Developers·2y

    Persisting LLM chat history to Firestore

    Learn how to persist chat history for LLM powered applications using Firestore. The post walks through maintaining in-memory chat history with LangChain's RunnableWithMessageHistory and transitioning to persistent storage using Firestore. Users will understand how to enhance their chat applications by saving session history, making conversations more context-aware and meaningful.

  10. 10
    Article
    Avatar of neo4jneo4j·2y

    New GraphAcademy Course: Building Knowledge Graphs With LLMs

    Discover how to build and query knowledge graphs using large language models (LLMs) in the new GraphAcademy course. Learn to convert unstructured data into structured, insightful graphs using Neo4j LLM Graph Builder and Python. This hands-on course covers setting schemas, interpreting results, and developing retrievers, requiring a solid understanding of Neo4j, LLM integration, and Cypher.

  11. 11
    Article
    Avatar of ds_centralData Science Central·1y

    LLM Chunking, Indexing, Scoring and Agents, in a Nutshell

    The post explains key concepts in AI and LLM terminology: chunking, indexing, agents, and relevancy scores. Chunking involves splitting a text corpus into small entities, indexing assigns identifiers for efficient retrieval, agents refer to the type of information retrieved by user prompts, and relevancy scores help rank the most relevant text entities in response to queries. The implementation examples are based on the author's experience with the xLLM system, tailored towards enhancing performance in enterprise contexts.

  12. 12
    Article
    Avatar of hnHacker News·2y

    microsoft/BitNet: Official inference framework for 1-bit LLMs

    bitnet.cpp is a new framework designed for the fast and efficient inference of 1-bit LLMs like BitNet b1.58. It supports CPU inference with plans for NPU and GPU support. Benchmark results show significant speedups and energy reductions on ARM and x86 CPUs. This framework can run large models like the 100B BitNet b1.58 model on a single CPU, making it suitable for local device execution. Installation and setup instructions are provided for various platforms.

  13. 13
    Article
    Avatar of tdsTowards Data Science·2y

    Scaling RAG from POC to Production

    Retrieval Augmented Generation (RAG) is becoming a key architecture for large-scale applications of AI, balancing the capabilities of large language models with the accuracy of indexed data. Scaling from a proof of concept (POC) to production presents multiple challenges, including performance, data management, and risk mitigation. Addressing these challenges involves architectural components such as scalable vector databases, caching mechanisms, advanced search techniques, and a Responsible AI layer. Strategic planning and integration into existing workflows are crucial for successful scaling.

  14. 14
    Article
    Avatar of tdsTowards Data Science·2y

    How to Choose the Architecture for Your GenAI Application

    Choosing the right architecture for a GenAI application involves balancing creativity and risk. The guide offers a framework with eight architectural patterns: generating each time, response/prompt caching, pre-generated templates, small language models, assembled reformat, ML selection of template, fine-tuning, and implementing guardrails. These approaches help manage cost, latency, and risk while meeting specific use case requirements.

  15. 15
    Article
    Avatar of codemotionCodemotion·2y

    Generative AI Prompt Patterns for Software Engineering

    Developers are transitioning to roles like Prompt Engineers and Code Reviewers as large language models (LLMs) enhance their coding capabilities. Integrating AI tools such as AWS Bedrock for efficient and scalable solutions is crucial. Key generative AI prompt patterns include Full-Context Code Analysis, LLM Method Replacement, Context Reducer, and Comments Replacement, which improve code quality and streamline workflows. Collaboration between human expertise and AI is essential for advanced, adaptable software development.

  16. 16
    Article
    Avatar of itsfossIt's Foss·2y

    I Ran 9 Popular LLMs on Raspberry Pi 5; Here's What I Found

    The Raspberry Pi 5, equipped with a 4-core Cortex-A76 CPU, up to 8GB of RAM, and a VideoCore VI GPU, was used to test various large language models (LLMs) for their efficiency and performance. Models tested include Phi-3.5B, Gemma2-2B, Qwen2.5-3B, Mistral-7B, and Llama 2-7B, among others. Key metrics were inference time, accuracy, and resource utilization. Notably, models under 7 billion parameters generally performed well, with specific strengths found in different LLMs such as Qwen2.5's speed and Gemma2's efficiency. The results highlight the Pi's capability to handle AI tasks given the proper model selection.

  17. 17
    Article
    Avatar of hnHacker News·2y

    homebrewltd/ichigo: Llama3.1 learns to Listen

    Ichigo, previously known as llama3-s, is a custom-built early-fusion speech model with improved multiturn capabilities and the ability to refuse inaudible queries. This model was rebranded and continues to evolve with cleaner data and enhanced functionality. It leverages techniques inspired by Meta's Chameleon paper and incorporates noise-synthetic data for better user experience. The project is open for collaboration and aims to advance text-based LLMs to have native listening capabilities.

  18. 18
    Article
    Avatar of nvidiadevNVIDIA Developer·2y

    An Introduction to Model Merging for LLMs

    Model merging combines the weights of multiple customized LLMs to optimize resource use and enhance model performance. Techniques such as Model Soup, SLERP, Task Arithmetic, TIES-Merging, and DARE are explored to provide various strategies for effective model merging. This approach reduces experimentation waste and offers cost-effective alternatives for training, making it a valuable method for increasing the utility of LLMs.

  19. 19
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    Which Tools to Use for LLM-Powered Applications: LangChain vs LlamaIndex vs NIM

    Considering building an application with a Large Language Model? LangChain, LlamaIndex, and NVIDIA NIM offer unique features to help you. LangChain is versatile for developing applications with data-aware and agent-driven components. LlamaIndex excels in data indexing and retrieval, optimizing how large language models access and process information. NVIDIA NIM focuses on high-performance model deployment, offering scalable and secure solutions. Each tool's strengths cater to different aspects of LLM application development, making your choice dependent on your specific needs, be it flexible integration, efficient data handling, or fast and secure deployment.

  20. 20
    Article
    Avatar of nvidiadevNVIDIA Developer·2y

    Evaluating Medical RAG with NVIDIA AI Endpoints and Ragas

    Retrieval-augmented generation (RAG) is revolutionizing the medical field by combining large language models with external knowledge retrieval to provide accurate and contextually relevant information. This hybrid approach is particularly beneficial in drug discovery and clinical trial screening. However, evaluating RAG systems for medical applications poses unique challenges, including scalability, lack of benchmarks, and the need for domain-specific metrics. The post discusses using LangChain NVIDIA AI endpoints and the Ragas evaluation framework to address these challenges, with a detailed tutorial on setting up and evaluating medical RAG systems using a synthetic dataset.

  21. 21
    Article
    Avatar of pytorchPyTorch·1y

    Deploying LLMs with TorchServe + vLLM

    The vLLM engine is optimized for executing large language models and can be easily deployed using the vllm serve command. For production environments, it's beneficial to pair vLLM with TorchServe, which offers essential features like custom metrics and model versioning. The post outlines steps to deploy the Llama-3.1-70B model using a custom Docker image, showcasing configurations for efficient GPU utilization and asynchronous request handling. Key functionalities of vLLM, such as PagedAttention and continuous batching, are highlighted along with the integration process with TorchServe.

  22. 22
    Article
    Avatar of medium_jsMedium·2y

    Building ReAct Agents from Scratch: A Hands-On Guide using Gemini

    ReAct (Reason + Act) is a framework for building AI agents that integrate reasoning and task execution. Using large language models like Gemini, ReAct agents dynamically solve problems by choosing appropriate tools and iteratively working toward solutions. This guide provides a detailed explanation of the framework and a step-by-step process for building a ReAct agent from scratch. It covers tool interaction, iterative problem-solving, and the benefits of combining reasoning with action within an LLM-centric framework.

  23. 23
    Article
    Avatar of bigdataboutiqueBigData Boutique blog·2y

    Innovating Search Experience with Amazon OpenSearch and Amazon Bedrock

    Delivering an exceptional search experience is vital in digital marketplaces. This post discusses how to enhance search functionality using Amazon OpenSearch and Amazon Bedrock, focusing on overcoming synonym deficiencies with large language models (LLMs). Practical steps and approaches, including naive synonym expansion, semantic search with embeddings, query-time synonym expansion, and neural sparse retrieval, are shared, providing insight into their effectiveness in improving search accuracy and user experience.

  24. 24
    Article
    Avatar of communityCommunity Picks·2y

    Prompt Engineering

    Unlock the power of advanced prompt engineering techniques to improve AI model interactions and performance. Learn about various methods like Chain of Questions and Chain of Numerical Reasoning, and discover how fine-tuning and Retrieval-Augmented Generation (RAG) enhance AI capabilities. Dive into practical examples and interactive courses to master prompt crafting for more accurate and relevant AI responses.

  25. 25
    Article
    Avatar of tdsTowards Data Science·2y

    Efficient Document Chunking Using LLMs: Unlocking Knowledge One Block at a Time

    Learn how to use Large Language Models (LLMs) like OpenAI's GPT-4o for efficient document chunking based on the concept of 'ideas.' The goal is to create blocks of text where each expresses a unified concept without overlapping. This involves parsing a document into manageable token sizes and then dividing these into coherent chunks. Key considerations include handling token limits and ensuring overlapping content is appropriately managed. The post provides practical steps and code examples to implement this method.