Best of Data ProcessingSeptember 2024

  1. 1
    Article
    Avatar of communityCommunity Picks·2y

    Building an Advanced RAG System With Self-Querying Retrieval

    Learn how to build an advanced Retrieval Augmented Generation (RAG) system that leverages self-querying retrieval to improve search relevance. This tutorial covers extracting metadata filters from natural language queries, combining metadata filtering with vector search, and generating structured outputs using LLMs. The guide focuses on developing an investment assistant to answer financial questions using MongoDB as the vector store and LangGraph for orchestration.

  2. 2
    Article
    Avatar of mdnblogMDN Blog·2y

    Efficient data handling with the Streams API

    The Streams API allows efficient data handling in JavaScript by enabling processing of data as it arrives, making it suitable for continuous data sources and real-time applications. Key concepts include chunks, backpressure, and piping, and the API includes abstractions like ReadableStream, WritableStream, and TransformStream. The post provides a practical example of building a Node.js application to transform data streams and explores various real-world use cases such as video streaming, data visualization, and file transfer systems.

  3. 3
    Article
    Avatar of communityCommunity Picks·2y

    AI engineering requires no academia or ML – just problem-solving

    AI engineering doesn't require academia or machine learning expertise. Tejas Kumar, an AI DevRel Engineer at DataStax, emphasizes that it involves applying AI to solve problems, often through AI API requests. Key techniques include fine-tuning transfer learning and optimizing model architecture to reduce costs. To mitigate AI hallucinations, Kumar recommends Retrieval-Augmented Generation (RAG), and to ensure privacy, running models locally using tools like LLVM or llama.cpp. More insights will be shared at the Shift Conference in Zadar.

  4. 4
    Article
    Avatar of newstackThe New Stack·2y

    Boost LLM Results: When to Use Knowledge Graph RAG

    Retrieval-augmented generation (RAG) systems sometimes fail to go deep enough into document sets, leading to shallow or incorrect responses. Using knowledge graphs can enhance RAG systems by connecting related documents more effectively. This method is especially useful for legal documents, technical documentation, research publications, and interconnected websites. Knowledge graphs use well-defined connections like HTML links, specialized keywords, and document structures to improve information retrieval and accuracy.

  5. 5
    Article
    Avatar of hnHacker News·2y

    getzep/graphiti: Build and query dynamic, temporally-aware Knowledge Graphs

    Graphiti builds dynamic, temporally aware Knowledge Graphs that manage evolving relationships between entities over time. It supports the ingestion of both unstructured and structured data and offers hybrid search functionality combining semantic and full-text search. Designed for scalability, Graphiti can handle large datasets and is tailored for applications in sales, customer service, health, and finance. Essential requirements include Python 3.10+, Neo4j 5.21+, and an OpenAI API key for LLM inference and embedding.

  6. 6
    Article
    Avatar of hnHacker News·2y

    feldera/feldera: The Feldera Incremental Computation Engine

    Feldera is a high-performance incremental computation engine capable of incrementally evaluating arbitrary SQL programs. It efficiently processes inserts, updates, and deletes without recomputing older data and supports both live and historical data queries. The engine offers fast out-of-the-box performance, handles large datasets, guarantees consistency, and connects to various data sources. It's suitable for complex analytical tasks and feature engineering pipelines.