Best of Data Processing — September 2024

1
Article
Community Picks·2y
Building an Advanced RAG System With Self-Querying Retrieval
Learn how to build an advanced Retrieval Augmented Generation (RAG) system that leverages self-querying retrieval to improve search relevance. This tutorial covers extracting metadata filters from natural language queries, combining metadata filtering with vector search, and generating structured outputs using LLMs. The guide focuses on developing an investment assistant to answer financial questions using MongoDB as the vector store and LangGraph for orchestration.
37
2
Article
MDN Blog·2y
Efficient data handling with the Streams API
The Streams API allows efficient data handling in JavaScript by enabling processing of data as it arrives, making it suitable for continuous data sources and real-time applications. Key concepts include chunks, backpressure, and piping, and the API includes abstractions like ReadableStream, WritableStream, and TransformStream. The post provides a practical example of building a Node.js application to transform data streams and explores various real-world use cases such as video streaming, data visualization, and file transfer systems.
25
3
Article
Community Picks·2y
AI engineering requires no academia or ML – just problem-solving
AI engineering doesn't require academia or machine learning expertise. Tejas Kumar, an AI DevRel Engineer at DataStax, emphasizes that it involves applying AI to solve problems, often through AI API requests. Key techniques include fine-tuning transfer learning and optimizing model architecture to reduce costs. To mitigate AI hallucinations, Kumar recommends Retrieval-Augmented Generation (RAG), and to ensure privacy, running models locally using tools like LLVM or llama.cpp. More insights will be shared at the Shift Conference in Zadar.
17
1
4
Article
The New Stack·2y
Boost LLM Results: When to Use Knowledge Graph RAG
Retrieval-augmented generation (RAG) systems sometimes fail to go deep enough into document sets, leading to shallow or incorrect responses. Using knowledge graphs can enhance RAG systems by connecting related documents more effectively. This method is especially useful for legal documents, technical documentation, research publications, and interconnected websites. Knowledge graphs use well-defined connections like HTML links, specialized keywords, and document structures to improve information retrieval and accuracy.
17
5
Article
Hacker News·2y
getzep/graphiti: Build and query dynamic, temporally-aware Knowledge Graphs
Graphiti builds dynamic, temporally aware Knowledge Graphs that manage evolving relationships between entities over time. It supports the ingestion of both unstructured and structured data and offers hybrid search functionality combining semantic and full-text search. Designed for scalability, Graphiti can handle large datasets and is tailored for applications in sales, customer service, health, and finance. Essential requirements include Python 3.10+, Neo4j 5.21+, and an OpenAI API key for LLM inference and embedding.
15
6
Article
Hacker News·2y
feldera/feldera: The Feldera Incremental Computation Engine
Feldera is a high-performance incremental computation engine capable of incrementally evaluating arbitrary SQL programs. It efficiently processes inserts, updates, and deletes without recomputing older data and supports both live and historical data queries. The engine offers fast out-of-the-box performance, handles large datasets, guarantees consistency, and connects to various data sources. It's suitable for complex analytical tasks and feature engineering pipelines.
10

See all Data Processing archives