Daily Dose of Data Science | Avi Chawla | Substack

Two Skills to Fix the Context Gap in Claude Code

Claude Code has two context gaps that CLAUDE.md can't fix: web scraping limitations (web_fetch summarizes instead of returning raw content, curl gets blocked by anti-bot systems) and backend integration issues (fragmented state discovery, unqueryable auth configs, costly retry loops). Two open-source skills address these. Bright Data adds a four-tier scraping fallback with residential IPs, CAPTCHA solving, and pre-built extractors for 40+ platforms returning clean JSON. InsForge acts as a backend context engineering layer, reducing token consumption from 10.4M to 3.7M in a RAG app test with zero errors. A demo shows building a Google Docs clone with real-time editing, Google OAuth, and AI chat from a single prompt by scraping a YouTube tutorial as the build spec. The post also covers Blockify, an open-source RAG preprocessing engine that converts chunks into structured IdeaBlocks with metadata, reducing corpus size by 40x and improving retrieval relevance by 2.3x.

#ai-agents

#crawling

#mcp

#rag

#claude-code

Apr 30•7m read time•From blog.dailydoseofds.com

Table of contents

Two skills to fix the context gap in Claude Code Naive RAG vs Blockify

Comment

Bookmark

Copy

Sort: