Best of CrawlingJanuary 2025

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    5 Agentic AI Design Patterns

    Explore five agentic AI design patterns that enhance the effectiveness of AI agents through reflection, tool use, reason and act, planning, and multi-agent approaches. Learn how Firecrawl Extract facilitates web scraping by using simple English prompts to extract clean, structured data. Discover additional resources on machine learning techniques and data science provided by Daily Dose of Data Science.

  2. 2
    Video
    Avatar of TechWithTimTech With Tim·1y

    Web Scraping 101: A Million Dollar SaaS Idea

    The post explores a web scraping SaaS idea with high potential, targeting influencer marketing inefficiencies. It outlines a project to build a system that identifies video sponsorships on YouTube, including detailed steps for data collection and analysis using Bright Data's web scraping API. The project aims to help companies find suitable influencers and track competitors, while addressing challenges like scaling data collection and handling API token limits.

  3. 3
    Article
    Avatar of communityCommunity Picks·1y

    lightpanda-io/browser: The open-source browser made for headless usage

    Lightpanda is an open-source headless browser designed for efficient web automation, AI agents, LLM training, scraping, and testing. It features a significantly lower memory footprint and faster execution times compared to Chrome. The browser supports Javascript execution and web APIs, is compatible with tools like Playwright and Puppeteer, and is built using the Zig programming language. Installation and configuration instructions are provided for both Linux and MacOS.

  4. 4
    Video
    Avatar of oxylabsOxylabs·1y

    Building a Real Estate Monitoring System

    Alex discusses building a real estate monitoring system, focusing on the types of data that can be extracted from real estate websites, the use cases for the extracted data including price comparisons and market trends, and the challenges faced such as getting fresh data, overcoming anti-bot measures, and scaling the system. He then advises using Oxylabs' Real Estate Scraper API to handle these challenges efficiently.

  5. 5
    Article
    Avatar of swizecswizec.com·1y

    Server-side React that renders as png, pdf, or interactive webapp

    React can be rendered as PNG, PDF, static HTML, or an interactive webapp by simply changing the URL. This process involves server-side rendering (SSR) with components supporting css-in-js and data loading via useQuery. Different formats of rendering are controlled by query parameters in the URL, using TanStack Start and TanStack Router alongside Puppeteer. The approach aims for sophisticated rendering with minimal effort from product engineers.

  6. 6
    Article
    Avatar of planetpythonPlanet Python·1y

    Create Project-Less Python Utilities with uv and Inline Script Metadata

    Learn how to create and run Python utility scripts with inline metadata using uv. This method avoids the need for a full Python project and simplifies dependency management by embedding metadata directly into the script. The post provides an example script for searching and fetching details from the Google Books API, along with additional examples for summarizing YouTube videos and scraping articles.

  7. 7
    Video
    Avatar of aaronjackAaron Jack·1y

    How to Build Powerful Web Scrapers with AI - 3 Steps

    Combining AI with web scraping has enormous potential, providing a way to create applications and services by extracting and transforming data from the web efficiently. The post details the challenges of traditional web scraping, such as brittle scripts and diverse HTML structures, and explains how AI can standardize this process. It includes step-by-step methods using tools like Puppeteer, Selenium, and proxies to avoid detection and manage large-scale scraping. Example applications and a brief overview of reducing costs by running models locally are also discussed.