Best of CrawlingMay 2024

  1. 1
    Article
    Avatar of planetpythonPlanet Python·2y

    Web scraping as an API service

    This post discusses the use of web scraping as an API service in systems-to-systems integrations. It highlights why web scraping should be avoided in backend integrations and introduces Playwright as a tool for generating Python code for web scraping.

  2. 2
    Article
    Avatar of mlnewsMachine Learning News·2y

    ScrapeGraphAI: A Web Scraping Python Library that Uses LLMs to Create Scraping Pipelines for Websites, Documents, and XML Files

    ScrapeGraphAI is an advanced web scraping library that simplifies data collection using large language models (LLMs) and a unique direct graph logic. It minimizes the time and technical skills required for web scraping projects, allowing users to focus more on analyzing the extracted data.

  3. 3
    Article
    Avatar of communityCommunity Picks·2y

    Scrapy vs. Crawlee

    A comparison between Scrapy and Crawlee, two web scraping libraries. Crawlee has features that Scrapy lacks, such as autoscaling, headless browsing, and working with JavaScript rendered websites. Crawlee supports JavaScript and TypeScript, while Scrapy only supports Python. Both libraries have their own advantages and the choice depends on individual needs.

  4. 4
    Article
    Avatar of communityCommunity Picks·2y

    VinciGit00/Scrapegraph-ai: Python scraper based on AI

    ScrapeGraphAI is a web scraping Python library that uses AI to create scraping pipelines for websites, documents, and XML files. It provides easy extraction of desired information from various sources.