Best of CrawlingNovember 2024

  1. 1
    Video
    Avatar of youtubeYouTube·2y

    This is how I scrape 99% websites via LLM

    Explore how advancements in AI, particularly large language models (LLMs), are revolutionizing web scraping in 2024. Learn the best practices for scripting internet data at a large scale, building autonomous web scrapers, and handling complex web interactions. The post demonstrates various kinds of web scraping tasks, including scraping public websites, handling complex web manipulations, and more sophisticated tasks that require reasoning. It also includes details about services like OpenAI, AgentQL, and SpiderCloud that facilitate optimized web content extraction.

  2. 2
    Article
    Avatar of rpythonReal Python·2y

    Introduction to Web Scraping With Python – Real Python

    Web scraping is the process of collecting and parsing raw data from the web using powerful Python tools. This video course offers 12 lessons covering methods such as string methods, regular expressions, and HTML parsing. It includes downloadable resources, subtitles, transcripts, an interactive quiz, and a certificate of completion to help you effectively scrape data from websites.

  3. 3
    Article
    Avatar of communityCommunity Picks·1y

    FlareSolverr/FlareSolverr: Proxy server to bypass Cloudflare protection

    FlareSolverr is a proxy server designed to bypass Cloudflare and DDoS-GUARD protection using Selenium with an undetected Chrome driver. It opens URLs with user parameters, solves Cloudflare challenges, and returns HTML code and cookies. Installation using Docker is recommended due to its dependencies on an external browser. Memory consumption can be high, so it should be used cautiously on low-RAM machines. It supports multiple architectures and provides examples for making requests via Bash, Python, and PowerShell. Users can also create permanent sessions to avoid repeatedly solving challenges.

  4. 4
    Article
    Avatar of infosecwriteupsInfoSec Write-ups·2y

    Dark Web Scraping Using AI : Tools, Techniques, and Challenges

    Learn how to use AI for scraping dark web data by leveraging Python and the Llama model. This guide covers setting up the necessary tools, including Streamlit, LangChain, Selenium, and BeautifulSoup, in a Python virtual environment. It demonstrates a step-by-step process to create a web scraper, retrieve and clean webpage content, and analyze the scraped data using Llama for accurate and relevant insights.

  5. 5
    Article
    Avatar of towardsdevTowards Dev·1y

    “Data-Driven Football Insights: From Web Scraping to Visualization Using Airflow, Dbt Cloud, and AWS Tech Stack”

    This project automates the process of collecting, storing, and analyzing football data using technologies like Apache Airflow, DBT Cloud, and AWS. The workflow includes web scraping data using Python, storing it in Amazon S3, processing it in Amazon Redshift, transforming data with DBT Cloud, and visualizing it through Amazon QuickSight. This integrated approach offers a scalable solution to manage and analyze detailed football statistics efficiently.