Best of CrawlingSeptember 2024

  1. 1
    Article
    Avatar of hnHacker News·2y

    Using GPT-4o for web scraping

    A developer experimented with using GPT-4o's structured outputs for web scraping, creating an AI-assisted web scraper. While the model performed well with simple and complex tables, it struggled with combined rows and generating XPaths. Cost is a concern due to the model's character volume requirements. Future improvements could include better UX through capturing browser events and further refining HTML data cleanup.

  2. 2
    Video
    Avatar of beabetterdevBe A Better Dev·2y

    The BEST Project Idea to Learn AWS

    This post shares an engaging project idea to learn AWS hands-on, focusing on essential services like EC2, RDS, S3, IAM, VPC, and CloudWatch. The project involves building a web scraper to extract data from websites, providing practical skills that are valuable for a career in cloud engineering. The author emphasizes the importance of hands-on experience over certifications for solidifying AWS concepts.

  3. 3
    Article
    Avatar of uxplanetUX Planet·2y

    Automate 90% of Your Design Job Search with This AI Workflow

    Discover an AI-driven workflow to automate your job search process. This system uses tools like Bardeen for scraping job listings and Google Sheets with App Scripts to filter and organize them. It then leverages GPT-40 mini to analyze job descriptions based on specific criteria, helping you focus on the most relevant opportunities quickly.

  4. 4
    Article
    Avatar of hnHacker News·2y

    finic-ai/finic: Create Playwright-based browser agents to scrape websites and automate tasks.

    Finic is a cloud platform that simplifies deploying and managing browser-based automation agents, focusing on fault-tolerant execution. It supports Playwright for DOM interaction and BeautifulSoup for HTML parsing. Features include cloud deployment, secure credential management, monitoring, and advanced functionalities like self-healing selectors and session impersonation.

  5. 5
    Video
    Avatar of youtubeYouTube·2y

    This Open Source Scraper CHANGES the Game!!!

    An open-source application is capable of scraping data from any website using just the URL and specified data fields, presenting the extracted information in formats like JSON, Excel, and Markdown. The tool leverages GPT models for cost-efficient operations and can handle intricate tasks such as CAPTCHA navigation and infinite scrolling. It avoids using libraries like fire craw for more control but still ensures reliable data extraction through structured output models from OpenAI.

  6. 6
    Article
    Avatar of medium_jsMedium·2y

    Mastering Web Scraping: From Bypassing CAPTCHAs to Building Simple Scrapers

    Web scraping is a powerful tool for automating data collection, overcoming security measures like CAPTCHAs and Cloudflare, and building scrapers. This post covers various techniques to bypass CAPTCHAs using Python libraries, create voting bots with tools like Automatio.ai, and construct simple scrapers with Python and JavaScript. Ethical practices in web scraping are emphasized, along with real-life case studies demonstrating the practical benefits of using Python for data analysis and cost savings.