Best of Crawling — December 2024

1
Article
Community Picks·1y
How to scrape Google Maps data using Python
Learn how to build a Google Maps scraper using Crawlee and Python to extract hotel data including names, ratings, reviews, prices, and amenities. The guide covers setting up the environment, connecting to Google Maps, handling dynamic content, and managing infinite scrolling. It also explains how to use proxies for large-scale scraping and create an interactive analysis dashboard with the exported data.
90
4
2
Article
Community Picks·1y
One Million Screenshots
Explore over a million rendered homepages from the web in an interactive manner, allowing you to zoom, pan, and click similar to Google Maps. This visual dataset could help you find websites you've been looking for or discover new ones. Check out the FAQ for more details and learn about the Screenshot API if you're interested in the data.
57
10
3
Article
Machine Learning News·1y
Meet Steel.dev: An Open Source Browser API for AI Agents and Apps
Steel.dev is an open-source tool that simplifies web automation for AI applications by abstracting complex browser interactions through a RESTful API. It reduces the need for detailed scripts and expertise in frameworks like Puppeteer, Selenium, and Playwright. The tool features a modular architecture that allows easy management and interaction with headless browsers, facilitating tasks such as data extraction and form completion while ensuring scalability for large-scale projects.
51
4
Article
The New Stack·1y
5 Python Libraries Every Data Engineer Should Know
Python is a powerful language for data engineering, enhanced by essential third-party libraries. For beginners, Beautiful Soup 4 and Requests are ideal for web scraping and sending HTTP requests. Intermediate users may benefit from Apache Airflow for workflow automation and Boto3 for integrating AWS services. Advanced users can leverage Pandas for comprehensive data manipulation and analysis.
36
5
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Our Agentic Workflow to Write and Publish Social Content
A personal multi-agent app was developed to automate the creation and publication of social media content. The tech stack includes CrewAI for building workflows, FireCrawl for web scraping, and Typefully for post scheduling. The app processes content from a blog or newsletter, understands the writing style, and drafts posts for LinkedIn and X, publishing them via Typefully's API. Detailed insights and code are accessible in CrewAI's documentation.
35
1
6
Video
ByteGrad·1y
Web Scraping With GPT-4 Vision AI & Playwright Is Ridiculously EASY - I Can't Believe This Works
AI advancements, particularly GPT-4 Vision AI, have simplified web scraping. Techniques like using network requests directly, scraping from server-side rendered websites via embedded HTML data, and utilizing proxies to avoid detection are discussed. Methods to scrape using both text-based and vision-based approaches with tools like Playwright and Smart Proxy are outlined, all while tapping into modern AI models to efficiently extract data.
34
7
Article
Hacker News·1y
steel-dev/steel-browser: 🚧 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you build automate the web without worrying about infrastructu
Steel.dev is an open-source browser API that allows developers to build AI apps and agents for web interaction without building automation infrastructure from scratch. It offers full browser control, session management, proxy support, debugging tools, and more. The API supports popular frameworks such as Puppeteer, Playwright, and Selenium, and can be run locally or deployed to various cloud platforms. The project is in public beta and invites contributions and feedback.
18

See all Crawling archives