This post goes beyond trivial examples often used to explain Java's CompletableFuture, by implementing a simple web-crawler that starts at a specific page and follows links to reach a target website. The example uses Java's HttpClient for downloading pages and employs various techniques, including concurrency management, async operations, and recursion, to make the crawler efficient. Additionally, it covers challenges such as memory usage, concurrency limits, and avoiding revisiting pages to create a practical yet insightful example for both beginner and seasoned developers.

18m read timeFrom concurrencydeepdives.com
Post cover image
Table of contents
RequirementsHttpClient APIBaseline codeRecursion with thenComposeImproving the crawlerMemory usage and metrics

Sort: