Best of Computer VisionJune 2025

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·46w

    48 Most Popular Open ML Datasets

    A comprehensive compilation of 48 widely-used open machine learning datasets organized by domain including computer vision (ImageNet, COCO), natural language processing (SQuAD, GLUE), recommendation systems (MovieLens, new Yambda-5B), tabular data (UCI datasets, Titanic), reinforcement learning (OpenAI Gym), and multimodal learning (LAION-5B, VQA). Each dataset is briefly described with its primary use case and key characteristics, serving as a reference guide for researchers and practitioners selecting appropriate datasets for their ML projects.

  2. 2
    Article
    Avatar of hnHacker News·45w

    Making eyesite

    A developer created Eyesite, a web application that enables eye-controlled navigation as an affordable alternative to Apple Vision Pro. The project uses WebGazer.js for eye tracking, requiring calibration through 9-point mapping for accuracy. Key design decisions included hiding the eye cursor to maintain immersion, implementing visual feedback through button glows when gazed upon, and using large UI elements to compensate for tracking jitteriness. The spacebar serves as the click mechanism, mimicking Vision Pro's look-and-pinch interaction model.

  3. 3
    Article
    Avatar of huggingfaceHugging Face·46w

    ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

    ScreenSuite is a comprehensive evaluation framework for GUI agents that unifies 13 benchmarks across perception, grounding, single-step actions, and multi-step agent capabilities. The suite evaluates vision language models on their ability to interact with graphical interfaces using only visual input, without accessibility trees or DOM metadata. It includes Dockerized environments for Ubuntu and Android testing, supports both local and remote sandbox execution, and provides standardized evaluation of leading VLMs like Qwen-2.5-VL series, UI-TARS, and GPT-4o on GUI automation tasks.

  4. 4
    Article
    Avatar of arstechnicaArs Technica·44w

    MIT student prints AI polymer masks to restore paintings in hours

    MIT graduate student Alex Kachkine developed a revolutionary art restoration technique using AI-generated polymer films that can restore damaged paintings in hours instead of months. The method creates transparent masks with thousands of precisely color-matched regions that can be applied to artwork and removed when needed, making restoration reversible. An AI model identified damage patterns and generated over 57,000 different colors to restore a 15th-century painting with 5,612 damaged regions in just 3.5 hours. This approach could help make the 70% of institutional art collections currently hidden due to damage accessible to the public again.

  5. 5
    Article
    Avatar of aiAI·44w

    Build Image Search and run on your PC within 200 lines of python

    A step-by-step guide to building an image search system using CLIP embeddings and vector indexing. The tutorial demonstrates how to create a multi-modal search engine that accepts natural language queries like 'a cute animal' and returns visually relevant images without manual tagging. The implementation uses CocoIndex for data processing, CLIP for generating embeddings, Qdrant for vector storage, and FastAPI for the search API, all contained within 200 lines of Python code.

  6. 6
    Video
    Avatar of codingwithlewisCoding with Lewis·44w

    I Built an AI That Knows 200,000 Game Characters

    A developer built an AI system that recognizes over 200,000 video game characters by scraping data from gaming databases, collecting character images, and using embeddings with a vector database for similarity matching. The project uses NVIDIA's G-Assist platform and demonstrates techniques for data collection, image processing, and building AI-powered gaming tools.

  7. 7
    Article
    Avatar of heatherbcooperVisually AI·43w

    Midjourney Video disrupts the competition

    Midjourney launched V1 video generation, allowing users to animate existing images through auto mode or manual prompting with motion control. The system offers image-to-motion capabilities with extensions up to 20 seconds. Meanwhile, MiniMax released Hailuo 02 with improved motion dynamics and 1080p output, while Higgsfield introduced Canvas for precise video editing control. Additional updates include Freepik adding Google Veo 3 and Hailuo 02, Adobe's Firefly mobile app launch, and new tools from ElevenLabs, Runway, and Krea.