Best of Computer Vision — November 2024

1
Article
Community Picks·1y
🤗 Transformers
🤗 Transformers provides APIs and tools for easily downloading and training state-of-the-art pretrained models for tasks in natural language processing, computer vision, audio, and multimodal categories. It supports interoperability between PyTorch, TensorFlow, and JAX, allowing for flexible model training and deployment. The library also offers comprehensive documentation, tutorials, and guides to help users get started and achieve specific goals.
103
9
2
Article
DEV·1y
Ollama-OCR for High-Precision OCR with Ollama
Llama 3.2-Vision is a highly capable multimodal large language model for text and image inputs, excelling in visual recognition and image reasoning. This guide explains how to implement OCR functionality using Ollama-OCR with Llama 3.2-Vision. Key features include high accuracy text recognition, support for multiple image formats, and customizable prompts. The guide also outlines the steps to install Ollama and the Llama 3.2-Vision model.
88
2
3
Article
Hacker News·1y
Hand Tracking for Mouse Input
The post describes a project that implements hand tracking to simulate mouse input using finger pinching, inspired by Apple Vision Pro. Initially, the author faced performance issues with the Python version of MediaPipe but found success using the web version. They managed real-time communication between the web frontend and Python backend via WebSocket. Later, they transitioned to using Tauri to build a more efficient desktop app with a Rust backend. The project also explored a mode inspired by Meta Quest for front-facing camera input. Various challenges like jitter and latency were tackled through techniques like the One Euro Filter.
54
6
4
Article
PyImageSearch·1y
Create a 3D Object from Your Images with TripoSR in Python
Learn how to create a 3D object from a single image using TripoSR, a state-of-the-art model for fast-feedforward 3D reconstruction. This guide walks through setting up the environment, uploading and preparing the image, initializing and running the TripoSR model, and generating and exporting the 3D model. The tutorial highlights the speed and accuracy of TripoSR, processing inputs in less than 0.5 seconds on an NVIDIA A100 GPU, and its application across various fields such as e-commerce, game development, and virtual reality.
50
2
5
Video
Community Picks·1y
OpenCV tutorial for beginners | FULL COURSE in 3 hours with Python
Felipe provides a comprehensive three-hour course on OpenCV using Python, covering fundamental concepts like image representation, reading and writing images and videos, basic operations such as cropping and resizing, and advanced functions like color spaces, blurring, thresholding, and edge detection. Two practical projects, color detection and face anonymization, are included to apply the learned concepts.
15
6
Article
Towards AI·1y
Transformers For Images!!
This post explores the application of transformers in image processing within the field of computer vision, detailing three main methods: Pixel Transformers, Vision Transformers (ViT) by Google Brain, and Swin Transformers by Microsoft. It highlights the limitations of CNNs and offers solutions to computational inefficiencies, such as using image patches and techniques like window attention and hierarchical patches.
12

See all Computer Vision archives