Best of Computer Vision — July 2024

1
Article
Community Picks·2y
face-api.js
face-api.js is a JavaScript API built on the tensorflow.js core for face detection and recognition in browsers. It supports multiple models like SSD MobileNetV1, Tiny Face Detector, and MTCNN, each optimized for different needs. The library also provides lightweight and fast 68-point face landmark detection, face recognition using a ResNet-34 model, and face expression recognition. It can be used both in browsers and Node.js environments and includes comprehensive examples for setup and usage.
841
20
2
Article
Semaphore·2y
OpenAI API Alternatives
OpenAI is a well-known AI provider with robust language models like GPT-4o, but alternatives exist that offer similar capabilities with potential benefits such as lower costs, specialized features, and flexibility. Key alternatives include Google Cloud AI APIs, Anthropic Claude API, AI21 Labs, Cohere, and Hugging Face Transformers, among others. Additionally, specific AI services for tasks like text-to-speech, computer vision, natural language processing, and image generation are also available from providers like Amazon Polly, Microsoft Azure Cognitive Services, and Amazon Titan.
49
3
Article
DevOps.com·2y
Image Recognition
Stay updated with the latest news and updates in image recognition technology.
27
1
4
Article
Machine Learning News·2y
LaMMOn: An End-to-End Multi-Camera Tracking Solution Leveraging Transformers and Graph Neural Networks for Enhanced Real-Time Traffic Management
Researchers from the University of Tennessee at Chattanooga and Leibniz University Hannover developed LaMMOn, a multi-camera tracking model using transformers and graph neural networks. LaMMOn integrates modules for object detection, tracking, trajectory clustering, and generating object embeddings from text. It addresses challenges in manual labeling and new matching rules, achieving high performance on datasets like CityFlow and TrackCUIP with competitive real-time processing speeds.
19
5
Article
Hacker News·2y
KwaiVGI/LivePortrait: Make one portrait alive!
LivePortrait is a GitHub repository containing the official PyTorch implementation of the LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control paper. The repository includes the initial version of the inference code and models, with continuous updates. Users can clone the repo, set up the environment using conda, install necessary dependencies, download pretrained weights, and run various scripts to animate portraits. The post also offers performance evaluation results on an RTX 4090 GPU and provides a Gradio interface for enhanced usability.
19
6
Article
The Verge·2y
The ‘godmother of AI’ has a new startup already worth $1 billion
Fei-Fei Li, known as the 'godmother of AI,' has founded a startup named World Labs, already valued at over $1 billion. World Labs aims to develop AI capable of advanced reasoning by processing visual data in a human-like manner. The startup focuses on creating models that understand three-dimensional worlds, which could have applications in robotics, augmented reality, virtual reality, and computer vision. Backed by major investors, World Labs intends to push the boundaries of AI in various industries including healthcare and manufacturing.
17
7
Article
Real Python·2y
Hugging Face Transformers Quiz – Real Python
Test your understanding of Hugging Face Transformers with this 6-question interactive quiz. This popular library is used for transformer models in natural language processing, computer vision, and other machine learning tasks. There's no time limit and you'll receive a score at the end, with a maximum of 100%. Good luck!
14
8
Article
ProAndroidDev·2y
Building On-Device Face Recognition In Android
The post explains how to build an on-device face recognition app for Android using FaceNet, TensorFlow Lite, Mediapipe, and ObjectBox. It covers the face recognition pipeline process, from detecting and cropping faces with Mediapipe, transforming these into embeddings using FaceNet/TensorFlow Lite, to storing and managing data with ObjectBox. The implementation also includes performing a nearest-neighbor search for face recognition using these technologies.
11
9
Article
Hacker News·2y
ShaShekhar/aaiela
This project allows users to modify images using audio commands. It incorporates several AI models including Detectron2 for object detection, Faster Whisper for audio transcription, and Stable Diffusion for text-to-image inpainting. Users can upload an image, give an audio command, and see the image modified based on their spoken instructions. The project also supports both local and API-based language models, with adjustable settings for customization.
11
1
10
Article
Medium·2y
Learn Transformer Fine-Tuning and Segment Anything
The post describes how to fine-tune Meta’s Segment Anything Model (SAM) for segmenting high fidelity masks in various domains, using the example of river pixel segmentation. It covers the project requirements, the architecture of SAM, configuring prompts, and the training process. The post offers practical advice on dataset management, leveraging Google Colab and GCP for training, and discusses different prompt types for better segmentation results.
11
11
Article
freeCodeCamp·2y
How Does Knowledge Distillation Work in Deep Learning Models?
Deep learning models can be resource-intensive, prompting the need for more efficient alternatives. Knowledge distillation transfers knowledge from a complex 'teacher' model to a simpler 'student' model, allowing the latter to achieve high performance with lower computational demands. This method improves model compression, generalization, and accessibility in fields like computer vision, NLP, and edge computing. Despite its challenges, such as computational overhead and hyperparameter tuning, knowledge distillation offers a pathway to creating smaller, efficient models suitable for a wide range of applications.
10

See all Computer Vision archives