NVIDIA researchers published LLaMA-Mesh, a fine-tuned version of LLaMA-3.1-8B-Instruct capable of both generating and understanding 3D mesh objects. The approach leverages the OBJ text-based format for 3D objects, allowing the LLM to treat 3D data as text tokens. Vertex coordinates are quantized to integers to reduce token count. A supervised fine-tuning dataset was constructed with two types of samples: mesh generation tasks and mesh recognition tasks, plus LLM-augmented dialogue samples. Results show competitive 3D generation quality compared to dedicated mesh models, while largely preserving the base model's language capabilities.
•5m watch time
Sort: