In this video, we dive into an intriguing research paper from NVIDIA, titled "LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models."

With LLaMA-Mesh, NVIDIA researchers were able to empower large language models (LLMs) with the capability to understand and generate 3D mesh objects, by using text only.

We will explain the OBJ format, which is used by LLaMA-Mesh to encode 3D mesh objects as text. Then, we describe how the model was created, by fine-tuning LLaMA-3.1-8B-Instruct over a text-3D instructions dataset which the researchers have crafted. Finally, we review interesting results from the paper.

Paper page - https://arxiv.org/abs/2411.09595
Code - https://github.com/nv-tlabs/LLaMA-Mesh
Nvidia's Blog - https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/
Our blog - https://aipapersacademy.com/llama-mesh/
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

Support us - https://paypal.me/aipapersacademy

The video was edited using VideoScribe - https://tidd.ly/44TZEiX
-----------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction
1:17 How can LLMs Understand 3D?
3:00 Building LLaMA-Mesh
4:22 Results

AI Papers Academy

NVIDIA researchers published LLaMA-Mesh, a fine-tuned version of LLaMA-3.1-8B-Instruct capable of both generating and understanding 3D mesh objects. The approach leverages the OBJ text-based format for 3D objects, allowing the LLM to treat 3D data as text tokens. Vertex coordinates are quantized to integers to reduce token count. A supervised fine-tuning dataset was constructed with two types of samples: mesh generation tasks and mesh recognition tasks, plus LLM-augmented dialogue samples. Results show competitive 3D generation quality compared to dedicated mesh models, while largely preserving the base model's language capabilities.

LLaMA-Mesh by Nvidia: LLM for 3D Mesh Generation