LLaVA-UHD is a method for building encoder-decoder models that can handle high-resolution images at any aspect ratio. It splits up large images into smaller slices, preserves fine visual details, and outperforms standard models and specialized high-res systems.
Sort: