LLaVA-NeXT 改进推理、OCR 和世界知识

[TOC]

相比LLaVA-1.5，LLaVA-NeXT最大的更新是在支持分辨率上兼容672x672, 336x1344, 1344x336 三种resolution。

据说34B性能超过Gemini-Pro

感觉好暴力啊

截屏2024-04-25 11.02.19

In addition to Vicuna-1.5 (7B and 13B), we consider more LLMs, including Mistral-7B and Nous-Hermes-2-Yi-34B.

截屏2024-04-25 10.59.53

LLaVA NeXT Improved reasoning, OCR, and world knowledge