LLaVA-NeXT 改进推理、OCR 和世界知识

LLaVA NeXT Improved reasoning, OCR, and world knowledge

Posted by Kylin on April 25, 2024

[TOC]

Abs

相比LLaVA-1.5,LLaVA-NeXT最大的更新是在支持分辨率上兼容672x672, 336x1344, 1344x336 三种resolution。

据说34B性能超过Gemini-Pro

Method

Dynamic High-Resolution

感觉好暴力啊

截屏2024-04-25 11.02.19

Data Mixture

  • High-quality User Instruct Data:多样性,高质量回复
  • Multimodal Document/Chart Data:模仿Qwen-VL-7B-Chat增加了ChartQA

Scaling LLM backbone

In addition to Vicuna-1.5 (7B and 13B), we consider more LLMs, including Mistral-7B and Nous-Hermes-2-Yi-34B.

Model Card

截屏2024-04-25 10.59.53

Reference