[TOC]
Abs
相比LLaVA-1.5,LLaVA-NeXT最大的更新是在支持分辨率上兼容672x672, 336x1344, 1344x336 三种resolution。
据说34B性能超过Gemini-Pro
Method
Dynamic High-Resolution
感觉好暴力啊
Data Mixture
- High-quality User Instruct Data:多样性,高质量回复
- Multimodal Document/Chart Data:模仿Qwen-VL-7B-Chat增加了ChartQA
Scaling LLM backbone
In addition to Vicuna-1.5 (7B and 13B), we consider more LLMs, including Mistral-7B and Nous-Hermes-2-Yi-34B.