MacawLLM

MULTI-MODAL LANGUAGE MODELING WITH IMAGE, AUDIO, VIDEO

Posted by Kylin on June 25, 2023

[TOC]

Abstract

多模态LLM:crossing visual/audio/text

three main components:

  • modality module for encoding multi-modal data

  • cognitive module for har- nessing pretrained LLMs

  • alignment module for harmonizing diverse representations. (novel contribution)

    MACAW-LLM main components:

  • modality module for encoding multi-modal data

  • cognitive module for harnessing pretrained LLMs

  • alignment module for harmonizing diverse representations (main contribution)