[TOC]
Abstract
多模态LLM:crossing visual/audio/text
three main components:
-
modality module for encoding multi-modal data
-
cognitive module for har- nessing pretrained LLMs
-
alignment module for harmonizing diverse representations. (novel contribution)
MACAW-LLM main components:
-
modality module for encoding multi-modal data
-
cognitive module for harnessing pretrained LLMs
-
alignment module for harmonizing diverse representations (main contribution)