KylinChen | Blog

MLLM Architecture

MLLM 经典结构详解

[TOC] Flamingo1 主要两个结构： perceiver resampler：类似DETR，通过设计多个Perceiver Resampler来生成64个固定长度的tokens，主要作用在于可以从图像中提取固定长度的特征向量，能够解决图像甚至多帧视频的feature map不一致的问题。 XAttn-Dense：在每一层LLM上都会增加corss- a...

Posted by Kylin on January 22, 2024

LLM Architecture

LLM 经典结构详解

[TOC] Reference

Posted by Kylin on January 22, 2024

LLM 常见面试问题

Interview for LLM

[TOC] 基础篇 1、为什么主流LLM都是Decoder-Only的?1 (Meituan-1) 先横向比较下，再说为什么： LLM架构主要分为三类： Encoder-Only、Encoder-Decoder、Decoder-Only 首先纯Encoder+MLM的模型（bert）只适合做NLU，不适合做生成。现在做Encoder-Decoder、Decoder-Only的...

Posted by Kylin on January 22, 2024

GPT2参数量准确计算

LLM参数量估计

[TOC] LLM参数量 e.g. GPT2-small 参数量计算 { "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eo...

Posted by Kylin on January 21, 2024

LLM Science Exam Review

Kaggle LLM Science Exam Review

[TOC] Bg Objectives 目标是让LLM做选择题（5选1） Data 数据是wikimedia的数据经过GPT3.5编写的选择题（并且过滤了简单问题）注意kaggle能运行的最大模型参数量约7B（因为用的是16G P100） train提供了200个问题样本，test全部包含4000样本 Metrics MAP@3 （Mean Average Pre...

Posted by Kylin on January 21, 2024

IMC 2023 Review

Image Matching Challenge 2023 Review

[TOC] Bg Objectives 从一组图像中重建环境的三维模型或关键点（SfM，从运动中获得结构）；目标是给出一组N张图片的姿态（旋转矩阵、平移向量） Metrics 提交的作品是根据估计姿势的平均精度（mAA）进行评估的。给定一组相机，以其旋转矩阵和平移矢量为参数，以及隐藏的 ground truth，我们计算\(N\)中每一对图像的旋转（ \(\epsi...

Posted by Kylin on January 21, 2024

SkipDecode

Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

[TOC] 5 Jul 2023, https://arxiv.org/abs/2307.02628 Abs Bg: early-exit strategies reduce computational cost Challenges: 现有的方法难以和batch inferencing and Key-Value caching 结合 Method: It o...

Posted by Kylin on January 12, 2024

Survey on Decoding Algorithm

主流Decoding Algorithms优化

[TOC] Non-autoregressive decoding [97, 104, 108] 是最初在机器翻译问题上提出的研究 [271] ： non-autoregressive translation 方向的survey disadvantage：到目前为止，conditional dependence between output tokens 仍然是不可知的，因此解码结果...

Posted by Kylin on January 11, 2024

Emu2 训练细节

Generative Multimodal Models are In-Context Learners

[TOC] Abs 一直在强调scaling up这个事情，带来一个疑问：效果的提升和现有架构的优势是不是和37B的参数量有关 Arch Training 1）first stage： data： data：image-text and video-text data 目标：captioning loss（文本loss）参与训练：linear projection、M...

Posted by Kylin on January 9, 2024

H2O filtering KV cache

Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

[TOC] Abs challenge: considering LLM inference, KV cache scaling linearly with the sequence length and batch size. insight: a small portion of tokens contributes most of the value when computing...

Posted by Kylin on January 8, 2024

Kylin Page

MLLM Architecture

MLLM 经典结构详解

LLM Architecture

LLM 经典结构详解

LLM 常见面试问题

Interview for LLM

GPT2参数量准确计算

LLM参数量估计

LLM Science Exam Review

Kaggle LLM Science Exam Review

IMC 2023 Review

Image Matching Challenge 2023 Review

SkipDecode

Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Survey on Decoding Algorithm

主流Decoding Algorithms优化

Emu2 训练细节

Generative Multimodal Models are In-Context Learners

H2O filtering KV cache

Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

FEATURED TAGS

ABOUT ME

FRIENDS