Kylin Page

A fool who dreams.

GPU Analysis 入门

GPU Analysis from zero to zero

[TOC] Basics pytorch自定义算子 numba triton Analysis Tools Pytorch.autograd.profiler with torch.autograd.profiler.profile(use_cuda=True) as prof: torch.square(b) prof.export_chrome_trace("log...

LLM推理优化Review202408

LLM Infer Paper Review202408

[TOC] RelayAttention1 : 场景:长system prompt 问题:对于batched request,KV caches are transferred from off-chip DRAM to on-chip SRAM multiple times,也就是对于request是独立的。 解决方案:RelayAttention allows reading t...

Displaced Patch Pipeline Parallelism

DiT时代的模型推理优化

[TOC] github项目地址:https://github.com/xdit-project/xDiT?tab=readme-ov-file Abs #### Reference

KnowLA

通过知识适应来增强参数高效的微调

[TOC] Abs leveraging knowledge graph embeddings to improve the effectiveness of PEFT:It inserts an adaptation layer into an LLM to integrate the embeddings of entities ap- pearing in the input te...

Survey on Graph and RAG

GraphRAG综述

[TOC] RAG RAG的应用背景: 提高回答准确率 “increase LLM accuracy” 减少幻觉 “reducing LLM hallucination” 无需训练的实时信息更新和专业知识增强 “handling train...

Blip2代码解析

Blip2训练代码详解

[TOC] 看了下网络的代码分析,大多集中在推理端代码分析,而忽略了blip2qformer的训练代码,因此来分析一下: Inference(or stage-2) 网络上分析的代码多是推理端的,blip2在做推理的时候,只有图像一侧的block起作用,相对来说流程上是比较简单的,即过N个Blip2QFormerLayer输出的learnable tokens作为图像特征,过mlp...

多模态发展技术纵览

An Overview of Multimodal Technology Development

[TOC] CLIP:对比学习构建图文桥梁(2021,OpenAI) Contrastive Language Image Pre-train 详细解析1 典型的双塔模型,有两个 encoder,一个对应图片,一个对应文本,图像和文本经过各自的 encoder 后,通过简单的点乘来代表不同模态的交互(相似性)。 训练时,假设一个 batch 有 N 对(图像,文本)...

LLaVA-NeXT 改进推理、OCR 和世界知识

LLaVA NeXT Improved reasoning, OCR, and world knowledge

[TOC] Abs 相比LLaVA-1.5,LLaVA-NeXT最大的更新是在支持分辨率上兼容672x672, 336x1344, 1344x336 三种resolution。 据说34B性能超过Gemini-Pro Method Dynamic High-Resolution 感觉好暴力啊 Data Mixture High-quality User Instru...

基于树状推测解码和验证加速LLM服务

Accelerating Large Language Model Serving with Tree based Speculative Inference and Verification

[TOC] Abs 在推测解码上做了两个事情: 1)把predictions组织成tree-based,every node represent a candidate token sequence 2)tree-based predictions可以并行verification Intro insight:并行猜测、树状组织、并行验证 但是有两个challenge: 1...

Llama 3 蒸馏实践

knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

[TOC] 理论篇 Baby Llama:knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty Abstract 首先在10M-word BabyLM dataset上sft两个teachers:GPT-2 and small...