KylinChen | Blog

GPU Analysis 入门

GPU Analysis from zero to zero

[TOC] Basics pytorch自定义算子 numba triton Analysis Tools Pytorch.autograd.profiler with torch.autograd.profiler.profile(use_cuda=True) as prof: torch.square(b) prof.export_chrome_trace("log...

Posted by Kylin on September 4, 2024

LLM推理优化Review202408

LLM Infer Paper Review202408

[TOC] RelayAttention1 : 场景：长system prompt 问题：对于batched request，KV caches are transferred from off-chip DRAM to on-chip SRAM multiple times，也就是对于request是独立的。解决方案：RelayAttention allows reading t...

Posted by Kylin on August 30, 2024

Displaced Patch Pipeline Parallelism

DiT时代的模型推理优化

[TOC] github项目地址：https://github.com/xdit-project/xDiT?tab=readme-ov-file Abs #### Reference

Posted by Kylin on August 29, 2024

KnowLA

通过知识适应来增强参数高效的微调

[TOC] Abs leveraging knowledge graph embeddings to improve the effectiveness of PEFT：It inserts an adaptation layer into an LLM to integrate the embeddings of entities ap- pearing in the input te...

Posted by Kylin on August 19, 2024

Survey on Graph and RAG

GraphRAG综述

[TOC] RAG RAG的应用背景：提高回答准确率 “increase LLM accuracy” 减少幻觉 “reducing LLM hallucination” 无需训练的实时信息更新和专业知识增强 “handling train...

Posted by Kylin on August 5, 2024

Blip2代码解析

Blip2训练代码详解

[TOC] 看了下网络的代码分析，大多集中在推理端代码分析，而忽略了blip2qformer的训练代码，因此来分析一下： Inference（or stage-2）网络上分析的代码多是推理端的，blip2在做推理的时候，只有图像一侧的block起作用，相对来说流程上是比较简单的，即过N个Blip2QFormerLayer输出的learnable tokens作为图像特征，过mlp...

Posted by Kylin on April 29, 2024

多模态发展技术纵览

An Overview of Multimodal Technology Development

[TOC] CLIP：对比学习构建图文桥梁（2021，OpenAI） Contrastive Language Image Pre-train 详细解析1 典型的双塔模型，有两个 encoder，一个对应图片，一个对应文本，图像和文本经过各自的 encoder 后，通过简单的点乘来代表不同模态的交互（相似性）。训练时，假设一个 batch 有 N 对（图像，文本）...

Posted by Kylin on April 25, 2024

LLaVA-NeXT 改进推理、OCR 和世界知识

LLaVA NeXT Improved reasoning, OCR, and world knowledge

[TOC] Abs 相比LLaVA-1.5，LLaVA-NeXT最大的更新是在支持分辨率上兼容672x672, 336x1344, 1344x336 三种resolution。据说34B性能超过Gemini-Pro Method Dynamic High-Resolution 感觉好暴力啊 Data Mixture High-quality User Instru...

Posted by Kylin on April 25, 2024

基于树状推测解码和验证加速LLM服务

Accelerating Large Language Model Serving with Tree based Speculative Inference and Verification

[TOC] Abs 在推测解码上做了两个事情： 1）把predictions组织成tree-based，every node represent a candidate token sequence 2）tree-based predictions可以并行verification Intro insight：并行猜测、树状组织、并行验证但是有两个challenge： 1...

Posted by Kylin on April 24, 2024

Llama 3 蒸馏实践

knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

[TOC] 理论篇 Baby Llama：knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty Abstract 首先在10M-word BabyLM dataset上sft两个teachers：GPT-2 and small...

Posted by Kylin on April 23, 2024

Kylin Page