KylinChen | Blog

Claude 3 Technical Report

The Claude 3 Model Family

[TOC] Abstract 三个模型定位，由强到弱： Claude 3 Opus, our most capable offering Claude 3 Sonnet, which provides a combination of skills and speed Claude 3 Haiku, our fastest and least expensive model...

Posted by Kylin on April 16, 2024

Infini Attention 详解及数学推导

Efficient Infinite Context Transformers with Infini Attention 详解

[TOC] Abstract 达到的效果：bounded memory and computation 方法的本质：new attention technique dubbed Infini-attention, 即修改的attention机制 Intro 宏观的想法上就是Q分别在 previous segments 和 local segmenet上分别做attention，只对...

Posted by Kylin on April 14, 2024

Mini Gemini

Mining the Potential of Multimodality Vision Language Models

[TOC] Abstract 这个模型是any-to-any的 we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. Intro 输入端：模型视觉输入端是dual-encode...

Posted by Kylin on April 13, 2024

幻方2025届大模型算法笔试

coding to HF

[TOC] 选择题填空题 OJ

Posted by Kylin on April 11, 2024

Stable Diffusion浅析&性能优化研究

interview to SD&SDSystem

[TOC] Intro To SD 参考1 Opt in SD 参考2 Reference 浅谈Stable Diffusion. https://zhuanlan.zhihu.com/p/637758440 ↩ 扩散模型(Diffusion Model)首篇综述-Diffusion Mod...

Posted by Kylin on April 8, 2024

小红书2025届算法面筋

interview to XHS

[TOC] 一面项目 Kaggle-LLM微调细节（数据准备、训练）手撕：给定两个版本号的字符串，比较版本号大小（直接split后map(int),遍历即可）二面面筋 blip2的learnable token有多少个最后部署的流量情况，内部api使用情况了解哪些推理加速框架？我说speculative sampling 想做infra还是想做纯算法手撕 ...

Posted by Kylin on April 1, 2024

淘天2025届算法面筋

interview to TT

[TOC] 一面项目： CrossAtt Adaptor的作用 Vision Token用几个 MLLM预训练分作几个阶段了解的MLLM架构 mplug怎么处理视频 video_attention_mask = torch.ones(video_embeds.size()[:-...

Posted by Kylin on April 1, 2024

拼多多2025届算法面筋

interview to PDD

一面八股：解决过拟合的方法为什么改LN为mean square root norm？项目：为什么不训练一个reward model 评测和数据筛选方法上线效率，上线后使用情况手撕：1 # 求最长回文子串。比如 "55787" 返回 "787" s = "557887" n = len(s) dp = [[0]*n for _ in range(n)] fo...

Posted by Kylin on April 1, 2024

RL in LLM pretrain

大模型预训练中的强化学习

[TOC] PPO（Proximal Policy Optimization） PG（Policy Gradient）假设R(eword)函数遵循马尔可夫链，求R对$\theta$ (决策的神经网络) 的导数： \(\begin{gathered}\nabla \bar{R}_\theta=\frac{1}{N} \sum_{n=1}^N \sum_{t=1}^{T_n} R\lef...

Posted by Kylin on March 31, 2024

腾讯2025届算法笔试

coding to Tencent

[TOC] 3.30 算法 OJ Q1 国际象棋的棋盘上，行数是用数字1到8表示，列数是用字母a到h表示。放置了一个皇后，返回攻击到的格子。提示：皇后可以攻击到同一行、同一列以及同一对角线的所有棋子。输入描述一个字符串，代表皇后的位置。输出描述第一行输出一个整数，代表皇后可以攻击到的格子数量。第二行输出2个长度为2的字符串，代表皇后能攻击到的格子。能攻...

Posted by Kylin on March 30, 2024

Kylin Page

Claude 3 Technical Report

The Claude 3 Model Family

Infini Attention 详解及数学推导

Efficient Infinite Context Transformers with Infini Attention 详解

Mini Gemini

Mining the Potential of Multimodality Vision Language Models

幻方2025届大模型算法笔试

coding to HF

Stable Diffusion浅析&性能优化研究

interview to SD&SDSystem

小红书2025届算法面筋

interview to XHS

淘天2025届算法面筋

interview to TT

拼多多2025届算法面筋

interview to PDD

RL in LLM pretrain

大模型预训练中的强化学习

腾讯2025届算法笔试

coding to Tencent

FEATURED TAGS

ABOUT ME

FRIENDS