Kylin Page

A fool who dreams.

Coding on 排列&背包问题

Example leetcode 排列&背包问题

[TOC] example. 377. 组合总和 Ⅳ 要我们求一个列表中等于target的取数序列(可重复、排列问题) class Solution: def combinationSum4(self, nums: List[int], target: int) -> int: dp = [0]*(target+1) dp[0] = 1 ...

DeepSpeed从0到1

DeepSpeed Cookbook

[TOC] ### Parameter Server Reference

标签词是锚点?压缩&ICL新思路

An Information Flow Perspective for Understanding In Context Learning

[TOC] EMNLP 23 best paper Abstract In-context learning (ICL) : promising capability of large language models (LLMs) by providing them with demonstration exam- ples to perform diverse tasks,即...

OPERA (CVPR 24) 通过过度信任惩罚和回顾分配减轻多模态大语言模型中的幻觉

Alleviating Hallucination in Multi Modal Large Language Models via Over Trust Penalty and Retrospection Allocation

[TOC] Abstract 现有的幻觉解决方法:特殊数据训练、外部知识库矫正(eg. Agent) Opera 是一种 decoding 方法,只涉及解码几乎就是free lunch Insight:MLLMs tend to generate new tokens by focusing on a few summary tokens, but not all the previ...

优化器Optimizer从0到1

Optimizer Cookbook

[TOC] SGD Naive SGD 梯度下降法中,每一个step需要在全部数据集上计算Loss之后再用反向传播得到所有参数的梯度。这种方式的坏处是如果数据量很大的话,计算量很大,这让训练数据不容易规模化。 随机梯度下降法 SGD(Stochastic Gradient Descent,SGD)每次随机挑一个 mini-batch 数据来优化神经网络的参数。这种方式可以近似地等价于...

2024年2月多模态大模型幻觉综述

A Survey on Hallucination in Large Vision Language Models

[TOC] Abstract “hallucination”:the misalignment between factual visual content and corresponding textual generation Intro 视觉幻觉按三元分类(ternary taxonomy): hallucination on object, attribute...

幻方2025届算法面筋

Interview to DeepSeek

[TOC] 一面 实习经历深挖 lora初始化方法 kv cache计算方法 bs*layer*sequence*hidden_dim*parameter_size*2 vllm可以做重参数化吗? 预训练权重在vllm框架下使用需要做哪些事情? 手撕:layernorm怎么写? def layernorm(hidden_states,belta,gamma): b...

DeepSeekVL Paper Reading

Introduction to DeepSeekVL

[TOC] Abstract Architecture 线性层进行多模态对齐 Reference

DeepSeekLLM Paper Reading

Introduction to DeepSeekLLM

[TOC] Abstract Architecture The micro design of DeepSeek LLM largely follows the design of LLaMA (Touvron et al., 2023a,b), adopting a Pre-Norm structure with RMSNorm (Zhang and Sennrich, 2019) ...

记忆增强的视频理解 MALLM paper reading

Memory Augmented Large Multimodal Model for LongTerm Video Understanding

[TOC] CVPR 2024 Abstract 以前MLLM的问题:can only take in a limited number of frames for short video understanding motivation:Instead of trying to process more frames simultaneously like most exi...