Kylin Page

A fool who dreams.

Continuous Batching

A Method for LLM Serving Throughputs

[TOC] Translate and Note from http://www.anyscale.com/blog/continuous-batching-llm-inference 由于 LLM 通过迭代生成其输出,并且 LLM 推理通常涉及内存而不是计算,因此在很多实践中,优化系统级批处理可以使性能差异达到10倍甚至更多。 Continuous batching = 动态...

C3 Intro To Clockwork

Serving DNNs like Clockwork Performance Predictability from the Bottom Up

[TOC] OSDI 20 clockwork 是写service文章一个重要的模版,尤其是background部分 Abstract BG service-level objectives (SLOs) 引用自 [40], 一般的形式是 latency SLO:Site Reliability Engineering: How Google Runs Production...

Redio Optimization Towards Disk I/Os

Accelerating Disk-Based Graph Processing by Reducing Disk I/Os

[TOC] IEEE TRANSACTIONS ON COMPUTERS Abstract: 面向的应用场景是 图处理,图处理一般会存一些点、边关系在hard-disk or SSDs 上 massive expensive disk I/Os remain the major performance bottleneck of disk-based graph proces...

Zero Offload

Democratizing Billion-Scale Model Training

[TOC] SC 21 Abstract: 主要面向 training 的 offload This paper present ZeRO-Infinity, a novel heterogeneous system technology that leverages GPU, CPU, and NVMe memory to allow for unprecedented model...

DeepSpeed Inference

Enabling Efficient Inference of Transformer Models at Unprecedented Scale

[TOC] SC 22 Abstract: transformer-based models 差异化发展: 1)参数量不同 2)由MoE引入的稀疏性不同 3)the target application scenarios can be latency-critical or throughput-oriented 4)the deployment hardware c...

Model Parallel Swapping of Computron

Serving Distributed Deep Learning Models with Model Parallel Swapping

[TOC] Not Accept Abstract Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster. 主要idea在于聚合GPU-CPU带宽,并用request-oriented调度的时间掩盖swap时间,并在实际ser...

Supervised Fine-Tuning Methods

SFT 方法总结

[TOC] Basics

Statistics In Quant

Quant中的统计学

[TOC] 协偏度 协峰度

C3 Intro To Effective Alpha

Effective Alpha

[TOC] Outline Operators 数据分两种:时间序列数据、横截面数据 ts_regression y 响应变量、x 自变量、d 是使用数据时间,lag是自变量和响应变量之间滞后 右边第一个是0,残差 右边第二个是3,映射(预测) trade_when corr Alpha Example 多因子模型 ...

Coding on 位运算

位运算及题单

[TOC] 位运算 Leetcode 题单 67. 二进制求和 只出现一次的数字 II 只出现一次的数字 III 数组中两个数的最大异或值 重复的DNA序列 最大单词长度乘积 所有子数组的亦或和