Kylin Page

A fool who dreams.

Several Tricks in Beam Search in Hugging Face

Beam Search 在 Hugging Face 中的实现

[TOC]

KV Cache

KV Cache 关键的优化技术

[TOC] 一个绝好的教程:https://www.youtube.com/watch?v=80bIUggRJf4 KV-cache优化技术总结:https://zhuanlan.zhihu.com/p/659770503 KV-cache介绍 KV-cache本身就是model.generate baseline式的优化方法 Attention的公式: ...

Introduction to SmartMOE

Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization

[TOC] Motivation: 之前的 research 都是按照一个 static plan 进行系统执行,而且一般考虑到:model architecture 和 hardware specification,现在我们想整一个 workload-aware 的 。 SmartMOE split the process of automatic parallelization in...

Introduction to FlexGen

High-Throughput Generative Inference of Large Language Models with a Single GPU

[TOC] ICML 2023 Motivation: high-throughput LLM inference using limited resources, such as a single and commodity GPU. throughput-oriented generative inference Background: 现有的资源受限LLM infe...

Fundamentals of Quantitative Trading

notebooks for WQ learning

[TOC] Key concepts of Quantitative Finance 股票市场如何运作 股票市场是指为发行、购买和出售在证券交易所或场外交易的股票而存在的公共市场。股票,也被称为股权,代表着对一家公司的部分所有权,而股市是一个投资者可以购买和出售这种可投资资产所有权的地方(Corporate Finance Institute. (2022, October 28). ...

21 Alpha Examples for WQC

21 个有效的 Alpha

[TOC] notebooks from 21 Alpha Examples for Beginners Data Field Search & operations Data: https://platform.worldquantbrain.com/data/search Operation: https://platform.worldquantbrain.co...

C1 Intro To WQ and Factor Fundamentals

WQ介绍和因子投资基础

[TOC] 量化回测:利用大量历史数据,模拟多种投资组合 回测指标:年化收益、Sharpe、最大回撤 量化交易发展历史 早期阶段。20世纪50年代,美国商品期货市场开始采用电子交易平台。从60年代开始,随着计算机和数据技术的快速发展,量化交易开始在不同的商业应用领域得到广泛应用。 1980年代。从80年代开始,量化交易在金融领域开始得到广泛应用,并受到各大金...

Factor Investing Chapter 1

因子投资基础

[TOC]

Introduction to Mobius

Fine Tuning Large-Scale Models on Commodity GPU Servers

[TOC] Mobius offload key insights:Note that we focus on extending GPU memory with only DRAM, since publicly available pretrained models can usually fit in DRAM and the limited bandwidth of SSDs...

GSPMD for ops partition across muti-devices

General and Scalable Parallelization for ML Computation Graphs

[TOC] Alpa 中说:Alpa optimizes the intra-operator parallelism plan within a device mesh. Alpa adopts the SPMD-style intra-op parallelism [31,57] which partitions operators evenly across devices and ...