Early-Exiting Framework with Parallel Decoding

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Posted by Kylin on January 14, 2023

[TOC]

9 Oct 2023, Arxiv

Abs

early-exiting的本质其实就是为计算图添加条件分支,而auto-regressive本质上也就是条件计算图中比较特殊的循环计算图。

传统的ee有退化问题的原因:

  • state copying mechanism
  • numerous exit paths
  • sensitivity to exit confidence thresholds

Intro

Challenges of previous works:

  • 为了应对KV recompute的问题,提出的state copying mechanism (Elbayad et al., 2020; Schuster et al., 2022) 对于长序列退化 (sec 4.1)
  • 早停可能会造成超长生成序列(循环词)
  • 过多的exiting point会加大系统开销,即the computational overhead from confidence measurement(置信度测量) at every layer
  • 选择confidence threshold带来开销和不连续性(不同的ct会带来系统表现波动)

Free

截屏2024-01-14 15.44.10

Reference