Model Compression Overview

发表于 2022-09-07 更新于 2024-12-02

I. INTRODUCTION

present situation

硬件与nn结构共同设计或者针对不同的硬件平台调整神经网络架构。主要mootivation是nn不同组件的开销是依赖于硬件的。

unstructured pruning
- motivation: removes neurons with with small sensitivity, wherever they occur
- positive: little impact on the generalization performance
- negative: leads to sparse matrix operations, which are known to be hard to accelerate, and which are typically memory-bound
structured pruning
- motivation: a group of parameters (e.g., entire convolutional filters) is removed.
- positive: still permitting dense matrix operations.
- negative: aggressive structured pruning often leads to significant accuracy degradation.

motivation: training a large model and then using it as a teacher to train a more compact model.
positive: mix knowledge distillation with prior method(i.e.也就是quantization and pruning ) has succeed
negative: a major challenge here is to achieve a high compression ratio with distillation alone.non-negligible accuracy degradation with aggressive compression.

present situation: has shown great and consistent success in both training and inference.this survey focused on inference.
shortcoming: very difficult to go below half-precision without significant tuning, and most of the recent quantization research has focused on inference.

motivation: work in neuroscience that suggests that the human brain stores information in a discrete/quantized form, rather than in a continuous form.

Simulated and Integer-only Quantization
Mixed-Precision Quantization
Hardware Aware Quantization
Distillation-Assisted Quantization
Extreme Quantization
- Quantization Error Minimization
- Improved Loss function
- Improved Training Method
Vector Quantization