Model Compression Overview
I. INTRODUCTION
1.Designing efficient NN model architectures
present situation
- 手动优化微观结构如内核类型(深度卷积或低秩分解)
- 手动优化宏观结构如模块(residual、inception)
- 自动优化如Automated machine learning (AutoML) and Neural Architecture Search (NAS)
2.Co-designing NN architecture and hardware together
硬件与nn结构共同设计或者针对不同的硬件平台调整神经网络架构。主要mootivation是nn不同组件的开销是依赖于硬件的。
3.Pruning
unstructured pruning
- motivation: removes neurons with with small sensitivity, wherever they occur
- positive: little impact on the generalization performance
- negative: leads to sparse matrix operations, which are known to be hard to accelerate, and which are typically memory-bound
structured pruning
- motivation: a group of parameters (e.g., entire convolutional filters) is removed.
- positive: still permitting dense matrix operations.
- negative: aggressive structured pruning often leads to significant accuracy degradation.
4.Knowledge distillation
- motivation: training a large model and then using it as a teacher to train a more compact model.
- positive: mix knowledge distillation with prior method(i.e.也就是quantization and pruning ) has succeed
- negative: a major challenge here is to achieve a high compression ratio with distillation alone.non-negligible accuracy degradation with aggressive compression.
5.Quantization
- present situation: has shown great and consistent success in both training and inference.this survey focused on inference.
- shortcoming: very difficult to go below half-precision without significant tuning, and most of the recent quantization research has focused on inference.
6.Similarity of Quantization and Neuroscience
- motivation: work in neuroscience that suggests that the human brain stores information in a discrete/quantized form, rather than in a continuous form.
II. GENERAL HISTORY OF QUANTIZATION
III. BASIC CONCEPTS OF QUANTIZATION
- Problem Setup and Notations
- Uniform Quantization
- Symmetric and Asymmetric Quantization
- Range Calibration Algorithms: Static vs Dynamic Quantization
- Quantization Granularity
- Non-Uniform Quantization
- Fine-tuning Methods
- Quantization-Aware Training
- Stochastic Quantization
IV. ADVANCED CONCEPTS: QUANTIZATION BELOW 8 BITS
- Simulated and Integer-only Quantization
- Mixed-Precision Quantization
- Hardware Aware Quantization
- Distillation-Assisted Quantization
- Extreme Quantization
- Quantization Error Minimization
- Improved Loss function
- Improved Training Method
- Vector Quantization
V. QUANTIZATION AND HARDWARE PROCESSORS
VI. FUTURE DIRECTIONS FOR RESEARCH IN QUANTIZATION
- Quantization Software
- Hardware and NN Architecture Co-Design
- Coupled Compression Methods
- Quantized Training