Model Compression Overview

I. INTRODUCTION

1.Designing efficient NN model architectures

present situation

  1. 手动优化微观结构如内核类型(深度卷积或低秩分解)
  2. 手动优化宏观结构如模块(residual、inception)
  3. 自动优化如Automated machine learning (AutoML) and Neural Architecture Search (NAS)

2.Co-designing NN architecture and hardware together

硬件与nn结构共同设计或者针对不同的硬件平台调整神经网络架构。主要mootivation是nn不同组件的开销是依赖于硬件的。

3.Pruning

  1. unstructured pruning

    • motivation: removes neurons with with small sensitivity, wherever they occur
    • positive: little impact on the generalization performance
    • negative: leads to sparse matrix operations, which are known to be hard to accelerate, and which are typically memory-bound
  2. structured pruning

    • motivation: a group of parameters (e.g., entire convolutional filters) is removed.
    • positive: still permitting dense matrix operations.
    • negative: aggressive structured pruning often leads to significant accuracy degradation.

4.Knowledge distillation

  • motivation: training a large model and then using it as a teacher to train a more compact model.
  • positive: mix knowledge distillation with prior method(i.e.也就是quantization and pruning ) has succeed
  • negative: a major challenge here is to achieve a high compression ratio with distillation alone.non-negligible accuracy degradation with aggressive compression.

5.Quantization

  • present situation: has shown great and consistent success in both training and inference.this survey focused on inference.
  • shortcoming: very difficult to go below half-precision without significant tuning, and most of the recent quantization research has focused on inference.

6.Similarity of Quantization and Neuroscience

  • motivation: work in neuroscience that suggests that the human brain stores information in a discrete/quantized form, rather than in a continuous form.

II. GENERAL HISTORY OF QUANTIZATION

III. BASIC CONCEPTS OF QUANTIZATION

  • Problem Setup and Notations
  • Uniform Quantization
  • Symmetric and Asymmetric Quantization
  • Range Calibration Algorithms: Static vs Dynamic Quantization
  • Quantization Granularity
  • Non-Uniform Quantization
  • Fine-tuning Methods
    • Quantization-Aware Training
    • Stochastic Quantization

IV. ADVANCED CONCEPTS: QUANTIZATION BELOW 8 BITS

  • Simulated and Integer-only Quantization
  • Mixed-Precision Quantization
  • Hardware Aware Quantization
  • Distillation-Assisted Quantization
  • Extreme Quantization
    • Quantization Error Minimization
    • Improved Loss function
    • Improved Training Method
  • Vector Quantization

V. QUANTIZATION AND HARDWARE PROCESSORS

VI. FUTURE DIRECTIONS FOR RESEARCH IN QUANTIZATION

  • Quantization Software
  • Hardware and NN Architecture Co-Design
  • Coupled Compression Methods
  • Quantized Training