Mobile Video Super-Resolution Work Log

Time:2023.2.7-2023.4.15

Paper Reading

  1. PD-Quant: Post-Training Quantization based on Prediction Difference Metric(2022.12)

    • 分析优化量化参数S Z用的各个Local Metrics (MSE or cosine distance of the activation before and after quantization in layers)
    • PD Loss: 引入Prediction Difference决定Activation Scaling Factors
    • Distribution Correction (DC): intermediate adjust the activation distribution on the calibration dataset
  2. Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

  3. Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization(2022CVPR)

  4. Computer Vision – ECCV 2022 Workshops

    • Learning Multiple Probabilistic Degradation Generators for Unsupervised Real World Image Super Resolution (无监督图像超分)
    • Evaluating Image Super-Resolution Performance on Mobile Devices: An Online Benchmark (SR模型直接部署基准测试)
    • Efficient Image Super-Resolution Using Vast-Receptive-Field Attention (大感受野Attention图像超分)
    • DSR: Towards Drone Image Super-Resolution (无人机图像超分)
    • Image Super-Resolution with Deep Variational Autoencoders (变分自动编码器用于SISR)
    • Light Field Angular Super-Resolution via Dense Correspondence Field Reconstruction (光场角超分辨率)
    • CIDBNet: A Consecutively-Interactive Dual-Branch Network for JPEG Compressed Image Super-Resolution (JPEG压缩图像超分)
    • XCAT - Lightweight Quantized Single Image Super-Resolution Using Heterogeneous Group Convolutions and Cross Concatenation (单图像超分)
    • RCBSR: Re-parameterization Convolution Block for Super-Resolution (结构重参数视频超分)
    • Multi-patch Learning: Looking More Pixels in the Training Phase (多patch训练策略SISR)
    • Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution (Nearest Convolution替代copy原图像用于depth_to_space操作)
    • Real-Time Channel Mixing Net for Mobile Image Super-Resolution (单图像超分:channel mixing using 1*1 conv)
    • Sliding Window Recurrent Network for Efficient Video Super-Resolution (视频超分)
    • EESRNet: A Network for Energy Efficient Super-Resolution (视频超分)
    • HST: Hierarchical Swin Transformer for Compressed Image Super-Resolution (压缩图像超分)
    • Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration (压缩图像超分)
  5. Video Super-Resolution With Convolutional Neural Networks(2016)

    • 将当前帧与相邻帧简单concate,提升超分质量
  6. Frame-Recurrent Video Super-Resolution(2017)

    • 利用前帧预测的HR结果补偿当前帧超分
  7. Enhanced Deep Residual Networks for Single Image Super-Resolution(2017)

    • ResBlock: 相较之前的工作减少ReLU等激活的使用
    • Upsample: conv+shuffle
  8. TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution(CVPR2020)

    • 时序可变形卷积对齐网络用于缓解超分的伪影现象
  9. Efficient Reference-based Video Super-Resolution (ERVSR): Single Reference Image Is All You Need(WACV2023)

    • 单个参考帧来超分整个低分辨率视频序列,不使用每个时间步的LR帧作为参考,而只用中心时间步的一帧作为参考
    • 基于注意力机制做相似性估计和对齐操作
    • 动机:加速推理,减少内存消耗
  10. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond(CVPR2021)

  11. BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment(CVPR2022)

  12. Multi-scale attention network for image super-resolution(ECCV2018)

    • Multi-scale cross block(MSCB) 3个并行但不同dilation的卷积提取特征并融合
    • Multi-path wide-activated attention block(MWAB) 3个并行支路: 卷积 + spatial attention + channel attention concate
    • 缺点: 常规的channel attention采取的global average pooling 不一定能实现正确考虑通道间相关性的目的
  13. Deep Video Super-Resolution using Hybrid Imaging System(2023)

    • 任务: 利用一段LR高帧率视频(main video)和一段HR低帧率视频(auxiliary video)重建HR高帧率视频
    • 模型3部分:
      1. 主视频超分产生基础的高清帧
      2. 辅助视频细节特征提取并进行对齐
      3. 混合视频信息聚集融合
  14. STDAN: Deformable Attention Network for Space-Time Video Super-Resolution(2023)

    • 变形注意力网络 deformable attention network
    • 长短距离特征插值 long short-term feature interpolation (LSTFI)
    • 时空变形特征聚集 spatial–temporal deformable feature aggregation (STDFA)
  15. ShuffleMixer: An Efficient ConvNet for Image Super-Resolution(NTIRE2022)

    • large convolution and channel split-shuffle operation 大卷积核搭配通道分割-混合操作
    • add the Fused-MBConv after every two shuffle mixer layers 两层shuffle-mixer层之后接Fused-MBConv层克服局部特征提取不完善的问题
  16. An Implicit Alignment for Video Super-Resolution (ArXiv 2023)

    • static upsample evolution: 静态插值上采样如 bilinear、nearest插值的动态化演进
    • implicit attention based alignment integrate with local window key&value position encoding and query(motion estimation/flow) position encoding: 基于注意力隐式对齐并结合局部窗口键值位置编码和运动补偿位置编码
  17. Rethinking Alignment in Video Super-Resolution Transformers (NIPS 2022)

    • 矩阵点乘:tf.multiply(A,B) = A * B
    • 矩阵叉乘:tf.matmul(A,B) = A @ B

Idea

  1. 发现以前文章的问题尝试改进和解决 -> 单纯比较runtime必败

  2. transformer PTQ -> 暂时不考虑, 专心workshop提性能

  3. 从第一个work出paper的角度,可以考虑新的压缩方面的idea应用于MAI video super resolution

    • dataset -> train: REDS, test: REDS4(Clips 000, 011, 015, 020 of REDS training set)
    • mobile video super resolution related paper
    • frontier -> Optical Flow
  4. 尝试blind video super resolution -> 放弃

  5. Compared Solutions Model Size, KB PSNR SSIM Runtime, ms
    MVideoSR 17 27.34 0.7799 3.05
    ZX_VIP 20 27.52 0.7872 3.04
    Fighter 11 27.34 0.7816 3.41
    XJTU-MIGU SUPER 50 27.77 0.7957 3.25
    BOE-IOT-AIBD 40 27.71 0.7820 1.97
    GenMedia Group 135 28.40 0.8105 3.10
    NCUT VGroup 35 27.46 0.7822 1.39
    Mortar ICT 75 22.91 0.7546 1.76
    RedCat AutoX 62 27.71 0.7945 7.26
    221B 186 28.19 0.8093 10.1
  6. 了解最新的基于数据集 REDS / Viemo-90K / Vid4 / UDM10 / SPMCS / RealVSR的最新研究进展

    Paper Source Training Set Testing Set
    Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search ICCV 2021 DIV2K Set5, Set14, B100 and Urban100
    LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP) REDS REDS
    Cross-Resolution Flow Propagation for Foveated Video Super-Resolution Winter Conference on Applications of Computer Vision. 2023 REDS REDS
    Online Video Super-Resolution with Convolutional Kernel Bypass Graft arxiv 2022.8 REDS REDS
    Real-Time Super-Resolution for Real-World Images on Mobile Devices 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR) DIV2K DIV2K, Set5, Set14, BSD100, Manga109, and Urban100
    Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting CVPR 2023 VSD4K VSD4K
    Rethinking Alignment in Video Super-Resolution Transformers NeurIPS 2022 REDS REDS
  7. SWAT的PSNR最好要刷到28以上, 完成 pruning, weight clustering, INT8/FP16 quantization

  8. 测试fintune之后的tensorflow模型和tflite模型 ->

  9. 对比的方法要在同一设置下 -> 设置对比排行榜

  10. 实验:SWRN整体框架不变替换Partial Standard Conv加持的VAB -> PSNR:27.76 无明显提高

  11. 查资料理解:attention机制怎样实现,怎样起作用,是否需要级联叠加

  12. 应用MobileOne结构重参数

Metrics

Full-Reference

  1. Peak Signal to Noise Ratio (PSNR)
  2. Structural SIMilarity (SSIM)
  3. Gradient Magnitude Similarity Deviation (GMSD)

No-Reference

  1. Naturalness Image Quality Evaluator (NIQE)
  2. Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE)
  3. Distortion Identification-based Image Verity and INtegrity Evalutation (DIIVINE)
  4. BLind Image Integrity Notator using DCT-Statistics (BLIINDS)

Results

Milestone_0

Model Description Dataset Val PSNR Val SSIM Params Runtime on oneplus7T [ms] FLOPs [G]
VapSR_4_1 Functional VapSR_4 with pixel norm realized by layer normalization, VAB activation: RELU, Attention using Partial conv REDS 27.790268 0.77721727 59,468 654.0 (INT8_CPU) 7.462
SWAT_0 Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=4) REDS 27.842232 0.77754354 50,624 271.0 (FP16_CPU) 5.803
SWAT_1 Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=2), replace fc with 1*1 conv REDS 27.759375 0.77492595 33,984 252.0 (FP16_CPU) 3.900
SWAT_2 Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv REDS 27.760305 0.77487457 25,664 - -
SWAT_3 Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization REDS 27.761642 0.7748446 25,664 27.8 (FP16_TFLite GPU Delegate) 2.949
SWAT_3_1 Sliding Window, VAB Attention(large reception field=17), Partial Conv(point_wise: standard conv, depth_wise: group conv), Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization REDS 27.761642 0.7748446 25,664 27.8 (FP16_TFLite GPU Delegate) 2.949
SWAT_3_2 Sliding Window, VAB Attention(large receptive field=17), Partial Conv(point_wise: standard conv, depth_wise: group conv), Channel Shuffle(mix_ratio=2), replace fc with 1*1 conv, replace pixel normalization with layer normalization REDS 27.74189 0.7742521 26,016 32.4 (FP16_TFLite GPU Delegate) 2.996
SWAT_4 Sliding Window, VAB Attention, Replace partial conv with standard convlution, Remove Channel Shuffle, replace pixel normalization with layer normalization REDS 27.785185 0.77523285 53,696 38.5 (FP16_TFLite GPU Delegate) 6.202
SWAT_5 Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization, enlarge train step numbers to 250,000 REDS 27.811176 0.7763541 25,664 27.6 (FP16_TFLite GPU Delegate) 2.949
SWAT_6 Sliding Window, VAB Attention, Partial Conv, Modified Channel Shuffle (mix_ratio:1), Remove convs of hidden forward/backward REDS 27.738842 0.7743317 21,056 - (FP16_TFLite GPU Delegate) 2.417
SWAT_7 Sliding Window, 3 branchs VAB Attention, Partial Conv, Remove Channel Shuffle, Replace pixel normalization with layer normalization REDS 27.645552 0.77121794 18,144 - (FP16_TFLite GPU Delegate) 2.090
SWAT_8 Sliding Window, VAB Attention modified 2 REDS 27.782675 0.77573705 45,424 - (FP16_TFLite GPU Delegate) 5.200
SWAT_9 Sliding Window, Non Activation Block REDS 27.636255 0.7709387 23,648 288.0 (FP16_TFLite GPU Delegate) 2.113

AI benchmark setting for Runtime test:

  • Input Values range(min,max): 0,255
  • Inference Mode: INT8/FP16
  • Acceleration: CPU/TFLite GPU Delegate

Milestone_1

Model Description Dataset Val PSNR Val SSIM Params Runtime on oneplus7T [ms] FLOPs [G]
SWAT_3_3 Sliding Window, VAB Attention(large reception field=13 with channel shuffle[Dense(unints)]), Partial Conv(standard conv), Replace pixel normalization with layer normalization REDS 27.761633 0.7752705 27,472 30.3 (FP16_TFLite GPU Delegate) 3.165
SWAT_3_4 Sliding Window, VAB Attention(large reception field=13 without channel shuffle[Dense(unints)], stack 2 blocks), Partial Conv(standard conv), Replace pixel normalization with layer normalization REDS 27.80347 0.77701694 32,832 40.9 (FP16_TFLite GPU Delegate) 3.798
SWAT_3_5 Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization REDS 27.840628 0.7774375 37,312 39.4 (FP16_TFLite GPU Delegate) 4.302
SWAT_3_6 Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Shallow feature extraction using standard conv REDS 27.8165 0.7774126 42,624 40.7 (FP16_TFLite GPU Delegate) 4.916
SWAT_3_7 Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Remove concat and unpack of hidden state REDS 27.182861 0.7562948 29,136 29.0 (FP16_TFLite GPU Delegate) 3.357
SWAT_3_8 Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Remove concat and unpack of hidden state, Increase channels of fusion attention REDS 27.564552 0.7688081 56,032 39.1 (FP16_TFLite GPU Delegate) 6.456
SWAT_3_9 Sliding Window, VAB Attention and IMDB hybrid REDS 27.95189 0.7806478 53,312 42.8 (FP16_TFLite GPU Delegate) 6.367
SWAT_3_10 Sliding Window, Finetuned VAB Attention REDS 27.846352 0.77762717 53,512 49.0 (FP16_TFLite GPU Delegate) 6.170
ABPN_0 Origin REDS 27.92307 0.779504 62,048 38.1/35.7 (INT8/FP16_TFLite GPU Delegate) 7.137
ABPN_1 GenMedia Group Modified(L1 Charbonnier loss; crop_size:64) REDS 27.858198 0.7780704 58,304 37.1/33.0 (INT8/FP16_TFLite GPU Delegate) 6.699
ABPN_2 GenMedia Group Modified(MAE loss; crop_size:96) REDS 27.875465 0.7783027 58,304 37.1/33.0 (INT8/FP16_TFLite GPU Delegate) 6.699
AFAVSR_0 Multiple frames aggregation attention (num_feat=48, d_atten=64, num_blocks=2) REDS 27.837406 0.77741796 68,368 44.5 (FP16_TFLite GPU Delegate) 7.872
AFAVSR_1 Multiple frames aggregation attention (num_feat=16, d_atten=32, num_blocks=8) REDS 27.829765 0.7763255 44,016 36.6 (FP16_TFLite GPU Delegate) 5.069
AFAVSR_2 Multiple frames aggregation attention (num_feat=16, d_atten=32, num_blocks=2) REDS (FP16_TFLite GPU Delegate)
AFAVSR_3 All batch frames aggregation attention (num_feat=32, d_atten=64, num_blocks=2) REDS - - - - -
SORT_0 Sliding Window, IMDB REDS 27.738451 0.77409536 17,356 20.6 (FP16_TFLite GPU Delegate) 2.084
SORT_1 Sliding Window, IMDB, ConvTail num_out_channel=48 REDS 27.75588 0.7749552 19,660 21.6 (FP16_TFLite GPU Delegate) 2.351
SORT_2 Sliding Window, IMDB, multi-branch distillation channel num hyperparameter tunning REDS 27.93981 0.7808094 45,264 35.6 (FP16_TFLite GPU Delegate) 5.385
SORT_3 Sliding Window, IMDB, multi-branch distillation channel num hyperparameter tunning, Replace SEL with CCA( Contrast-Aware Channel Attention) REDS 27.867216 0.7790734 39,144 35.3 (FP16_TFLite GPU Delegate) 4.414
SORT_4 Sliding Window, Modified IMDB equipped with channel attention mechanism REDS 27.769545 0.7755401 39,120 41.3(FP16_TFLite GPU Delegate) 5.725
SORT_5 Sliding Window, Modified IMDB equipped with larger channel width and channel reduction/aggregation using 1*1 convs REDS 28.13419 0.78656757 166,944 85.9(FP16_TFLite GPU Delegate) 19.566
SORT_6 Sliding Window, Modified IMDB equipped with dynamic channel width REDS 27.944357 0.7809873 48,216 - (FP16_TFLite GPU Delegate) -

AI benchmark setting for Runtime test:

  • Input Values range(min,max): 0,255
  • Inference Mode: INT8/FP16
  • Acceleration: CPU/TFLite GPU Delegate

Milestone_2

Model Description Dataset Val PSNR Val SSIM Params Runtime on oneplus7T [ms] FLOPs [G]
VSR_0 Sliding Window, Non Activation Block REDS 27.673386 0.7725643 26,368 57.9 (FP16_TFLite GPU Delegate) 2.417
VSR_1 Attention Alignment_0, Non Activation Block REDS 27.508242 0.76671493 17,440 42.0 (FP16_TFLite GPU Delegate) 1.677
VSR_2 Attention Alignment_1, Non Activation Block,Rectify BSConvolution REDS 27.53437 0.7678055 17,776 error (FP16_TFLite GPU Delegate) 2.035
VSR_3 VSR_2 Ablation: Attention Alignment_1 REDS 27.414068 0.76361054 17,413 60.3 (FP16_TFLite GPU Delegate) 1.793
VSR_4 VSR_2 -> modify Non Activation Block using partial conv REDS 27.784992 0.7769825 43,120 error (FP16_TFLite GPU Delegate) 4.958
VSR_5 VSR_4 Ablation: RGB out channels sharing upsample result REDS 27.835686 0.7776796 47,728 - (FP16_TFLite GPU Delegate) 5.491
VSR_6 VSR_5 Finetune: Non Activation Block channel numbers modify REDS 27.783165 0.7768693 28,976 error (FP16_TFLite GPU Delegate) 3.699
VSR_7 Light weight hidden states attention alignment; Blue Print convolution for shallow feature extraction; Multi-Stage ExcavatoR(MSER) combined with partial convolution and simplified channel attention REDS 27.470276 0.7664948 81,806 66.1 (FP16_TFLite GPU Delegate) 7.938
VSR_8 Light weight hidden states attention alignment; Blue Print convolution for shallow feature extraction; Nonlinear activation free block REDS 27.91092 0.77971315 66,312 64.7 (FP16_TFLite GPU Delegate) 7.269
VSR_9 vsr_9 ablation: feature alignment REDS 27.91092 0.77971315 39,792 44.5 (FP16_TFLite GPU Delegate) 4.218
VSR_10 motivation: IMDB + PartialConv + VapSR + BSConv REDS 27.963232 0.780958 44,256 48.6 (FP16_TFLite GPU Delegate) 5.103
VSR_11 VSR_10 ablation: hidden state conv using bias REDS 27.948818 0.7809571 44,288 47.2 (FP16_TFLite GPU Delegate) 5.103
VSR_12 VSR_10 ablation: hidden state process using modified IMDB REDS 27.953104 0.7807622 57,696 62.8 (FP16_TFLite GPU Delegate) 6.649

AI benchmark setting for Runtime test:

  • Input Values range(min,max): 0,255
  • Inference Mode: INT8/FP16
  • Acceleration: CPU/TFLite GPU Delegate

Milestone_3

Model Description Dataset Val PSNR Val SSIM Params Runtime on oneplus7T [ms] FLOPs [G]
MVSR_0 modified IMDB IMDB + PartialConv + VapSR + BSConv; deprecate hidden state forward and backward; light weight feature alignment REDS 27.915539 0.7799377 35,777 38.8 (FP16_TFLite GPU Delegate) 4.068
MVSR_1 modified IMDB IMDB + PartialConv + VapSR + BSConv; deprecate hidden state forward and backward; light weight frame alignment REDS 27.932716 0.7810435 34,473 44.3 (FP16_TFLite GPU Delegate) 3.976
MVSR_2 MVSR_1 Ablation: light weight frame alignment REDS 27.929586 0.78039753 34,208 35.4 (FP16_TFLite GPU Delegate) 3.944
MVSR_3 MVSR_1 Ablation: large receptive field in SMDB -> reduce: 3x3 + 3x3 dilated REDS 27.892586 0.7790079 32,169 41.1 (FP16_TFLite GPU Delegate) 3.711
MVSR_4 MVSR_2 Ablation: large receptive field in SMDB -> increase: 7x7 + 7x7 dilated REDS 27.958328 0.78145003 37,664 42.4 (FP16_TFLite GPU Delegate) 4.343
MVSR_5 MVSR_1 Ablation: large receptive field in SMDB -> increase: 7x7 + 7x7 dilated REDS 27.936714 0.7809204 37,929 49.8 (FP16_TFLite GPU Delegate) 4.375
MVSR_6 modified IMDB IMDB + PartialConv based pixel attention version_0 + VapSR + BSConv; light weight frame alignment REDS 27.884369 0.7790964 34,473 44.4 (FP16_TFLite GPU Delegate) 4.246
MVSR_7 modified IMDB IMDB + PartialConv based pixel attention version_1 + VapSR + BSConv; light weight frame alignment REDS 27.858534 0.77831227 35,769 44.5 (FP16_TFLite GPU Delegate) 4.387
MVSR_8 MVSR_1 Ablation: SEL -> Channel Attention REDS 27.610485 0.7696045 29,145 40.5 (FP16_TFLite GPU Delegate) 3.001
MVSR_9 MVSR_1 Ablation: Channel fuse + SEL -> FlashModule + Channel fuse REDS 28.043566 0.7842476 96,249 72.8 (FP16_TFLite GPU Delegate) 10.684
MVSR_10 Partial conv idea applied to MSDB and Attention(i.e. SEL) REDS 27.86422 0.7783118 27,081 41.2 (FP16_TFLite GPU Delegate) 3.031
MVSR_11 MVSR_10 fintune: deperecae MSDB’s channel fuse; add MDSB blocks REDS 27.90566 0.7793118 32,553 48.7 (FP16_TFLite GPU Delegate) 3.634
MVSR_12 MVSR_11 ablation: MSDB’s group convolution -> standard convolution REDS 27.953104 0.7807622 68,169 38.4 (FP16_TFLite GPU Delegate) 7.737
MVSR_13 MVSR_12 AttentionAlign module evolution REDS 27.966156 0.7809557 68,157 39.8 (FP16_TFLite GPU Delegate) 7.735
MVSR_13_1 MVSR_13 evolution: ConvTail used for increasing dimension -> BSConv REDS 27.879667 0.7790071 62,541 40.6 (FP16_TFLite GPU Delegate) 7.080
MVSR_13_2 MVSR_13 ablation: fractional/partial ratio 1/2 -> 1/4 REDS 27.877321 0.7783568 37,517 37.1 (FP16_TFLite GPU Delegate) 4.152
MVSR_13_3 MVSR_13 ablation: fractional/partial ratio 1/2 -> 1/8 REDS 27.79014 0.77567685 29,829 35.5 (FP16_TFLite GPU Delegate) 3.240
MVSR_13_4 MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/4 REDS 27.955465 0.78150684 119,149 84.1 (FP16_TFLite GPU Delegate) 13.663
MVSR_13_4_revalid MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/4 REDS 27.956861 0.7814712 119,149 84.1 (FP16_TFLite GPU Delegate) 13.663
MVSR_13_5 MVSR_13 ablation: fractional/partial ratio 1/2 -> 7/8 REDS 27.993414 0.7823691 152,277 103.0 (FP16_TFLite GPU Delegate) 17.506
MVSR_13_6 MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/8 REDS 27.948065 0.78071 50,293 39.0 (FP16_TFLite GPU Delegate) 5.651
MVSR_13_7 MVSR_13 ablation: fractional/partial ratio 1/2 -> 5/8 REDS 27.983498 0.7823881 91,109 86.4 (FP16_TFLite GPU Delegate) 10.406
MVSR_14 MVSR_13 ablation: - frame attention align -> standard conv 1 x 1 act as frame information propogation operator REDS 27.930904 0.78043896 68,272 29.7 (FP16_TFLite GPU Delegate) 7.746
MVSR_15 MVSR_13 ablation: MSDB block number 4 -> 3 REDS 27.91523 0.77966064 52,965 33.4 (FP16_TFLite GPU Delegate) 6.016
MVSR_16 MVSR_13 ablation: No partial/fractional; No BSconv (Blueprint Separable conv); No receptive field decomposition REDS 27.928417 0.7801167 920,381 399.0 (FP16_TFLite GPU Delegate) 106.045
MVSR_17 MVSR_13 evolution: MSDB using standard conv 3 x 3, PPA using split large receptive field conv 5 x 5 + 5 x 5 dilated REDS 27.902325 0.7794845 47,101 36.8 (FP16_TFLite GPU Delegate) 5.313
MVSR_18 MVSR_17 ablation: BSconv REDS 27.893446 0.77926654 47,325 34.4 (FP16_TFLite GPU Delegate) 5.340
MVSR_19 MVSR_13 evolution: MSDB blocks 4 -> 3; Enlarge receptive field of PPA 3 -> 17 REDS 27.914143 0.7799854 60,861 35.9 (FP16_TFLite GPU Delegate) 6.924
MVSR_20 MVSR_13 ablation: No receptive field decomposition REDS 27.93524 0.78071207 251,613 119.0 (FP16_TFLite GPU Delegate) 28.875
MVSR_21 MVSR_13 ablation: No frame align; No fractional/partial; No BSconv; No receptive field decomposition REDS 27.941408 0.7807615 920,128 400.0 (FP16_TFLite GPU Delegate) 106.014
MVSR_21_1 MVSR_13 ablation: No frame align (directly extraction from 3 consecutive frames); No fractional/partial; No BSconv; No receptive field decomposition REDS 27.913836 0.779473 920,992 399.0 (FP16_TFLite GPU Delegate) 106.114
MVSR_22 MVSR_13 ablation: No BSconv; No receptive field decomposition REDS 27.936152 0.7799803 251,837 118.0 (FP16_TFLite GPU Delegate) 28.902
MVSR_23 MVSR_13 ablation: PFE PPA Standard conv -> Depthwise conv REDS 27.867388 0.7784562 32,541 49.5 (FP16_TFLite GPU Delegate) 3.632
MVSR_24 MVSR_13 ablation: - Partial/Fractional Extraction REDS 27.952333 0.78090274 186,141 94.3 (FP16_TFLite GPU Delegate) 21.449
MVSR_24_revalid MVSR_13 ablation: - Partial/Fractional Extraction (keep fc) REDS 27.940563 0.7804851 190,493 102.0 (FP16_TFLite GPU Delegate) 21.935
MVSR_25 MVSR_13 ablation: - BSConv REDS 27.929697 0.780183 68,381 38.8 (FP16_TFLite GPU Delegate) 7.762
MVSR_26 MVSR_13 ablation: - Large Receptive Field Decomposition REDS 27.972654 0.7818956 251,613 119.0 (FP16_TFLite GPU Delegate) 28.875
MVSR_27 MVSR_13 ablation: - FC in PFE, PPA REDS 27.945955 0.78067327 63,805 37.8 (FP16_TFLite GPU Delegate) 7.249

AI benchmark setting for Runtime test:

  • Input Values range(min,max): 0,255
  • Inference Mode: INT8/FP16
  • Acceleration: CPU/TFLite GPU Delegate

Benchmark_0

Rank Model Source Dataset Test PSNR Test SSIM Params Runtime on oneplus7T [ms]
1 Diggers Real-Time Video Super-Resolution based on Bidirectional RNNs(2021 SOTA) REDS(train_videos: 240, test_videos: 30) 27.98 - 39,640 -
2 VSR_12 Ours REDS(train_videos: 240, test_videos: 30) 27.981062 0.7824855 57,696 62.8
3 MVSR_4 Ours REDS(train_videos: 240, test_videos: 30) 27.958328 0.78145003 37,664 42.4
4 MVSR_12 Ours REDS(train_videos: 240, test_videos: 30) 27.953104 0.7807622 68,169 38.4
5 SORT_2 Ours REDS(train_videos: 240, test_videos: 30) 27.93981 0.7808094 45,264 35.6
6 SWRN Sliding Window Recurrent Network for Efficient Video Super-Resolution (2022 SOTA) REDS(train_videos: 240, test_videos: 30) 27.92 0.77 43,472 31.0
7 MVSR_11 Ours REDS(train_videos: 240, test_videos: 30) 27.90566 0.7793118 32,553 48.7
8 SWAT_3_5 Ours REDS(train_videos: 240, test_videos: 30) 27.840628 0.7774375 37,312 39.4
9 EESRNet EESRNet: A Network for Energy Efficient Super-Resolution(2022) REDS(train_videos: 240, test_videos: 30) 27.84 - 62,550 -
10 LiDeR LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices (2022) REDS(train_videos: 240, test_videos: 30) 27.51 0.76 - -
11 EVSRNet EVSRNet:Efficient Video Super-Resolution with Neural Architecture Search(2021) REDS(train_videos: 240, test_videos: 30) 27.42 - - -
12 RCBSR RCBSR: Re-parameterization Convolution Block for Super-Resolution(2022) REDS(train_videos: 240, test_videos: 30) 27.28 0.775 - -

Benchmark_1

Model Source Dataset Test PSNR Test SSIM Params
SSL-uni Structured Sparsity Learning for Efficient Video Super-Resolution (CVPR2023) REDS(train:266 test:4) 30.24 0.86 500,000

PaperWriting

No.1

  1. BSConvU as shallow feature extraction
  2. Recurrent neural network for feature information freedom flow cross frames
  3. multi distilation module through dynamic routing large ERF attention
  4. Bilineared RGB channels share same upsample result
  5. Nearest conv for shorter residual inference time compared with bilinear residual

No.2

  1. Motivation: 移动端视频超分 Inference Time ↓, PSNR ↑, SSIM ↑
  2. 只用当前处理LR帧的前一个预测HR帧做参考补偿当前帧 -> 拍摄的同时实时超分,不受只能对拍摄完成的视频进行超分的限制
  3. 假设模型中间的feature maps对输出结果不是同等贡献度,如何进行高贡献度的feature maps聚集aggregation -> 做Partial Convolution accelerate inference(分析)
  4. 减少模型中的activation -> 利用Multiply产生非线性映射的能力
  5. RGB三通道共享上采样补偿 -> 常规模型的RGB三通道上采样补偿是否存在高度一致性,若存在则可以共享以起到降低计算量加速推理的效果(分析)
  6. 蓝图卷积作为浅层特征提取 -> 效果反而比标准卷积最终的效果好
  7. 多尺度特征(降采样到不同尺度)基于注意力机制融合 <- motivation: 灵长类动物视觉皮层同一区域不同神经元感受野不同,类比到模型内则是同一层内从不同尺度/感受野捕获更精确的空间信息或更多的纹理信息
  8. 短距离shortcut的fusion -> 加速推理

No.3

  1. Motivation: 移动端视频超分 Inference Time ↓, PSNR ↑, SSIM ↑
  2. 辅助前后向传播的隐藏状态做对齐(auxiliary forward/backward hidden states for feature alignment) -> 提升超分结果PSNR
  3. 假设模型中间的feature maps对输出结果不是同等贡献度,如何进行高贡献度的feature maps聚集aggregation -> 做Partial Convolution accelerate inference(分析)
  4. 减少模型中的activation -> 利用Multiply产生非线性映射的能力,加速推理
  5. 考虑动态深度(adaptive existing) -> 加速推理 -> deprecated

PaperReference

  1. Rethinking Alignment in Video Super-Resolution Transformers(NIPS 2022) -> VIT 视频超分(VSR)中帧/特征对齐不是必要操作
  2. An Implicit Alignment for Video Super-Resolution (ArXiv 2023) -> bilinear interpolation/resample 改进
  3. Video Super-Resolution Transformer
  4. Efficient Reference-based Video Super-Resolution (ERVSR): Single Reference Image Is All You Need (WACV 2023) -> 帧序列中间帧作为参考帧辅助当前帧超分
  5. MULTI-STAGE FEATURE ALIGNMENT NETWORK FOR VIDEO SUPER-RESOLUTION
  6. ELSR: Extreme Low-Power Super Resolution Network For Mobile Devices
  7. LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices
  8. Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
  9. COLLAPSIBLE LINEAR BLOCKS FOR SUPER-EFFICIENT SUPER RESOLUTION
  10. Revisiting Temporal Alignment for Video Restoration
  11. BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
  12. EVSRNet:Efficient Video Super-Resolution with Neural Architecture Search
  13. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
  14. Revisiting Temporal Modeling for Video Super-resolution -> MAI 第一届VSR 官方baseline
  15. TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution (CVPR 2020)
  16. Video Super-resolution with Temporal Group Attention (CVPR 2020)
  17. 3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks
  18. Frame-Recurrent Video Super-Resolution
  19. Video Super-Resolution With Convolutional Neural Networks