Mobile Video Super-Resolution Work Log

发表于 2023-02-07 更新于 2024-12-02

Time:2023.2.7-2023.4.15

Paper Reading

PD-Quant: Post-Training Quantization based on Prediction Difference Metric（2022.12）
- 分析优化量化参数S Z用的各个Local Metrics (MSE or cosine distance of the activation before and after quantization in layers)
- PD Loss：引入Prediction Difference决定Activation Scaling Factors
- Distribution Correction (DC): intermediate adjust the activation distribution on the calibration dataset
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization（2022CVPR）
Computer Vision – ECCV 2022 Workshops
- Learning Multiple Probabilistic Degradation Generators for Unsupervised Real World Image Super Resolution (无监督图像超分)
- Evaluating Image Super-Resolution Performance on Mobile Devices: An Online Benchmark (SR模型直接部署基准测试)
- Efficient Image Super-Resolution Using Vast-Receptive-Field Attention (大感受野Attention图像超分)
- DSR: Towards Drone Image Super-Resolution (无人机图像超分)
- Image Super-Resolution with Deep Variational Autoencoders (变分自动编码器用于SISR)
- Light Field Angular Super-Resolution via Dense Correspondence Field Reconstruction (光场角超分辨率)
- CIDBNet: A Consecutively-Interactive Dual-Branch Network for JPEG Compressed Image Super-Resolution (JPEG压缩图像超分)
- XCAT - Lightweight Quantized Single Image Super-Resolution Using Heterogeneous Group Convolutions and Cross Concatenation (单图像超分)
- RCBSR: Re-parameterization Convolution Block for Super-Resolution (结构重参数视频超分)
- Multi-patch Learning: Looking More Pixels in the Training Phase (多patch训练策略SISR)
- Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution (Nearest Convolution替代copy原图像用于depth_to_space操作)
- Real-Time Channel Mixing Net for Mobile Image Super-Resolution (单图像超分：channel mixing using 1*1 conv)
- Sliding Window Recurrent Network for Efficient Video Super-Resolution (视频超分)
- EESRNet: A Network for Energy Efficient Super-Resolution (视频超分)
- HST: Hierarchical Swin Transformer for Compressed Image Super-Resolution (压缩图像超分)
- Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration (压缩图像超分)
Video Super-Resolution With Convolutional Neural Networks(2016)
- 将当前帧与相邻帧简单concate，提升超分质量
Frame-Recurrent Video Super-Resolution(2017)
- 利用前帧预测的HR结果补偿当前帧超分
Enhanced Deep Residual Networks for Single Image Super-Resolution(2017)
- ResBlock: 相较之前的工作减少ReLU等激活的使用
- Upsample: conv+shuffle
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution(CVPR2020)
- 时序可变形卷积对齐网络用于缓解超分的伪影现象
Efficient Reference-based Video Super-Resolution (ERVSR): Single Reference Image Is All You Need(WACV2023)
- 单个参考帧来超分整个低分辨率视频序列，不使用每个时间步的LR帧作为参考，而只用中心时间步的一帧作为参考
- 基于注意力机制做相似性估计和对齐操作
- 动机：加速推理，减少内存消耗
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond(CVPR2021)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment(CVPR2022)
Multi-scale attention network for image super-resolution(ECCV2018)
- Multi-scale cross block(MSCB) 3个并行但不同dilation的卷积提取特征并融合
- Multi-path wide-activated attention block(MWAB) 3个并行支路: 卷积 + spatial attention + channel attention concate
- 缺点: 常规的channel attention采取的global average pooling 不一定能实现正确考虑通道间相关性的目的
Deep Video Super-Resolution using Hybrid Imaging System(2023)
- 任务: 利用一段LR高帧率视频(main video)和一段HR低帧率视频(auxiliary video)重建HR高帧率视频
- 模型3部分：
  1. 主视频超分产生基础的高清帧
  2. 辅助视频细节特征提取并进行对齐
  3. 混合视频信息聚集融合
STDAN: Deformable Attention Network for Space-Time Video Super-Resolution(2023)
- 变形注意力网络 deformable attention network
- 长短距离特征插值 long short-term feature interpolation (LSTFI)
- 时空变形特征聚集 spatial–temporal deformable feature aggregation (STDFA)
ShuffleMixer: An Efficient ConvNet for Image Super-Resolution(NTIRE2022)
- large convolution and channel split-shuffle operation 大卷积核搭配通道分割-混合操作
- add the Fused-MBConv after every two shuffle mixer layers 两层shuffle-mixer层之后接Fused-MBConv层克服局部特征提取不完善的问题
An Implicit Alignment for Video Super-Resolution (ArXiv 2023)
- static upsample evolution: 静态插值上采样如 bilinear、nearest插值的动态化演进
- implicit attention based alignment integrate with local window key&value position encoding and query(motion estimation/flow) position encoding: 基于注意力隐式对齐并结合局部窗口键值位置编码和运动补偿位置编码
Rethinking Alignment in Video Super-Resolution Transformers (NIPS 2022)
- 矩阵点乘：tf.multiply(A,B) = A * B
- 矩阵叉乘：tf.matmul(A,B) = A @ B

Idea

发现以前文章的问题尝试改进和解决 -> 单纯比较runtime必败
transformer PTQ -> 暂时不考虑, 专心workshop提性能
从第一个work出paper的角度,可以考虑新的压缩方面的idea应用于MAI video super resolution
- dataset -> train: REDS, test: REDS4(Clips 000, 011, 015, 020 of REDS training set)
- mobile video super resolution related paper
- frontier -> Optical Flow
尝试blind video super resolution -> 放弃

Compared Solutions	Model Size, KB	PSNR	SSIM	Runtime, ms
MVideoSR	17	27.34	0.7799	3.05
ZX_VIP	20	27.52	0.7872	3.04
Fighter	11	27.34	0.7816	3.41
XJTU-MIGU SUPER	50	27.77	0.7957	3.25
BOE-IOT-AIBD	40	27.71	0.7820	1.97
GenMedia Group	135	28.40	0.8105	3.10
NCUT VGroup	35	27.46	0.7822	1.39
Mortar ICT	75	22.91	0.7546	1.76
RedCat AutoX	62	27.71	0.7945	7.26
221B	186	28.19	0.8093	10.1

了解最新的基于数据集 REDS / Viemo-90K / Vid4 / UDM10 / SPMCS / RealVSR的最新研究进展

Paper	Source	Training Set	Testing Set
Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search	ICCV 2021	DIV2K	Set5, Set14, B100 and Urban100
LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices	2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP)	REDS	REDS
Cross-Resolution Flow Propagation for Foveated Video Super-Resolution	Winter Conference on Applications of Computer Vision. 2023	REDS	REDS
Online Video Super-Resolution with Convolutional Kernel Bypass Graft	arxiv 2022.8	REDS	REDS
Real-Time Super-Resolution for Real-World Images on Mobile Devices	2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)	DIV2K	DIV2K, Set5, Set14, BSD100, Manga109, and Urban100
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting	CVPR 2023	VSD4K	VSD4K
Rethinking Alignment in Video Super-Resolution Transformers	NeurIPS 2022	REDS	REDS

SWAT的PSNR最好要刷到28以上, 完成 pruning, weight clustering, INT8/FP16 quantization
测试fintune之后的tensorflow模型和tflite模型 ->
对比的方法要在同一设置下 -> 设置对比排行榜
实验：SWRN整体框架不变替换Partial Standard Conv加持的VAB -> PSNR：27.76 无明显提高
查资料理解：attention机制怎样实现，怎样起作用，是否需要级联叠加
应用MobileOne结构重参数

Metrics

Full-Reference

Peak Signal to Noise Ratio (PSNR)
Structural SIMilarity (SSIM)
Gradient Magnitude Similarity Deviation (GMSD)

No-Reference

Naturalness Image Quality Evaluator (NIQE)
Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE)
Distortion Identification-based Image Verity and INtegrity Evalutation (DIIVINE)
BLind Image Integrity Notator using DCT-Statistics (BLIINDS)

Results

Milestone_0

Model	Description	Dataset	Val PSNR	Val SSIM	Params	Runtime on oneplus7T [ms]	FLOPs [G]
VapSR_4_1	Functional VapSR_4 with pixel norm realized by layer normalization, VAB activation: RELU, Attention using Partial conv	REDS	27.790268	0.77721727	59,468	654.0 (INT8_CPU)	7.462
SWAT_0	Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=4)	REDS	27.842232	0.77754354	50,624	271.0 (FP16_CPU)	5.803
SWAT_1	Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=2), replace fc with 1*1 conv	REDS	27.759375	0.77492595	33,984	252.0 (FP16_CPU)	3.900
SWAT_2	Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv	REDS	27.760305	0.77487457	25,664	-	-
SWAT_3	Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization	REDS	27.761642	0.7748446	25,664	27.8 (FP16_TFLite GPU Delegate)	2.949
SWAT_3_1	Sliding Window, VAB Attention(large reception field=17), Partial Conv(point_wise: standard conv, depth_wise: group conv), Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization	REDS	27.761642	0.7748446	25,664	27.8 (FP16_TFLite GPU Delegate)	2.949
SWAT_3_2	Sliding Window, VAB Attention(large receptive field=17), Partial Conv(point_wise: standard conv, depth_wise: group conv), Channel Shuffle(mix_ratio=2), replace fc with 1*1 conv, replace pixel normalization with layer normalization	REDS	27.74189	0.7742521	26,016	32.4 (FP16_TFLite GPU Delegate)	2.996
SWAT_4	Sliding Window, VAB Attention, Replace partial conv with standard convlution, Remove Channel Shuffle, replace pixel normalization with layer normalization	REDS	27.785185	0.77523285	53,696	38.5 (FP16_TFLite GPU Delegate)	6.202
SWAT_5	Sliding Window, VAB Attention, Partial Conv, Channel Shuffle(mix_ratio=1), replace fc with 1*1 conv, replace pixel normalization with layer normalization, enlarge train step numbers to 250,000	REDS	27.811176	0.7763541	25,664	27.6 (FP16_TFLite GPU Delegate)	2.949
SWAT_6	Sliding Window, VAB Attention, Partial Conv, Modified Channel Shuffle (mix_ratio:1), Remove convs of hidden forward/backward	REDS	27.738842	0.7743317	21,056	- (FP16_TFLite GPU Delegate)	2.417
SWAT_7	Sliding Window, 3 branchs VAB Attention, Partial Conv, Remove Channel Shuffle, Replace pixel normalization with layer normalization	REDS	27.645552	0.77121794	18,144	- (FP16_TFLite GPU Delegate)	2.090
SWAT_8	Sliding Window, VAB Attention modified 2	REDS	27.782675	0.77573705	45,424	- (FP16_TFLite GPU Delegate)	5.200
SWAT_9	Sliding Window, Non Activation Block	REDS	27.636255	0.7709387	23,648	288.0 (FP16_TFLite GPU Delegate)	2.113

AI benchmark setting for Runtime test:

Input Values range(min,max): 0,255
Inference Mode: INT8/FP16
Acceleration: CPU/TFLite GPU Delegate

Milestone_1

Model	Description	Dataset	Val PSNR	Val SSIM	Params	Runtime on oneplus7T [ms]	FLOPs [G]
SWAT_3_3	Sliding Window, VAB Attention(large reception field=13 with channel shuffle[Dense(unints)]), Partial Conv(standard conv), Replace pixel normalization with layer normalization	REDS	27.761633	0.7752705	27,472	30.3 (FP16_TFLite GPU Delegate)	3.165
SWAT_3_4	Sliding Window, VAB Attention(large reception field=13 without channel shuffle[Dense(unints)], stack 2 blocks), Partial Conv(standard conv), Replace pixel normalization with layer normalization	REDS	27.80347	0.77701694	32,832	40.9 (FP16_TFLite GPU Delegate)	3.798
SWAT_3_5	Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization	REDS	27.840628	0.7774375	37,312	39.4 (FP16_TFLite GPU Delegate)	4.302
SWAT_3_6	Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Shallow feature extraction using standard conv	REDS	27.8165	0.7774126	42,624	40.7 (FP16_TFLite GPU Delegate)	4.916
SWAT_3_7	Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Remove concat and unpack of hidden state	REDS	27.182861	0.7562948	29,136	29.0 (FP16_TFLite GPU Delegate)	3.357
SWAT_3_8	Sliding Window, VAB Attention(large reception field=17 without channel shuffle[Dense(unints)], stack 2 blocks, pointwise conv for channel fusion without PConv), Partial Conv(standard conv), Replace pixel normalization with layer normalization, Remove concat and unpack of hidden state, Increase channels of fusion attention	REDS	27.564552	0.7688081	56,032	39.1 (FP16_TFLite GPU Delegate)	6.456
SWAT_3_9	Sliding Window, VAB Attention and IMDB hybrid	REDS	27.95189	0.7806478	53,312	42.8 (FP16_TFLite GPU Delegate)	6.367
SWAT_3_10	Sliding Window, Finetuned VAB Attention	REDS	27.846352	0.77762717	53,512	49.0 (FP16_TFLite GPU Delegate)	6.170
ABPN_0	Origin	REDS	27.92307	0.779504	62,048	38.1/35.7 (INT8/FP16_TFLite GPU Delegate)	7.137
ABPN_1	GenMedia Group Modified(L1 Charbonnier loss; crop_size:64)	REDS	27.858198	0.7780704	58,304	37.1/33.0 (INT8/FP16_TFLite GPU Delegate)	6.699
ABPN_2	GenMedia Group Modified(MAE loss; crop_size:96)	REDS	27.875465	0.7783027	58,304	37.1/33.0 (INT8/FP16_TFLite GPU Delegate)	6.699
AFAVSR_0	Multiple frames aggregation attention (num_feat=48, d_atten=64, num_blocks=2)	REDS	27.837406	0.77741796	68,368	44.5 (FP16_TFLite GPU Delegate)	7.872
AFAVSR_1	Multiple frames aggregation attention (num_feat=16, d_atten=32, num_blocks=8)	REDS	27.829765	0.7763255	44,016	36.6 (FP16_TFLite GPU Delegate)	5.069
AFAVSR_2	Multiple frames aggregation attention (num_feat=16, d_atten=32, num_blocks=2)	REDS				(FP16_TFLite GPU Delegate)
AFAVSR_3	All batch frames aggregation attention (num_feat=32, d_atten=64, num_blocks=2)	REDS	-	-	-	-	-
SORT_0	Sliding Window, IMDB	REDS	27.738451	0.77409536	17,356	20.6 (FP16_TFLite GPU Delegate)	2.084
SORT_1	Sliding Window, IMDB, ConvTail num_out_channel=48	REDS	27.75588	0.7749552	19,660	21.6 (FP16_TFLite GPU Delegate)	2.351
SORT_2	Sliding Window, IMDB, multi-branch distillation channel num hyperparameter tunning	REDS	27.93981	0.7808094	45,264	35.6 (FP16_TFLite GPU Delegate)	5.385
SORT_3	Sliding Window, IMDB, multi-branch distillation channel num hyperparameter tunning, Replace SEL with CCA( Contrast-Aware Channel Attention)	REDS	27.867216	0.7790734	39,144	35.3 (FP16_TFLite GPU Delegate)	4.414
SORT_4	Sliding Window, Modified IMDB equipped with channel attention mechanism	REDS	27.769545	0.7755401	39,120	41.3(FP16_TFLite GPU Delegate)	5.725
SORT_5	Sliding Window, Modified IMDB equipped with larger channel width and channel reduction/aggregation using 1*1 convs	REDS	28.13419	0.78656757	166,944	85.9(FP16_TFLite GPU Delegate)	19.566
SORT_6	Sliding Window, Modified IMDB equipped with dynamic channel width	REDS	27.944357	0.7809873	48,216	- (FP16_TFLite GPU Delegate)	-

AI benchmark setting for Runtime test:

Input Values range(min,max): 0,255
Inference Mode: INT8/FP16
Acceleration: CPU/TFLite GPU Delegate

Milestone_2

Model	Description	Dataset	Val PSNR	Val SSIM	Params	Runtime on oneplus7T [ms]	FLOPs [G]
VSR_0	Sliding Window, Non Activation Block	REDS	27.673386	0.7725643	26,368	57.9 (FP16_TFLite GPU Delegate)	2.417
VSR_1	Attention Alignment_0, Non Activation Block	REDS	27.508242	0.76671493	17,440	42.0 (FP16_TFLite GPU Delegate)	1.677
VSR_2	Attention Alignment_1, Non Activation Block,Rectify BSConvolution	REDS	27.53437	0.7678055	17,776	error (FP16_TFLite GPU Delegate)	2.035
VSR_3	VSR_2 Ablation: Attention Alignment_1	REDS	27.414068	0.76361054	17,413	60.3 (FP16_TFLite GPU Delegate)	1.793
VSR_4	VSR_2 -> modify Non Activation Block using partial conv	REDS	27.784992	0.7769825	43,120	error (FP16_TFLite GPU Delegate)	4.958
VSR_5	VSR_4 Ablation: RGB out channels sharing upsample result	REDS	27.835686	0.7776796	47,728	- (FP16_TFLite GPU Delegate)	5.491
VSR_6	VSR_5 Finetune: Non Activation Block channel numbers modify	REDS	27.783165	0.7768693	28,976	error (FP16_TFLite GPU Delegate)	3.699
VSR_7	Light weight hidden states attention alignment; Blue Print convolution for shallow feature extraction; Multi-Stage ExcavatoR(MSER) combined with partial convolution and simplified channel attention	REDS	27.470276	0.7664948	81,806	66.1 (FP16_TFLite GPU Delegate)	7.938
VSR_8	Light weight hidden states attention alignment; Blue Print convolution for shallow feature extraction; Nonlinear activation free block	REDS	27.91092	0.77971315	66,312	64.7 (FP16_TFLite GPU Delegate)	7.269
VSR_9	vsr_9 ablation: feature alignment	REDS	27.91092	0.77971315	39,792	44.5 (FP16_TFLite GPU Delegate)	4.218
VSR_10	motivation: IMDB + PartialConv + VapSR + BSConv	REDS	27.963232	0.780958	44,256	48.6 (FP16_TFLite GPU Delegate)	5.103
VSR_11	VSR_10 ablation: hidden state conv using bias	REDS	27.948818	0.7809571	44,288	47.2 (FP16_TFLite GPU Delegate)	5.103
VSR_12	VSR_10 ablation: hidden state process using modified IMDB	REDS	27.953104	0.7807622	57,696	62.8 (FP16_TFLite GPU Delegate)	6.649

AI benchmark setting for Runtime test:

Input Values range(min,max): 0,255
Inference Mode: INT8/FP16
Acceleration: CPU/TFLite GPU Delegate

Milestone_3

Model	Description	Dataset	Val PSNR	Val SSIM	Params	Runtime on oneplus7T [ms]	FLOPs [G]
MVSR_0	modified IMDB IMDB + PartialConv + VapSR + BSConv; deprecate hidden state forward and backward; light weight feature alignment	REDS	27.915539	0.7799377	35,777	38.8 (FP16_TFLite GPU Delegate)	4.068
MVSR_1	modified IMDB IMDB + PartialConv + VapSR + BSConv; deprecate hidden state forward and backward; light weight frame alignment	REDS	27.932716	0.7810435	34,473	44.3 (FP16_TFLite GPU Delegate)	3.976
MVSR_2	MVSR_1 Ablation: light weight frame alignment	REDS	27.929586	0.78039753	34,208	35.4 (FP16_TFLite GPU Delegate)	3.944
MVSR_3	MVSR_1 Ablation: large receptive field in SMDB -> reduce: 3x3 + 3x3 dilated	REDS	27.892586	0.7790079	32,169	41.1 (FP16_TFLite GPU Delegate)	3.711
MVSR_4	MVSR_2 Ablation: large receptive field in SMDB -> increase: 7x7 + 7x7 dilated	REDS	27.958328	0.78145003	37,664	42.4 (FP16_TFLite GPU Delegate)	4.343
MVSR_5	MVSR_1 Ablation: large receptive field in SMDB -> increase: 7x7 + 7x7 dilated	REDS	27.936714	0.7809204	37,929	49.8 (FP16_TFLite GPU Delegate)	4.375
MVSR_6	modified IMDB IMDB + PartialConv based pixel attention version_0 + VapSR + BSConv; light weight frame alignment	REDS	27.884369	0.7790964	34,473	44.4 (FP16_TFLite GPU Delegate)	4.246
MVSR_7	modified IMDB IMDB + PartialConv based pixel attention version_1 + VapSR + BSConv; light weight frame alignment	REDS	27.858534	0.77831227	35,769	44.5 (FP16_TFLite GPU Delegate)	4.387
MVSR_8	MVSR_1 Ablation: SEL -> Channel Attention	REDS	27.610485	0.7696045	29,145	40.5 (FP16_TFLite GPU Delegate)	3.001
MVSR_9	MVSR_1 Ablation: Channel fuse + SEL -> FlashModule + Channel fuse	REDS	28.043566	0.7842476	96,249	72.8 (FP16_TFLite GPU Delegate)	10.684
MVSR_10	Partial conv idea applied to MSDB and Attention(i.e. SEL)	REDS	27.86422	0.7783118	27,081	41.2 (FP16_TFLite GPU Delegate)	3.031
MVSR_11	MVSR_10 fintune: deperecae MSDB’s channel fuse; add MDSB blocks	REDS	27.90566	0.7793118	32,553	48.7 (FP16_TFLite GPU Delegate)	3.634
MVSR_12	MVSR_11 ablation: MSDB’s group convolution -> standard convolution	REDS	27.953104	0.7807622	68,169	38.4 (FP16_TFLite GPU Delegate)	7.737
MVSR_13	MVSR_12 AttentionAlign module evolution	REDS	27.966156	0.7809557	68,157	39.8 (FP16_TFLite GPU Delegate)	7.735
MVSR_13_1	MVSR_13 evolution: ConvTail used for increasing dimension -> BSConv	REDS	27.879667	0.7790071	62,541	40.6 (FP16_TFLite GPU Delegate)	7.080
MVSR_13_2	MVSR_13 ablation: fractional/partial ratio 1/2 -> 1/4	REDS	27.877321	0.7783568	37,517	37.1 (FP16_TFLite GPU Delegate)	4.152
MVSR_13_3	MVSR_13 ablation: fractional/partial ratio 1/2 -> 1/8	REDS	27.79014	0.77567685	29,829	35.5 (FP16_TFLite GPU Delegate)	3.240
MVSR_13_4	MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/4	REDS	27.955465	0.78150684	119,149	84.1 (FP16_TFLite GPU Delegate)	13.663
MVSR_13_4_revalid	MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/4	REDS	27.956861	0.7814712	119,149	84.1 (FP16_TFLite GPU Delegate)	13.663
MVSR_13_5	MVSR_13 ablation: fractional/partial ratio 1/2 -> 7/8	REDS	27.993414	0.7823691	152,277	103.0 (FP16_TFLite GPU Delegate)	17.506
MVSR_13_6	MVSR_13 ablation: fractional/partial ratio 1/2 -> 3/8	REDS	27.948065	0.78071	50,293	39.0 (FP16_TFLite GPU Delegate)	5.651
MVSR_13_7	MVSR_13 ablation: fractional/partial ratio 1/2 -> 5/8	REDS	27.983498	0.7823881	91,109	86.4 (FP16_TFLite GPU Delegate)	10.406
MVSR_14	MVSR_13 ablation: - frame attention align -> standard conv 1 x 1 act as frame information propogation operator	REDS	27.930904	0.78043896	68,272	29.7 (FP16_TFLite GPU Delegate)	7.746
MVSR_15	MVSR_13 ablation: MSDB block number 4 -> 3	REDS	27.91523	0.77966064	52,965	33.4 (FP16_TFLite GPU Delegate)	6.016
MVSR_16	MVSR_13 ablation: No partial/fractional; No BSconv (Blueprint Separable conv); No receptive field decomposition	REDS	27.928417	0.7801167	920,381	399.0 (FP16_TFLite GPU Delegate)	106.045
MVSR_17	MVSR_13 evolution: MSDB using standard conv 3 x 3, PPA using split large receptive field conv 5 x 5 + 5 x 5 dilated	REDS	27.902325	0.7794845	47,101	36.8 (FP16_TFLite GPU Delegate)	5.313
MVSR_18	MVSR_17 ablation: BSconv	REDS	27.893446	0.77926654	47,325	34.4 (FP16_TFLite GPU Delegate)	5.340
MVSR_19	MVSR_13 evolution: MSDB blocks 4 -> 3; Enlarge receptive field of PPA 3 -> 17	REDS	27.914143	0.7799854	60,861	35.9 (FP16_TFLite GPU Delegate)	6.924
MVSR_20	MVSR_13 ablation: No receptive field decomposition	REDS	27.93524	0.78071207	251,613	119.0 (FP16_TFLite GPU Delegate)	28.875
MVSR_21	MVSR_13 ablation: No frame align; No fractional/partial; No BSconv; No receptive field decomposition	REDS	27.941408	0.7807615	920,128	400.0 (FP16_TFLite GPU Delegate)	106.014
MVSR_21_1	MVSR_13 ablation: No frame align (directly extraction from 3 consecutive frames); No fractional/partial; No BSconv; No receptive field decomposition	REDS	27.913836	0.779473	920,992	399.0 (FP16_TFLite GPU Delegate)	106.114
MVSR_22	MVSR_13 ablation: No BSconv; No receptive field decomposition	REDS	27.936152	0.7799803	251,837	118.0 (FP16_TFLite GPU Delegate)	28.902
MVSR_23	MVSR_13 ablation: PFE PPA Standard conv -> Depthwise conv	REDS	27.867388	0.7784562	32,541	49.5 (FP16_TFLite GPU Delegate)	3.632
MVSR_24	MVSR_13 ablation: - Partial/Fractional Extraction	REDS	27.952333	0.78090274	186,141	94.3 (FP16_TFLite GPU Delegate)	21.449
MVSR_24_revalid	MVSR_13 ablation: - Partial/Fractional Extraction (keep fc)	REDS	27.940563	0.7804851	190,493	102.0 (FP16_TFLite GPU Delegate)	21.935
MVSR_25	MVSR_13 ablation: - BSConv	REDS	27.929697	0.780183	68,381	38.8 (FP16_TFLite GPU Delegate)	7.762
MVSR_26	MVSR_13 ablation: - Large Receptive Field Decomposition	REDS	27.972654	0.7818956	251,613	119.0 (FP16_TFLite GPU Delegate)	28.875
MVSR_27	MVSR_13 ablation: - FC in PFE, PPA	REDS	27.945955	0.78067327	63,805	37.8 (FP16_TFLite GPU Delegate)	7.249

AI benchmark setting for Runtime test:

Input Values range(min,max): 0,255
Inference Mode: INT8/FP16
Acceleration: CPU/TFLite GPU Delegate

Benchmark_0

Rank	Model	Source	Dataset	Test PSNR	Test SSIM	Params	Runtime on oneplus7T [ms]
1	Diggers	Real-Time Video Super-Resolution based on Bidirectional RNNs(2021 SOTA)	REDS(train_videos: 240, test_videos: 30)	27.98	-	39,640	-
2	VSR_12	Ours	REDS(train_videos: 240, test_videos: 30)	27.981062	0.7824855	57,696	62.8
3	MVSR_4	Ours	REDS(train_videos: 240, test_videos: 30)	27.958328	0.78145003	37,664	42.4
4	MVSR_12	Ours	REDS(train_videos: 240, test_videos: 30)	27.953104	0.7807622	68,169	38.4
5	SORT_2	Ours	REDS(train_videos: 240, test_videos: 30)	27.93981	0.7808094	45,264	35.6
6	SWRN	Sliding Window Recurrent Network for Efficient Video Super-Resolution (2022 SOTA)	REDS(train_videos: 240, test_videos: 30)	27.92	0.77	43,472	31.0
7	MVSR_11	Ours	REDS(train_videos: 240, test_videos: 30)	27.90566	0.7793118	32,553	48.7
8	SWAT_3_5	Ours	REDS(train_videos: 240, test_videos: 30)	27.840628	0.7774375	37,312	39.4
9	EESRNet	EESRNet: A Network for Energy Efficient Super-Resolution(2022)	REDS(train_videos: 240, test_videos: 30)	27.84	-	62,550	-
10	LiDeR	LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices (2022)	REDS(train_videos: 240, test_videos: 30)	27.51	0.76	-	-
11	EVSRNet	EVSRNet：Efficient Video Super-Resolution with Neural Architecture Search(2021)	REDS(train_videos: 240, test_videos: 30)	27.42	-	-	-
12	RCBSR	RCBSR: Re-parameterization Convolution Block for Super-Resolution(2022)	REDS(train_videos: 240, test_videos: 30)	27.28	0.775	-	-

Benchmark_1

Model	Source	Dataset	Test PSNR	Test SSIM	Params
SSL-uni	Structured Sparsity Learning for Efficient Video Super-Resolution (CVPR2023)	REDS(train:266 test:4)	30.24	0.86	500,000

PaperWriting

No.1

BSConvU as shallow feature extraction
Recurrent neural network for feature information freedom flow cross frames
multi distilation module through dynamic routing large ERF attention
Bilineared RGB channels share same upsample result
Nearest conv for shorter residual inference time compared with bilinear residual

No.2

Motivation: 移动端视频超分 Inference Time ↓, PSNR ↑, SSIM ↑
只用当前处理LR帧的前一个预测HR帧做参考补偿当前帧 -> 拍摄的同时实时超分,不受只能对拍摄完成的视频进行超分的限制
假设模型中间的feature maps对输出结果不是同等贡献度，如何进行高贡献度的feature maps聚集aggregation -> 做Partial Convolution accelerate inference(分析)
减少模型中的activation -> 利用Multiply产生非线性映射的能力
RGB三通道共享上采样补偿 -> 常规模型的RGB三通道上采样补偿是否存在高度一致性，若存在则可以共享以起到降低计算量加速推理的效果(分析)
蓝图卷积作为浅层特征提取 -> 效果反而比标准卷积最终的效果好
多尺度特征(降采样到不同尺度)基于注意力机制融合 <- motivation: 灵长类动物视觉皮层同一区域不同神经元感受野不同，类比到模型内则是同一层内从不同尺度/感受野捕获更精确的空间信息或更多的纹理信息
短距离shortcut的fusion -> 加速推理

No.3

Motivation: 移动端视频超分 Inference Time ↓, PSNR ↑, SSIM ↑
辅助前后向传播的隐藏状态做对齐(auxiliary forward/backward hidden states for feature alignment) -> 提升超分结果PSNR
假设模型中间的feature maps对输出结果不是同等贡献度，如何进行高贡献度的feature maps聚集aggregation -> 做Partial Convolution accelerate inference(分析)
减少模型中的activation -> 利用Multiply产生非线性映射的能力,加速推理
考虑动态深度(adaptive existing) -> 加速推理 -> deprecated

PaperReference

Rethinking Alignment in Video Super-Resolution Transformers(NIPS 2022) -> VIT 视频超分(VSR)中帧/特征对齐不是必要操作
An Implicit Alignment for Video Super-Resolution (ArXiv 2023) -> bilinear interpolation/resample 改进
Video Super-Resolution Transformer
Efficient Reference-based Video Super-Resolution (ERVSR): Single Reference Image Is All You Need (WACV 2023) -> 帧序列中间帧作为参考帧辅助当前帧超分
MULTI-STAGE FEATURE ALIGNMENT NETWORK FOR VIDEO SUPER-RESOLUTION
ELSR: Extreme Low-Power Super Resolution Network For Mobile Devices
LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
COLLAPSIBLE LINEAR BLOCKS FOR SUPER-EFFICIENT SUPER RESOLUTION
Revisiting Temporal Alignment for Video Restoration
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
EVSRNet：Efficient Video Super-Resolution with Neural Architecture Search
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
Revisiting Temporal Modeling for Video Super-resolution -> MAI 第一届VSR 官方baseline
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution (CVPR 2020)
Video Super-resolution with Temporal Group Attention (CVPR 2020)
3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks
Frame-Recurrent Video Super-Resolution
Video Super-Resolution With Convolutional Neural Networks