【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读

BlockSwap

2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

来源: ChenBong 博客园

Institute：University of Edinburgh
Author：Jack Turner，Michael O'Boyle
GitHub：https://github.com/BayesWatch/pytorch-blockswap 【20+】
Citation：1

Introduction

backbone network (consist of standard block)
1. cheap blocks pool ==> (swap backbone's block) candidate blcok-swap networks space
1. under constraint sample ==> compute fisher score ==> ranking network
1. Distillation (T: backbone network, S: block-swap network)

Contribution

block-wise 的替换，相对于 NAS（bottom-up method）来说降低了搜索空间，速度更快，相对于改变网络的深度/宽度方法，如剪枝等（top-dowm method）来说搜索维度更高（剪枝只能修改每一层的filter个数，block swap 还可以修改层的类型）
基于 Fisher information 的候选网络快速评估算法

Method

Fisher Information

泰勒展开来计算filter重要性的方法与计算 Fisher information 的方法等价
\(\Delta_{c}=\frac{1}{2 N} \sum_{n}^{N}\left(\sum_{i}^{W} \sum_{j}^{H} a_{n i j} g_{n i j}\right)^{2}\)
- feature map大小 W×H， \(\sum a_{ij}*g_{ij}\) 衡量一个（filter输出的）channel 的重要性 \(\Delta_{c}\)
\(\Delta_{b}=\sum_{c}^{C} \Delta_{c}\)
- C是一个 blcok 的总通道数；一个block的重要性表示为 \(\Delta_{b}\)
\(\sum_B \Delta_{b}\)
- B 是一个 swap-block network 的 blcok 数量，一个 swap-block network 的重要性表示为： \(\sum_B \Delta_{b}\)

Substitute Blocks

Standard Block
- 参数量： \(2N^2k^2\)
Grouped+Pointwise Block – G(g)
- 参数量： \(2((N^2k^2)/g+N^2)\)
Bottleneck Block – B(b)
- 参数量：\((N/b)^2k^2+2N^2/b\)
Bottleneck Grouped+Pointwise Block – BG(b, g)
- 参数量： \((N/bg)^2k^2+2N^2/b\)

Distillation

\(\mathcal{L}_{A T}=\mathcal{L}_{C E}+\beta \sum_{i=1}^{L}\left\|\frac{\mathbf{f}\left(A_{i}^{t}\right)}{\left\|\mathbf{f}\left(A_{i}^{t}\right)\right\|_{2}}-\frac{\mathbf{f}\left(A_{i}^{s}\right)}{\left\|\mathbf{f}\left(A_{i}^{s}\right)\right\|_{2}}\right\|_{2} \qquad (1)\)

\(\mathbf{f}\left(A_{i}\right)=\left(1 / N_{A_{i}}\right) \sum_{j=1}^{N_{A_{i}}} \mathbf{a}_{i j}^{2}\) ，其中 \(i=1,2,...,L\) ，\(N_{A_i}\) is the number of channels at layer i.

blocks pool ==> candidate blcok-swap networks space ==>
under constraint sample ==> compute fisher score ==> ranking network
Distillation (T: backbone network, S: block-swap network)

Experiments

CIFAR-10

Setup

momentum：0.9
lr：init 0.1，cosine
minibatch size：128
weight decay：5e-4
β：1000

Teacher Network：

3 个 WRN-40-2（depth 40，width multiplier 2，18 blocks，2.2M params）

Student Network：

params constraint：200K, 400K, 600K, 800K

WRN-16-2 / WRN-40-1 / WRN-16-1
WRN-40-2 + mixed swap
WRN-40-2 + Single swap (MBConv6 / DARTS / DenseNet)
WRN-40-2 + SNIP pruning
WRN-40-2 + \(l1\) pruning

ImageNet

Setup

momentum：0.9
lr：init 0.1，step：30，60，90
minibatch size：256
weight decay：1e-4
β：750

Teacher Network：

1 个 ResNet34（16 blocks, 21.8M params）

Student Network：

params constraint：8M，3M

ResNet18 / ResNet18-0.5 (the channel width in the last 3 sections has been halved)
ResNet34 + mixed swap
ResNet34 + Single swap (G(4) / G(N))

Ablation Study

mixed block VS. single blcok

mix swap 总是存在比 single swap 更好的结构

One minibatch VS. N minibatch && Ranking correlation

final err 与不同 batch 时下列指标的相关性：

acc
weight l2 norm
grad l1 norm
fisher score

Sample Num?

BlockSwap finds networks with final test errors of 4.85%. 4.54%, and 4.21% after 10, 100, and 1000 samples respectively.

We empirically found that 1000 samples.

Conclusion

Summary

To Read

Reference

https://blog.csdn.net/xbinworld/article/details/104591706

https://www.zhihu.com/question/266846405

https://openreview.net/forum?id=SklkDkSFPB

posted @ 2020-10-13 20:17 ChenBong 阅读(319) 评论(0) 收藏举报

刷新页面返回顶部

Loading

ChenBong

【BlockSwap】2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget-论文阅读

BlockSwap

Introduction

Contribution

Method

Fisher Information

Substitute Blocks

Distillation

Experiments

CIFAR-10

Setup

ImageNet

Setup

Ablation Study

mixed block VS. single blcok

One minibatch VS. N minibatch && Ranking correlation

Sample Num?

Conclusion

Summary

To Read

Reference

公告