【MnasNet】2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile-论文阅读

MnasNet

2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile

来源：ChenBong 博客园

Institute：Google Brain、Google
Author：Mingxing Tan、Quoc V. Le
GitHub：https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet
Citation：740+

Introduction

使用实际的 latency 和性能acc 的 trade-off 作为搜索的目标

不是使用搜索cell，堆叠cell的单一的搜索空间，使用了新的搜索空间，允许每一个block的op类型各不相同，增加了layer的多样性

Motivation

mobile上的网络设计，有多个优化目标，如参数量少，速度快，准确率高等
之前的 nas 方法在考虑 latency 时，常常使用如 FLOPs 作为实际 latency 的 proxy，实际上 real world latency 和 FLOPs 之间是有很大的差距的
之前的 nas 方法很多都是采用搜索cell-堆叠cell 的策略，虽然这样可以减小搜索空间，但是却丢失了 layer diversity，导致搜索不到一些更好的模型

Contribution

同时考虑多个优化目标的 soft reward：latency and acc trade-off 的 Multi-objective soft reward
将 real world latency 作为优化目标
提出 layer diversity search space，可以实现 layer diversity

Method

Pipeline

5 epoch × 8k = 40k epoch

sample 太多，采用什么搜索算法估计都没有区别，估计随机搜索性能也不差

Multi-objective Soft Reward

Hard Constraint

only maximizes a single metric and does not provide multiple Pareto optimal solutions

Soft Constraint

weight sum method：

$maximize~ACC(m)+λ|LAT(m)-T|$

where $λ=\left\{\begin{array}{ll}\alpha, & \text { if } L A T(m) \leq T \\ \beta, & \text { otherwise }\end{array}\right.$

We pick the weighted product method because it is easy to customize, but we expect methods like weighted sum should be also fine.

weight product method：

An empirical rule for picking α and β is to ensure Pareto-optimal solutions have similar reward under different accuracy-latency trade-offs.

For instance, we empirically observed doubling the latency usually brings about 5% relative accuracy gain.

Given two models:

(1) M1 has latency $l$ and accuracy $a$;

(2) M2 has latency $2l$ and 5% higher accuracy $a·(1 + 5\% )$,

they should have similar reward:

$ Reward(M2) = a · (1 + 5%)·(2l/T)^β ≈ Reward(M1) = a · (l/T)^β$.

Solving this gives β ≈ −0.07. Therefore, we use α = β = −0.07 in our experiments unless explicitly stated.

Hard Constraint vs. Soft Constraint

Layer Diversity Search Space

For #layers in each block, we search for {0, +1, -1} based on MobileNetV2;

for filter size per layer, we search for its relative size in {0.75, 1.0, 1.25} to MobileNetV2.

搜索空间还是基于手工设计的网络 MobileNet V2，实际上还是在搜一个类似MB V2的结构。

Search Algorithm

reinforcement learning approach

use sample-eval-update loop to train the controller.

Experiments

Setup

Optimizer：RMSProp，decay=0.9，momentum=0.9
momentum：0.99
weight decay：1e-5
batch size：4K
lr：
- warm up：0 to 0.256
- decayed by 0.97 every 2.4 epochs

Model Scaling Performance

图5说明，无论是调整 Multiplier 还是 Input size，搜出来的结构都有很好的 Acc-Latency trade off。

问题：是巧合还是说明当一个结构有很好的 Acc-Latency trade off 时，在不同的scale下的表现都会一致地好？即 scale 不影响不同结构 scale 之后的 rank？

Ablation Study

Soft vs. Hard Latency Constraint

图6说明，hard constraint 下，搜到的结构主要分布在 T=75ms 以下，而 soft constraint 有更大的概率去搜索离约束 T=75ms 更远的模型。从而可以更好地得到 acc-latency 的 Pareto optimal 曲线。

Multi-objective Soft Reward and Layer Diversity Search Space

单一因素的影响

本文的2个主要变化： A：Multi-objective soft reward 和 B：Layer diversity search space

要说明 A 和 B 都有效，应要做4个实验：

A+B、A、B、baseline

得到以下形式的结果：

A+B > A > baseline
- A+B > A
- A > baseline
- A+B > baseline
A+B > B > baseline
- A+B > B
- B > baseline

A > baseline 和 A+B > B，说明A因素有效；B > baseline 和 A+B > A，说明B因素有效。

这里只做了A+B、A、baseline，缺了B

A+B > A ?> baseline
- A+B > A，说明都在 A(Multi obj) 的条件下，加上更大的搜索空间 B (Layer diversity) 会更好
- A ?> baseline，不能说明仅使用 A(Multi-obj) 会更好
- A+B > baseline，同时使用 A(Multi obj) 和 B (Layer diversity)时，比 baseline 更好
A+B > B(无) > baseline
- A+B > B，没做，不能说明在都使用 B(Layer diversity) 的条件下，有效性
- B > baseline，没做，不能说明仅使用 B(Layer diversity) 会更好

在搜到的结构上替换op

把搜到的 MnasNet-A1 不同类型的op都替换成相同类型的op，得到不同的变体，想说明 layer diversity 对acc-latency trade off 很重要。

问题：但变体不是专门在新的空间上重新搜索相同的数量，可能在单一op的空间中，也存在 acc-latency trade off 很好的模型，这个实验同样不能说明使用 layer diversity 的 search space 会更好。

Conclusion

Summary

搜索算法不是本文的主要贡献
Multi-objective soft constraint
- 对 Multi-objective 做 weight product 的形式比较少见，可以更好地探索 constraint 附近的空间，在需要找 pareto optimal 曲线的时候可以尝试
Layer diversity Search Space：
- 问题：无论是单一因素的实验还是替换op的实验，都没有充分证明Layer diversity Search Space的有效性
在ImageNet 上 40k 个epoch 的开销，计算成本高

To Read

Reference

posted @ 2020-12-14 14:40 ChenBong 阅读(164) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Loading

ChenBong