AMP

autocast op reference

Op Eligibility

Ops that run in float64 or non-floating-point dtypes are not eligible, and will run in these types whether or not autocast is enabled.

Only out-of-place ops and Tensor methods are eligible. In-place variants and calls that explicitly supply an out=... Tensor are allowed in autocast-enabled regions, but won’t go through autocasting. For example, in an autocast-enabled region a.addmm(b, c) can autocast, but a.addmm_(b, c) and a.addmm(b, c, out=d) cannot. For best performance and stability, prefer out-of-place ops in autocast-enabled regions.

Ops called with an explicit dtype=... argument are not eligible, and will produce output that respects the dtype argument.

CUDA Ops that can autocast to `float16`

__matmul__, addbmm, addmm, addmv, addr, baddbmm, bmm, chain_matmul, multi_dot, conv1d, conv2d, conv3d, conv_transpose1d, conv_transpose2d, conv_transpose3d, GRUCell, linear, LSTMCell, matmul, mm, mv, prelu, RNNCell

CUDA Ops that can autocast to `float32`

__pow__, __rdiv__, __rpow__, __rtruediv__, acos, asin, binary_cross_entropy_with_logits, cosh, cosine_embedding_loss, cdist, cosine_similarity, cross_entropy, cumprod, cumsum, dist, erfinv, exp, expm1, group_norm, hinge_embedding_loss, kl_div, l1_loss, layer_norm, log, log_softmax, log10, log1p, log2, margin_ranking_loss, mse_loss, multilabel_margin_loss, multi_margin_loss, nll_loss, norm, normalize, pdist, poisson_nll_loss, pow, prod, reciprocal, rsqrt, sinh, smooth_l1_loss, soft_margin_loss, softmax, softmin, softplus, sum, renorm, tan, triplet_margin_loss

reference:

https://pytorch.org/docs/master/amp.html#autocast-op-reference

https://on-demand.gputechconf.com/gtc-taiwan/2018/pdf/5-1_Internal%20Speaker_Michael%20Carilli_PDF%20For%20Sharing.pdf

https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/

https://zhuanlan.zhihu.com/p/79887894

posted @ 2022-05-18 17:21 xuyv 阅读(108) 评论(0) 收藏举报

刷新页面返回顶部

AMP

Op Eligibility

CUDA Ops that can autocast to float16

CUDA Ops that can autocast to float32

公告

CUDA Ops that can autocast to `float16`

CUDA Ops that can autocast to `float32`