Operator Numerical Check

姚伟峰

Operator Numerical Check
- 基本公式
- 软件行为
  - NumPy
  - PyTorch
  - TensorFlow

基本公式

其中：
atol: absolute tolerance
rtol: relative tolerance
NaN and Inf行为：
NaNs are treated as equal if they are in the same place and if equal_nan=True. Infs are treated as equal if they are in the same place and of the same sign in both arrays.

软件行为

NumPy

Numpy对应的接口有两个：

PyTorch

PyTorch采用Numpy相同的default tolerance，API为:
torch.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False)。

每个op可使用default也可自设，具体每个op的tolerance可见source code; Caffe2算子的tolerance可见source code。

Examples

op 数据类型 absolute tolerance relative tolerance

MatMul(FWD) FP32 1e-4 1e-4

MatMul(FWD) FP16 150 * 1e-4 150 * 1e-4

from code

TensorFlow

TF使用如下test util判断：
assertAllCloseAccordingToType(a, b, rtol=1e-06, atol, float_rtol, float_atol, half_rtol, half_atol, bfloat16_rtol, bfloat16_atol, msg=None) 或它的简化版本ssertAllClose。
对不同的数据类型，default tolerance如下：

数据类型	absolute tolerance	relative tolerance
FP64	2.22e-15	2.22e-15
FP32	1e-6	1e-6
FP16	1e-3	1e-3
BF16	1e-2	1e-2

TF default tolerance的选择标准描述如下：

The default atol and rtol is 10 * eps, where eps is the smallest representable positive number such that 1 + eps != 1. This is about 1.2e-6 in 32bit, 2.22e-15 in 64bit, and 0.00977 in 16bit. See numpy.finfo.

具体每个op的tolerance可见相应op的test code或相应kernel的test code。

Examples

op 数据类型 absolute tolerance relative tolerance

MatMul(FWD) FP32 3e-5 3e-5

MatMul(FWD) FP16 0.2 0.2

from code

posted on 2022-01-21 14:32 姚伟峰阅读(229) 评论(0) 收藏举报

刷新页面返回顶部

足迹