数值微分 Numerical Differentiation - central difference method

As differentiation is inherently less stable than integration and benefits little from high-order methods, we usually choose easy methods. Mostly, we use the central difference method.
For a function \(f(x)\) and its derivative at \(x_0\),

\[f'(x_0)\approx \frac{f(x_0+h)-f(x_0-h)}{2h}. \]

where \(h\) is an arbitrary variable that is rather small.

By Taylor's Series, we can prove that the error of central difference is \(O(h^2)\).
Expand \(f(x)\) at \(x_0\).

\[f(x)=f(x_0)+\sum_{i=1}^{\infty}{\frac{f^{(i)}(x_0)\cdot (x-x_0)^i}{i!}}. \]

Then

\[f(x_0+h)=f(x_0)+hf'(x_0)+\frac{h^2}{2}f''(x_0)+\frac{h^3}{3!}f^{(3)}(x_0)+\frac{h^4}{4!}f^{(4)}(x_0)+O(h^5), \]

\[f(x_0-h)=f(x_0)-hf'(x_0)+\frac{h^2}{2}f''(x_0)-\frac{h^3}{3!}f^{(3)}(x_0)+\frac{h^4}{4!}f^{(4)}(x_0)+O(h^5). \]

Thus

\[\frac{f(x_0+h)-f(x_0-h)}{2h}=f'(x_0)+\frac{h^2}{6}f^{(3)}(x_0)+O(h^4). \]

This shows that the error is \(O(h^2)\).

Yet we are still unsure which \(h\) to pick: an \(h\) too small shall lead to round-off errors in subtraction, while an \(h\) not small enough fails to ignore the high order items, causing truncation errors. A conventional choice will be \(h=\sqrt{\epsilon}\cdot |x_0|\), where \(\epsilon\) is machine precision (typically \(10^{-16}\)). This value strikes a balance that reduces truncation error and prevents round-off error. Above is all of central difference method.

What's more, in deep learning and similar situations, automatic differentiation (or autograd) is adopted. Autograd is not numerical at all, but an exact chain-rule computation (wrt the neural network) up to floating-point precision. Therefore, autograd avoids truncation error and proves fundamental for training a neural network.

posted @ 2025-08-14 12:36  studentDL  阅读(11)  评论(0)    收藏  举报