SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation
In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.
Automatic Differentiation and Gradients
Automatic differentiation is useful and powerful for implementing machine learning algorithms such as backpropagation for training neural networks.
Computing gradients
To differentiate automatically, TensorFlow needs to:
- remember what operations happen in what order during the forward pass.
- Then TensorFlow traverses this list of operations in reverse order
to compute gradients during the backward pass.
Gradient tapes
TensorFlow provides the tf.GradientTape API for automatic differentiation; that is:
0. A gradient is fundamentally an operation on a scalar.
The gradient with respect to each source has the shape of the source;
Similarly, if the target(s) are not scalar the gradient of the sum is calculated:
- computing the gradient of a computation
with respect to some inputs, usuallytf.Variables. - TensorFlow "records" relevant operations executed inside the context of a
tf.GradientTapeonto a "tape". - TensorFlow
then uses that tapeto compute the gradients of a "recorded" computation using reverse mode differentiation
Once you've recorded some operations, use
GradientTape.gradient(target, sources)to calculate the gradient of some target(often a loss) relative to some source(often the model's variables): - To get the gradient with respect to two or more variables, you can pass a list of those variables as sources to the gradient method.
The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest). - we can passing a dictionary of variables as source: grad_w = tape.gradient( cost, {'w': w, 'b': b} )['w']
- By default, the resources held by a
GradientTapeare released as soon as the GradientTape.gradient method is called.
so a non-persistentGradientTapecan only be used to computeone set of gradients(orjacobians),
To compute multiple gradients over the same computation, create a gradient tape withpersistent=True.
This allows multiple calls tothe gradient methodas resources are released when the tape object is garbage collected. - Only call
GradientTape.gradientinside the context if you actually want to trace the gradient in order to compute higher order derivatives.
since callingGradientTape.gradienton a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). - examples:
def comp_gradient_dy_dx(x, persistent=False):
k = np.array(range(20)).reshape(int(x.shape[-1]), int(20/x.shape[-1]))
with tf.GradientTape() as tape: #recording ops.
y = (x**2) @ k
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
return y, dy_dx
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0], [3.0, 4.0]])))
# y, [ [2.0, 4.0], [6.0, 8.0] ]
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0, 3, 4], [3.0, 4.0, 5, 6]])))
# y, [ [20.0, 140.0, 360.0, 680.0], [60.0, 280.0, 600.0, 1020.0] ]
def linear_model_gradients_DcDw_DcDb(w, x, b):
with tf.GradientTape(persistent=True) as tape:
y = x @ w + b
cost = tf.reduce_mean(y**2)
# here cost is a reduced scalar value.
[dc_dw, dc_db] = tape.gradient(cost, [w, b]) #
return [cost, dc_dw, dc_db]
w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]
cost, dc_dw, dc_db = linear_model_gradients_DcDw_DcDb(w, x, b)
print( "LMG:\n COST:%r\n DcDw:%r\n DcDb:%r\n" % (
cost.numpy(), dc_dw.numpy(), dc_db.numpy()
))
Gradients with respect to a model
It's common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting.
In most cases, you will want to calculate gradients with respect to a model's trainable variables.
Since all subclasses of tf.Module aggregate their variables in the Module. trainable_variables property, you can calculate these gradients in a few lines of code:
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
# Forward pass
y = layer(x)
loss = tf.reduce_mean(y**2)
# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables)
for var, g in zip(layer.trainable_variables, grad):
print(f'{var.name}, shape: {g.shape}')

浙公网安备 33010602011771号