SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation

In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.

Automatic Differentiation and Gradients
Automatic differentiation is useful and powerful for implementing machine learning algorithms such as backpropagation for training neural networks.

Computing gradients
To differentiate automatically, TensorFlow needs to:

  1. remember what operations happen in what order during the forward pass.
  2. Then TensorFlow traverses this list of operations in reverse order to compute gradients during the backward pass.

Gradient tapes
TensorFlow provides the tf.GradientTape API for automatic differentiation; that is:
0. A gradient is fundamentally an operation on a scalar.
The gradient with respect to each source has the shape of the source;
Similarly, if the target(s) are not scalar the gradient of the sum is calculated:

  1. computing the gradient of a computation with respect to some inputs, usually tf.Variables.
  2. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape".
  3. TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation
    Once you've recorded some operations, use
    GradientTape.gradient(target, sources) to calculate the gradient of some target(often a loss) relative to some source(often the model's variables):
  4. To get the gradient with respect to two or more variables, you can pass a list of those variables as sources to the gradient method.
    The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest).
  5. we can passing a dictionary of variables as source: grad_w = tape.gradient( cost, {'w': w, 'b': b} )['w']
  6. By default, the resources held by a GradientTape are released as soon as the GradientTape.gradient method is called.
    so a non-persistent GradientTape can only be used to compute one set of gradients (or jacobians),
    To compute multiple gradients over the same computation, create a gradient tape with persistent=True.
    This allows multiple calls to the gradient method as resources are released when the tape object is garbage collected.
  7. Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.
    since calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage).
  8. examples:
def comp_gradient_dy_dx(x, persistent=False):
  k = np.array(range(20)).reshape(int(x.shape[-1]), int(20/x.shape[-1]))
  with tf.GradientTape() as tape: #recording ops.
    y = (x**2) @ k
  # dy = 2x * dx
  dy_dx = tape.gradient(y, x)
  return y, dy_dx

print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0], [3.0, 4.0]])))
# y, [ [2.0, 4.0], [6.0, 8.0] ]
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0, 3, 4], [3.0, 4.0, 5, 6]])))
# y, [ [20.0, 140.0, 360.0, 680.0],  [60.0, 280.0, 600.0, 1020.0] ]


def linear_model_gradients_DcDw_DcDb(w, x, b):
  with tf.GradientTape(persistent=True) as tape:
      y = x @ w + b
      cost = tf.reduce_mean(y**2)
      # here cost is a reduced scalar value.
      [dc_dw, dc_db] = tape.gradient(cost, [w, b]) #
      return  [cost, dc_dw, dc_db]

w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]

cost, dc_dw, dc_db = linear_model_gradients_DcDw_DcDb(w, x, b)
print( "LMG:\n  COST:%r\n  DcDw:%r\n  DcDb:%r\n" % (
cost.numpy(), dc_dw.numpy(), dc_db.numpy()
))

Gradients with respect to a model
It's common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting.

In most cases, you will want to calculate gradients with respect to a model's trainable variables.
Since all subclasses of tf.Module aggregate their variables in the Module. trainable_variables property, you can calculate these gradients in a few lines of code:

layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
    # Forward pass
    y = layer(x)
    loss = tf.reduce_mean(y**2)
# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables)

for var, g in zip(layer.trainable_variables, grad):
  print(f'{var.name}, shape: {g.shape}')

posted @ 2024-01-02 16:02  abaelhe  阅读(53)  评论(0)    收藏  举报