# brain of mat kelcey...

## simple tensorboard visualisation for gradient norms

June 27, 2017 at 09:45 PM | categories: Uncategorized

( i've had three people recently ask me about how i was visualising gradient norms in tensorboard so, according to my three strikes rule, i now have to "automate" it by writing a blog post about it )

one really useful visualisation you can do while training a network is visualise the norms of the variables and gradients.

how are they useful? some random things that immediately come to mind include the fact that...

• diverging norms of variables might mean you haven't got enough regularisation.
• zero norm gradient means learning has somehow stopped.
• exploding gradient norms means learning is unstable and you might need to clip (hellloooo deep reinforcement learning).

let's consider a simple bounding box regression conv net (the specifics aren't important, i just grabbed this from another project, just needed something for illustration) ...

# (256, 320, 3)  input image

model = slim.conv2d(images, num_outputs=8, kernel_size=3, stride=2, weights_regularizer=l2(0.01), scope="c0")
# (128, 160, 8)

model = slim.conv2d(model, num_outputs=16, kernel_size=3, stride=2, weights_regularizer=l2(0.01), scope="c1")
# (64, 80, 16)

model = slim.conv2d(model, num_outputs=32, kernel_size=3, stride=2, weights_regularizer=l2(0.01), scope="c2")
# (32, 40, 32)

model = slim.conv2d(model, num_outputs=4, kernel_size=1, stride=1, weights_regularizer=l2(0.01), scope="c3")
# (32, 40, 4)  1x1 bottleneck to get num of params down betwee c2 & h0

model = slim.dropout(model, keep_prob=0.5, is_training=is_training)
# (5120,)  32x40x4 -> 32 is where the majority of params are so going to be most prone to overfitting.

model = slim.fully_connected(model, num_outputs=32, weights_regularizer=l2(0.01), scope="h0")
# (32,)

model = slim.fully_connected(model, num_outputs=4, activation_fn=None, scope="out")
# (4,) = bounding box (x1, y1, dx, dy)


a simple training loop using feed_dict would be something along the lines of ...

optimiser = tf.train.AdamOptimizer()
train_op = optimiser.minimize(loss=some_loss)

with tf.Session() as sess:
while True:
_ = sess.run(train_op, feed_dict=blah)


but if we want to get access to gradients we need to do things a little differently and call compute_gradients and apply_gradients ourselves ...

optimiser = tf.train.AdamOptimizer()

with tf.Session() as sess:
while True:
_ = sess.run(train_op, feed_dict=blah)


optimiser = tf.train.AdamOptimizer()
l2_norm = lambda t: tf.sqrt(tf.reduce_sum(tf.pow(t, 2)))
tf.summary.histogram("variables/" + variable.name, l2_norm(variable))

with tf.Session() as sess:
summaries_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter("/tmp/tb", sess.graph)
for step in itertools.count():
_, summary = sess.run([train_op, summaries_op], feed_dict=blah)


( though we may only want to run the expensive summaries_op once in awhile... )

with logging like this we get 8 histogram summaries per variable; the cross product of

• layer weights vs layer biases
• norms vs values

e.g. for conv layer c3 in the above model we get the summaries shown below. note: nothing terribly interesting in this example, but a couple of things

• red : very large magnitude of gradient very early in training; this is classic variable rescaling.
• blue: non zero gradients at end of training, so stuff still happening at this layer in terms of the balance of l2 regularisation vs loss. (note: no bias regularisation means it'll continue to drift)

sometimes the histograms aren't enough and you need to do some more serious plotting. in these cases i hackily wrap the gradient calc in tf.Print and plot with ggplot

e.g. here's some gradient norms from an old actor / critic model (cartpole++)

## related: explicit simple_value and image summaries

on a related note you can also explicitly write summaries which is sometimes easier to do than generating the summaries through the graph.

i find this especially true for image summaries where there are many pure python options for post processing with, say, PIL

e.g. explicit scalar values

summary_writer =tf.summary.FileWriter("/tmp/blah")
summary = tf.Summary(value=[
tf.Summary.Value(tag="foo", simple_value=1.0),
tf.Summary.Value(tag="bar", simple_value=2.0),
])


e.g. explicit image summaries using PIL post processing

summary_values = []  # (note: could already contain simple_values like above)
for i in range(6):
# wrap np array with PIL image and canvas
img = Image.fromarray(some_np_array_probably_output_of_network[i]))
canvas = ImageDraw.Draw(img)
# draw a box in the top left
canvas.line([0,0, 0,10, 10,10, 10,0, 0,0], fill="white")
# write some text
canvas.text(xy=(0,0), text="some string to add to image", fill="black")
# serialise out to an image summary
sio = StringIO.StringIO()
img.save(sio, format="png")
image = tf.Summary.Image(height=256, width=320, colorspace=3, #RGB
encoded_image_string=sio.getvalue())
summary_values.append(tf.Summary.Value(tag="img/%d" % idx, image=image))