First contact with TensorFlow
First contact with TensorFlow
Get started with with Deep Learning programming
This book is devoted to the open-source community, whose work we consume every day without knowing
I hope this book adds some value to this world of education that I love so much. I think that knowledge is liberation and should be accessible to all. For this reason, the content of this book is available on this website completely free. If the reader finds the content useful and considers it appropriate to compensate the effort of the author in writing it,the reader can purchase a paper copy, PDF version or Kindle version.
http://jorditorres.org/first-contact-with-tensorflow/
CONTENTS:
Forewords
Preface
A practical approach
1. TensorFlow basics
2. Linear Regression in TensorFlow
3. Clustering in TensorlFlow
4. Single Layer Neural Network in TensorFlow
5. Multi-layer Neural Networks in TensorFlow
6. Parallelism
Closing
Acknowledgments
About the author, BSC, UPC & GEMLeB
References
About the book
Forewords
The area of Machine Learning has shown a great expansion thanks to the co-development of key areas such as computing, massive data storage and Internet technologies. Many of the technologies and events of everyday life of many people are directly or indirectly influenced by automatic learning. Examples of technologies such as speech recognition, image classification on our phones or detection of spam emails, have enabled apps that a decade ago would have only sounded possible in science fiction. The use of learning in stock market models or medical models has impacted our society massively. In addition, cars with cruise control, drones and robots of all types will impact society in the not too distant future.
Deep Learning, a subtype of Machine Learning, has undoubtedly been one of the fields which has had an explosive expansion since it was rediscovered in 2006. Indeed, many of the startups in Silicon Valley specialize in it, and big technology companies like Google, Facebook, Microsoft or IBM have both development and research teams. Deep Learning has generated interest even outside the university and research areas: a lot of specialized magazines (like Wired) and even generic ones (such as New York Times, Bloomberg or BBC) have written many articles about this subject.
This interest has led many students, entrepreneurs and investors to join Deep Learning. Thanks to all the interest generated, several packages have been opened as “Open Source”. Being one of the main promoters of the library we developed at Berkeley (Caffe) in 2012 as a PhD student, I can say that TensorFlow, presented in this book and also designed by Google (California), where I have been researching since 2013, will be one of the main tools that researchers and SME companies will use to develop their ideas about Deep Learning and Machine Learning. A guarantee of this is the number of engineers and top researchers who have participated in this project, culminated with the Open Sourcing.
I hope this introductory book will help the reader interested in starting their adventure in this very interesting field. I would like to thank the author, whom I have the pleasure of knowing, the effort to disseminate this technology. He wrote this book (first Spanish version) in record time, two months after the open source project release was announced. This is another example of the vitality of Barcelona and its interest to be one of the actors in this technological scenario that undoubtedly will impact our future.
Oriol Vinyals, Research Scientist at Google Brain
Preface
Education is the most powerful weapon which you can use to change the world.
Nelson Mandela
The purpose of this book is to help to spread this knowledge among engineers who want to expand their wisdom in the exciting world of Machine Learning. I believe that anyone with an engineering background may find applications of Deep Learning, and Machine Learning in general, valuable to their work.
Given my background, the reader probably will wonder why I have proposed this challenge of writing about this new Deep Learning technology. My research focus is gradually moving from supercomputing architectures and runtimes to execution middleware’s for big data workloads, and more recently to platforms for Machine Learning on massive data.
Precisely by being an engineer, not a data scientist, I think I can contribute with this introductory approach to the subject, and that it can be helpful for many engineers in the early stages; then it will be their choice to go deeper into what they need.
I hope this book adds some value to this world of education that I love so much. I think that knowledge is liberation and should be accessible to all. For this reason, the content of this book will be available on the website www.JordiTorres.eu/TensorFlowcompletely free. If the reader finds the content useful and considers it appropriate to compensate the effort of the author in writing it, there is a tab on the website to make a donation. On the other hand, if the reader prefers to opt for a paper copy, you can purchase the book through Amazon.com portal.
A Spanish version is also available. Indeed, this book is the translation of the Spanish one, which was finished last January and it was presented in the GEMLeB Meetup (Grup d’Estudi de Machine Learning de Barcelona) of which I am one of the coorganizers.
Let me thank you for reading this book! It comforts me and justifies my effort for writing it. Those who know me, know that technological diffusion is one of my passions. It energizes and motivates me to keep learning.
Jordi Torres, February 2016
A practical approach
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Benjamin Franklin
One of the common applications of Deep Learning includes pattern recognition. Therefore, in the same way as when you start programming there is sort of a tradition to start printing “Hello World”, in Deep Learning a model for the recognition of handwritten digits is usually constructed[1]. The first example of a neural network that I will provide, will also allow me to introduce this new technology called TensorFlow.
However, I do not intend to write a research book on Machine Learning or Deep Learning, I only want to make this new Machine Learning’s package, TensorFlow, available to everybody, as soon as possible. Therefore I apologise in to my fellow data scientists for certain simplifications that I have allowed myself in order to share this knowledge with the general reader.
The reader will find here the regular structure that I use in my classes; that is inviting you to use your computer’s keyboard while you learn. We call it “learn by doing“, and my experience as a professor at UPC tells me that it is an approach that works very well with engineers who are trying to start a new topic.
For this reason, the book is of a practical nature, and therefore I have reduced the theoretical part as much as possible. However certain mathematical details have been included in the text when they are necessary for the learning process.
I assume that the reader has some basic underestanding of Machine Learning, so I will use some popular algorithms to gradually organize the reader’s training in TensorFlow.
In the first chapter, in addition to an introduction to the scenario in which TensorFlow will have an important role, I take the opportunity to explain the basic structure of a TensorFlow program, and explain briefly the data it maintains internally.
In chapter two, through an example of linear regression, I will present some code basics and, at the same time, how to call various important components in the learning process, such as the cost function or the gradient descent optimization algorithm.
In chapter three, where I present a clustering algorithm, I go into detail to present the basic data structure of TensorFlow calledtensor, and the different classes and functions that the TensorFlow package offers to create and manage the tensors.
In chapter four, how to build a neural network with a single layer to recognize handwritten digits is presented in detail. This will allow us to sort all the concepts presented above, as well as see the entire process of creating and testing a model.
The next chapter begins with an explanation based on neural network concepts seen in the previous chapter and introduces how to construct a multilayer neural network to get a better result in the recognition of handwritten digits. What it is known as convolutional neural network will be presented in more detail.
In chapter six we look at a more specific issue, probably not of interest to all readers, harnessing the power of calculation presented by GPUs. As introduced in chapter 1, GPUs play an important role in the training process of neural networks.
The book ends with closing remarks, in which I highlight some conclusions. I would like to emphasize that the examples of code in this book can be downloaded from the github repository of the book[2].
1. TensorFlow basics
In this chapter I will present very briefly how a TensorFlow’s code and their programming model is. At the end of this chapter, it is expected that the reader can install the TensorFlow package on their personal computer.
An Open Source Package
Machine Learning has been investigated by the academy for decades, but it is only in recent years that its penetration has also increased in corporations. This happened thanks to the large volume of data it already had and the unprecedented computing capacity available nowadays.
In this scenario, there is no doubt that Google, under the holding of Alphabet, is one of the largest corporations where Machine Learning technology plays a key role in all of its virtual initiatives and products.
Last October, when Alphabet announced its quarterly Google’s results, with considerable increases in sales and profits, CEO Sundar Pichai said clearly: “Machine learning is a core, transformative way by which we’re rethinking everything we’re doing”.
Technologically speaking, we are facing a change of era in which Google is not the only big player. Other technology companies such as Microsoft, Facebook, Amazon and Apple, among many other corporations are also increasing their investment in these areas.
In this context, a few months ago Google released its TensorFlow engine under an open source license (Apache 2.0). TensorFlow can be used by developers and researchers who want to incorporate Machine Learning in their projects and products, in the same way that Google is doing internally with different commercial products like Gmail, Google Photos, Search, voice recognition, etc.
TensorFlow was originally developed by the Google Brain Team, with the purpose of conducting Machine Learning and deep neural networks research, but the system is general enough to be applied in a wide variety of other Machine Learning problems.
Since I am an engineer and I am speaking to engineers, the book will look under the hood to see how the algorithms are represented by a data flow graph. TensorFlow can be seen as a library for numerical computation using data flow graphs. The nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors), which interconnect the nodes.
TensorFlow is constructed around the basic idea of building and manipulating a computational graph, representing symbolically the numerical operations to be performed. This allows TensorFlow to take advantage of both CPUs and GPUs right now from Linux 64-bit platforms such as Mac OS X, as well as mobile platforms such as Android or iOS.
Another strength of this new package is its visual TensorBoard module that allows a lot of information about how the algorithm is running to be monitored and displayed. Being able to measure and display the behavior of algorithms is extremely important in the process of creating better models. I have a feeling that currently many models are refined through a little blind process, through trial and error, with the obvious waste of resources and, above all, time.
TensorFlow Serving
Recently Google launched TensorFlow Serving[3], that helps developers to take their TensorFlow machine learning models (and, even so, can be extended to serve other types of models) into production. TensorFlow Serving is an open source serving system (written in C++) now available on GitHub under the Apache 2.0 license.
What is the difference between TensorFlow and TensorFlow Serving? While in TensorFlow it is easier for the developers to build machine learning algorithms and train them for certain types of data inputs, TensorFlow Serving specializes in making these models usable in production environments. The idea is that developers train their models using TensorFlow and then they use TensorFlow Serving’s APIs to react to input from a client.
This allows developers to experiment with different models on a large scale that change over time, based on real-world data, and maintain a stable architecture and API in place.
The typical pipeline is that a training data is fed to the learner, which outputs a model, which after being validated is ready to be deployed to the TensorFlow serving system. It is quite common to launch and iterate on our model over time, as new data becomes available, or as you improve the model. In fact, in the google post [4] they mention that at Google, many pipelines are running continuously, producing new model versions as new data becomes available.
Developers use to communicate with TensorFlow Serving a front-end implementation based on gRPC, a high performance, open source RPC framework from Google.
If you are interested in learning more about TensorFlow Serving, I suggest you start by by reading the Serving architecture overview [5] section, set up your environment and start to do a basic tutorial[6] .
TensorFlow Installation
It is time to get your hands dirty. From now on, I recommend that you interleave the reading with the practice on your computer.
TensorFlow has a Python API (plus a C / C ++) that requires the installation of Python 2.7 (I assume that any engineer who reads this book knows how to do it).
In general, when you are working in Python, you should use the virtual environment virtualenv. Virtualenv is a tool to keep Python dependencies required in different projects, in different parts of the same computer. If we use virtualenv to install TensorFlow, this will not overwrite existing versions of Python packages from other projects required by TensorFlow.
First, you should install pip and virtualenv if they are not already installed, like the follow script shows:
# Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev python-virtualenv
# Mac OS X
$ sudo easy_install pip
$ sudo pip install --upgrade virtualenv
environment virtualenv in the ~/tensorflow directory:
$ virtualenv --system-site-packages ~/tensorflow
The next step is to activate the virtualenv. This can be done as follows:
$ source ~/tensorflow/bin/activate # with bash
$ source ~/tensorflow/bin/activate.csh # with csh
(tensorflow)$
The name of the virtual environment in which we are working will appear at the beginning of each command line from now on. Once the virtualenv is activated, you can use pip to install TensorFlow inside it:
# Ubuntu/Linux 64-bit, CPU only:
(tensorflow)$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl
# Mac OS X, CPU only:
(tensorflow)$ sudo easy_install --upgrade six
(tensorflow)$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.1-cp27-none-any.whl
I recommend that you visit the official documentation indicated here, to be sure that you are installing the latest available version.
If the platform where you are running your code has a GPU, the package to use will be different. I recommend that you visit the official documentation to see if your GPU meets the specifications required to support Tensorflow. Installing additional software is required to run Tensorflow GPU and all the information can be found at Download and Setup TensorFlow[7] web page. For more information on the use of GPUs, I suggest reading chapter 6.
Finally, when you’ve finished, you should disable the virtual environment as follows:
(tensorflow)$ deactivate
Given the introductory nature of this book, we suggest thatthe reader visits the mentioned official documentation page to find more information about other ways to install Tensorflow.
My first code in TensorFlow
As I mentioned at the beginning, we will move in this exploration of the planet TensorFlow with little theory and lots of practice. Let’s start!
From now on, it is best to use any text editor to write python code and save it with extension “.py” (eg test.py). To run the code, it will be enough with the command python test.py.
To get a first impression of what a TensorFlow’s program is, I suggest doing a simple multiplication program; the code looks like this:
import tensorflow as tf
a = tf.placeholder("float")
b = tf.placeholder("float")
y = tf.mul(a, b)
sess = tf.Session()
print sess.run(y, feed_dict={a: 3, b: 3})
In this code, after importing the Python module tensorflow, we define “symbolic” variables, called placeholder in order to manipulate them during the program execution. Then, we move these variables as a parameter in the call to the function multiply that TensorFlow offers. tf.mul is one of the many mathematical operations that TensorFlow offers to manipulate thetensors. In this moment, tensors can be considered dynamically-sized, multidimensional data arrays.
The main ones are shown in the following table:
| Operation | Description |
| tf.add | sum |
| tf.sub | substraction |
| tf.mul | multiplication |
| tf.div | division |
| tf.mod | module |
| tf.abs | return the absolute value |
| tf.neg | return negative value |
| tf.sign | return the sign |
| tf.inv | returns the inverse |
| tf.square | calculates the square |
| tf.round | returns the nearest integer |
| tf.sqrt | calculates the square root |
| tf.pow | calculates the power |
| tf.exp | calculates the exponential |
| tf.log | calculates the logarithm |
| tf.maximum | returns the maximum |
| tf.minimum | returns the minimum |
| tf.cos | calculates the cosine |
| tf.sin | calculates the sine |
TensorFlow also offers the programmer a number of functions to perform mathematical operations on matrices. Some are listed below:
| Operation | Description |
| tf.diag | returns a diagonal tensor with a given diagonal values |
| tf.transpose | returns the transposes of the argument |
| tf.matmul | returns a tensor product of multiplying two tensors listed as arguments |
| tf.matrix_determinant | returns the determinant of the square matrix specified as an argument |
| tf.matrix_inverse | returns the inverse of the square matrix specified as an argument |
The next step, one of the most important, is to create a session to evaluate the specified symbolic expression. Indeed, until now nothing has yet been executed in this TensorFlowcode. Let me emphasize that TensorFlow is both, an interface to express Machine Learning’s algorithms and an implementation to run them, and this is a good example.
Programs interact with Tensorflow libraries by creating a session with Session(); it is only from the creation of this session when we can call the run() method, and that is when it really starts to run the specified code. In this particular example, the values of the variables are introduced into the run() method with a feed_dict argument. That’s when the associated code solves the expression and exits from the display a 9 as a result of multiplication.
With this simple example, I tried to introduce the idea that the normal way to program in TensorFlow is to specify the whole problem first, and eventually create a session to allow the running of the associated computation.
Sometimes however, we are interested in having more flexibility in order to structure the code, inserting operations to build the graph with operations running part of it. It happens when we are, for example, using interactive environments of Python such as IPython [8]. For this purpose, TesorFlow offers the tf.InteractiveSession() class.
The motivation for this programming model is beyond the reach of this book. However, to continue with the next chapter, we only need to know that all information is saved internally in a graph structure that contains all the information operations and data .
This graph describes mathematical computations. The nodes typically implement mathematical operations, but they can also represent points of data entry, output results, or read/write persistent variables. The edges describe the relationships between nodes with their inputs and outputs and at the same time carry tensors, the basic data structure of TensorFlow.
The representation of the information as a graph allows TensorFlow to know the dependencies between transactions and assigns operations to devices asynchronously, and in parallel, when these operations already have their associated tensors (indicated in the edges input) available.
Parallelism is therefore one of the factors that enables us to speed up the execution of some computationally expensive algorithms, but also because TensorFlow has already efficiently implemented a set of complex operations. In addition, most of these operations have associated kernels which are implementations of operations designed for specific devices such as GPUs. The following table summarizes the most important operations/kernels[9]:
| Operations groups | Operations |
| Maths | Add, Sub, Mul, Div, Exp, Log, Greater, Less, Equal |
| Array | Concat, Slice, Split, Constant, Rank, Shape, Shuffle |
| Matrix | MatMul, MatrixInverse, MatrixDeterminant |
| Neuronal Network | SoftMax, Sigmoid, ReLU, Convolution2D, MaxPool |
| Checkpointing | Save, Restore |
| Queues and syncronizations | Enqueue, Dequeue, MutexAcquire, MutexRelease |
| Flow control | Merge, Switch, Enter, Leave, NextIteration |
Display panel Tensorboard
To make it more comprehensive, TensorFlow includes functions to debug and optimize programs in a visualization tool called TensorBoard. TensorBoard can view different types of statistics about the parameters and details of any part of the graph computing graphically.
The data displayed with TensorBoard module is generated during the execution of TensorFlow and stored in trace files whose data is obtained from the summary operations. In the documentation page[10] of TensorFlow, you can find detailed explanation of the Python API.
The way we can invoke it is very simple: a service with Tensorflow commands from the command line, which will include as an argument the file that contains the trace.
(tensorflow)$ tensorboard --logdir=<trace file>
You simply need to access the local socket 6006 from the browser[11] with http://localhost:6006/ .
The visualization tool called TensorBoard is beyond the reach of this book. For more details about how Tensorboard works, the reader can visit the section TensorBoard Graph Visualization[12]from the TensorFlow tutorial page.
[contents link]
2. Linear Regression in TensorFlow
In this chapter, I will begin exploring TensorFlow’s coding with a simple model: Linear Regression. Based on this example, I will present some code basics and, at the same time, how to call various important components in the learning process, such as the cost function or the algorithm gradient descent.
Model of relationship between variables
Linear regression is a statistical technique used to measure the relationship between variables. Its interest is that the algorithm that implements it is not conceptually complex, and can also be adapted to a wide variety of situations. For these reasons, I have found it interesting to start delving into TensorFlow with an example of linear regression.
Remember that both, in the case of two variables (simple regression) and the case of more than two variables (multiple regression), linear regression models the relationship between a dependent variable, independent variables xi and a random term b.
In this section I will create a simple example to explain how TensorFlow works assuming that our data model corresponds to a simple linear regression as y = W * x + b. For this, I use a simple Python program that creates data in a two-dimensional space, and then I will ask TensorFlow to look for the line that fits the best in these points.
The first thing to do is to import the NumPy package that we will use to generate points. The code we have created is as it follows:
import numpy as np
num_points = 1000
vectors_set = []
for i in xrange(num_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
vectors_set.append([x1, y1])
x_data = [v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]
As you can see from the code, we have generated points following the relationship y = 0.1 * x + 0.3, albeit with some variation, using a normal distribution, so the points do not fully correspond to a line, allowing us to make a more interesting example.
In our case, a display of the resulting cloud of points is:

The reader can view them with the following code (in this case, we need to import some of the functions of matplotlibpackage, running pip install matplotlib[13]):
import matplotlib.pyplot as plt
plt.plot(x_data, y_data, 'ro', label='Original data')
plt.legend()
plt.show()
These points are the data that we will consider the training dataset for our model.
Cost function and gradient descent algorithm
The next step is to train our learning algorithm to be able to obtain output values y, estimated from the input data x_data. In this case, as we know in advance that it is a linear regression, we can represent our model with only two parameters: W and b.
The objective is to generate a TensorFlow code that allows to find the best parameters W and b, that from input data x_data, adjunct them to y_data output data, in our case it will be a straight line defined by y_data = W * x_data + b . The reader knows that W should be close to 0.1 and b to 0.3, but TensorFlow does not know and it must realize it for itself.
A standard way to solve such problems is to iterate through each value of the data set and modify the parameters W and b in order to get a more precise answer every time. To find out if we are improving in these iterations, we will define a cost function (also called “error function”) that measures how “good” (actually, as “bad”) a certain line is.
This function receives the pair of W and as parameters b and returns an error value based on how well the line fits the data. In our example we can use as a cost function the mean squared error[14]. With the mean squared error we get the average of the “errors” based on the distance between the real values and the estimated one on each iteration of the algorithm.
Later, I will go into more detail with the cost function and its alternatives, but for this introductory example the mean squared error helps us to move forward step by step.
Now it is time to program everything that I have explained with TensorFlow. To do this, first we will create three variables with the following sentences:
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
For now, we can move forward knowing only that the call to the method Variable is defining a variable that resides in the internal graph data structure of TensorFlow, of which I have spoken above. We will return with more information about the method parameters later, but for now I think that it’s better to move forward to facilitate this first approach.
Now, with these variables defined, we can express the cost function that we discussed earlier, based on the distance between each point and the calculated point with the function y= W * x + b. After that, we can calculate its square, and average the sum. In TensorFlow this cost function is expressed as follows:
loss = tf.reduce_mean(tf.square(y - y_data))
As we see, this expression calculates the average of the squared distances between the y_data point that we know, and the point y calculated from the input x_data.
At this point, the reader might already suspects that the line that best fits our data is the one that obtains the lesser error value. Therefore, if we minimize the error function, we will find the best model for our data.
Without going into too much detail at the moment, this is what the optimization algorithm that minimizes functions known asgradient descent[15] achieves. At a theoretical level gradient descent is an algorithm that given a function defined by a set of parameters, it starts with an initial set of parameter values and iteratively moves toward a set of values that minimize the function. This iterative minimization is achieved taking steps in the negative direction of the function gradient[16]. It’s conventional to square the distance to ensure that it is positive and to make the error function differentiable in order to compute the gradient.
The algorithm begins with the initial values of a set of parameters (in our case W and b), and then the algorithm is iteratively adjusting the value of those variables in a way that, in the end of the process, the values of the variables minimize the cost function.
To use this algorithm in TensorFlow, we just have to execute the following two statements:
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
Right now, this is enough to have the idea that TensorFlow has created the relevant data in its internal data structure, and it has also implemented in this structure an optimizer that may be invoked by train, which it is a gradient descent algorithm to the cost function defined. Later on, we will discuss the function parameter called learning rate (in our example with value 0.5).
Running the algorithm
As we have seen before, at this point in the code the calls specified to the library TensorFlow have only added information to its internal graph, and the runtime of TensorFlow has not yet run any of the algorithms. Therefore, like the example of the previous chapter, we must create a session, call the run method and passing train as parameter. Also, because in the code we have specified variables, we must initialize them previously with the following calls:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
Now we can start the iterative process that will allow us to find the values of W and b, defining the model line that best fits the points of entry. The training process continues until the model achieves a desired level of accuracy on the training data. In our particular example, if we assume that with only 8 iterations is sufficient, the code could be:
for step in xrange(8):
sess.run(train)
print step, sess.run(W), sess.run(b)
The result of running this code show that the values of W and b are close to the value that we know beforehand. In my case, the result of the print is:
(array([ 0.09150752], dtype=float32), array([ 0.30007562], dtype=float32))
And, if we graphically display the result with the following code:
plt.plot(x_data, y_data, 'ro')
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.legend()
plt.show()
We can see graphically the line defined by parameters W = 0.0854 and b = 0.299 achieved with only 8 iterations:

Note that we have only executed eight iterations to simplify the explanation, but if we run more, the value of parameters get closer to the expected values. We can use the following sentence to print the values of W and b:
print(step, sess.run(W), sess.run(b))
In our case the print outputs are:
(0, array([-0.04841119], dtype=float32), array([ 0.29720169], dtype=float32))
(1, array([-0.00449257], dtype=float32), array([ 0.29804006], dtype=float32))
(2, array([ 0.02618564], dtype=float32), array([ 0.29869056], dtype=float32))
(3, array([ 0.04761609], dtype=float32), array([ 0.29914495], dtype=float32))
(4, array([ 0.06258646], dtype=float32), array([ 0.29946238], dtype=float32))
(5, array([ 0.07304412], dtype=float32), array([ 0.29968411], dtype=float32))
(6, array([ 0.08034936], dtype=float32), array([ 0.29983902], dtype=float32))
(7, array([ 0.08545248], dtype=float32), array([ 0.29994723], dtype=float32))
You can observe that the algorithm begins with the initial values of W= -0.0484 and b=0.2972 (in our case) and then the algorithm is iteratively adjusting in a way that the values of the variables minimize the cost function.
You can also check that the cost function is decreasing with
print(step, sess.run(loss))
In this case the print output is:
(0, 0.015878126)
(1, 0.0079048825)
(2, 0.0041520335)
(3, 0.0023856456)
(4, 0.0015542418)
(5, 0.001162916)
(6, 0.00097872759)
(7, 0.00089203351)
I suggest that reader visualizes the plot at each iteration, allowing us to visually observe how the algorithm is adjusting the parameter values. In our case the 8 snapshots are:

As the reader can see, at each iteration of the algorithm the line fits better to the data. How does the gradient descent algorithm get closer to the values of the parameters that minimize the cost function?
Since our error function consists of two parameters (W and b) we can visualize it as a two-dimensional surface. Each point in this two-dimensional space represents a line. The height of the function at each point is the error value for that line. In this surface some lines yield smaller error values than others. When TensorFlow runs gradient descent search, it will start from some location on this surface (in our example the point W= -0.04841119 and b=0.29720169) and move downhill to find the line with the lowest error.
To run gradient descent on this error function, TensorFlow computes its gradient. The gradient will act like a compass and always point us downhill. To compute it, TensorFlow will differentiate the error function, that in our case means that it will need to compute a partial derivative for W and b that indicates the direction to move in for each iteration.
The learning rate parameter mentioned before, controls how large of a step TensorFlow will take downhill during each iteration. If we introduce a parameter too large of a step, we may step over the minimum. However, if we indicate to TensorFlow to take small steps, it will require much iteration to arrive at the minimum. So using a good learning rate is crucial. There are different techniques to adapt the value of the learning rate parameter, however it is beyond the scope of this introductory book. A good way to ensure that gradient descent algorithm is working fine is to make sure that the error decreases at each iteration.
Remember that in order to facilitate the reader to test the code described in this chapter, you can download it from Github[17]of the book with the name of regression.py. Here you will find all together for easy tracking:
import numpy as np
num_points = 1000
vectors_set = []
for i in xrange(num_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
vectors_set.append([x1, y1])
x_data = [v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]
import matplotlib.pyplot as plt
#Graphic display
plt.plot(x_data, y_data, 'ro')
plt.legend()
plt.show()
import tensorflow as tf
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in xrange(8):
sess.run(train)
print(step, sess.run(W), sess.run(b))