Before we start with Tensorflow tutorial, let’s cover basics of convolutional neural network.If you are already familiar with conv-nets(and call them conv-nets), you can move to part-2 i.e. Neural Networks are essentially mathematical models to solve an optimization problem.We shall slide convolutional filter over whole input image to calculate this output across the image as shown by a schematic below: In this case, we slide our window by 1 pixel at a time.If some cases, people slide the windows by more than 1 pixel.Then the output sizes w2*h2*d2 will be:w2= (w1-f)/S 1h2=(h1-f)/S 1d2=d1Most common pooling is done with the filter of size 2*2 with a stride of 2.As you can calculate using the above formula, it essentially reduces the size of input by half.Typically, all the neurons in one layer, do similar kind of mathematical operations and that’s how that a layer gets its name(Except for input and output layers as they do little mathematical operations).

As you can see, after each convolution, the output reduces in size(as in this case we are going from 32*32 to 28*28).

Then, the output size will be: Pooling layer is mostly used immediately after the convolutional layer to reduce the spatial size(only width and height, not depth).

This reduces the number of parameters, hence computation is reduced.

So, in this example, if we add a padding of size 2 on both sides of the input layer, the size of the output layer will be 32*32*6 which works great from the implementation purpose as well.

Let’s say you have an input of size N*N, filter size is F, you are using S as stride and input is added with 0 pad of size P.

This number is called of size 28*28(can you think of why 28*28 from 32*32 with the filter of 5*5 and stride of 1).

