Comprehensive Guide to Convolutional Neural Networks (CNNs)

A Interviewe’s guide to Convolutional Neural Networks (CNNs)

Drawbacks of MLP for processing images

Spatial Loss in MLP - Since we flatten the image before sending it to the MLP, there occurs spatial loss. Spatial loss meaning that pixels that are closely related to reach other actually might get farther apart losing their meaning
Exponential increase in parameters - IN MLP, as size of image increases, the number of parameters would exponentially increase. Ex - Lets say there is a 28x28 image. So first we would have to flatten it before sending to the mlp therefore it would become 1x784 sized vector. Now this vector to get accepted in the input we would need 784 nodes in the input. Then in the first layer, if we have 1 hidden layer consisting of 512 nodes it would go to 784 x 512 + 512(biases) = 386560
Fully Connected Layers (Deep) - MLP consists of Fully connected nodes. Fully connected means each node in every layer is connected to every other node in the previous layer as well the next layer. this becomes problematic for large images. Suppose we have image size 1000 x 1000, then in the first hidden layer, there will be 10^6 parameters for one node in first hidden layer. If this first hidden layer contains 1000 nodes then it would become 1 billion parameters only for the first layer. This is infeasible for training.

How CNN solves these problems

CNN contains locally connected layers which are connected to only a small subset of the previous layers. This uses far fewer parameters
The CNN architecture is NOT dependent on the size of the input image. Input image can be 28x28 or even 1000x1000 the cnn would require same number of parameters to train
Since the layers are locally connected and through the sliding window mechanism, there is no spatial loss

How do CNNs learn?

In CNNs the first convolutional layer learns basic features such as lines, edges etc. The next layer learns a little bit more complex features such as circles, squares, shapes etc. And the subsequent layers learn more complex features in this way.

Architecture of CNN

A Basic High Level archtiecture of CNN is as follows

Input Layer
Convolutional Layer (for feature extraction)
Fully connected layer for classification
Output Prediction

Steps

The raw image is passed as an input
The image passes through the convolutional layers to detect patterns and extract features known as feature maps. The output of this step is flattened to a vector of the learned features of the image. Important- Image dimensions shrink after each layer and therefore the number of feature maps increase untils we have a long list of small features

How kernel depth works

In a convolution layer, each kernel always has the same depth as the input.

So the kernel size is really:

( kernel height ) × ( kernel width ) × ( input depth ) (kernel height)×(kernel width)×(input depth) Your example

Input image: 28 × 28 × 1 28×28×1 → Conv1 kernels: 3 × 3 × 1 3×3×1 (if kernel size = 3×3)

After Conv1: 28 × 28 × 4 28×28×4 → Conv2 kernels: 3 × 3 × 4 3×3×4 (depth adjusts to match input depth = 4)

After Conv2: 28 × 28 × 12 28×28×12 → Conv3 kernels (if you had one) would be 3 × 3 × 12 3×3×12

Each kernel “slides” over the height and width of the feature map.

At each spatial location, it takes into account all input channels at once, using its full depth.

The result is one number per location → one feature map.

That’s why the number of kernels = number of output channels. And why the kernel depth always readjusts automatically to match the input depth.

Where does the dropout layer go into the CNN Architecture ? Ans - Dropout layer goes between the fully connected layers at the end of the architecture

What happens to computational compleixty of color images

If we pass a 3x3 filter over a greyscale image, we will have a total of 9 parameters for eachfiler, in color images, every filter is itself a 3d filter. That means every filer has number of parameters (3x3x3=27 parameters) Therefore complexity increases with processing color images.

Important fact of Inception Net - That it has bottleneck layers It has a block named as Inception block

For Resnet architecture - It has a block named as Residual block

Residual block is a combination of skip connection and convolutional layers