Classification Tutorial Using TensorFlow Keras and Deep learning.
Step by step tutorial for beginners to understand Deep Learning with TensorFlow.
Introduction
Machine Learning algorithm is a function that can tune variables in order to map our input to the output. Training of these data is done for thousands or even millions of iterations over the input and output data. In general, a ML problem has an input and an output but the algorithm to find these outputs is what must be learnt. The Neural networks are in between these inputs and outputs
Neural Network can be defined as a stack of layers where each layer has a predefined Math and internal variables. Each layer is made up of units also called as Neurons.
In Neural Networks the math and internal variables are applied, and the resulting output is produced. In order to produce the results a neural network must be trained repeatedly to map the inputs to the outputs. In general, while training the neural networks it involves the tuning of the internal variables in the layers until the network provides the output for the new inputs.
Dense layer
A layer is called a dense layer when all the neurons present in the layer are fully connected to neurons in the previous layer.
X1,X2 and X3 are inputs and a1, a2, a3 are the neurons in the hidden layer. The a3 is the neuron in the output layer. As told before each layer has predefined math here w and b are weights and biases. These are the weights and biases that gets adjusted while the model is being trained.
The model that’s being calculated in dense layer is as follows:
Output at neuron a1: a1 = x1w11 + x2w12 + x3*w13 + b1
Output at neuron a2: a2 = x1w21 + x2w22+ x3*w23 + b2
Output at the output layer: a3 = a1w31 + a2w32 + b3
Example:
Following is a syntax in creating a single dense layer using Keras.
hidden = keras.layers.Dense (unit=2, input_shape=[3])
output = keras.layers.Dense (units=1)
model = tf.keras.sequential (hidden, output)
Basic Activation functions in Dense layer
Activation functions basically finds the output from a neural network. This function is present in each neuron and the functions decides if the neuron should be activated or turned off. This is based on neurons input is relevant for the model prediction.
TanH / Hyperbolic Tangent
The output of TanH is zero centered and it is easier to model the inputs which have higher negative and positive values. The output is like sigmoid and it is between 0 to 1. The disadvantage is like sigmoid.
SoftMax Activation Function
SoftMax function takes input as a vector and it normalizes the input into a probability distribution, the probabilities obtained is proportional to the exponential of the input vectors.
Fashion MNIST
Fashion MNIST is a dataset of Zalando’s article images, the dataset contains a set of 60000 train examples and 10000 test examples.
Each of the images in this dataset is 28X28 greyscale image and it is associated with a label from 10 classes. This dataset can be used as a drop-in replacement for MNIST. The class labels are:
Applying Neural Networks to Fashion MNIST Dataset
Input Images (28*28 =784 pixels) — This is a flattened layer with images as a one long string each of 784 pixels.
Each of the 784 pixels in the input layer is connected to each of the 128 Neurons in the dense layer. Dense layer will adjust the weights and bias in the training phase.
The output layer consists of 10 Neurons corresponding to each of 10 classes. Each of the neuron will give a probability score for all the classes and the final output will be the output neuron with the highest probability.
To make this more robust, Intoducing CNN’s — Convolutional Neural Networks.
CNN’s work in Two steps:
- Convolutions
2. Pooling
Convolution
Let’s assume an image of 6 pixels height and 6 pixels in width and the image is in grey scale, every pixel value ranges between 0 to 255. Where 0 is black and 255 being white.
Convolution follows a methodology of creating another grid of numbers called kernel or filter, for instance take it as 3 x 3 grid.
For example, consider pixel highlighted in color green, first step is to center our kernel over the pixel we want. Now the concentration is on 3 by 3 grid as our kernel size is as well 3 by 3. The second step is to pick the image and corresponding kernel value, multiply them and sum the whole thing. The third step is to assign the pixel to convoluted image to the same position i.e. the position in original 2d 6 by 6 matrix.
Example : First gray highlighted pixel in
2 x 1 =2
5 x 2 = 10
4 x 1 = 4
Similarly multiply all the values with kernel values and
summation = 2+ 10+ 4+ — + — + — — — — = 198.
So, 198 is our corresponding value in convoluted image. This step is repeated for all the pixel values and new convoluted pixel value is found.
For the pixels in the corner, there are multiple ways to handle this. The simple way is to ignore this pixel and think it did not exist, the downside of this method is lots of image information is lost. The better way is to perform zero padding i.e. is to add zeros around the corner pixels.
Pooling
The Tutorial utilizes Max Pooling methodology.
Max Pooling is a process of reducing the size of an input image by summarizing regions.
To perform max pooling, we need to select two things, 1. Grid — The pool size and 2. Stride.
For the above grey scale image, consider 2 x 2-pixel grid highlighted in color and look into the pixel, which is in yellow color, the next step is to select the highest value in 2 x 2 grid. In this case among 22, 27, 91 and 110, 110 is the greatest value so the new pixel value is 110 and it is highlighted in yellow color in new image 3 x 3-pixel view.
The parameter stride determines the number of pixels to slide the window in the image.
Techniques to avoid overfitting in Neural Networks
1. Dropout:
Dropouts involves turning off some of the neurons during the training process. The turning off the neurons happens during epochs.
Feed forward a
nd back propagation is used to turn off the neurons during each epoch.
2. Image Augmentation:
Image Augmentation works based on creating new training images by applying many random image transformations. Some of the
transformations are random rotation to the original image, Flipping and random zoom.
3. Early Stopping:
In this method validation loss is tracked during the training of the model, use it to determine when to stop the training such that the model is accurate and making sure its are not over-fitting.
4. Cross validation:
Using validation data to evaluate the loss and model metrics at end of each epoch. This helps to validate on every epoch and indicating how good our model is.
Now, Applying this knowledge to practice
Import TensorFlow Modules
Import Tensorflow and Keras- Implementation of Neural Networks.
Import keras.datasets — The dataset used in the tutorial is imported is available in keras datasets.
Import other necessary modules as shown below.
Downloading the data
Fashion_mnist.load_data will download the data, the data is already available in the form of train and test splits.
Observing the Dimensions of our dataset
((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))
Scaling and Reshaping
Normalize the data by dividing all the pixel values with the max pixel value i.e., 255.
Reshaping the data into 4 channels i.e., (samples, x dimensions, y dimensions, number of channels.) in a format expected by tensorflow.
Function to display the performance summary of the Deep Learning Models.
The function plots accuracy and loss for both training and validation data against the number of epochs.
The following function creates a sequential keras model.
Sequential Model allows to create model layer by layer.
The model utilized here is RELU — Rectified Linear unit.
For initializing the weights The model consists of HE UNIFORM Kernel initializer — It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / fan_in) where fan_in is the number of input units in the weight tensor.
The Max Pooling of 2X2 is added to the model, striding by default will take the value of pool if not mentioned explicitly.
Flattening the matrix into a vector.
The model consists a dense layer with 100 neurons with Relu activation.
The output consists of 10 neurons to represent 10 output classes, with softmax activation function.
Compiling the model
The model uses SGD- Stochastic Gradient Descent — which is computationally less expensive as compared to GD as it updates the weights on each step of the training unlike gradient descent which updates on scanning through the whole training set. SGD can also be called as on-line GD.
Sparse Categorical Entropy is used in this model since there are more than 2 output categories. Categorical Entroy is used to measure the dissimilarity between the distribution of observed class labels and the predicted probabilities of class membership.
Model with a Feature to avoid over fitting
In addition to the previous model, this consists of a convolutional layer with higher number of filters.
It also consists of a dropout Layer technique which turns off the neurons based on wrapping function.
Model Evaluation Methodology
There are 4 different varaitions of the model is this tutorial:
First Model
Description: This model uses 5 KFOLD cross validation, with 100 epochs and validation data.
model1.summary()
From the graph above the model looks like it is overfitted means the training has been memorized by the model and hence we can observe there is
higher training accuracy and lower test accuracies. Also, the training loss is going down whereas the test loss is rising.
Second Model
Description: This model uses 5 KFOLD cross validation, with 100 epochs, validation data and drop out method.
model2.summary()
From the above graph we can observe both the training and validation accuracies are almost the same indicating a good fit. Also the model loss for both train and validation set are both decreasing along the number of epochs. For Accuracy around 18 epoch is same for both train and validation set.
As of now this model looks to be good option. Let’s find how the other model work out.
Early stoping
To prevent overfitting, the early stop function is defined and is used below:
Third Model
Description: This model uses cross validation, with 1000 epochs, validation data and includes early stopping criteria.
model3.summary()
From the above graph this model works good at 10 epochs but fails to perform well during the other epochs. After 10 epochs the training accuracy increases but the test accuracies decreases indicating the model is getting over fitted. Same is the case model loss.
Fourth Model
Description: This model uses cross validation, with 1000 epochs, validation data, includes early stopping criteria and also dropout method.
From the above graph the model looks good fit since both the loss and accuracies seems to go hand in hand for both training and validation data.
Comparing the model 2 and 4, The Model 2 comes out to be the best among all the four models so moving forward the final phase is to train the model 2 on complete training set and evaluate on the testing data.
The Test accuracy of the model is found to be: 90.86999893188477
Conclusion:
1. The input shape of the data plays a critical role in selection of layers and their depth. Selecting oversized/ undersized layers may lead to overfitting/ underfitting, this can learnt by experience, working on number of datasets regularly.
2. The model accuracies depend on the number of convolutional layer selected with proper number of filters, selecting the number of dense layers with the activation function is important for a building a better model.
3. Cross-Validation plays a critical role in deciding the model performance, various models should be tried before applying to entire train and test dataset.
References:
1. https://www.tensorflow.org/tutorials (https://www.tensorflow.org/tutorials) — For Basic syntax.
2. https://classroom.udacity.com/courses/ud187/lessons/1771027d-8685-496f-8891-d7786efb71e1/concepts/8b8c3d93-4117-4134-b678-77d54634b656 (https://classroom.udacity.com/courses/ud187/lessons/1771027d-8685-496f-8891-d7786efb71e1/concepts/8b8c3d93-4117-4134- b678–77d54634b656) — For understanding concepts.
3. Dr. Timothy Havens — https://mtu.instructure.com/courses/1304186/modules (https://mtu.instructure.com/courses/1304186/modules) — For understanding concepts.
4. https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/ 5. (https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/) — For basic structure or Idea of the Tutorial.