The MNIST Dataset

This web application demonstrates how to train an artifical neural network to classify image data. The example chosen here is the MNIST dataset comprising 60000+10000 scans of handwritten digits (0-9) together with their "ground truth" labels. The technology used is TensorFlow.js together with a custom JavaScript.
 
Proceed as follows:

  1. In the MNIST tab: Wait until the MNIST dataset is loaded. (This may take seconds or minutes.)
  2. In the model tab: Create one or more models. Modify the model architecture to your liking.
  3. In the training tab: Train the model until the performance is good enough (e.g. until a validation accuracy of 98% is reached).
  4. In the inference tab: Draw a number with the mouse onto the screen and let the model(s) predict the intended digit.
Please wait until the dataset is loaded.

The model

Create one or more models. Then, a sketch appears that visualizes the model architecture: Vertical lines symbolize the intermediate states as the data flows through the neural net. The leftmost vertical line symbolizes the input (a batch of pictures), the rightmost vertical line symbolizes the output (the numbers 0-9 to which probabilities shall be assigned). Horizontal lines symbolize "layers", i.e. certain transformations that are applied to the intermediate states.

The input (a tensor of size [batch]x28x28x1) and the output (a "flatten"-layer that reduces the tensor dimension to [batch]x[number of activations] plus a fully connected layer with a softmax activation function between this last-mentioned layer and the 10 output nodes) is fixed. Between these fixed objects, the model architecture can be modified.

Click here to create and to customize an empty model.
Click here to create a simple model with two convolutional layers. (This is the model used by the Google TensorFlow.js team in their MNIST demo.)
A minor modification of the simple CNN model above; removes maxpool in favor of stride=2 for the CNNs and introduces batch normalization. Apparently, this variant trains a bit quicker but is more prone to overfitting.
This variant carries the modifications a bit further, in order to mimick ensembling (where the predictions of multiple models are evaluated simultaneously) in an attempt to make the predictions more stable. Three versions of the two convolutional layers are addressed in parallel. Dropout (with p=10%) is used to add noise to the respective inputs to decouple these paths as far as possible.

Train the model(s)

For performance reasons, in each epoch only 10% of the data is used for training. Each time the training button is hit, the models are trained for 10 epochs. Data is processed in batches of size [512 div. by number of models]. There are two sets of plots:

The "loss" shown in the uppermost plots is "categoricalCrossentropy". This is the function that is used by the numerical optimizer in order to adjust the weights of the model. The lower plots show the metric "accuracy", i.e. the fraction of correctly predicted digits, comparing the digit with the highest predicted probability with the ground truth label. (These visualizations are provided by the tfjs-vis library.)

Inference: Apply the model(s) to new data

Draw a digit into the canvas and press "Start inference" to see the predictions of the models. Intermediate activations can be seen by hovering the mouse over intermediate states in the schematic drawings of the model architectures.

Obviously, the input used during inference is not "drawn from the same distribution" compared to the input used during training, a somewhat worse performance is to be expected. If the results are not consistently of high quality, please consider to train the model(s) for more epochs.



 

 

About

Created 2020 by Dr. F. Schirmeier.
 
Please use a modern browser (IE is not supported).
 
Based on the demo app provided by the Google TensorFlow team. This already contains:

The contribution of the present app is