A step-by-step tutorial
- Environment Setup
- Building the neural network
5.1 Define the layers
- Test set performances
Both R and Python are useful and popular tools for Data Science. However, when it comes to Deep Learning, it is most common to find tutorials and guides for Python rather than R.
This post provides a simple Deep Learning example in the R language. It aims at sharing a practical introduction to the subject for R practitioners, using Keras.
In this example, we share code snippets that can be easily copied and pasted on Google Colab¹.
Colab allows anyone to create notebooks in Python or R by writing code through the browser, entirely for free.
We can create a new R notebook in Colab through this link². From there, we install Keras as follows:
Although we leverage Colab for simplicity, the local installation process is equally straightforward³. We can now import the needed libraries:
install.packages("keras") also installs TensorFlow. It is possible to check the available versions of Keras and TensorFlow from the installed packages list:
Our purpose is to classify images of handwritten digits. For this example,we make use of the MNIST⁴ dataset, a classic of the Machine Learning community.
The dataset has the following characteristics:
- 60,000 training images and 10,000 test images.
- Images of size 28 x 28 pixels.
- 10 categories (digits from 0 to 9).
- Grayscale images: pixel values range between 0 (black) and 255 (white).
Neural Networks require data in the shape of tensors. Tensors are algebraic objects with an arbitrary number of dimensions (D). For example, we can see vectors as 1D tensors and matrices as 2D tensors.
In the case of images, we need a vector space able to convey:
- Number of images (N)
- Image height (H)
- Image width (W)
- Color channels (C), also known as color depth.
Therefore, in Deep Learning tasks images are generally represented as 4D tensors with shape: N x H x W x C.
In the case of grayscales images, the color channel is a single number (from 0 to 255) for each sample. Hence, it is possible to either omit the channel axis or leave it equal to one.
Let us import the MNIST dataset from Keras (under the Apache 2.0 License⁵) and verify the shape of the training and test images:
The representation of input observations as tensors applies to any data type. For example, tabular data in the form of a csv file with 300 rows (samples) and 8 columns (features) can be seen as 2D tensors of shape 300 x 8.
We can have a look at some samples by their corresponding label:
As color values are in the [0, 255] interval, we can scale them to be in the [0, 1] interval. Moreover, we can reshape the input by flattening images from a 2D 28 x 28 to 1D 784 (28*28) without information loss:
Note: Keras (as other common libraries) expects to reshape an array by filling new axes in row-major ordering (from the C language). This is the behaviour of
array_reshape(). R practitioners may be more familiar with
dim<-() to deal with matrices shapes. Nevertheless,
dim<-() fills new axes in column-major ordering (from the Fortran language):
The labels must be converted from a vector with integers (each integer representing a category) into a matrix with binary values and columns equal to the number of categories:
5.1 Define the layers
The cornerstone of a neural network is the layer. We can imagine the layer as a module that extracts a representation of the input data that is useful to the final goal.
We can build a neural network by stacking layers sequentially. Keras allows to do it by leveraging
keras_model_sequential. In this example, we create a network composed of three layers:
- A fully connected (or dense) layer that produces an output space of 512 units.
- A dropout layer to “drop out” randomly 20% of neurons during the training. In brief, this technique aims at improving the generalization capabilities of the model.
- A final dense layer with an output of 10 units and a softmax activation function. This layer returns an array of probability scores for each category, each being the probability of the current image to represent a 0, 1, 2, … up to 9:
We can inspect the model’s structure as follows:
In the compilation step, we need to define:
- Loss function:
- The loss function must provide a reasonable estimate of the model error.
- The network tries to minimize this function during training.
- It specifies how the weights of the model get updated during training.
- It makes use of the gradient of the loss function.
- An array of metrics to monitor during the training procedure.
Fitting the model means finding a set of parameters that minimizes the loss function during training.
Input data is not processed as a whole. The model iterates over the training data in batches, each of size
batch_size. An iteration over all the training data is called
epoch. We must declare the number of epochs when fitting the model. After each epoch, the network updates its weights to minimize the loss.
After training an estimator, it is a good practice to assess its performances on out-of-sample data. We can measure the accuracy (fraction of handwritten digits correctly classified) on the test set:
Let us have a look at some misclassified digits:
In this post, we created a simple neural network in Keras, sharing at the same time an introduction to Deep Learning concepts. In particular, we used the R language, that is generally not as commonly seen in Deep Learning tutorials or guides as Python.
Notably, some the two most popular Deep Learning frameworks, Torch and TensorFlow, support R as well.
It is possible to find more information and examples here:
- TensorFlow for R⁶
- Torch for R⁷
- Francois Chollet, J.J. Allaire, “Deep Learning with R”, Manning, 2018⁸.