[ad_1]

## Can we use PyTorch in Rust? What are Rust bindings? What’s tch-rs? A look on neural networks in Rust

It’s been a while since the last time when we had a look at Rust and its application to Machine Learning — please, scroll down to the bottom for the previous tutorials on ML and Rust. Today I would like to present you a step forward, introducing neural networks in Rust. There exists a Rust Torch, which allows us to create any kind of neural network we want. The Bindings are the key point to landing a Rust Torch. Bindings allow the creation of *foreign function interfaces *or FFIs, which create a bridge between Rust and functions/codes written in a language. Good examples can be found in the Rust nomicon

To create bindings with C and C++ we can use bindgen, a library that automatically generated Rust FFI. From bindings to C++ api of PyTorch, Laurent Mazare has helped the Rust community to have a Rustacean version of PyTorch. As the GitHub page says, tch provides thin wrappers around the C++ libtorch . The big advantage is that the library is strictly similar to the original ones, so there are no learning barriers to overcome. The core code is quite easy to read.

First of all, let’s have a look at the code. This is the best starting point to get an additional understanding of the Rust infrastructure.

Firstly, to have an idea about Rust FFI we can peep these files . Most of them are automatically generated, while Laurent and coworkers have put together magnificent pieces of code to connect C++ Torch APIs with Rust.

Following, we can start reading the core code in `src`

, in particular, let’s have a look at `init.rs`

. After the definition of an `enum Init `

there is a public function `pub fn f_init `

, which matches the input initialisation method and returns a tensor for weights and one for biases. We can learn the use of `match`

which reflects `switch`

in C and `match`

in Python 3.10. Weights and bias tensors are initialised through random, uniform, Kaiming, or orthogonal methods (fig.1).

Then, for the type `enum Init`

we have the methods implementation `impl Init`

. The implemented method is a setter `pub fn set(self, tensor: &mut Tensor)`

which is a great example to further appreciate the concept of ownership and borrowship in Rust:

We talked about borrowship in our very first tutorial. It’s the right time to understand better this concept. Suppose we could have a similar `set`

function:

`pub fn set(self, tensor: Tensor){}`

In the main code, we could call this function, passing a tensor `Tensor`

. The `Tensor`

will be set and we will be happy. However, what if we are calling `set`

on `Tensor`

again? Well, we would run into the error `value used here after move`

. What does this mean? This error is telling you that you moved `Tensor`

into `set`

. *A **move** means that you have transferred ownership* to `self`

in `set`

When you’re calling `set(self, tensor: Tensor)`

again, you would like to have ownership back of `Tensor`

for setting up again. Luckily in Rust this is not possible, differently in C++. In Rust, *once a **move** has been done the memory allocated for the process gets deallocated*. Thus, what we want to do here is to *borrow* the value of `Tensor`

to `set`

so we can keep ownership. To do that we need to call `Tensor`

by reference, so `tensor: &Tensor`

. Since we are expecting `Tensor`

to mutate we’ll have to add `mut`

so: `tensor: &mut Tensor`

Moving forward, we can see another important element, which is simple and makes use of the `Init`

class: `Linear`

, namely a fully connected neural network layer:

Fig. 3 shows how easy is to set up a fully connected layer, which is made of a weight matrix `ws_init`

and bias matrix `bs_init`

. The default initialisation is made with `super::Init::KaimingUniform`

for weights, a function we saw above.

The main fully connected layer can then be created with the function `linear`

. As you can see in the function signature, namely what’s between the `<...>`

, there are a few interesting things (fig.4). Firstly, the *lifetime annotation*`'a`

. As we said above Rust automatically recognises when a variable has gone out of scope and can be freed. *We can annotate some variables to have a specific lifetime*, so we can decide how long they can live. The standard annotation is `'a`

where `'`

denotes a lifetime parameter. One important thing to remember is that this signature doesn’t modify anything within the function, but it tells the function borrower to recognise all those variables whose lifetime can satisfy the constraints we are imposing.

The second argument is `T: Borrow`

This annotation means: take `nn::Path`

specified in `var_store.rs`

and borrow this type to `T`

. Any type in Rust is free to borrow as several different types. This type will be used to define the input hardware (e.g. GPU), as you can see with `vs:T`

. Finally, the input and output dimensions of the network are specified as integers `in_dim: i64, out_dim: i64`

along with the `LinearConfig`

for initialization of weight and bias `c: LinearConfig.`

It’s time to get our hands dirty and play with Torch Rust. Let’s set up a simple linear neural network, then a sequential network, and finally a convolutional neural network using the MNIST dataset. As always you can find all the materials on my ML ❤ Rust repo. Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset and it has been made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.

## A simple neural network in Rust

As always, the first step for a new Rust project is `cargo new NAME_OF_THE_PROJECT`

in this case `simple_neural_networks`

. Then, we can start setting up the `Cargo.toml`

with all the packages we need: we’ll be using `mnist`

, `ndarry`

and obviously `tch`

— fig.5. I decided to use `mnist`

to extract the original MNIST data, so we can see how to transform and deal with array and tensors. Feel free to use the `vision`

resource already present in `tch.`

We’ll be using `mnist`

to download the MNIST dataset, and `ndarray`

to perform some transforms on the image vectors, and convert them into `tch::Tensor`

.

Let’s jump to the `main.rs`

code. In a nutshell, we need:

- to download and extract the MNIST images and return a vector for training, validation, and test data.
- From these vectors, we’ll have to perform some conversion to
`Tensor`

so we’ll be able to use`tch`

. - Finally, we’ll implement a series of epochs, in each epoch we’ll multiply the input data with the neural network weight matrix and we’ll perform backpropagation to update the weight values.

`mnist`

automatically downloads the input files from here. We need to add `features = ['download']`

in `Cargo.toml`

to activate the download functionality. After files have been downloaded, raw data is extracted — `download_and_extract()`

— and subdivided into training, validation and test sets. Note that the main function will not return anything, so you need to specify `-> Results<(), Box`

and `Ok(())`

at the end of the code (fig.6)

Now, the very first Torch thing of the code: convert an array to `Tensor.`

The output data from `mnist`

is `Vec`

. The training vector structure has a`TRAIN_SIZE`

number of images, whose dimensions are`HEIGHT`

times `WIDTH`

. These three parameters can be specified as `usize`

type and, together with the input data-vector, they can be passed to `image_to_tensor`

function, as shown in fig.7, returning `Tensor`

The input `Vec`

data can be reshaped to `Array3`

with `from_shape_vec`

and values are normalised and converted to `f32`

, namely `.map(|x| *x as f32/256.0)`

. From an array it is easy to build up a torch Tensor as shown on line 14, `Tensor::of_slice(inp_data.as_slice().unwrap());`

. The output tensor size will be `dim1 x (dim2*dim3)`

For our training data, setting `TRAIN_SIZE=50'000`

, `HEIGHT=28`

and `WIDTH=28`

, the output training tensor size will be `50'000 x 784`

.

Similarly, we’ll convert the labels to a tensor, whose size will be `dim1`

— so for the training labels we’ll have a `50'000`

long tensor https://github.com/Steboss/ML_and_Rust/blob/aa7d495c4a2c7a416d0b03fe62e522b6225180ab/tutorial_3/simple_neural_networks/src/main.rs#L42

We’re now ready to start tackling with linear neural network. After a zero-initialization of weight and bias matrices:

let mut ws = Tensor::zeros(&[(HEIGHT*WIDTH) as i64, LABELS], kind::FLOAT_CPU).set_requires_grad(true);let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);

which resembles the PyTorch implementation, we can start computing the neural network weights.

Fig.8 shows the main routine to run the training of a linear neural network. Firstly, we can give a name to the outermost for loop with `'train`

The apostrophe, in this case, is not an indicator of a lifetime, but of loop name. We are monitoring the loss for each epoch. If two consecutive losses difference is less than `THRES`

we can stop the outermost cycle as we reached convergence — you can disagree, but for the moment let’s keep it 🙂 The entire implementation is super simple to read, just a little caveat in extracting the accuracy from the computed `logits`

and the jobs is done 🙂

When you are ready you can directly run the entire `main.rs`

code with `cargo run`

On my 2019 MacBook Pro, 2.6GHZ, 6-CORE Intel Core i7, 16GB RAM, the computation takes less than a minute, achieving a test accuracy of 90.45% after 65 epochs

## Sequential neural network

Let’s now see the sequential neural network implementation https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/custom_nnet

Fig.9 explains how the sequential network is created. Firstly, we need to import `tch::nn::Module`

. Then we can create a function for the neural network `fn net(vs: &nn::Path) -> impl Module`

. This function returns an implementation for `Module`

and receives as input `nn::Path`

which is structural info about the hardware to use for running the network (e.g. CPU or GPU). Then, the sequential network is implemented as a combination of linear layer of input size `IMAGE_DIM`

and `HIDDEN_NODES`

nodes, a `relu`

and a final linear layer with `HIDDEN_NODES`

inputs and `LABELS`

output.

Thus, in the main code we’ll call the neural network creation as:

// set up variable store to check if cuda is available

let vs = nn::VarStore::new(Device::cuda_if_available());// set up the seq net

let net = net(&vs.root());// set up optimizer

let mut opt = nn::Adam::default().build(&vs, 1e-4)?;

along with an Adam optimizer — remember the `?`

at the end of `opt`

otherwise you’ll return a `Result<>`

type which doesn’t have the functionality we need. At this point we can simply followed the procedure as per PyTorch, so we’ll set up a number of epochs and perform the backpropagation withthe optimizer’s `backward_step`

method with a given `loss`

## Convolutional neural network

Our final step for today is dealing with convolutional neural network: https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/conv_nnet/src

At first, you can notice we are now using `nn::ModuleT`

. This module trait is an additional train parameter. This is commonly used to differentiate the behaviour of the network between training and evaluation. Then, we can start defining the structure of the network `Net`

which is made of two conv2d layers and two linear ones. The implementation of `Net`

states how the network is made, the two convolutional layers have a stride of 1 and 32, padding 32 and 64, and dilation of 5 and 5 respectively. The linear layers receive an input of 1024 and the final layer returns an output of 10 elements. Finally, we need to define the `ModuleT`

implementation for `Net`

. Here, the forward step `forward_t`

receives an additional boolean argument, `train`

and it will return a `Tensor`

. The forward step applies the convolutional layer, along with `max_pool_2d`

and `dropout`

. The dropout step is just for training purposes, so it’s bound with the boolean `train`

.

To increase the training performance, we’ll train the conv-layer with batches from the input tensor. For this reason you need to implement a function to split into random batches the input tensors:

`generate_random_index`

takes the input image array and the batch size we want to split it to. It creates an output tensor of random integers `::randint`

.

Fig.13 shows the training step. The input dataset is split into `n_it`

batches where `let n_it = (TRAIN_SIZE as i64)/BATCH_SIZE;`

. For each batch we compute the loss from the network and back propagate the error with `backward_step`

.

Running the convolutional network on my local laptop required few minutes, achieving a validation accuracy of 97.60%.

You made it! I am proud of you! Today we had a little peep to `tch`

and how to set up a few computer vision experiments. We saw the inner structure of the code for the initialization and the linear layer. We reviewed some important concepts about borrowship in Rust and we learned what’s a lifetime annotation. Then, we jumped into the implementation of a simple linear neural network, a sequential neural network, and a convolutional one. Here we learned how to process how to input images and convert them to `tch::Tensor.`

We saw how to use the module `nn:Module`

for a simple neural network, to implement a forward step and we saw also its extension `nn:ModuleT`

. For all these experiments we saw two methods to perform backpropagation, either with `zero_grad`

and `backward`

or with `backward_step`

directly applied to the optimizer.

I hope you enjoyed my tutorial 🙂 Stay tuned for the next episode.

[ad_2]

Source link