Model creation, deployment, and scoring
In this article, we will cover all the required steps to create, deploy and consume a model in Azure Machine Learning Studio. In a previous series, I covered this topic already. However, in the last year, many updates were made to Azure Machine Learning Studio (AML Studio), including a new version of the Python SDK. Therefore an update of this series was required. The outline of this article is as follows: firstly we will present a general introduction to AML Studio, then we will create a compute and environment. Thirdly, we will create a data store and upload the data here. Then, for the remainder of this article, we will create a model, deploy it and test it.
Azure Machine Learning Studio is the Machine Learning Suite in Microsofts Azure platform. It can be used to create, deploy and consume models, includes versioning of both data and models, and can be used with both low code and SDK options. In this article, we will go over the Python SDK (V2), but be sure to also check out the low code and AutoML functionality of AML Studio.
AML Studio contains notebooks to create scripts in which the SDK can be used to create a compute, environment, and model. To run a notebook, we will first need a compute, which you can create using the Compute tab. We will also show how to create a new compute using the SDK, but you first need a compute to run the required commands. Notebooks can be created and changed in AML Studio directly or you can connect to the workspace with Visual Studio Code if you prefer. Details on how to do this can be found here.
In this article, we will create notebooks in AML Studio itself. Before we start, there is a very convenient shortcut to run all cells in a notebook: Alt+R. Before we can create anything within the workspace, we will first need to authenticate to the workspace to get a workspace handle. This can be done by running the following script:
This is required for all other notebooks we will create in this article, so always run this command first. Now let’s get started by creating a compute.
Create a Compute
A compute can be created with the UI, be sure to create a compute instance with the specifications you require. This compute can then be used to run for instance notebooks. The following code snippet can be used to create a new compute instance using the Python SDK:
You can also create a compute cluster if you require more than one node. The code to do this is slightly different and can be found below:
Create an Environment
In order to train and deploy a model, we also need an environment that specifies the required packages to install. You can also provide the package version to guarantee that your code will run as expected and only updates packages if you tested them first. Environments in AML Studio are very similar to for instance Conda environments. To create them, we will first create a subdirectory ‘dependencies’. Then we will create the .yml file that we need to train and deploy our model. Once this is done, we need to trigger the creation of the actual environment. Environments in AML Studio can be based on predefined images that are maintained by Microsoft. A detailed explanation is provided here.
The following code creates and registers an environment in AML Studio:
In this article, we will use data from Kaggle to train a model. The dataset we use can be found here, provided by Ahsan Raza, and is available under a Creative Commons 4.0 License.
The assignment is to predict if a customer will cancel the reservation or not. In order to use this dataset, we will first upload it to a container in the blob storage connected to the AML workspace. Once this is done, we can create a datastore, if this is not done already, by selecting the Data tab in AML Studio. There, register the datastore using the UI. Now we can access the file by using the new AzureMachineLearningFileSystem class. For this, the full URI of the dataset is required, which can be obtained by selecting the datastore, selecting the file, and copying the URI.
By default, the azureml-fsspec package is not installed, so before using it, we need to pip install this, by using the magic %pip command. For opening the file, this yields the following code:
Create a Model
In order to create a model, we need to create a Main.py file that contains the logic of training the model. For that, we first create a directory to store the file. Now, we will write the main file, similar to how we wrote the .yml file for the Environment. However, be sure to first test all components, before submitting it as a job. That will save you a lot of time, because the creation time of a job is quite long, around 10 minutes.
First, we need a way to add parameters to the training script. We will do this with the argparse module. For logging, we will use the mlflow module. The autolog method makes it possible to not write a score script because this will be done automatically. With that out of the way, let’s build the actual model!
We will use a very basic sklearn setup to create the model. Be sure to install sklearn if you did not do this yet. First, we split the dataset into the training and the test set. Because this is merely an example to show AML Studio, we will not use a validation set or use a complicated model. The dataset contains both numerical and categorical features. Therefore, we will use the OneHotEncoder and StandardScaler from sklearn. We save these in a pipeline with a Logistic Regression model. Creating a pipeline this way ensures that we will use the exact same transformations for training data and test data that we will send to the model later. After fitting the model and transformer, we save and register the model. This results in the following code:
Now that we have a Main.py file, we need to run a command to run the file and register the model. A command contains several elements and combines all the steps that we have taken so far. First, we provide inputs in a dictionary for all the variables of the Main.py file. Next, we specify the location of the code and the actual command. We need an environment in which this code can run, so we specify the environment we have created previously. The same applies to the compute. The last two parts of the command are the experiment and display name, used in AML Studio. After triggering this job, a link with the details page will be provided, in which you can follow the progress of the model. Most importantly, you can also see detailed logs for debugging any errors that you encounter.
Now that we have created a model, we need to deploy the model to an endpoint in order to use it. To do this, we need to take two steps. Firstly, we create an endpoint and secondly we deploy the model to the endpoint. In this article, we will create an online endpoint. If you need to score large volumes of data, it might be better to create a batch endpoint. The process to do this is quite similar, more details on batch endpoints can be found here.
Creating the online endpoint is quite easy. You need to specify a name and can select methods to authenticate to the endpoint. The two provided options are key and aml_token. The main difference is that keys do not expire, whereas tokens do. In this example, we will use a key. It is possible to add tags to the endpoint to provide extra information on for instance the data used and the type of model. Tags need to be provided in a dictionary. Below you can find the code for creating an online endpoint.
After the endpoint is created (this will take several minutes), we need to deploy the model to the endpoint. First, we retrieve the latest version of the model by name. Then we need to specify the deployment configuration. Note that the instance_type sets the type of compute used to run the endpoint on and it must be selected from the available list of instances. After specifying the configuration we need to run the deployment. This again will take several minutes to complete.
Having created and deployed our model, it is time to test it! We can send request files to the endpoint and receive the prediction. The code and the key to do this from outside AML Studio can be found in the Endpoint tab. There, code to consume the model is provided for C#, Python and R. However, let’s first test the model directly from our AML Studio notebook. To do this, we create a JSON file with input_data. It requires three values. First, we specify the columns in a list. It is important to use the exact same column names as during the training of the model. Next, we specify a list of indices, counting the number of predictions we need. Last, we create a list with a list for each data input. It is important to use the same order for the data inputs as in the columns list. Below you can find the example for the model we created and the code required to send this file to the endpoint. Be sure to delete the endpoint and terminate the compute instance if you do not longer need it. The code to do this is provided in the same example, commented out on the last line.
In this article, we have created a compute, environment, and model using Azure Machine Learning Studio. Then, we deployed the model and tested it using sample data. Once again, do not forget to delete the endpoint and terminate the compute instance if you do not longer need it. That was it for now, have fun using AML Studio for your data science projects!