Model Management

The data science lifecycle begins with data exploration and modeling in the Data Labs environment, but ultimately, machine learning models need to be deployed into a production environment. Although it is theoretically possible to use the Data Lab container that developed the model as a production environment, it is much more efficient to extract the model from this environment and deploy it in a lightweight container, or even multiple container instances.

Aunsight's Model service provides a mechanism for storing machine learning models in a versioned environment. Aunsight model objects can track up to thirty versions. Having multiple versions of a model can be useful in providing quality assurance as models go through new training data; after each training session, the results of the latest model can be compared against the results from previous versions on the same dataset. Moreover, each version can have more than one data format: Pickle serialization format for Python, .rdata for the R language, and the language-independent PMML XML model description format. Such flexibility allows a single modeling notebook to produce versions that can be used in any number of production environments, making models theoretically "exportable" to new application domains.

Aunsight Data Lab notebooks can upload a serialized version of a trained model into the service with just a few lines of code, or users can manually upload models via the web interface. This article describes how to use the Model service and its accompanying dashboard in the Web interface to store and retrieve models. Additionally, since most data scientists will upload their models directly from their modeling code, this document describes the the model classes from the Aunsight platform SDK for Python, lib-aunsight-py. After reading this document, data scientists with knowledge of Python can upload models directly from a notebook.

The Models Dashboard

The Aunsight web interface provides a models dashboard to users who have the AU-MODEL:view-model permission available to them through a role or group they have in a given context.

To access this dashboard, log in to the web interface and select the context you wish to work in through the context selector. From that context, click the "Models" icon (models icon) in the palette on the right.

Models dashboard

The Models dashboard is a standard list-based view of data lab notebook workspaces available in the present context. You can search (search icon) and sort (sort icon) the list to find a notebook workspace by clicking the appropriate icon at the top of the list. You can also create a new model by clicking the plus icon (new icon).

Creating a Model

To create a new model, clicking the "plus" icon (plus icon) from the models list. This will bring up a model creation dialog.

Models creation dialog

To create a new model, specify a name for the model and click the "create" button (create). In addition to a name, you can specify values for two optional fields: Tags and description.

  • Tags: Aunsight models can be labeled and grouped by tags (32 character limit) that allow users to easily identify groups of related objects (dataflows, workflows, datasets, etc.). Models can be free-form tagged by adding keywords as a comma separated list of tags.

  • Description: Though the field accepts only plain text, it will render markdown formatted content as rich text. The description field can be edited at any time after the model is created.

Managing a Model

Once a model is created or a user has selected one from the models list, the Web interface will display that model record in the main view of the interface. The web interface displays information and tools in several tabs to group similar information and functions logically. Additionally, the model record has a group of action buttons that perform tasks relating to the model record as a whole.

The Details Tab

By default, the web interface displays the details tab where three sections display different metadata:

General Section

The General section displays the ID, name, version numbers, and details on the creation and modification of the resource. To edit the name, description, and tags, click the edit icon (edit icon) in the upper right corner of this section. The other information is managed by Aunsight and cannot be edited.

model general info section

Version Details Section

The Version Details section allows users to view and edit information about the currently selected version. To select a different version, use the versions action button. Version settings specified when creating a model version can be edited on unpublished versions by clicking the edit icon (edit icon) in the upper right corner of this section. If you would like to lock in version settings, clicking the "Publish" button (publish button) will publish the model version, an action which prevents future changes to the model.

model version details section

Version Formats Section

The Version Formats section displays model formats for the specified version. Model version formats are descriptions of the format of an uploaded file for that version. Versions can have more than one format (e.g. Pickle and PMML or RDA and PMML) because the same model may be stored for deployment in more than one process. Choosing a format for storing a model depends upon a combination of the model development language (Python or R) and the intended deployment platform.

model version formats section

To add a model version format, click the "Add format" button in the row for the format you would like to upload (e.g. pkl for Python Pickle files, rda for .rdata files, etc.). Doing so will bring up a tool to specify a model version and upload a model file into it via the web interface.

Create model version format

If there is a model version format, users can edit or upload a file to the format version (edit format), download the model file in that format (download format), or delete the format and its associated model file (delete format) by clicking the appropriate button in the rightmost field.

models formats list

New Version Tab

Every model can have up to thirty versions, and each version can store three different formats of that version. The previous section described how to create formats for a version from the details tab. By contrast, the New Version tab of the web interface allows users to create a new version of the file. To create a new version, users can specify a name and settings for their model and upload a model file as a version format. Model version settings include:

  • Description: Though the field accepts only plain text, it will render markdown formatted content as rich text. The description field can be edited at any time after the model is created.
  • Tags: Aunsight models can be labeled and grouped by tags (32 character limit) that allow users to easily identify groups of related objects (dataflows, workflows, datasets, etc.). Models can be free-form tagged by adding keywords as a comma separated list of tags.
  • Hyperparameters: Hyperparameters can be stored in this field to aid in evaluating model version results.
  • Config: Configuration settings are passed as a JSON object that can be read by the deployment container.
  • Records: Records are a reference to the training dataset(s) used to generate the model version. Users can add more than one dataset since more than one dataset may have been used in training. Though not required, specifying this field can help ensure reproducibility of machine learning experiments.
  • Formats Version formats allow users to specify a different format (e.g. Python Pickle or Rdata from the R programming language) to serialize a machine learning model.

When all the settings have been specified, a model version can be created by clicking the "Submit" button (submit button) at the bottom of the tab.

Action Buttons

An action button group visible from within all tabs provides tools that affect the entire process record:

Versions

The versions action button (versions action button) allows users to select a different version of the model from a dropdown list of processes. Since each version of a model contains different metadata, this button will change the data displayed on the details tab of the record view to reflect the currently specified model version.

versions action button list

Delete

The delete action button allows users to delete a model version or the entire model record itself along with its associated model version formats. If just one version of a model exists, clicking delete will bring up a message asking the user to confirm if they wish to delete the entire model record.

Warning

Model deletion is irreversible!

If more than one version of the model exists, however, clicking the delete action button will allow the user to delete either the record as a whole or just the currently selected model version.

delete

Selecting "Delete version X" will only delete the current model version and its associated formats. The model itself and all other versions will remain.

Warning

Model version deletion is irreversible!

Note

When deleting versions, the version will be lost forever, but the version numbers of all subsequent versions will not be updated. This means users will see an interruption in the version number sequence which indicates that a deleted version once existed.

Uploading Models from Notebooks

While it is possible to upload models from the web interface, it is much more common that models will be uploaded into the model service from the notebook environment where they are created. In essence, machine learning models developed using toolkits like TensorFlow are programming objects and as such can be serialized for persistent storage. Since many popular machine learning toolkits are written for the Python language, Python's pickle format is a common serialization method. Other formats include the XML-based PMML format and the .rdata format generated by the save() function in the R language.

Once a model object has been serialized, it can easily be uploaded using the Aunsight SDKs for Python, R, and Javascript. Because most data scientists work with Python-based machine learning toolkits, this tutorial will give examples using the lib-aunsight-py SDK, but other tools such as the lib-aunsight-r SDK exist.

Regardless of the tool, the model upload process will generally take the following steps to complete:

  1. Use a token to initiate a context (session) for communicating with the Aunsight APIs
  2. Select a model by GUID (or perhaps create a new model)
  3. Create a new version for the model
  4. Create an upload format
  5. Upload a file

The following code performs these steps using the Python lib-aunsight-py SDK to upload the model object my_model_object into a new model version.

from lib_aunsight.context import AunsightContext
import pickle
import io

# Set some variables for the Aunsight objects GUIDs
TOKEN_ID  = 'cc75705d-2f50-420e-9ab1-0e6d5d6d5517'
MODEL_ID = '0fc32fcc-3c97-4e17-81e9-975213d0e586'

# Step 1
# Create an Aunsight context using the token
c = AunsightContext(token=TOKEN_ID)

# Step 2
# Select the model from the context by its ID
m = c.model(MODEL_ID)

# Note: the previous line assumes you want to work with an existing model.
# To create a new model, use this instead:
# m = c.model()
# m.create()

# Step 3
# Create a new model version
v = m.version() # Create a Python class for the new version
v.create() # POST the API request to create this version on the platform

# Step 4
# Create a new format for this model version
# Note: You can create more than one format for each model version
#       as long as the id keyword has a different format specified.
mvf = c.model_version_format(model=m.id, version=v.id, id='pkl')
mvf.create() # POST the API request to create this format on the platform

# Step 5
# Now, makes the API upload request, passing in the pickled object.
mvf.upload(pickle.dumps(my_model_object))

In the preceding example, it is assumed that your model object is named my_model_object. Additionally, the serialization format may be different depending on your use case. When creating a model_version_format object, the id keyword argument should have the value 'pmml' or 'rda' depending on the desired format.