The data science lifecycle begins with data exploration and modeling in the Data Labs environment, but ultimately, machine learning models need to be deployed into a production environment. Although it is theoretically possible to use the Data Lab container that developed the model as a production environment, it is much more efficient to extract the model from this environment and deploy it in a lightweight container, or even multiple container instances.
Aunsight's Model service provides a mechanism for storing machine learning models in a versioned environment. Aunsight model objects can track up to thirty versions. Having multiple versions of a model can be useful in providing quality assurance as models go through new training data; after each training session, the results of the latest model can be compared against the results from previous versions on the same dataset. Moreover, each version can have more than one data format: Pickle serialization format for Python,
.rdata for the R language, and the language-independent PMML XML model description format. Such flexibility allows a single modeling notebook to produce versions that can be used in any number of production environments, making models theoretically "exportable" to new application domains.
Aunsight Data Lab notebooks can upload a serialized version of a trained model into the service with just a few lines of code, or users can manually upload models via the web interface. This article describes how to use the Model service and its accompanying dashboard in the Web interface to store and retrieve models. Additionally, since most data scientists will upload their models directly from their modeling code, this document describes the the model classes from the Aunsight platform SDK for Python,
lib-aunsight-py. After reading this document, data scientists with knowledge of Python can upload models directly from a notebook.
The Models Dashboard¶
The Aunsight web interface provides a models dashboard to users who have the
AU-MODEL:view-model permission available to them through a role or group they have in a given context.
To access this dashboard, log in to the web interface and select the context you wish to work in through the context selector. From that context, click the "Models" icon () in the palette on the right.
The Models dashboard is a standard list-based view of data lab notebook workspaces available in the present context. You can search () and sort () the list to find a notebook workspace by clicking the appropriate icon at the top of the list. You can also create a new model by clicking the plus icon ().
Creating a Model¶
To create a new model, clicking the "plus" icon () from the models list. This will bring up a model creation dialog.
To create a new model, specify a name for the model and click the "create" button (). In addition to a name, you can specify values for two optional fields: Tags and description.
Tags: Aunsight models can be labeled and grouped by tags (32 character limit) that allow users to easily identify groups of related objects (dataflows, workflows, datasets, etc.). Models can be free-form tagged by adding keywords as a comma separated list of tags.
Description: Though the field accepts only plain text, it will render markdown formatted content as rich text. The description field can be edited at any time after the model is created.
Managing a Model¶
Once a model is created or a user has selected one from the models list, the Web interface will display that model record in the main view of the interface. The web interface displays information and tools in several tabs to group similar information and functions logically. Additionally, the model record has a group of action buttons that perform tasks relating to the model record as a whole.
The Details Tab¶
By default, the web interface displays the details tab where three sections display different metadata:
The General section displays the ID, name, version numbers, and details on the creation and modification of the resource. To edit the name, description, and tags, click the edit icon () in the upper right corner of this section. The other information is managed by Aunsight and cannot be edited.
Version Details Section¶
The Version Details section allows users to view and edit information about the currently selected version. To select a different version, use the versions action button. Version settings specified when creating a model version can be edited on unpublished versions by clicking the edit icon () in the upper right corner of this section. If you would like to lock in version settings, clicking the "Publish" button () will publish the model version, an action which prevents future changes to the model.
Version Formats Section¶
The Version Formats section displays model formats for the specified version. Model version formats are descriptions of the format of an uploaded file for that version. Versions can have more than one format (e.g.
PMML) because the same model may be stored for deployment in more than one process. Choosing a format for storing a model depends upon a combination of the model development language (Python or R) and the intended deployment platform.
To add a model version format, click the "Add format" button in the row for the format you would like to upload (e.g.
pkl for Python Pickle files,
.rdata files, etc.). Doing so will bring up a tool to specify a model version and upload a model file into it via the web interface.
If there is a model version format, users can edit or upload a file to the format version (), download the model file in that format (), or delete the format and its associated model file () by clicking the appropriate button in the rightmost field.
New Version Tab¶
Every model can have up to thirty versions, and each version can store three different formats of that version. The previous section described how to create formats for a version from the details tab. By contrast, the New Version tab of the web interface allows users to create a new version of the file. To create a new version, users can specify a name and settings for their model and upload a model file as a version format. Model version settings include:
- Description: Though the field accepts only plain text, it will render markdown formatted content as rich text. The description field can be edited at any time after the model is created.
- Tags: Aunsight models can be labeled and grouped by tags (32 character limit) that allow users to easily identify groups of related objects (dataflows, workflows, datasets, etc.). Models can be free-form tagged by adding keywords as a comma separated list of tags.
- Hyperparameters: Hyperparameters can be stored in this field to aid in evaluating model version results.
- Config: Configuration settings are passed as a JSON object that can be read by the deployment container.
- Records: Records are a reference to the training dataset(s) used to generate the model version. Users can add more than one dataset since more than one dataset may have been used in training. Though not required, specifying this field can help ensure reproducibility of machine learning experiments.
- Formats Version formats allow users to specify a different format (e.g. Python
Rdatafrom the R programming language) to serialize a machine learning model.
When all the settings have been specified, a model version can be created by clicking the "Submit" button () at the bottom of the tab.
An action button group visible from within all tabs provides tools that affect the entire process record:
The versions action button () allows users to select a different version of the model from a dropdown list of processes. Since each version of a model contains different metadata, this button will change the data displayed on the details tab of the record view to reflect the currently specified model version.
The delete action button allows users to delete a model version or the entire model record itself along with its associated model version formats. If just one version of a model exists, clicking delete will bring up a message asking the user to confirm if they wish to delete the entire model record.
Model deletion is irreversible!
If more than one version of the model exists, however, clicking the delete action button will allow the user to delete either the record as a whole or just the currently selected model version.
Selecting "Delete version X" will only delete the current model version and its associated formats. The model itself and all other versions will remain.
Model version deletion is irreversible!
When deleting versions, the version will be lost forever, but the version numbers of all subsequent versions will not be updated. This means users will see an interruption in the version number sequence which indicates that a deleted version once existed.
Uploading Models from Notebooks¶
While it is possible to upload models from the web interface, it is much more common that models will be uploaded into the model service from the notebook environment where they are created. In essence, machine learning models developed using toolkits like TensorFlow are programming objects and as such can be serialized for persistent storage. Since many popular machine learning toolkits are written for the Python language, Python's
pickle format is a common serialization method. Other formats include the XML-based PMML format and the
.rdata format generated by the
save() function in the R language.
lib-aunsight-py SDK, but other tools such as the
lib-aunsight-r SDK exist.
Regardless of the tool, the model upload process will generally take the following steps to complete:
- Use a token to initiate a context (session) for communicating with the Aunsight APIs
- Select a model by GUID (or perhaps create a new model)
- Create a new version for the model
- Create an upload format
- Upload a file
The following code performs these steps using the Python
lib-aunsight-py SDK to upload the model object
my_model_object into a new model version.
from lib_aunsight.context import AunsightContext import pickle import io # Set some variables for the Aunsight objects GUIDs TOKEN_ID = 'cc75705d-2f50-420e-9ab1-0e6d5d6d5517' MODEL_ID = '0fc32fcc-3c97-4e17-81e9-975213d0e586' # Step 1 # Create an Aunsight context using the token c = AunsightContext(token=TOKEN_ID) # Step 2 # Select the model from the context by its ID m = c.model(MODEL_ID) # Note: the previous line assumes you want to work with an existing model. # To create a new model, use this instead: # m = c.model() # m.create() # Step 3 # Create a new model version v = m.version() # Create a Python class for the new version v.create() # POST the API request to create this version on the platform # Step 4 # Create a new format for this model version # Note: You can create more than one format for each model version # as long as the id keyword has a different format specified. mvf = c.model_version_format(model=m.id, version=v.id, id='pkl') mvf.create() # POST the API request to create this format on the platform # Step 5 # Now, makes the API upload request, passing in the pickled object. mvf.upload(pickle.dumps(my_model_object))
In the preceding example, it is assumed that your model object is named
my_model_object. Additionally, the serialization format may be different
depending on your use case. When creating a
model_version_format object, the
id keyword argument should have the value
'rda' depending on
the desired format.