Creating Processes

A process is a type of Aunsight object used to manage user supplied code that can be executed within a container running in the Aunsight compute infrastructure. In this sense, a process is not different from the container-based tasks executed behind the scenes by the Aunsight platform itself. However, unlike system tasks, user tasks involve user supplied code, and therefore must be maintained and versioned by Aunsight users. A process record object therefore provides a wrapper for process versions where code or actual container images are stored.

Aunsight has two major types of process versions: image and packaged process versions. An image process version is a record that serves as a metadata wrapper for a Docker image. Each process contains basic metadata but ultimately leaves the building of Docker containers up to the user. Image processes thus offer a great deal of flexibility in terms of the kinds of base images that can be used: an image can extend any one of thousands of base images maintained by Dockerhub or some other Docker registry.

A packaged process is a record that serves as a metadata wrapper for a code repository intended for one of a few officially supported Aunsight base image called a runtime. For example, a user who simply wants to run some Python code using standard libraries like TensorFlow or NumPy can do so without worrying about Dockerfiles or even running a Docker client on their local machine to build large Docker images files. Instead, a packaged process record must specify a specific runtime—a base image—within which the user supplied code will be executed.

The present article details both ways of creating a process. Readers of this article will already need to know some programming or shell scripting in order to create a packaged process, and image process creators will need to also understand how to build Docker images and use the command line client to save their image files to a tarball in order to upload them to Aunsight via the Toolbelt. After reading either method for creating a process version image, users will be able to create custom processes in an Aunsight context.

Creating a packaged process

Packaged processes are a convenient wrapper for running user-supplied code in Aunsight without the complexity of managing Dockerfiles and dependencies from Docker base images. Instead, users can simply initialize a repository with packaged code in a directory on their local machine, edit the files generated in that package, and then upload their code into Aunsight where it will be built into a complete process image. Each time changes to the repository are uploaded, Aunsight will generate a new process version.

Create a Process repository

To create a new process version, users should first create a directory on their local machine where they would like to store the code. For example, to create a directory on a MacOS/Linux based host, open a terminal and enter something like the following:

mkdir ~/dev/my-first-aunsight-process

And then navigate to that directory:

cd ~/dev/my-first-aunsight-process

Once inside the working directory, the user will need to log in to Toolbelt (au2 context set managed-token) and set the context for the packaged process:

$ au2 context set 20e7f5e9-b4a9-4972-bcc6-f43610a0d7e0 8283de3e-2cce-4bc8-a523-0c7f84399530

Note

au2 context set takes one or two GUIDS as arguments; the first is the GUID of the organization and the second argument, the GUID of the project context, is optional.

Once a context has been created, initialize the present working directory as a packaged process repository by issuing the au2 process repo init command.

Process repository initialization is an interactive process. Users will be asked to set values for:

  • Process Name - A short text string to be used as a non-unique process identifier
  • Description - A longer, markdown-compatible description of the process
  • Runtime - The GUID of a process runtime

Process runtimes are the base environment within which user-supplied code will be run. Runtimes are provided and maintained by Aunsight and each has different capabilities available out of the box. To see a list of available runtimes, issue the au2 runtime list command from the context in which you wish to run your process.

Exploring the process repository

After inputting the required information, Toolbelt will initialize the present working directory as an Aunsight process package. Aunsight process packages consist of the following files:

args.schema.yml - A YAML configuration file for specifying the structured arguments. (used for structured processes)

out.schema.yml - Similar to args.schema.yml, this is a YAML configuration file for specifying the structured output (values returned) by a process. (Used for structured processes)

env.yml - A YAML configuration file for specifying default environment variables. (These can be overridden at the time of job submission)

aunsight.yml - A YAML configuration file containing information about this process. Normally, this should not need to be edited.

main.py - the main entry point for a Python package. For packages using runtimes for other languages, this file may be named main.sh, main.js, or main.r instead.

Note

Aunsight also initializes a hidden file, .auignore which excludes certain files from upload similarly to how a .gitignore file excludes files from a commit to a git-based repository. Users do not generally need to edit this file.

In addition to the files created at package initalization, a package may contain one or two of the following files which are executed as setup and cleanup shell-scripts used for setup and cleanup operations within a process container:

before.sh - a shell script run right before main.py is run

after.sh - a shell script right after the process’s main.py exits.

Uploading a process repository

When a user has edited a process package to the desired state, they can upload their package to Aunsight where it can be run and tested with the au2 process repo upload command.

If a process with the specified name does not exist in Aunsight, Toolbelt will ask you to confirm that you would like to create that process and the repository will be uploaded as version one.

If the process already exists, Toolbelt will ask to confirm creation of a new process version.

Upon confirmation of the creation of a new process or process version, Toolbelt will display the new process version and the ID for the metro-dispatcher:process-upload job.

Version created: 1
Uploading package
Job submitted: 7644571

Tip

The job ID displayed during process upload can be useful in case there is a problem uploading the process repository into Aunsight. If the process package is not uploaded, debugging information will be recorded in the upload job record.

To confirm if a process has been successfully uploaded, show that process’s version information:

$ au2 process version list eadf98c2-8e85-46ae-b81a-640d9d326109
┌────┬─────────────┬──────────────┬───────────┐
│ id │ description │ image_status │ published │
├────┼─────────────┼──────────────┼───────────┤
│ 1  │             │ AVAILABLE    │           │
└────┴─────────────┴──────────────┴───────────┘

Acceptable Runtimes

Users should NOT use a runtime that doesn't appear in the list below.

runtime runtime_name organization project VERSION
cc76d137-acf6-488f-80a7-a1db6c9d9cd7 Python3.11 Torch 4
4cd24262-8061-40be-bbb0-9367690240ce Python3.8 Torch 4
1bff1454-88a0-460f-8a98-a13293b23238 Python3.11 Tensorflow 4
bd23e8c0-ef00-40fb-ab49-0a4ffd855a04 SF Runtime f9acae51-9c4d-4080-a300-03a161a93496 09985c71-a4de-4eb7-9e58-8574f108459f 8
1f7e1782-519e-43c9-88f0-7dc02af05630 SF-XGB Runtime f9acae51-9c4d-4080-a300-03a161a93496 09985c71-a4de-4eb7-9e58-8574f108459f 12

Creating an Image Process

Creating an image process involves two parts: creating a Dockerfile and building and uploading the resulting image. Because Dockerfile syntax is a broader standard in distributed computing, this article does not discuss techniques for creating a file that will run custom code; it is presumed users who want to create an Aunsight image process already know how to design a Dockerfile to package and containerize their code.

Hint

If you wish to continue with this tutorial but do not have a Dockerfile at hand, you may use the following simple example that will create a small (~ 5MB) image that will wait one second before terminating succesfully. FROM alpine:latest CMD sleep 1 && echo SUCCEEDED && exit 0

Step 1: Create a Process Version

Before you can upload a process image, you must create a process version using the au2 process version create command. This takes the GUID of a process you would like to create as its sole argument:

$ au2 process version create <PROCESS GUID>
┌──────┬───────┐
│ type │ image │
├──────┼───────┤
│ id   │ 4     │
└──────┴───────┘

Note

Process versions are sequentially numbered. If the current number of process versions was 3, issuing the au2 process version create command will create version 4. Issuing the command two or more times subsequently for the same process will create versions 5 and 6, etc.

Step 2: Build to Docker Image

First, ensure that Docker has been installed on your local machine and that your user account has the appropriate permissions to run Docker as a non-root user.

Note

Permissions are usually configured automatically for users of the Docker Desktop client for MacOS and Windows.

Second, open a terminal in the directory where the Dockerfile you would like to build is and issue a docker build command:

docker build -t my-name/my-process .

This will build a Docker image tagged my-name/my-process by using the Dockerfile in the current directory.

Step 3: Upload the Docker Image to Aunsight

If the Dockerfile builds successfully, you can upload the resulting image from your local machine to Aunsight.

docker save my-name/my-process | au2 process version upload --process <PROCESS GUID> --ver <VERSION NUMBER> --stdin --wait

Attention

Dockerfiles may be quite large, so the upload process can take some time.

Upon completion of an upload, you can check the status of the image upload by issuing the au2 process version list <PROCESS GUID> command to see if your process version upload has succeeded.

$ au2 process version list eadf98c2-8e85-46ae-b81a-640d9d326109
┌────┬─────────────┬──────────────┬───────────┐
│ id │ description │ image_status │ published │
├────┼─────────────┼──────────────┼───────────┤
│ 1  │             │ AVAILABLE    │           │
└────┴─────────────┴──────────────┴───────────┘

Also note that if you issue the process upload command with a version that already has an associated image, Aunsight will fail.

Tip

If a process image upload fails, check to see if there is already a process image associated with the process version you specified in the upload command.