Working with Jobs

The Jobs workspace provides a window onto the different tasks that the Aunsight platform runs on the compute resources in a specific context. Because compute operations are an important part of dataflows, workflows, and processes, users need to be able to monitor the status and results of these tasks. This article teaches how to manage tasks with the jobs workspace. Readers can learn how to view, search, and understand the job records that the Aunsight platform uses to track tasks. In addition, this article also discusses how to stop jobs using the Aunsight Toolbelt command line interface, should that be necessary.

Viewing Jobs

The Aunsight web interface provides a Jobs workspace to users who have the AU-TRACKER:view-job permission available to them through a role or group they have in a given context.

To access this workspace, log in to the web interface and select the context you wish to work in through the context selector. From that context's dashboard, click the "Jobs" icon (jobs icon) in the palette on the right.

Jobs Workspace

The jobs workspace is a standard list-based view of jobs in the present context. You can search (search icon) the list to find a job by clicking the appropriate icon at the top of the list. If a job is not appearing, or its status seems out of date, click the refresh (refresh icon) icon to update the data in the list.

Searching for Jobs

Clicking the search icon (search icon) will bring up a search box with three option buttons.

Simply typing text into the search box will dynamically search the list based on the name or ID of the task. For example, searching with a process name will show jobs related to the process.

Users can also filter the job by clicking one of the filter buttons:

job filters

  • Filter by type
    Displays jobs matching a specific job type.
  • Filter by state
    Displays jobs in a specific state, such as:
    • Pending (submitted, but not yet run)
    • Running
    • Succeeded
    • Failed
    • Killed (stopped)
  • Submitted by me only
    Displays jobs submitted by the user.

Viewing Details of a Job

Clicking a job on the list panel will bring up that record in the main view of the user interface. Each job record displays data in two tabs, details and logs.

job details

Details tab

The details tab of each job record displays metadata related to that job record. All job records will contain the following fields:

  • ID - The job ID number which can be used to stop the job
  • Name - A descriptive name for the job
  • Type - The type
  • State - The current status of the job (e.g. running, pending, failed)
  • Created At - Timestamp of the job's submission
  • State Updated At - Timestamp of the last change to the job's state
  • Updated At - Timestamp of the last update about the job from the tracker service
  • Duration - Length of time from start to final completion (or the present, if the job is still active)

In addition to these basic settings that are common to all jobs, a number of other fields may be present based on the type of job selected. For file-related jobs, these fields usually refer to locks placed on files, but for computationally-focused jobs, the fields can contain details related to the execution environment (RAM and CPU core allocations, etc.).

Logs tab

The logs tab displays the contents of the Aunsight Loggerstream object associated with this job. Loggerstreams are specific Aunsight platform objects that allow tasks to log information about their performance. As the following example shows, loggerstreams are JSON objects containing a series of messages regarding the status of the job.

loggerstream view

Note

If you cannot see loggerstream data for a job; make sure you have the AU-LOGGER:view-stream permission.

The loggerstream view can be controlled in a variety of ways by the action buttons provided via the interface:

  • Sort by Timestamp
    newest oldest
    Log data can be sorted by timestamp using the newest/oldest first buttons

  • View Mode
    view mode
    Log data can be viewed as a date-sorted JSON, date-sorted message, and raw JSON.

  • Loggerstream ID
    copy log id Copy the Loggerstream ID to the clipboard (useful for access from the Toolbelt and SDK interfaces)

Raw JSON tab

The raw JSON (JavaScript Object Notation) tab is a specific viewing option or tab within Aunsight Jobs page, where users can access the log data exactly as it was generated by the system. This raw data includes all the details in their original structure, without any additional formatting, interpretation, or visualization applied to it.

Raw JSON

Recover Button

For users with the ability to run Workflows a "Recover" button will appear on failed or killed Workflow jobs with no previous successful recovery attempts. The "Recover" feature will be disabled if the job is over 7 days old.

Recover Button

Once the workflow job has run successfully the "Recover" button will no longer be available.

Recover Button Advanced Options

When utilizing the Recover Workflow feature, advanced options will be available to users. Successful Component ID's can be selected to rerun, Failed Component ID's can be selected to skip in the rerun and Failed Component ID's can be select to force success, allowing users to customize the Workflow Recovery.

Recover Advanced Options

During a recovery run users will be able to force success on some components of the Workflow by entering ‘Failed Component IDs to: Force Success’ in the advanced options modal. This functionality will enable users to execute the rest of the downstream workflow.

Supported components:

  • Workflows

  • Dataflows

  • Workflow_multi

  • Dataflow_multi

  • Datamart Load

  • Run Process

  • Run Process Multi

Resubmit Button

The "Resubmit" button is available for Workflows, Dataflows, and Process jobs. The button will produce a form pre-filled with arguments from the selected job which the user can edit and submit. This is a convenient way to resubmit a job with the same or modified arguments as a previous job. The "Resubmit" button will be disabled if the job is over 7 days old.

Resubmit Button

Stopping Jobs

Occasionally, users may find that a job is taking much longer than it should due to unexpected issues. This is often common with dataset ingestion, because network difficulties between the client interface and the platform services can interrupt data transfer operations. Another common example are processes, since user-supplied code can be subject to unpredictable errors.

In these cases, it is possible to stop a job, but this ability should not be overused, so there is no support for doing so from within the web app. Users who wish to kill jobs can do so from the toolbelt command line interface if they have the permissions AU-DISPATCHER:kill-any-job or AU-DISPATCHER:kill-any-project-job depending on the desired context.

Killing jobs with toolbelt can be easily performed with the ID of the object being submitted for compute (e.g. a dataflow, process, or workflow), or from the job ID itself.

To kill a job by referring to the object that is being run, type au2 dataflow job kill <object ID> (substitute the process or workflow commands for dataflow to kill those types of objects by their IDs).

To kill by referring to the job ID itself, type au2 dataflow job kill --job <job ID> (substitute the process or workflow commands for dataflow to kill those types of objects).

Killing objects will instruct Aunsight to stop any containers running those jobs and collect and clean up the environment for the job. The job record itself will remain, but should have its state updated to killed within a few moments.

Understanding Job Types

Both user and system tasks are tracked by means of "Job" records. The Aunsight Web Interface provides a jobs workspace that allows users to view a record of information about compute tasks run in an organization context. Aunsight records contain different information based on the type of task.

Tokamak Dataflows

Tokamak dataflow jobs are specialized Docker containers that perform ETL functions using Pig Latin operations on a Hadoop filesystem.

Workflows

Workflow jobs are containers that manage the execution of a workflow. As such, workflow containers will frequently start up other jobs (dataflows, queries, processes, etc.) or even start other workflows.

Processes

Process jobs are docker images running custom code that is then uploaded by users to perform some specific function. Processes usually interact with the Aunsight platform through one the SDKs.

Queries

Aunsight queries are containers that execute queries on an Apache Drill query resource and return data. These queries and the containers that run them are created and managed by the Aunsight Query service.

Dispatcher Tasks

The Metrodispatcher (Dispatcher) service is an Aunsight platform service that facilitates large data transfer within the Aunsight platform. Because many data transfers can take hours or even days and involve terabytes of data moving slowly over encrypted connections through the public Internet, the dispatcher service listens for incoming requests and immediately assigns a worker node to serve as a liaison for the remainder of the data transfer. The service itself merely delegates and monitors these tasks so that it can remain responsive to further incoming requests.

Dispatcher handles a variety of tasks involving the transfer of big data. For this reason, Metrodispatcher tasks fall into one of the following categories:

  • Download
  • Describe
  • Upload
  • Copy
  • Delete
  • Hash

Scripts

AuQL scripts are managed from the AuQL service and its associated workspace in the Web interface. The scripts themselves are run within containers and tracked as AuQL script jobs.

Sightglass Source Publisher

Sightglass data sources are pushed to a public cloud so that data can be distributed efficiently across the public internet and mobile data networks. Sightglass source publisher jobs are containers responsible for performing the necessary data transformation and uploads into the public cloud to push a new version of Sightglass data to the cloud.

Peeper Reports

Peeper Reports are statistical analyses of Aunsight datasets. Generating these reports requires intensive computational work on the entire dataset. Peeper report jobs are containers responsible for reading through an entire dataset to generate the statistics included in these reports.