Understanding Aunsight Compute

Aunsight is not just a data storage platform; it is a data exploration platform that provides a powerful infrastructure for scalable and elastic computation on big data. Harnessing industry standard and open source distributed computing frameworks like Docker and Kubernetes running within our Cloud infrastructure, Aunsight offers top-of-the-line capabilities for data exploration.

Hardware alone accounts for only part of the efficiency of this platform. The software architecture of the Aunsight platform itself accelerates compute distributed computing. Instead of moving large datasets, Aunsight moves the small programs that operate on data to where the data lives. Within the cluster that backs the Aunsight data lake, individual storage hosts are designed to provide both storage and compute, allowing compute operations to be distributed across a cluster so that each host can process its part of the dataset in parallel.

In order to distribute computation across a number of hosts, the programs that operate on data need to be packaged into standardized containers. Aunsight relies on the Docker framework to provide containerization and manages its cluster of Docker hosts via a Kubernetes cluster. Docker and Kubernetes are industry standards for distributed computing and cluster management in the cloud. Together, these technologies allow Aunsight to provide a scalable, elastic compute infrastructure based on widely supported industry standards. And because this technology stack allows compute tasks to run in parallel on the hosts closest to where the data lives, performance scales seamlessly as the size of datasets increases.

Tasks: The Basic Unit of Work

From the user's perspective this compute platform enables users to execute programs that perform some kind of asynchronous work on data in a finite period of time. Aunsight calls these programs run in the cluster "tasks." For example, when Aunsight users query a dataset using Apache Drill, a program that executes the query is packaged as a Docker image and distributed across the hosts where the data is stored. When all the containers return the results of their work, Aunsight considers the task to have succeeded, or else it provides some feedback in the form of an error message and logging. Tasks can be distinguished from other parts of the Aunsight system, such as system services in that the services run all the time and are generally not distributed across a number of hosts.

Aunsight tasks can be divided into system tasks and user tasks. System tasks are run using trusted code running in Aunsight-provided Docker images. User tasks, by contrast, represent user-provided code. For example, workflows or dataflows. Because both types of tasks represent work with data, users can monitor and interact with tasks through job records, which track task status, log data, and output (if any). Aunsight provides a jobs workspace for viewing these records.

Jobs: Tracking the Work

Managing tasks within Aunsight occurs from the jobs workspace. The jobs workspace provides a list-based view of all tasks run within an organization's context. This list of job records represent both system and user tasks. Each job record contains different information depending on the type of job run. When jobs produce output messages, this data is also included in a job record and can be retrieved from the jobs workspace.

The job ID numbers assigned to job records also provide a mechanism for users to terminate tasks. Because "killing" (ending) a task is not considered a normal part of Aunsight administration, the web interface provides no mechanism for performing this, but it can be done easily using the Toolbelt command line interface (CLI). Toolbelt users who have the AU-DISPATCHER:kill-any-job and AU-TRACKER:delete-job can issue kill commands for jobs managed by each of these services respectively.

Processes: Extending Aunsight

In addition to tasks generated from the Aunsight tools, processes uploaded into Aunsight allow users to run custom Docker images as a task within Aunsight. Processes provide the ultimate in flexibility, because users can write their own code using the Aunsight SDKs for Python and Javascript to provide interactivity with the platform. Because code for processes is user-provided, Aunsight treats user tasks as untrusted code and places some restrictions on how processes can interact with the platform. In general, however, Aunsight processes are able to interact with the platform fully, which makes processes a powerful tool for automation and customization.

Resources:

Because everything run in Aunsight depends on physical hardware resources, Aunsight provides an abstraction for dealing with compute and storage resources in an organization or project context. Resource records help Aunsight determine how to run a process or store data by abstracting details about the physical or virtual hosts involve. For example, a organization may have an HDFS storage cluster available to an organization, and that organization may later request an increase in the number of storage hosts its current limits. Aunalytics can provision additional resources and modify the details of the resource record to reflect the new hardware. All of these upgrades can be done behind the scenes, meaning users can simply name the resource they need and leave provisioning and infrastructure engineering to the Aunalytics team.

Because hardware resources represent resources available by service contract, the web interface does not allow users to change the resources available to them. If you need to make changes to your resources, we'd be glad make changes to your service contract. Simply contact us to discuss your needs and receive a quote.