What is a Workflow?¶
Heraclitus (535-475 BCE)
The Aunsight platform creates new value from data by bringing it into motion. When businesses collect data, they usually do so as a static, stored form of data: the database. But if data is going to be more than just a cost liability, it needs to flow through a process that can generate insights. The Aunsight platform offers a comprehensive suite of tools for doing various kinds of transformations, from simple, ETL data cleaning to sophisticated machine learning. While all vendors in the data science software industry offer tools that can do one or more of these tasks, Aunsight's integration as a platform allows all of the pieces to be deployed together as parts of a single flow of data.
Workflows are the tool that conducts the parts of this flow. In Aunsight, workflows make everything flow together as an automated and potentially even reactive and intelligent data processing pipeline. Workflows connect all the parts such as:
- Monitoring datasets and performing basic ETL data cleaning and preparation on newly-arrived data.
- Scheduling and evaluating the training, pruning, and selection of machine learning models based on new data.
- Evaluating scripts that control the flow of an analytics pipeline so that data processing can react to changes in data inputs automatically.
- Managing data endpoints and dynamically refreshing the reports and dashboards data consumers view.
- Launching other workflows as a way to link different projects or complex parts of a large project.
Each of these components of a data analytics process represent different tools in the Aunsight platform, but workflows orchestrate and manages each of them so that they can be directed in a planned way. This article explains how workflows conduct this flow of data and computing power to build an automated data analytics pipeline.
Making Work Flow in Aunsight¶
Aunsight workflows connect the components of an analytics process by streamlining the inputs of and outputs of each component through a graph of connections. If the data lake is like data at rest like water in a lake, then workflows are like a hydraulic system that pumps, channels, and diverts data towards some useful purpose.
In technical terms, Aunsight workflows are a graph of connections between two or more components. Visually, Aunsight displays workflows as a map of connected parts going from left to right.
This graph is defined by a JSON object that describes the individual components and how the inputs and outputs of each piece are connected. Building this graph allows a data engineer to connect the output of 1) a dataflow as the source for 2) a machine learning model training process, and then 3) evaluate a script to see if the training results in a model version better at classifying a sample dataset.
Workflows are an indispensable part of any analytics process, so the following suite of articles discusses the tools that Aunsight provides for creating, managing, and designing workflows. Since this article suite presumes that users will want to use the Aunsight web interface (AWI) to interact with the platform, learning the workflows workspace is an important starting point for what comes next. The next article focuses on how to create a workflow from scratch and explains the settings that users may want to know about when setting up their first workflows. Since designing a large graph of components in JSON is tedious, Aunsight provides the Workflow Builder interface, where users can design and run workflows using a graphical interface. Since the best learning always comes from doing, this suite ends with a capstone article where you can learn to build a simple workflow step-by-step in under ten minutes.