Managing Schemas

Dataset schemas provide a means for interpreting the structure of data in a dataset by declaratively specifying what kind of fields exist in a dataset and what data types each field contains. Dataset schemas are represented in Aunsight by a JSON object that defines how to interpret field names and datatypes for various kinds of data structures. Aunsight relies on schemas to ensure data read from a dataset is properly formatted for the purposes it will be put to. For example, a field that contains strings like "2018~07~09" needs to be understood as a date field and not arbitrary text strings.

This article explains the tools available for editing dataset schemas via the Aunsight Web interface. Familiarity with the underlying JSON ontology of schema objects is not strictly necessary, but some understanding of data types and structured database in general is helpful for understanding the purpose of Aunsight schemas.

Finding the Tools

After logging in to the Web interface and selecting the relevant context you wish to work in, click the "Datasets" icon (team icon) in the palette on the right. From this view, select a dataset you would like to explore to bring up that record:

Datasets Workspace

The Schema tab of the dataset record displays a number of action buttons:

  • Reset - Reverts the schema to the version currently stored in the Atlas record
  • Save - Saves changes to the schema to the Atlas record for this dataset
  • Copy Header - Copies the dataset header row to the clipboard
  • Detect Schema - Opens the schema autodetector tool
  • View Dataset - Displays a dialog screen for browsing the dataset.
  • Raw - Displays the schema using the raw JSON schema editor tool
  • Guided - Displays the schema using the guided schema builder tool

The Detect Schema, Raw, and Guided buttons all open tools for editing schemas. The remainder of this article discusses each of these tools.

Schema Auto-detector

The schema auto-detector automates the process of generating a schema for most datasets. Auto-detector will open a dialog for displaying and editing a provisional schema generated by examining the first 10kb of data from the dataset.

Schema autodetector

If your data has a header-row, check the "First Row Header" box to ensure that the first row of data is interpreted as field labels.

Users can edit the suggested schema by renaming fields in the "Field" column and changing the datatype specified in the "Type" field.

To help users validate that the schema is appropriate for the data, the autodetector will display sample values in the columns to the right of the vertical rule. If the "Consolidate Values" checkbox is checked, the values column will only display unique values found in the data sample. If "Consolidate Values" is unchecked, all the values will be displayed in sequential order.

If the results of the schema detector are correct, you can apply the changes and proceed to make further edits by clicking "Apply" or click "Apply and Save" to immediately save the output to the Atlas Record. Alternately, you can copy the schema JSON to the clipboard for use in another place by clicking "Copy Schema." If none of these options is appropriate, you can close the auto-detector and abandon changes by clicking "Cancel."

Guided Schema-builder

For manual edits to the schema, the guided schema builder provides an easy-to-use tool for editing schemas directly. This tool graphically displays the current schema and guides the user in constructing a JSON object ontology for describing the schema.

Guided Schema builder

Users can see details about the different types of schema options by clicking a field name and entering details in the form displayed for each field to the right of the horizontal divider.

Guided Schema builder details

Users can also add new fields by clicking the plus icon (Blue plus icon) and entering a field name.

Manually Editing Raw Schema JSON

Users may also manually edit the raw JSON schema code in plain text. The raw schema editor tool provides simple JSON validation tools to make correctly formatting the JSON easier, but users need to be familiar with the JSON ontology for dataset schemas.

Guided Schema builder details