Dataset schemas provide a means for interpreting the structure of data in a dataset by declaratively specifying what kind of fields exist in a dataset and what data types each field contains. Dataset schemas are represented in Aunsight by a JSON object that defines how to interpret field names and datatypes for various kinds of data structures. Aunsight relies on schemas to ensure data read from a dataset is properly formatted for the purposes it will be put to. For example, a field that contains strings like "
2018~07~09" needs to be understood as a date field and not arbitrary text strings.
This article explains the tools available for editing dataset schemas via the Aunsight Web interface. Familiarity with the underlying JSON ontology of schema objects is not strictly necessary, but some understanding of data types and structured database in general is helpful for understanding the purpose of Aunsight schemas.
Finding the Tools¶
After logging in to the Web interface and selecting the relevant context you wish to work in, click the "Datasets" icon () in the palette on the right. From this view, select a dataset you would like to explore to bring up that record:
The Schema tab of the dataset record displays a number of action buttons:
- Reset - Reverts the schema to the version currently stored in the Atlas record
- Save - Saves changes to the schema to the Atlas record for this dataset
- Copy Header - Copies the dataset header row to the clipboard
- Detect Schema - Opens the schema autodetector tool
- View Dataset - Displays a dialog screen for browsing the dataset.
- Raw - Displays the schema using the raw JSON schema editor tool
- Guided - Displays the schema using the guided schema builder tool
The Detect Schema, Raw, and Guided buttons all open tools for editing schemas. The remainder of this article discusses each of these tools.
The schema auto-detector automates the process of generating a schema for most datasets. Auto-detector will open a dialog for displaying and editing a provisional schema generated by examining the first 10kb of data from the dataset.
If your data has a header-row, check the "First Row Header" box to ensure that the first row of data is interpreted as field labels.
Users can edit the suggested schema by renaming fields in the "Field" column and changing the datatype specified in the "Type" field.
To help users validate that the schema is appropriate for the data, the autodetector will display sample values in the columns to the right of the vertical rule. If the "Consolidate Values" checkbox is checked, the values column will only display unique values found in the data sample. If "Consolidate Values" is unchecked, all the values will be displayed in sequential order.
If the results of the schema detector are correct, you can apply the changes and proceed to make further edits by clicking "Apply" or click "Apply and Save" to immediately save the output to the Atlas Record. Alternately, you can copy the schema JSON to the clipboard for use in another place by clicking "Copy Schema." If none of these options is appropriate, you can close the auto-detector and abandon changes by clicking "Cancel."
For manual edits to the schema, the guided schema builder provides an easy-to-use tool for editing schemas directly. This tool graphically displays the current schema and guides the user in constructing a JSON object ontology for describing the schema.
Users can see details about the different types of schema options by clicking a field name and entering details in the form displayed for each field to the right of the horizontal divider.
Users can also add new fields by clicking the plus icon () and entering a field name.
Manually Editing Raw Schema JSON¶
Users may also manually edit the raw JSON schema code in plain text. The raw schema editor tool provides simple JSON validation tools to make correctly formatting the JSON easier, but users need to be familiar with the JSON ontology for dataset schemas.