Overwriting a Dataset Using Store¶
This tutorial will demonstrate how to use the Store Operation in a dataflow to overwrite an existing dataset. In order to harness the power of repeatability with Dataflows and Workflows, it is often more helpful to overwrite the contents of an existing dataset. This allows you to preserve the id and atlas information associated with the dataset. This means that when you have a workflow that references a specific dataset, the dataset id number does not need to be continually updated through manual intervention. Dataflows can update the information in the datasets and then workflows can use the dataflows and updated datasets harmoniously.
This tutorial will begin after the final step of Use Case 1. If you would like to review the process we have already completed, please visit this tutorial.
Summary of Steps Already Completed:
Before we begin this tutorial, we have already created a new dataflow. We also imported and loaded the source datasets. We have added the desired operations, including the Store Operation. The included Store operation stores the dataflow output to a new dataset each time the datatflow is run. The dataflow has been run once to create the desired target dataset for the overwriting procedure.
How to Overwrite a Dataset using the Store Operation¶
1. Import the Target Dataset¶
Before you can modify the Store operation and change the output dataset, you need to import the target dataset. The Store operation will only allow you to select datasets that have been imported to the dataflow as destinations for the store command.
You do not need to load the dataset into the dataflow for it to be accessible in the store operation so be sure to uncheck the Create load operation(s) automatically checkbox during the import process.
2. Modify the Store Operation¶
To change the output settings for the store operation, select the Store operation block on the canvas and toggle the output dataset from "New Dataset" to the dataset you imported. Click Apply.
An alert will appear when you select the target dataset that you will risk overwriting data. Double check the dataset you have selected before you run your dataflow or you risk losing data.
Don't forget to save your changes as you modify your dataflow.
Now your dataflow has been modified to overwrite a singular, in this case, dataset. You could extend this modification to allow a single dataflow to overwrite several related datasets through a more complex branching structure, if needed.