Datasets
Introduction
The Ambience Datasets module is responsible for providing data to other modules, for example Dashboard. A dataset is a named collection of records where each record consists of named values which may be not only text strings, numbers and booleans, but also arrays, binary data and even nested records. The data can be derived from three sources:
- MongoDB dataset where it reads data from MongoDB databases.
- ETL dataset where it reads data from ETL chainsets.
- Cached dataset where it reads data from other dataset or any ETL or MongoDB
- Dataset itself where it reads data from other existing datasets.
Module Interface
The interface of the Datasets module.
Add Dataset
To create a new dataset, click the “Add” button located at the upper right corner of the page.
The “Add Dataset” dialog box will appear.
In the “Copy From” field, select which dataset’s source to be used from the drop-down list.
To copy from existing dataset, select from the drop-down list. For new dataset, select one of the first three options in the list. Screenshots below shows the Properties setting for the MongoDB dataset, ETL dataset and cached dataset respectively.
Refer to Properties on the different tabs for the dataset.
ETag
The Etag reflects when the dataset has changed by checking the last modified datetime. Etag works with the ETL Bump Etag step which updates new datetime in the Datasets module’s Etag column. Alternatively, Etag can be manually updated by clicking the “Update Etag” icon under the “Actions” column corresponding the desired dataset.
Upload Dataset
Upload allows importing the DayTest.dataset.json file into Datasets module in another machine and provides the option to create a new dataset or overwrite an existing one. To upload a dataset, click on the “Upload” button located at the upper right corner of the page.
The “Upload Dataset” dialog box will appear.
Field | Description |
---|---|
File | Browse to the location of the file. (Required) |
Name | A unique dataset name. (Required) |
Overwrite | Selected to overwrite existing dataset. By default, this field is deselected. |
Browse to the location of the file to be uploaded in the “File” field. In the “Name” field, key in a unique name for the dataset if it is a new dataset. If the file to be uploaded is to overwrite an existing dataset, ensure to select the “Overwrite” field and key in a name that matches the dataset to be overwritten. Click on the “OK” button to upload the dataset or click on the “Cancel” button to abort the action.
Bulk Actions
This function allows you to perform an action on several datasets at the same time.
When the “Bulk Actions” button located at the upper right corner of the page is clicked, a list of available actions is displayed. If no datasets are selected prior to clicking the button, there will be fewer actions available. To select a dataset, select the checkbox next to the name of the dataset.
Option | Description |
---|---|
Select All | Selects all datasets in the list. |
Select None | Unselect all datasets. |
Invert Selection | Inverts the current selection. That is, any datasets selected will be unselect and vice versa. |
Set Roles | Select roles for the dataset. |
Set Enabled True | Enable the datasets selected. Enabled dataset is represented by a green tick under the “Enabled” column. |
Set Enabled False | Disable the datasets selected. Disabled dataset is represented by a red cross under the “Enabled” column. |
When the “Set Roles” option is selected, the “Set Roles” dialog box will appear.
Select the desired roles for the datasets by checking the checkboxes next to the options. Alternatively, you can use the search function at the top of the dialog box to search for the desired role by keying in the keyword. You can also choose to select all, select none or invert selection by clicking on their respective icons on the right of the search function.
Edit Dataset
In edit mode, you can perform,
- enable or disable the dataset.
- edit properties, filter, aggregation pipeline and schema.
To edit a dataset, click on the “Edit” icon under the “Actions” column corresponding the desired dataset.
The “Edit” panel will appear.
Edit as required and click on the “Save” button to save the changes. Click on the “Cancel” button to abort the action.
Download Dataset
The download allows exporting dataset’s settings in Datasets module into a .json file, for example, DayTest.dataset.json. To download a dataset, click on the “Download” icon under the “Actions” column corresponding to the desired dataset.
Delete Dataset
Delete function deletes a selected dataset in Datasets module, not delete the data.
To delete a dataset, click on the “Delete” icon under the “Actions” column corresponding the desired dataset.
There is an option to undo the deletion. A notification with an “Undo” button appears right after clicking on the “Delete” icon.
Upon clicking on the “Undo” button, the deleted dataset is restored and is added back to the list of dataset.
Build Cache
This function allows you to cache any datasets in the Dataset Management Page. The output of the chosen dataset will be stored within the Cached Dataset, the schema wil be set accordingly and the ETag will be bumped.
To cache data, click on the “Build Cache” icon under the “Actions” column corresponding the desired dataset.
The “Build Cache” dialog box will appear.
Select the desired dataset to build the cache from the drop-down list and click on the “OK” button to build the cache. Upon successful caching the data, the number of records cached will be displayed in the management panel under the “Source” column.
As the cached records are stored within one dataset record (as a nested array), they are limited by the MongoDB record size. One record can only contain 16 MB of data (currently).
All modules which use datasets can use the Cached Datasets, except Record Editor module.
Refresh
After performing actions on the browser/tab, the list is reloaded to display the list of datasets in the page. The manual “Refresh” button is available and is particularly useful if you or others have opened multiple pages and making changes.
The “Refresh” button is found at the upper right corner of the page. Clicking on it reloads the list.
Search Dataset
The search function at the top left of Datasets module page, filters the dataset list retaining those with matching text.
This provides an easy way to search through the list of dataset. It is case-insensitive and displays the datasets that have the entered search value in any of the values of the fields below:
- Name
- Source
- Last Modified
- ETag
Dataset Data
When adding or uploading a dataset, the dataset does not include the data, only the dataset “Definition”, including the schema. The data need to be imported into the dataset. There are two methods to import the data.
- Import module - import the data , as well as creating the dataset (see Import module)
- Repository module - generate data using ETL chainset with datasource already deposited in the module
Below is a simple method to use ETL chainset to generate and load the data into the dataset.
Create a ETL chainset with Generate DataSource
and MongoDB Writer
ETL steps.
In the first ETL step, key in the location of the datasource in the Repository module.
In the second step, key in the dataset to add the data into.
Click on the “Run Steps” icon at the upper right corner of the Steps column.
Check the data in the dataset using the Developer module.
You can combine Generate DataSource
with a Dataset Endpoint
ETL step if you want to wrap a datasource as a dataset. Hence you can se the datasource as a dataset as well.
Composite DS
Composite datasource (DS) can be read from datasets instead of just from the Repository DS.
You can integrate the composites at any point using ETL, reading from datasets and producing records for further ETL steps to process.
Some composites have multiple flows inside, usually each one writing to a different DataStore. These flows can be intercepted using ETL.
For example, a Composite has a DataStore (called DataStore2) and a derivate InComposite2
. To intercept the datastore and to obtain the records, create a ETL with the ETL step Intercept DataStore
.
This step will run the flow that leads up to the named DataStore. It will however, will not perform the data store step and will return the records at that point to the ETL for subsequent processing.
The output of this step will be as shown below.
Showing the second flow has run, but the data store did not occur. The results are returned to the ETL instead.