Table of Contents
Using a transform process, you can manipulate data by performing a sequence of operations according to your specific requirements. You receive exactly what you need from the database by creating processors in Repertoire > Data Designer > Composite DataSource or adding transforms in Ambience > Designer > Universe Designer. You can view the records after transformation to verify the output. The result of each transformation may be used to overwrite the original field value or add an additional field. The result can be saved as a new data source or forwarded to reports and dashboards for presentation.
The data operations impact only the results; the original data source remains unaffected, unless you choose to overwrite it. Transforms provide fast, convenient data manipulations using an intuitive user interface, without the need of writing any scripts.
When you create or edit transforms, the order of transforms is important as some operations may depend on fields created by previous steps. Where possible, the tool will prevent you from reordering items to produce an impossible sequence. For example, if Step 5 creates a field called X and Step 7 uses field X, you may be able to move Step 7 up one place to Step 6, but cannot move it above Step 5. Similarly, you cannot move Step 5 below Step 7. In addition, you cannot delete Step 5 because Step 7 depends on it. If you find that an ordering operation (move up or down) or delete operation is disabled, it means that making the change would yield a transform that could never succeed.
To create a transform using the Repository Designer, add a new composite data source as follows:
Composite
datasource, as shown. Then click Next.
The transforms editor is displayed as follows:
Static transforms apply an operation to one single field at a time. Dynamic transforms
work with several fields of a record simultaneously. In static transforms, you specify a constant value for data manipulation. In
dynamic transforms, you use ${@Field_Name}
to call values from another field. To illustrate dynamic transforms, the following example
calls values from Field 2 and adds them to values from Field 1. The operation is Transform > Decimal > Add:
Field 1 (Decimal) | Field 2 (Decimal) | Value | Output (Decimal) |
---|---|---|---|
50.6249 | 1.1 | ${@Field 2} | 51.7249 |
0.721 | 5.2 | ${@Field 2} | 5.921 |
Many transforms include the merge or retain operation. Before using the merge or retain operation, make sure the fields have been sorted first, which minimizes memory use. Comparing each record with many other unsorted records requires a significantly large memory if there are huge volumes of data. However, if we only have to compare each record with the previous record, we can run through massive amounts of data without needing huge amounts of memory.
The merge operation ensures that no information is lost. When one field is being operated on, the merge will be between only these records, while all the other fields remain the same. This ensures that the record that is the result of the merge operation does not discard any information. If you need to merge across variations in non-essential fields, those fields will need to be discarded first.
The retain operation works similarly with merge. The only difference is that the retain operation does not discard duplicate records after the manipulation.
Using transforms, how does a company calculate how much it receives from sales each month? In the following example, we will take sales values that occur throughout the month from a legacy system which for historical reasons may include spaces around the date values. Also, some date values are not available and are imported into the system as nulls. This is meant to illustrate real-world situations where the data needs to be cleansed before we can begin properly processing it.
Step 1: Trim off the extra spaces, tabs and new lines in the string values. The operation is Transform > String > Trim. Select In Place
.By doing this, we keep only the trimmed string values.
The input data is read from a server which may include extra spaces at the beginning and end. The date strings may look like the following. Therefore, we need to trim these spaces before parsing the date:
" 2011-07-31 "
" 2011-09-30 "
In the following table, the When
field includes string representations of dates when a product was sold
and the Value
field shows the sale price. Later, we will introduce a Commission
field, which describes how much of the sale price
goes directly to the sales person as commission.
The When field displays the following output:
When (String) | Value (Decimal) |
---|---|
2011-03-28 | 2000.00 |
2011-03-07 | 3000.00 |
2011-3-19 | 1000.00 |
2000.00 | |
1000.00 | |
3000.00 | |
2011-2-16 | 2000.00 |
2011-2-27 | 1000.00 |
2011-02-03 | 2000.00 |
4000.00 | |
1000.00 | |
2011-05-12 | 2000.00 |
2011-05-16 | 1000.00 |
2011-05-07 | 1000.00 |
2011-05-24 | 3000.00 |
2011-6-07 | 2000.00 |
2011-6-15 | 1000.00 |
2011-6-21 | 1000.00 |
2011-07-09 | 2000.00 |
2011-07-14 | 3000.00 |
2011-07-26 | 1000.00 |
2011-07-29 | 1000.00 |
1000.00 | |
2000.00 | |
2011-09-17 | 1000.00 |
2011-09-05 | 2000.00 |
2011-09-11 | 1000.00 |
2011-09-23 | 3000.00 |
2011-10-16 | 2000.00 |
2011-10-04 | 3000.00 |
2011-10-27 | 1000.00 |
Step 2: Convert the strings from the When
field into dates. The operation is
Transform > String > To date. Select In Place.
The new values may look the same as the trimmed strings, but these are now Date objects, which means we can manipulate
them as dates rather than just as sequences of characters.
In the following table, the When field shows the output:
When (Date) | Value (Decimal) |
---|---|
2011-03-28 | 2000.00 |
2011-03-07 | 3000.00 |
2011-03-19 | 1000.00 |
2000.00 | |
1000.00 | |
3000.00 | |
2011-02-16 | 2000.00 |
2011-02-27 | 1000.00 |
2011-02-03 | 2000.00 |
4000.00 | |
1000.00 | |
2011-05-12 | 2000.00 |
2011-05-16 | 1000.00 |
2011-05-07 | 1000.00 |
2011-05-24 | 3000.00 |
2011-06-07 | 2000.00 |
2011-06-15 | 1000.00 |
2011-06-21 | 1000.00 |
2011-07-09 | 2000.00 |
2011-07-14 | 3000.00 |
2011-07-26 | 1000.00 |
2011-07-29 | 1000.00 |
1000.00 | |
2000.00 | |
2011-09-17 | 1000.00 |
2011-09-05 | 2000.00 |
2011-09-11 | 1000.00 |
2011-09-23 | 3000.00 |
2011-10-16 | 2000.00 |
2011-10-04 | 3000.00 |
2011-10-27 | 1000.00 |
Step 3: Extract the month values from the Date
field. The
operation is Transform > Date > Month.
In the following table, a new field named "Index" is created to show the output:
When (String) | Value (Decimal) | Index (Integer) |
---|---|---|
2011-03-28 | 2000.00 | 3 |
2011-03-07 | 3000.00 | 3 |
2011-03-19 | 1000.00 | 3 |
2000.00 | ||
1000.00 | ||
3000.00 | ||
2011-02-16 | 2000.00 | 2 |
2011-02-27 | 1000.00 | 2 |
2011-02-03 | 2000.00 | 2 |
4000.00 | ||
1000.00 | ||
2011-05-12 | 2000.00 | 5 |
2011-05-16 | 1000.00 | 5 |
2011-05-07 | 1000.00 | 5 |
2011-05-24 | 3000.00 | 5 |
2011-06-07 | 2000.00 | 6 |
2011-06-15 | 1000.00 | 6 |
2011-06-21 | 1000.00 | 6 |
2011-07-09 | 2000.00 | 7 |
2011-07-14 | 3000.00 | 7 |
2011-07-26 | 1000.00 | 7 |
2011-07-29 | 1000.00 | 7 |
1000.00 | ||
2000.00 | ||
2011-09-17 | 1000.00 | 9 |
2011-09-05 | 2000.00 | 9 |
2011-09-11 | 1000.00 | 9 |
2011-09-23 | 3000.00 | 9 |
2011-10-16 | 2000.00 | 10 |
2011-10-04 | 3000.00 | 10 |
2011-10-27 | 1000.00 | 10 |
Step 4: Show the months in long names from the When
field. The operation is
Transform > Date > Month name (long). Select In Place.
In the following table, the When field shows the output:
When (String) | Value (Decimal) | Index (Integer) |
---|---|---|
March | 2000.00 | 3 |
March | 3000.00 | 3 |
March | 1000.00 | 3 |
2000.00 | ||
1000.00 | ||
3000.00 | ||
February | 2000.00 | 2 |
February | 1000.00 | 2 |
February | 2000.00 | 2 |
4000.00 | ||
1000.00 | ||
May | 2000.00 | 5 |
May | 1000.00 | 5 |
May | 1000.00 | 5 |
May | 3000.00 | 5 |
June | 2000.00 | 6 |
June | 1000.00 | 6 |
June | 1000.00 | 6 |
July | 2000.00 | 7 |
July | 3000.00 | 7 |
July | 1000.00 | 7 |
July | 1000.00 | 7 |
1000.00 | ||
2000.00 | ||
September | 1000.00 | 9 |
September | 2000.00 | 9 |
September | 1000.00 | 9 |
September | 3000.00 | 9 |
October | 2000.00 | 10 |
October | 3000.00 | 10 |
October | 1000.00 | 10 |
You will notice that the nulls in the When
field and the Index
field are still nulls (represented by empty cells).
Unless the transform specifically processes nulls, all transforms will just pass nulls through unchanged.
In the next step, we will change those nulls into values we can work with.
Step 5: Set the Null values from the When
field to "Unknown". The
operation is Transform > String > Null to string. Select
In Place.
In the following table, the When field shows the output:
When (String) | Value (Decimal) | Index (Integer) |
---|---|---|
March | 2000.00 | 3 |
March | 3000.00 | 3 |
March | 1000.00 | 3 |
Unknown | 2000.00 | |
Unknown | 1000.00 | |
Unknown | 3000.00 | |
February | 2000.00 | 2 |
February | 1000.00 | 2 |
February | 2000.00 | 2 |
Unknown | 4000.00 | |
Unknown | 1000.00 | |
May | 2000.00 | 5 |
May | 1000.00 | 5 |
May | 1000.00 | 5 |
May | 3000.00 | 5 |
June | 2000.00 | 6 |
June | 1000.00 | 6 |
June | 1000.00 | 6 |
July | 2000.00 | 7 |
July | 3000.00 | 7 |
July | 1000.00 | 7 |
July | 1000.00 | 7 |
Unknown | 1000.00 | |
Unknown | 2000.00 | |
September | 1000.00 | 9 |
September | 2000.00 | 9 |
September | 1000.00 | 9 |
September | 3000.00 | 9 |
October | 2000.00 | 10 |
October | 3000.00 | 10 |
October | 1000.00 | 10 |
Step 6: Sort the table by month index. The operation is Transform > Sort. Choose the Index field to sort by. Choose the Ascending order. Leave the Max Memory (MB) field blank. By doing this, this operation will sort the sales values without a maximum memory limit.
The following table shows the output:
When (String) | Value (Decimal) | Index (Integer) |
---|---|---|
Unknown | 2000.00 | |
Unknown | 1000.00 | |
Unknown | 3000.00 | |
Unknown | 4000.00 | |
Unknown | 1000.00 | |
Unknown | 1000.00 | |
Unknown | 2000.00 | |
February | 2000.00 | 2 |
February | 1000.00 | 2 |
February | 2000.00 | 2 |
March | 2000.00 | 3 |
March | 3000.00 | 3 |
March | 1000.00 | 3 |
May | 2000.00 | 5 |
May | 1000.00 | 5 |
May | 1000.00 | 5 |
May | 3000.00 | 5 |
June | 2000.00 | 6 |
June | 1000.00 | 6 |
June | 1000.00 | 6 |
July | 2000.00 | 7 |
July | 3000.00 | 7 |
July | 1000.00 | 7 |
July | 1000.00 | 7 |
September | 1000.00 | 9 |
September | 2000.00 | 9 |
September | 1000.00 | 9 |
September | 3000.00 | 9 |
October | 2000.00 | 10 |
October | 3000.00 | 10 |
October | 1000.00 | 10 |
Step 7: Create a new field named Commission
in the original data source.
Sum merge the sales values and commissions, and display a single total for each month.
The operation is Transform > Reduce. Select the Sum
action on the Value field and the Commission field. Leave the action blank on the other fields. By doing this,
you can add up the sales values and commissions and view the total for each month. The rows have been merged
down to one row per group, where the month names and indexes are the groups.
Alternatively, you can also sum retain on the sales values and commissions, and then perform a Discard Duplicates Transform
operation
to remove the redundant rows. You will receive the same output.
The following table shows the sorted table with the Commission field added:
When (String) | Value (Decimal) | Commission (Decimal) | Index (Integer) |
---|---|---|---|
Unknown | 2000.00 | 100.20 | |
Unknown | 1000.00 | 50.10 | |
Unknown | 3000.00 | 150.30 | |
Unknown | 4000.00 | 200.40 | |
Unknown | 1000.00 | 50.10 | |
Unknown | 1000.00 | 50.10 | |
Unknown | 2000.00 | 100.20 | |
February | 2000.00 | 100.20 | 2 |
February | 1000.00 | 50.10 | 2 |
February | 2000.00 | 100.20 | 2 |
March | 2000.00 | 100.20 | 3 |
March | 3000.00 | 150.30 | 3 |
March | 1000.00 | 50.10 | 3 |
May | 2000.00 | 100.20 | 5 |
May | 1000.00 | 50.10 | 5 |
May | 1000.00 | 50.10 | 5 |
May | 3000.00 | 150.30 | 5 |
June | 2000.00 | 100.20 | 6 |
June | 1000.00 | 50.10 | 6 |
June | 1000.00 | 50.10 | 6 |
July | 2000.00 | 100.20 | 7 |
July | 3000.00 | 150.30 | 7 |
July | 1000.00 | 50.10 | 7 |
July | 1000.00 | 50.10 | 7 |
September | 1000.00 | 50.10 | 9 |
September | 2000.00 | 100.20 | 9 |
September | 1000.00 | 50.10 | 9 |
September | 3000.00 | 150.30 | 9 |
October | 2000.00 | 100.20 | 10 |
October | 3000.00 | 150.30 | 10 |
October | 1000.00 | 50.10 | 10 |
The following table shows the output of the Reduce Transform:
When (String) | Value (Decimal) | Commission (Decimal) | Index (Integer) |
---|---|---|---|
Unknown | 14000.00 | 701.40 | |
February | 5000.00 | 250.50 | 2 |
March | 6000.00 | 300.60 | 3 |
May | 7000.00 | 350.70 | 5 |
June | 4000.00 | 200.40 | 4 |
July | 7000.00 | 350.70 | 7 |
September | 7000.00 | 350.70 | 9 |
October | 6000.00 | 300.60 | 10 |
Step 8:
Subtract the commission from the sales value in each month. The operation is Transform > Decimal > Subtract.
Specify ${@Commission}
as the value to be subtracted. In this case we are not subtracting a fixed amount (a static transform) but an amount
read from another field - the Commission field. This is termed a dynamic transform, because the result requires input from more than one field.
In the following table, a new field named "Received" is created to show the output:
When (String) | Value (Decimal) | Commission (Decimal) | Index (Integer) | Received (Decimal) |
---|---|---|---|---|
Unknown | 14000.00 | 701.40 | 13298.60 | |
February | 5000.00 | 250.50 | 2 | 4749.50 |
March | 6000.00 | 300.60 | 3 | 5699.40 |
May | 7000.00 | 350.70 | 5 | 6649.30 |
June | 4000.00 | 200.40 | 4 | 3799.60 |
July | 7000.00 | 350.70 | 7 | 6649.30 |
September | 7000.00 | 350.70 | 9 | 6649.30 |
October | 6000.00 | 300.60 | 10 | 5699.40 |
Now we can save the result of the transformations as a new data source for later processing. The transforms are repeatable, which means you can re-run the transforms, each month, to receive the most updated set of numbers, rather than create a sequence of transforms but run them only once.
The following is a summary of the transformations we used in the overview:
String trim: remove whitespace at the beginning and end of date strings.
String to date: build date objects.
Date to month: extract month index from date objects.
Date to month name (long): replace date objects with long month names.
Null to string: set the null month names to “Unknown”.
Sort by month index: view the sales values throughout each month in the ascending order.
Reduce: sum merge the sales values and commissions, reduce the redundant rows and display a single total for each month.
Subtract: subtract the commissions from the sales values to calculate how much the company receives from sales each month.
You can find more details on these and all other supported transformations in the following chapters.