Chapter 1. Transform

Table of Contents

Overview
Launching the Transforms Editor
Static and Dynamic Transforms
Merge and Retain Operations
Example
Summary

Overview

Using a transform process, you can manipulate data by performing a sequence of operations according to your specific requirements. You receive exactly what you need from the database by creating processors in Repertoire > Data Designer > Composite DataSource or adding transforms in Ambience > Designer > Universe Designer. You can view the records after transformation to verify the output. The result of each transformation may be used to overwrite the original field value or add an additional field. The result can be saved as a new data source or forwarded to reports and dashboards for presentation.

The data operations impact only the results; the original data source remains unaffected, unless you choose to overwrite it. Transforms provide fast, convenient data manipulations using an intuitive user interface, without the need of writing any scripts.

When you create or edit transforms, the order of transforms is important as some operations may depend on fields created by previous steps. Where possible, the tool will prevent you from reordering items to produce an impossible sequence. For example, if Step 5 creates a field called X and Step 7 uses field X, you may be able to move Step 7 up one place to Step 6, but cannot move it above Step 5. Similarly, you cannot move Step 5 below Step 7. In addition, you cannot delete Step 5 because Step 7 depends on it. If you find that an ordering operation (move up or down) or delete operation is disabled, it means that making the change would yield a transform that could never succeed.

Launching the Transforms Editor

To create a transform using the Repository Designer, add a new composite data source as follows:

  1. From the directory tree, right click the place where you want the data source to be created.
  2. Select Add - Datasource as shown:

    Figure 1.1. Add Datasource Option

    Add Datasource Option
  3. Select the Composite datasource, as shown. Then click Next.

    Figure 1.2. Composite Datasource

    Composite Datasource
  4. Enter a name and description for the datasource and click Next.

    Figure 1.3. Naming the Datasource

    Naming the Datasource
  5. Optionally, set the security options as desired. The security options allow you to make the datasource read only, hide its details and encrypt it.

    Figure 1.4. Securing the Datasource

    Securing the Datasource
  6. Click Finish to create the datasource.

The transforms editor is displayed as follows:

Figure 1.5. Transforms Editor

Transforms Editor

Static and Dynamic Transforms

Static transforms apply an operation to one single field at a time. Dynamic transforms work with several fields of a record simultaneously. In static transforms, you specify a constant value for data manipulation. In dynamic transforms, you use ${@Field_Name} to call values from another field. To illustrate dynamic transforms, the following example calls values from Field 2 and adds them to values from Field 1. The operation is Transform > Decimal > Add:

Field 1 (Decimal)Field 2 (Decimal)ValueOutput (Decimal)
50.62491.1${@Field 2}51.7249
0.7215.2${@Field 2}5.921

Merge and Retain Operations

Many transforms include the merge or retain operation. Before using the merge or retain operation, make sure the fields have been sorted first, which minimizes memory use. Comparing each record with many other unsorted records requires a significantly large memory if there are huge volumes of data. However, if we only have to compare each record with the previous record, we can run through massive amounts of data without needing huge amounts of memory.

The merge operation ensures that no information is lost. When one field is being operated on, the merge will be between only these records, while all the other fields remain the same. This ensures that the record that is the result of the merge operation does not discard any information. If you need to merge across variations in non-essential fields, those fields will need to be discarded first.

The retain operation works similarly with merge. The only difference is that the retain operation does not discard duplicate records after the manipulation.

Example

Using transforms, how does a company calculate how much it receives from sales each month? In the following example, we will take sales values that occur throughout the month from a legacy system which for historical reasons may include spaces around the date values. Also, some date values are not available and are imported into the system as nulls. This is meant to illustrate real-world situations where the data needs to be cleansed before we can begin properly processing it.

  • Step 1: Trim off the extra spaces, tabs and new lines in the string values. The operation is Transform > String > Trim. Select In Place

    .

    Figure 1.6. Trim Option

    Trim Option

    By doing this, we keep only the trimmed string values.

    The input data is read from a server which may include extra spaces at the beginning and end. The date strings may look like the following. Therefore, we need to trim these spaces before parsing the date:

    • " 2011-07-31 "

    • " 2011-09-30 "

    In the following table, the When field includes string representations of dates when a product was sold and the Value field shows the sale price. Later, we will introduce a Commission field, which describes how much of the sale price goes directly to the sales person as commission.

    The When field displays the following output:

    When (String)Value (Decimal)
    2011-03-282000.00
    2011-03-073000.00
    2011-3-191000.00
    2000.00
    1000.00
    3000.00
    2011-2-162000.00
    2011-2-271000.00
    2011-02-032000.00
    4000.00
    1000.00
    2011-05-122000.00
    2011-05-161000.00
    2011-05-071000.00
    2011-05-243000.00
    2011-6-072000.00
    2011-6-151000.00
    2011-6-211000.00
    2011-07-092000.00
    2011-07-143000.00
    2011-07-261000.00
    2011-07-291000.00
    1000.00
    2000.00
    2011-09-171000.00
    2011-09-052000.00
    2011-09-111000.00
    2011-09-233000.00
    2011-10-162000.00
    2011-10-043000.00
    2011-10-271000.00
  • Step 2: Convert the strings from the When field into dates. The operation is Transform > String > To date. Select In Place. The new values may look the same as the trimmed strings, but these are now Date objects, which means we can manipulate them as dates rather than just as sequences of characters.

    In the following table, the When field shows the output:

    When (Date)Value (Decimal)
    2011-03-282000.00
    2011-03-073000.00
    2011-03-191000.00
    2000.00
    1000.00
    3000.00
    2011-02-162000.00
    2011-02-271000.00
    2011-02-032000.00
    4000.00
    1000.00
    2011-05-122000.00
    2011-05-161000.00
    2011-05-071000.00
    2011-05-243000.00
    2011-06-072000.00
    2011-06-151000.00
    2011-06-211000.00
    2011-07-092000.00
    2011-07-143000.00
    2011-07-261000.00
    2011-07-291000.00
    1000.00
    2000.00
    2011-09-171000.00
    2011-09-052000.00
    2011-09-111000.00
    2011-09-233000.00
    2011-10-162000.00
    2011-10-043000.00
    2011-10-271000.00
  • Step 3: Extract the month values from the Date field. The operation is Transform > Date > Month.

    In the following table, a new field named "Index" is created to show the output:

    When (String)Value (Decimal)Index (Integer)
    2011-03-282000.003
    2011-03-073000.003
    2011-03-191000.003
     2000.00
     1000.00
     3000.00
    2011-02-162000.002
    2011-02-271000.002
    2011-02-032000.002
     4000.00
     1000.00
    2011-05-122000.005
    2011-05-161000.005
    2011-05-071000.005
    2011-05-243000.005
    2011-06-072000.006
    2011-06-151000.006
    2011-06-211000.006
    2011-07-092000.007
    2011-07-143000.007
    2011-07-261000.007
    2011-07-291000.007
     1000.00
     2000.00
    2011-09-171000.009
    2011-09-052000.009
    2011-09-111000.009
    2011-09-233000.009
    2011-10-162000.0010
    2011-10-043000.0010
    2011-10-271000.0010
  • Step 4: Show the months in long names from the When field. The operation is Transform > Date > Month name (long). Select In Place.

    In the following table, the When field shows the output:

    When (String)Value (Decimal)Index (Integer)
    March2000.003
    March3000.003
    March1000.003
    2000.00 
    1000.00 
    3000.00 
    February2000.002
    February1000.002
    February2000.002
    4000.00 
    1000.00 
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    1000.00 
    2000.00 
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010

    You will notice that the nulls in the When field and the Index field are still nulls (represented by empty cells). Unless the transform specifically processes nulls, all transforms will just pass nulls through unchanged. In the next step, we will change those nulls into values we can work with.

  • Step 5: Set the Null values from the When field to "Unknown". The operation is Transform > String > Null to string. Select In Place.

    In the following table, the When field shows the output:

    When (String)Value (Decimal)Index (Integer)
    March2000.003
    March3000.003
    March1000.003
    Unknown2000.00 
    Unknown1000.00 
    Unknown3000.00 
    February2000.002
    February1000.002
    February2000.002
    Unknown4000.00 
    Unknown1000.00 
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    Unknown1000.00 
    Unknown2000.00 
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010
  • Step 6: Sort the table by month index. The operation is Transform > Sort. Choose the Index field to sort by. Choose the Ascending order. Leave the Max Memory (MB) field blank. By doing this, this operation will sort the sales values without a maximum memory limit.

    The following table shows the output:

    When (String)Value (Decimal)Index (Integer)
    Unknown2000.00
    Unknown1000.00
    Unknown3000.00
    Unknown4000.00
    Unknown1000.00
    Unknown1000.00
    Unknown2000.00
    February2000.002
    February1000.002
    February2000.002
    March2000.003
    March3000.003
    March1000.003
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010
  • Step 7: Create a new field named Commission in the original data source. Sum merge the sales values and commissions, and display a single total for each month. The operation is Transform > Reduce. Select the Sum action on the Value field and the Commission field. Leave the action blank on the other fields. By doing this, you can add up the sales values and commissions and view the total for each month. The rows have been merged down to one row per group, where the month names and indexes are the groups.

    Alternatively, you can also sum retain on the sales values and commissions, and then perform a Discard Duplicates Transform operation to remove the redundant rows. You will receive the same output.

    The following table shows the sorted table with the Commission field added:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)
    Unknown2000.00100.20 
    Unknown1000.0050.10 
    Unknown3000.00150.30 
    Unknown4000.00200.40 
    Unknown1000.0050.10 
    Unknown1000.0050.10 
    Unknown2000.00100.20 
    February2000.00100.202
    February1000.0050.102
    February2000.00100.202
    March2000.00100.203
    March3000.00150.303
    March1000.0050.103
    May2000.00100.205
    May1000.0050.105
    May1000.0050.105
    May3000.00150.305
    June2000.00100.206
    June1000.0050.106
    June1000.0050.106
    July2000.00100.207
    July3000.00150.307
    July1000.0050.107
    July1000.0050.107
    September1000.0050.109
    September2000.00100.209
    September1000.0050.109
    September3000.00150.309
    October2000.00100.2010
    October3000.00150.3010
    October1000.0050.1010

    The following table shows the output of the Reduce Transform:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)
    Unknown14000.00701.40 
    February5000.00250.502
    March6000.00300.603
    May7000.00350.705
    June4000.00200.404
    July7000.00350.707
    September7000.00350.709
    October6000.00300.6010
  • Step 8: Subtract the commission from the sales value in each month. The operation is Transform > Decimal > Subtract. Specify ${@Commission} as the value to be subtracted. In this case we are not subtracting a fixed amount (a static transform) but an amount read from another field - the Commission field. This is termed a dynamic transform, because the result requires input from more than one field.

    In the following table, a new field named "Received" is created to show the output:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)Received (Decimal)
    Unknown14000.00701.40 13298.60
    February5000.00250.5024749.50
    March6000.00300.6035699.40
    May7000.00350.7056649.30
    June4000.00200.4043799.60
    July7000.00350.7076649.30
    September7000.00350.7096649.30
    October6000.00300.60105699.40

    Now we can save the result of the transformations as a new data source for later processing. The transforms are repeatable, which means you can re-run the transforms, each month, to receive the most updated set of numbers, rather than create a sequence of transforms but run them only once.

Summary

The following is a summary of the transformations we used in the overview:

  • String trim: remove whitespace at the beginning and end of date strings.

  • String to date: build date objects.

  • Date to month: extract month index from date objects.

  • Date to month name (long): replace date objects with long month names.

  • Null to string: set the null month names to “Unknown”.

  • Sort by month index: view the sales values throughout each month in the ascending order.

  • Reduce: sum merge the sales values and commissions, reduce the redundant rows and display a single total for each month.

  • Subtract: subtract the commissions from the sales values to calculate how much the company receives from sales each month.

You can find more details on these and all other supported transformations in the following chapters.