Discard Duplicates Transform

The Discard Duplicates transform enables you to select one or more fields to compare, and discards duplicated rows depending on values in these fields. Before starting this transform, make sure the fields have been sorted first, to minimize memory use. This is because comparing each record with many other unsorted records requires a significantly large memory if there are huge volumes of data. However, if we only have to compare each record with the previous record, we can run through massive amounts of data without needing huge amounts of memory.

The following table shows an example of the input:

Field 1Field 2
A5
A9
A9
D1
D3

If you select Field 1 to compare, the following table shows the output:

Field 1Field 2
A5
D1

If you select Field 2 to compare, the following table shows the output:

Field 1Field 2
A5
A9
D1
D3