Subtree Optimization

The XML DataSource provides an option for processing huge XML files. By default, the entire XML Document is loaded into memory, as XPath can only work fully when the tree is available. As this is not practical when the source files are large, there is a special Subtree Optimization mode that can be used instead.

When Subtree Optimization is enabled, the XML file is read and processed sequentially. This mode only works when the Root XPath expression is a simple, absolute path to the root of the subtree that represents a record. For example, /data/customer/record is an absolute path and Subtree Optimization will work. However //record, which is also a valid XPath, will not work in this mode.

Note

You must ensure the Root XPath is a simple,absolute xpath for this mode to work.

When the data source encounters an element in the XML source with the designated root path, a subtree is then constructed from that element and the descendants. Hence, full XPaths can still be used for the extraction of fields, though they only have access to the subtree. When this mode is used, only the subtree needed to process one record exists in memory at a time. This greatly reduces the memory requirements.