Core Concepts
Chainset, Chain, and Step Hierarchy
Chainset
A chainset is a container that holds related chains. Think of it as a module or package that groups together chains serving a common purpose.
Chainset Properties: - Name - Unique identifier - Database - Default database for chains (optional) - Enabled/Disabled - Controls availability - Roles - Who can access the chainset - Workgroups - Organizational grouping - Imports - Other chainsets to import chains from
Example: A “CustomerData” chainset might contain chains for importing customers, validating customer records, and generating customer reports.
Chain
A chain is a sequence of steps that transforms data. Chains are the executable units within a chainset.
Chain Properties: - Name - Unique within the chainset - Description - Documentation of purpose - Steps - Ordered sequence of transformation operations - Group - Optional organizational grouping within chainset - Test Inputs - Sample data for testing
Example: A “ValidateCustomer” chain might have steps to check required fields, validate email format, verify phone number, and flag duplicates.
Step
A step is an individual transformation operation. Steps are the building blocks of chains.
Step Properties: - Type - The operation to perform (e.g., MongoDB Reader, String Upper, Array Filter) - Name - Custom label for this step instance - Parameters - Configuration values specific to the step type - Enabled/Disabled - Controls whether step executes - Debug Logging - Enhanced logging for troubleshooting
Example: A “MongoDB Reader” step with parameters specifying the database, collection, and query.
Records and Data Flow
What is a Record?
A record is the unit of data flowing through an ETL chain. Records are typically JSON documents with fields and values.
Example Record:
{
"customerId": "C12345",
"name": "Jane Doe",
"email": "jane.doe@example.com",
"orderTotal": 150.00,
"orderDate": "2026-03-01"
}
Streaming Execution Model
ETL chains use a reactive streaming model:
- Source step generates records (e.g., reading from database)
- Records stream through subsequent steps one at a time
- Each step transforms the record and passes it forward
- Backpressure prevents memory overflow with large datasets
- Terminal step consumes records (e.g., writing to database)
This model allows processing millions of records efficiently without loading everything into memory.
Step Structure Patterns
Steps have different structure patterns that define how they transform the record stream:
| Pattern | Description | Example |
|---|---|---|
| 1 => 1 | One record in, one record out | String Upper, integer.Add, Date Format |
| 1 => N | One record in, multiple records out | array.Unwind, String Split, MongoDB Reader |
| N => 1 | Multiple records in, one record out | Array Collect, String Merge, Aggregation |
| N => M | Multiple records in, multiple records out | Filter, Distinct, Sort |
| 1 => 0 | One record in, no records out (terminal) | MongoDB Writer, Send Mail, File Write |
Understanding these patterns is crucial for designing efficient chains.
Chain Calls and Composition
Chain Call Step
The Chain Call step allows one chain to call another chain, enabling composition and reusability.
How it works: 1. The calling chain passes a record to the Chain Call step 2. The Chain Call step invokes the named chain with that record as input 3. The called chain executes all its steps 4. Results from the called chain continue in the calling chain
Example:
Main Chain:
Step 1: MongoDB Reader (reads orders)
Step 2: Chain Call "EnrichOrder" (adds customer details)
Step 3: Chain Call "CalculateTotals" (computes totals)
Step 4: MongoDB Writer (saves enriched orders)
Benefits of Chain Calls
- Reusability - Write logic once, use in multiple chains
- Maintainability - Update logic in one place
- Testability - Test sub-chains independently
- Readability - Break complex logic into named, understandable units
- Composition - Build complex workflows from simple building blocks
Typed Functions (FunctionSignature / FunctionCall)
Typed functions are the recommended pattern for reusable chains. They give a chain an explicit contract — named input parameters and named output fields — so callers don’t need to know the chain’s internal field names.
Two steps work together:
structure.FunctionSignature— placed as the first step of the function chain, declares the parameter and return names (up to five each). It is a marker step that does nothing at runtime; it exists to document the contract and to populate the call dialog.structure.FunctionCall— calls a function chain with explicit field mapping. Specified source fields are passed in asin0…in4; result fields are written back asout0…out4. The original record is preserved with the return values merged in.
Example:
Function chain: "Fn_lookupEmployee"
Step 1: structure.FunctionSignature
Parameters: employeeId
Returns: employeeName, department
Step 2: mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
(look up employee by in0 = employeeId)
Step 3: structure.AddFields
{ "employeeName": "${name}", "department": "${dept}" }
Calling chain:
Step 1: MongoDB Reader (reads orders)
Step 2: structure.FunctionCall
Function Name: Fn_lookupEmployee
In: employeeId → in0
Out: out0 → employeeName, out1 → department
Step 3: MongoDB Writer (saves enriched orders)
When to use a typed function vs a plain Chain Call:
Use structure.FunctionCall |
Use structure.ChainCall |
|---|---|
| Chain is reused from multiple call sites | Chain is used exactly once |
| Caller needs to pass specific fields | Full record context is fine |
| Output fields need renaming at the call site | Caller consumes output fields directly |
| You want an explicit, documented contract | Internal implementation detail |
Naming convention: typed function chains are conventionally prefixed with Fn (e.g. Fn_validateEmail, Fn_lookupEmployee).
Calling imported functions: prefix the chain name with the imported chainset name in brackets:
Function Name: [CommonUtilities]Fn_formatAddress
Conditional Execution (ChainIf / ChainIfElse)
Two steps handle conditional branching based on a boolean field in the current record:
structure.ChainIf— calls a named chain only when a boolean field istrue(orfalse). If the condition is not met, the original record passes through unchanged.structure.ChainIfElse— calls one of two named chains depending on whether a boolean field istrueorfalse. The original record is always routed to exactly one of the two chains.
Both steps are 1 => 0..N: for each input record, output is whatever the called chain emits (zero or more records).
Typical pattern:
1. Compare / filter step that sets a boolean field
e.g. filter.Compare: "amount > 1000" → field: isHighValue
2. structure.ChainIf
Boolean Field: isHighValue
Call If: true
Chain Name: ProcessHighValueOrder
(records where isHighValue=false pass through unchanged)
Or with two branches:
1. filter.Compare: "customerType == 'premium'" → field: isPremium
2. structure.ChainIfElse
Boolean Field: isPremium
True Chain: ProcessPremiumOrder
False Chain: ProcessStandardOrder
Use structure.ChainIf / structure.ChainIfElse in preference to duplicate filter chains when you need to route records to different processing logic. The boolean field is typically set by a preceding compare or calculation step.
Chain Imports
Chainsets can import other chainsets to access their chains:
Example: - “CommonUtilities” chainset contains “FormatAddress”, “ValidateEmail”, “CalculateTax” - “OrderProcessing” chainset imports “CommonUtilities” - Chains in “OrderProcessing” can call “FormatAddress”, “ValidateEmail”, “CalculateTax”
Important: Avoid circular dependencies (A imports B, B imports C, C imports A).
Test Inputs
Test inputs are sample data stored with a chain for testing purposes.
Why Use Test Inputs?
- Documentation - Show expected input format
- Testing - Verify chain behavior with known data
- Debugging - Reproduce issues with specific inputs
- Demonstration - Show how the chain works to others
- Unit Testing - Create multiple test cases for different scenarios
Creating Test Inputs
In the ETL Designer: 1. Select a chain 2. Click “Test Input” dropdown 3. Add test with name, description, and JSON data 4. Run chain with test input to verify behavior
Example Test Input:
{
"name": "Valid Order Test",
"description": "Tests order processing with valid data",
"testJSON": {
"orderId": "ORD-001",
"customerId": "C12345",
"items": [
{"sku": "WIDGET-A", "quantity": 2, "price": 25.00}
]
}
}
Job Queue and Execution
Job Queue
When an ETL chain is triggered (manually, by scheduler, or by another module), it enters a job queue.
Queue Behavior: - Multiple jobs can be queued for the same chainset - Jobs execute sequentially (one at a time per chainset) - Jobs from different chainsets can run concurrently - Long-running jobs don’t block other chainsets
Execution Lifecycle
- Queued - Job waiting to execute
- Running - Chain executing steps
- Completed - Successful completion
- Failed - Error occurred during execution
- Cancelled - User cancelled the job
Monitoring Execution
In the ETL Designer Results panel: - View running and completed jobs - See execution time and record counts - Review step-by-step results - Access error messages and stack traces - Cancel running jobs if needed
Error Handling
How ETL Chains Signal Errors
ETL chains do not throw exceptions. Instead, errors are communicated through the record stream in one of two ways:
- Error record: A step emits a record with an
errorfield (e.g.{ "error": "something went wrong" }). The record flows downstream and can be filtered, logged, or handled by subsequent steps. - No records emitted: When a step produces no output (e.g. a
mongodb.Readerthat matches nothing), downstream steps receive no input and the chain simply terminates with no records out. This is not an error — it is the normal result of a query that finds nothing.
Preventing Chain Termination on Empty Results
When a sub-chain may legitimately produce no records and you want the main flow to continue, wrap the call in a structure.ChainArray (Chain to Array) step. This calls another chain for each incoming record and writes its results into a named array field — even if the called chain produces no records, the array is empty and the outer record still flows on:
1. structure.ChainArray
Chain Name: "LookupCustomer"
Array Field: customerResults
(customerResults will be [] if no match — main flow continues either way)
2. Check customerResults length / process array
Validation Pattern
ETL provides dedicated validation steps that accumulate issues into a validationIssues array, which can be inspected after all checks have run.
Key Steps
validation.MandatoryFields— checks that a list of fields are all present on the record. Any missing fields are added as avalidationIssueautomatically.validation.AddValidationIssue— conditionally appends a message to thevalidationIssuesarray when a boolean field isfalse. The record passes through unchanged when the field istrue.validation.IsValid— writestrueto a result field ifvalidationIssuesis empty,falseotherwise. Use this after all checks to branch on the overall result.
How the Pattern Works
1. validation.MandatoryFields
Fields: employeeId, leaveType, startDate, endDate
(adds issues for any missing fields)
2. filter.Compare: startDate < endDate → field: datesValid
3. validation.AddValidationIssue
IsValid Field: datesValid
Message: "End date must be after start date"
4. filter.Compare: days > 0 → field: daysValid
5. validation.AddValidationIssue
IsValid Field: daysValid
Message: "Leave days must be greater than zero"
6. validation.IsValid
Result Field: isValid
7. structure.ChainIfElse
Boolean Field: isValid
True Chain: ProcessValidRecord
False Chain: ReturnValidationErrors
After all validation.AddValidationIssue steps have run, validationIssues contains one entry per failed check. validation.IsValid then summarises whether any issues were found.
Validation as Typed Functions
For reusable validation logic, validation checks are typically placed in dedicated typed-function chains (prefixed Fn_validate*). Each function takes the fields to check as inputs and returns a validationIssues array. The calling chain merges the results and checks the final issue count.
Designing for Errors
Best practices for robust chains: - Validate early - Check data quality at the start - Use validation steps - Explicit validation with clear error messages - Filter invalid records - Route bad records to error handling - Log errors - Write error records to error collection - Use ChainArray for optional lookups - Wrap sub-chain calls where no results is a valid outcome - Graceful degradation - Continue processing valid records when possible
Job State
Each job has a key/value store — the job state — that persists for the duration of that job. Two steps access it:
config.WriteState— writes a field value (or the entire record if no field is named) to a named key. The input record passes through unchanged.config.ReadState— reads a value back from the named key and places it into a field on the current record (or merges it over the record if no field is named).
State is job-scoped: two users running the same chain at the same time each have independent state.
Save-Branch-Restore Pattern
The primary use case is preserving the input record across a 1 => N step so it can be recovered after the N => 1 aggregation:
1. config.WriteState
Field: (empty — saves entire record)
Key: originalInput
2. array.Unwind
Field: items ← 1 => N: one record becomes many
3. (process each item record)
4. array.Collect
Field: processedItems ← N => 1: many records collapse to one
5. config.ReadState
Key: originalInput
Field: (empty — merges saved record back)
6. (final record has both processedItems and the original input fields)
Without the state steps, the original input fields would be lost after the 1 => N / N => 1 round-trip.
Notes
- Use state sparingly — it is effectively a global variable for the job.
- Avoid writing state inside a flow of multiple records; the value changes with each record and other steps running in parallel may observe an inconsistent value.
- State values can be any type: string, integer, array, or object.
Disabling Steps
Individual steps can be disabled: they are retained in the chain with their configuration intact but are skipped during execution. The step is shown greyed out in the designer.
How to Disable / Enable a Step
- Select the step in the ETL Designer
- Click “More Actions” → “Disable Step” (or “Enable Step” to re-enable)
When to Use Step Disable
- Diagnostic steps in production — Add a
mongodb.TeeWriterto capture intermediate records while debugging, then disable it when done. The step stays in the chain ready to be re-enabled next time without reconfiguring it. - Temporary bypass during debugging — Disable a slow or problematic step to test the rest of the chain without removing its configuration.
- Alternative implementations — Keep two alternative versions of a step in the chain with one disabled, making it easy to switch between them.
Disabling a step is always preferable to deleting it when you may need it again.
Debug Logging
Debug logging provides detailed information about step execution.
Enabling Debug Logging
- Select a step in the ETL Designer
- Click “More Actions” → “Debug Logging On”
- Step icon changes to indicate debug mode
- Run the chain
- View detailed logs in Results panel
What Debug Logging Shows
- Input record to the step
- Step parameters and configuration
- Processing details
- Output record from the step
- Timing information
- Any warnings or errors
Use debug logging to: - Understand how data transforms through steps - Diagnose unexpected behavior - Verify step configuration - Optimize performance
Groups
Groups organize chains within a chainset for better management.
Using Groups
- Organize by purpose: “Import”, “Export”, “Validation”, “Reports”
- Organize by domain: “Customers”, “Orders”, “Products”, “Inventory”
- Organize by frequency: “Hourly”, “Daily”, “Weekly”, “On-Demand”
Group Features
- Filter chain list to show only one group
- Assign chains to groups via “Set Group” action
- Each chain belongs to at most one group
- Groups are purely organizational (no execution impact)
Next Steps
- Quick Start Guide - Create your first ETL chain
- ETL Designer - Learn the designer interface
- Best Practices - Design effective ETL chains
- Examples - See complete working examples