Core Concepts

Chainset, Chain, and Step Hierarchy

Chainset

A chainset is a container that holds related chains. Think of it as a module or package that groups together chains serving a common purpose.

Chainset Properties: - Name - Unique identifier - Database - Default database for chains (optional) - Enabled/Disabled - Controls availability - Roles - Who can access the chainset - Workgroups - Organizational grouping - Imports - Other chainsets to import chains from

Example: A “CustomerData” chainset might contain chains for importing customers, validating customer records, and generating customer reports.

Chain

A chain is a sequence of steps that transforms data. Chains are the executable units within a chainset.

Chain Properties: - Name - Unique within the chainset - Description - Documentation of purpose - Steps - Ordered sequence of transformation operations - Group - Optional organizational grouping within chainset - Test Inputs - Sample data for testing

Example: A “ValidateCustomer” chain might have steps to check required fields, validate email format, verify phone number, and flag duplicates.

Step

A step is an individual transformation operation. Steps are the building blocks of chains.

Step Properties: - Type - The operation to perform (e.g., MongoDB Reader, String Upper, Array Filter) - Name - Custom label for this step instance - Parameters - Configuration values specific to the step type - Enabled/Disabled - Controls whether step executes - Debug Logging - Enhanced logging for troubleshooting

Example: A “MongoDB Reader” step with parameters specifying the database, collection, and query.

Records and Data Flow

What is a Record?

A record is the unit of data flowing through an ETL chain. Records are typically JSON documents with fields and values.

Example Record:

{
  "customerId": "C12345",
  "name": "Jane Doe",
  "email": "jane.doe@example.com",
  "orderTotal": 150.00,
  "orderDate": "2026-03-01"
}

Streaming Execution Model

ETL chains use a reactive streaming model:

  1. Source step generates records (e.g., reading from database)
  2. Records stream through subsequent steps one at a time
  3. Each step transforms the record and passes it forward
  4. Backpressure prevents memory overflow with large datasets
  5. Terminal step consumes records (e.g., writing to database)

This model allows processing millions of records efficiently without loading everything into memory.

Step Structure Patterns

Steps have different structure patterns that define how they transform the record stream:

Pattern Description Example
1 => 1 One record in, one record out String Upper, integer.Add, Date Format
1 => N One record in, multiple records out array.Unwind, String Split, MongoDB Reader
N => 1 Multiple records in, one record out Array Collect, String Merge, Aggregation
N => M Multiple records in, multiple records out Filter, Distinct, Sort
1 => 0 One record in, no records out (terminal) MongoDB Writer, Send Mail, File Write

Understanding these patterns is crucial for designing efficient chains.

Chain Calls and Composition

Chain Call Step

The Chain Call step allows one chain to call another chain, enabling composition and reusability.

How it works: 1. The calling chain passes a record to the Chain Call step 2. The Chain Call step invokes the named chain with that record as input 3. The called chain executes all its steps 4. Results from the called chain continue in the calling chain

Example:

Main Chain:
  Step 1: MongoDB Reader (reads orders)
  Step 2: Chain Call "EnrichOrder" (adds customer details)
  Step 3: Chain Call "CalculateTotals" (computes totals)
  Step 4: MongoDB Writer (saves enriched orders)

Benefits of Chain Calls

  • Reusability - Write logic once, use in multiple chains
  • Maintainability - Update logic in one place
  • Testability - Test sub-chains independently
  • Readability - Break complex logic into named, understandable units
  • Composition - Build complex workflows from simple building blocks

Typed Functions (FunctionSignature / FunctionCall)

Typed functions are the recommended pattern for reusable chains. They give a chain an explicit contract — named input parameters and named output fields — so callers don’t need to know the chain’s internal field names.

Two steps work together:

  • structure.FunctionSignature — placed as the first step of the function chain, declares the parameter and return names (up to five each). It is a marker step that does nothing at runtime; it exists to document the contract and to populate the call dialog.
  • structure.FunctionCall — calls a function chain with explicit field mapping. Specified source fields are passed in as in0in4; result fields are written back as out0out4. The original record is preserved with the return values merged in.

Example:

Function chain: "Fn_lookupEmployee"
  Step 1: structure.FunctionSignature
    Parameters: employeeId
    Returns: employeeName, department

  Step 2: mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
    (look up employee by in0 = employeeId)

  Step 3: structure.AddFields
    { "employeeName": "${name}", "department": "${dept}" }
Calling chain:
  Step 1: MongoDB Reader (reads orders)
  Step 2: structure.FunctionCall
    Function Name: Fn_lookupEmployee
    In: employeeId → in0
    Out: out0 → employeeName, out1 → department
  Step 3: MongoDB Writer (saves enriched orders)

When to use a typed function vs a plain Chain Call:

Use structure.FunctionCall Use structure.ChainCall
Chain is reused from multiple call sites Chain is used exactly once
Caller needs to pass specific fields Full record context is fine
Output fields need renaming at the call site Caller consumes output fields directly
You want an explicit, documented contract Internal implementation detail

Naming convention: typed function chains are conventionally prefixed with Fn (e.g. Fn_validateEmail, Fn_lookupEmployee).

Calling imported functions: prefix the chain name with the imported chainset name in brackets:

Function Name: [CommonUtilities]Fn_formatAddress

Conditional Execution (ChainIf / ChainIfElse)

Two steps handle conditional branching based on a boolean field in the current record:

  • structure.ChainIf — calls a named chain only when a boolean field is true (or false). If the condition is not met, the original record passes through unchanged.
  • structure.ChainIfElse — calls one of two named chains depending on whether a boolean field is true or false. The original record is always routed to exactly one of the two chains.

Both steps are 1 => 0..N: for each input record, output is whatever the called chain emits (zero or more records).

Typical pattern:

1. Compare / filter step that sets a boolean field
   e.g. filter.Compare: "amount > 1000" → field: isHighValue

2. structure.ChainIf
   Boolean Field: isHighValue
   Call If: true
   Chain Name: ProcessHighValueOrder
   (records where isHighValue=false pass through unchanged)

Or with two branches:

1. filter.Compare: "customerType == 'premium'" → field: isPremium

2. structure.ChainIfElse
   Boolean Field: isPremium
   True Chain: ProcessPremiumOrder
   False Chain: ProcessStandardOrder

Use structure.ChainIf / structure.ChainIfElse in preference to duplicate filter chains when you need to route records to different processing logic. The boolean field is typically set by a preceding compare or calculation step.

Chain Imports

Chainsets can import other chainsets to access their chains:

Example: - “CommonUtilities” chainset contains “FormatAddress”, “ValidateEmail”, “CalculateTax” - “OrderProcessing” chainset imports “CommonUtilities” - Chains in “OrderProcessing” can call “FormatAddress”, “ValidateEmail”, “CalculateTax”

Important: Avoid circular dependencies (A imports B, B imports C, C imports A).

Test Inputs

Test inputs are sample data stored with a chain for testing purposes.

Why Use Test Inputs?

  • Documentation - Show expected input format
  • Testing - Verify chain behavior with known data
  • Debugging - Reproduce issues with specific inputs
  • Demonstration - Show how the chain works to others
  • Unit Testing - Create multiple test cases for different scenarios

Creating Test Inputs

In the ETL Designer: 1. Select a chain 2. Click “Test Input” dropdown 3. Add test with name, description, and JSON data 4. Run chain with test input to verify behavior

Example Test Input:

{
  "name": "Valid Order Test",
  "description": "Tests order processing with valid data",
  "testJSON": {
    "orderId": "ORD-001",
    "customerId": "C12345",
    "items": [
      {"sku": "WIDGET-A", "quantity": 2, "price": 25.00}
    ]
  }
}

Job Queue and Execution

Job Queue

When an ETL chain is triggered (manually, by scheduler, or by another module), it enters a job queue.

Queue Behavior: - Multiple jobs can be queued for the same chainset - Jobs execute sequentially (one at a time per chainset) - Jobs from different chainsets can run concurrently - Long-running jobs don’t block other chainsets

Execution Lifecycle

  1. Queued - Job waiting to execute
  2. Running - Chain executing steps
  3. Completed - Successful completion
  4. Failed - Error occurred during execution
  5. Cancelled - User cancelled the job

Monitoring Execution

In the ETL Designer Results panel: - View running and completed jobs - See execution time and record counts - Review step-by-step results - Access error messages and stack traces - Cancel running jobs if needed

Error Handling

How ETL Chains Signal Errors

ETL chains do not throw exceptions. Instead, errors are communicated through the record stream in one of two ways:

  • Error record: A step emits a record with an error field (e.g. { "error": "something went wrong" }). The record flows downstream and can be filtered, logged, or handled by subsequent steps.
  • No records emitted: When a step produces no output (e.g. a mongodb.Reader that matches nothing), downstream steps receive no input and the chain simply terminates with no records out. This is not an error — it is the normal result of a query that finds nothing.

Preventing Chain Termination on Empty Results

When a sub-chain may legitimately produce no records and you want the main flow to continue, wrap the call in a structure.ChainArray (Chain to Array) step. This calls another chain for each incoming record and writes its results into a named array field — even if the called chain produces no records, the array is empty and the outer record still flows on:

1. structure.ChainArray
   Chain Name: "LookupCustomer"
   Array Field: customerResults
   (customerResults will be [] if no match — main flow continues either way)

2. Check customerResults length / process array

Validation Pattern

ETL provides dedicated validation steps that accumulate issues into a validationIssues array, which can be inspected after all checks have run.

Key Steps

  • validation.MandatoryFields — checks that a list of fields are all present on the record. Any missing fields are added as a validationIssue automatically.
  • validation.AddValidationIssue — conditionally appends a message to the validationIssues array when a boolean field is false. The record passes through unchanged when the field is true.
  • validation.IsValid — writes true to a result field if validationIssues is empty, false otherwise. Use this after all checks to branch on the overall result.

How the Pattern Works

1. validation.MandatoryFields
   Fields: employeeId, leaveType, startDate, endDate
   (adds issues for any missing fields)

2. filter.Compare: startDate < endDate → field: datesValid
3. validation.AddValidationIssue
   IsValid Field: datesValid
   Message: "End date must be after start date"

4. filter.Compare: days > 0 → field: daysValid
5. validation.AddValidationIssue
   IsValid Field: daysValid
   Message: "Leave days must be greater than zero"

6. validation.IsValid
   Result Field: isValid

7. structure.ChainIfElse
   Boolean Field: isValid
   True Chain: ProcessValidRecord
   False Chain: ReturnValidationErrors

After all validation.AddValidationIssue steps have run, validationIssues contains one entry per failed check. validation.IsValid then summarises whether any issues were found.

Validation as Typed Functions

For reusable validation logic, validation checks are typically placed in dedicated typed-function chains (prefixed Fn_validate*). Each function takes the fields to check as inputs and returns a validationIssues array. The calling chain merges the results and checks the final issue count.

Designing for Errors

Best practices for robust chains: - Validate early - Check data quality at the start - Use validation steps - Explicit validation with clear error messages - Filter invalid records - Route bad records to error handling - Log errors - Write error records to error collection - Use ChainArray for optional lookups - Wrap sub-chain calls where no results is a valid outcome - Graceful degradation - Continue processing valid records when possible

Job State

Each job has a key/value store — the job state — that persists for the duration of that job. Two steps access it:

  • config.WriteState — writes a field value (or the entire record if no field is named) to a named key. The input record passes through unchanged.
  • config.ReadState — reads a value back from the named key and places it into a field on the current record (or merges it over the record if no field is named).

State is job-scoped: two users running the same chain at the same time each have independent state.

Save-Branch-Restore Pattern

The primary use case is preserving the input record across a 1 => N step so it can be recovered after the N => 1 aggregation:

1. config.WriteState
   Field: (empty — saves entire record)
   Key: originalInput

2. array.Unwind
   Field: items           ← 1 => N: one record becomes many

3. (process each item record)

4. array.Collect
   Field: processedItems  ← N => 1: many records collapse to one

5. config.ReadState
   Key: originalInput
   Field: (empty — merges saved record back)

6. (final record has both processedItems and the original input fields)

Without the state steps, the original input fields would be lost after the 1 => N / N => 1 round-trip.

Notes

  • Use state sparingly — it is effectively a global variable for the job.
  • Avoid writing state inside a flow of multiple records; the value changes with each record and other steps running in parallel may observe an inconsistent value.
  • State values can be any type: string, integer, array, or object.

Disabling Steps

Individual steps can be disabled: they are retained in the chain with their configuration intact but are skipped during execution. The step is shown greyed out in the designer.

How to Disable / Enable a Step

  1. Select the step in the ETL Designer
  2. Click “More Actions” → “Disable Step” (or “Enable Step” to re-enable)

When to Use Step Disable

  • Diagnostic steps in production — Add a mongodb.TeeWriter to capture intermediate records while debugging, then disable it when done. The step stays in the chain ready to be re-enabled next time without reconfiguring it.
  • Temporary bypass during debugging — Disable a slow or problematic step to test the rest of the chain without removing its configuration.
  • Alternative implementations — Keep two alternative versions of a step in the chain with one disabled, making it easy to switch between them.

Disabling a step is always preferable to deleting it when you may need it again.

Debug Logging

Debug logging provides detailed information about step execution.

Enabling Debug Logging

  1. Select a step in the ETL Designer
  2. Click “More Actions” → “Debug Logging On”
  3. Step icon changes to indicate debug mode
  4. Run the chain
  5. View detailed logs in Results panel

What Debug Logging Shows

  • Input record to the step
  • Step parameters and configuration
  • Processing details
  • Output record from the step
  • Timing information
  • Any warnings or errors

Use debug logging to: - Understand how data transforms through steps - Diagnose unexpected behavior - Verify step configuration - Optimize performance

Groups

Groups organize chains within a chainset for better management.

Using Groups

  • Organize by purpose: “Import”, “Export”, “Validation”, “Reports”
  • Organize by domain: “Customers”, “Orders”, “Products”, “Inventory”
  • Organize by frequency: “Hourly”, “Daily”, “Weekly”, “On-Demand”

Group Features

  • Filter chain list to show only one group
  • Assign chains to groups via “Set Group” action
  • Each chain belongs to at most one group
  • Groups are purely organizational (no execution impact)

Next Steps