Core Concepts

Chainset, Chain, and Step Hierarchy

Chainset

A chainset is a container that holds related chains. Think of it as a module or package that groups together chains serving a common purpose.

Chainset Properties: - Name - Unique identifier - Database - Default database for chains (optional) - Enabled/Disabled - Controls availability - Roles - Who can access the chainset - Workgroups - Organizational grouping - Imports - Other chainsets to import chains from

Example: A “CustomerData” chainset might contain chains for importing customers, validating customer records, and generating customer reports.

Chain

A chain is a sequence of steps that transforms data. Chains are the executable units within a chainset.

Chain Properties: - Name - Unique within the chainset - Description - Documentation of purpose - Steps - Ordered sequence of transformation operations - Group - Optional organizational grouping within chainset - Test Inputs - Sample data for testing

Example: A “ValidateCustomer” chain might have steps to check required fields, validate email format, verify phone number, and flag duplicates.

Step

A step is an individual transformation operation. Steps are the building blocks of chains.

Step Properties: - Type - The operation to perform (e.g., MongoDB Reader, String Upper, Array Filter) - Name - Custom label for this step instance - Parameters - Configuration values specific to the step type - Enabled/Disabled - Controls whether step executes - Debug Logging - Enhanced logging for troubleshooting

Example: A “MongoDB Reader” step with parameters specifying the database, collection, and query.

Records and Data Flow

What is a Record?

A record is the unit of data flowing through an ETL chain. Records are typically JSON documents with fields and values.

Example Record:

{
  "customerId": "C12345",
  "name": "Jane Doe",
  "email": "jane.doe@example.com",
  "orderTotal": 150.00,
  "orderDate": "2026-03-01"
}

Streaming Execution Model

ETL chains use a reactive streaming model:

Source step generates records (e.g., reading from database)
Records stream through subsequent steps one at a time
Each step transforms the record and passes it forward
Backpressure prevents memory overflow with large datasets
Terminal step consumes records (e.g., writing to database)

This model allows processing millions of records efficiently without loading everything into memory.

Step Structure Patterns

Steps have different structure patterns that define how they transform the record stream:

Pattern	Description	Example
1 => 1	One record in, one record out	String Upper, integer.Add, Date Format
1 => N	One record in, multiple records out	array.Unwind, String Split, MongoDB Reader
N => 1	Multiple records in, one record out	Array Collect, String Merge, Aggregation
N => M	Multiple records in, multiple records out	Filter, Distinct, Sort
1 => 0	One record in, no records out (terminal)	MongoDB Writer, Send Mail, File Write

Understanding these patterns is crucial for designing efficient chains.

Chain Calls and Composition

Chain Call Step

The Chain Call step allows one chain to call another chain, enabling composition and reusability.

How it works: 1. The calling chain passes a record to the Chain Call step 2. The Chain Call step invokes the named chain with that record as input 3. The called chain executes all its steps 4. Results from the called chain continue in the calling chain

Example:

Main Chain:
  Step 1: MongoDB Reader (reads orders)
  Step 2: Chain Call "EnrichOrder" (adds customer details)
  Step 3: Chain Call "CalculateTotals" (computes totals)
  Step 4: MongoDB Writer (saves enriched orders)

Benefits of Chain Calls

Reusability - Write logic once, use in multiple chains
Maintainability - Update logic in one place
Testability - Test sub-chains independently
Readability - Break complex logic into named, understandable units
Composition - Build complex workflows from simple building blocks

Typed Functions (FunctionSignature / FunctionCall)

Typed functions are the recommended pattern for reusable chains. They give a chain an explicit contract — named input parameters and named output fields — so callers don’t need to know the chain’s internal field names.

Two steps work together:

structure.FunctionSignature — placed as the first step of the function chain, declares the parameter and return names (up to five each). It is a marker step that does nothing at runtime; it exists to document the contract and to populate the call dialog.
structure.FunctionCall — calls a function chain with explicit field mapping. Specified source fields are passed in as in0…in4; result fields are written back as out0…out4. The original record is preserved with the return values merged in.

Example:

Function chain: "Fn_lookupEmployee"
  Step 1: structure.FunctionSignature
    Parameters: employeeId
    Returns: employeeName, department

  Step 2: mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
    (look up employee by in0 = employeeId)

  Step 3: structure.AddFields
    { "employeeName": "${name}", "department": "${dept}" }

Calling chain:
  Step 1: MongoDB Reader (reads orders)
  Step 2: structure.FunctionCall
    Function Name: Fn_lookupEmployee
    In: employeeId → in0
    Out: out0 → employeeName, out1 → department
  Step 3: MongoDB Writer (saves enriched orders)

When to use a typed function vs a plain Chain Call:

Use `structure.FunctionCall`	Use `structure.ChainCall`
Chain is reused from multiple call sites	Chain is used exactly once
Caller needs to pass specific fields	Full record context is fine
Output fields need renaming at the call site	Caller consumes output fields directly
You want an explicit, documented contract	Internal implementation detail

Naming convention: typed function chains are conventionally prefixed with Fn (e.g. Fn_validateEmail, Fn_lookupEmployee).

Calling imported functions: prefix the chain name with the imported chainset name in brackets:

Function Name: [CommonUtilities]Fn_formatAddress

Conditional Execution (ChainIf / ChainIfElse)

Two steps handle conditional branching based on a boolean field in the current record:

structure.ChainIf — calls a named chain only when a boolean field is true (or false). If the condition is not met, the original record passes through unchanged.
structure.ChainIfElse — calls one of two named chains depending on whether a boolean field is true or false. The original record is always routed to exactly one of the two chains.

Both steps are 1 => 0..N: for each input record, output is whatever the called chain emits (zero or more records).

Typical pattern:

1. Compare / filter step that sets a boolean field
   e.g. filter.Compare: "amount > 1000" → field: isHighValue

2. structure.ChainIf
   Boolean Field: isHighValue
   Call If: true
   Chain Name: ProcessHighValueOrder
   (records where isHighValue=false pass through unchanged)

Or with two branches:

1. filter.Compare: "customerType == 'premium'" → field: isPremium

2. structure.ChainIfElse
   Boolean Field: isPremium
   True Chain: ProcessPremiumOrder
   False Chain: ProcessStandardOrder

Use structure.ChainIf / structure.ChainIfElse in preference to duplicate filter chains when you need to route records to different processing logic. The boolean field is typically set by a preceding compare or calculation step.

Chain Imports

Chainsets can import other chainsets to access their chains:

Example: - “CommonUtilities” chainset contains “FormatAddress”, “ValidateEmail”, “CalculateTax” - “OrderProcessing” chainset imports “CommonUtilities” - Chains in “OrderProcessing” can call “FormatAddress”, “ValidateEmail”, “CalculateTax”

Important: Avoid circular dependencies (A imports B, B imports C, C imports A).

Test Inputs

Test inputs are sample data stored with a chain for testing purposes.

Why Use Test Inputs?

Documentation - Show expected input format
Testing - Verify chain behavior with known data
Debugging - Reproduce issues with specific inputs
Demonstration - Show how the chain works to others
Unit Testing - Create multiple test cases for different scenarios

Creating Test Inputs

In the ETL Designer: 1. Select a chain 2. Click “Test Input” dropdown 3. Add test with name, description, and JSON data 4. Run chain with test input to verify behavior

Example Test Input:

{
  "name": "Valid Order Test",
  "description": "Tests order processing with valid data",
  "testJSON": {
    "orderId": "ORD-001",
    "customerId": "C12345",
    "items": [
      {"sku": "WIDGET-A", "quantity": 2, "price": 25.00}
    ]
  }
}

Job Queue and Execution

Job Queue

When an ETL chain is triggered (manually, by scheduler, or by another module), it enters a job queue.

Queue Behavior: - Multiple jobs can be queued for the same chainset - Jobs execute sequentially (one at a time per chainset) - Jobs from different chainsets can run concurrently - Long-running jobs don’t block other chainsets

Execution Lifecycle

Queued - Job waiting to execute
Running - Chain executing steps
Completed - Successful completion
Failed - Error occurred during execution
Cancelled - User cancelled the job

Monitoring Execution

In the ETL Designer Results panel: - View running and completed jobs - See execution time and record counts - Review step-by-step results - Access error messages and stack traces - Cancel running jobs if needed

Error Handling

How ETL Chains Signal Errors

ETL chains do not throw exceptions. Instead, errors are communicated through the record stream in one of two ways:

Error record: A step emits a record with an error field (e.g. { "error": "something went wrong" }). The record flows downstream and can be filtered, logged, or handled by subsequent steps.
No records emitted: When a step produces no output (e.g. a mongodb.Reader that matches nothing), downstream steps receive no input and the chain simply terminates with no records out. This is not an error — it is the normal result of a query that finds nothing.

Preventing Chain Termination on Empty Results

When a sub-chain may legitimately produce no records and you want the main flow to continue, wrap the call in a structure.ChainArray (Chain to Array) step. This calls another chain for each incoming record and writes its results into a named array field — even if the called chain produces no records, the array is empty and the outer record still flows on:

1. structure.ChainArray
   Chain Name: "LookupCustomer"
   Array Field: customerResults
   (customerResults will be [] if no match — main flow continues either way)

2. Check customerResults length / process array

Validation Pattern

ETL provides dedicated validation steps that accumulate issues into a validationIssues array, which can be inspected after all checks have run.

Key Steps

validation.MandatoryFields — checks that a list of fields are all present on the record. Any missing fields are added as a validationIssue automatically.
validation.AddValidationIssue — conditionally appends a message to the validationIssues array when a boolean field is false. The record passes through unchanged when the field is true.
validation.IsValid — writes true to a result field if validationIssues is empty, false otherwise. Use this after all checks to branch on the overall result.

How the Pattern Works

1. validation.MandatoryFields
   Fields: employeeId, leaveType, startDate, endDate
   (adds issues for any missing fields)

2. filter.Compare: startDate < endDate → field: datesValid
3. validation.AddValidationIssue
   IsValid Field: datesValid
   Message: "End date must be after start date"

4. filter.Compare: days > 0 → field: daysValid
5. validation.AddValidationIssue
   IsValid Field: daysValid
   Message: "Leave days must be greater than zero"

6. validation.IsValid
   Result Field: isValid

7. structure.ChainIfElse
   Boolean Field: isValid
   True Chain: ProcessValidRecord
   False Chain: ReturnValidationErrors

After all validation.AddValidationIssue steps have run, validationIssues contains one entry per failed check. validation.IsValid then summarises whether any issues were found.

Validation as Typed Functions

For reusable validation logic, validation checks are typically placed in dedicated typed-function chains (prefixed Fn_validate*). Each function takes the fields to check as inputs and returns a validationIssues array. The calling chain merges the results and checks the final issue count.

Designing for Errors

Best practices for robust chains: - Validate early - Check data quality at the start - Use validation steps - Explicit validation with clear error messages - Filter invalid records - Route bad records to error handling - Log errors - Write error records to error collection - Use ChainArray for optional lookups - Wrap sub-chain calls where no results is a valid outcome - Graceful degradation - Continue processing valid records when possible

Job State

Each job has a key/value store — the job state — that persists for the duration of that job. Two steps access it:

config.WriteState — writes a field value (or the entire record if no field is named) to a named key. The input record passes through unchanged.
config.ReadState — reads a value back from the named key and places it into a field on the current record (or merges it over the record if no field is named).

State is job-scoped: two users running the same chain at the same time each have independent state.

Save-Branch-Restore Pattern

The primary use case is preserving the input record across a 1 => N step so it can be recovered after the N => 1 aggregation:

1. config.WriteState
   Field: (empty — saves entire record)
   Key: originalInput

2. array.Unwind
   Field: items           ← 1 => N: one record becomes many

3. (process each item record)

4. array.Collect
   Field: processedItems  ← N => 1: many records collapse to one

5. config.ReadState
   Key: originalInput
   Field: (empty — merges saved record back)

6. (final record has both processedItems and the original input fields)

Without the state steps, the original input fields would be lost after the 1 => N / N => 1 round-trip.

Notes

Use state sparingly — it is effectively a global variable for the job.
Avoid writing state inside a flow of multiple records; the value changes with each record and other steps running in parallel may observe an inconsistent value.
State values can be any type: string, integer, array, or object.

Disabling Steps

Individual steps can be disabled: they are retained in the chain with their configuration intact but are skipped during execution. The step is shown greyed out in the designer.

How to Disable / Enable a Step

Select the step in the ETL Designer
Click “More Actions” → “Disable Step” (or “Enable Step” to re-enable)

When to Use Step Disable

Diagnostic steps in production — Add a mongodb.TeeWriter to capture intermediate records while debugging, then disable it when done. The step stays in the chain ready to be re-enabled next time without reconfiguring it.
Temporary bypass during debugging — Disable a slow or problematic step to test the rest of the chain without removing its configuration.
Alternative implementations — Keep two alternative versions of a step in the chain with one disabled, making it easy to switch between them.

Disabling a step is always preferable to deleting it when you may need it again.

Debug Logging

Debug logging provides detailed information about step execution.

Enabling Debug Logging

Select a step in the ETL Designer
Click “More Actions” → “Debug Logging On”
Step icon changes to indicate debug mode
Run the chain
View detailed logs in Results panel

What Debug Logging Shows

Input record to the step
Step parameters and configuration
Processing details
Output record from the step
Timing information
Any warnings or errors

Use debug logging to: - Understand how data transforms through steps - Diagnose unexpected behavior - Verify step configuration - Optimize performance

Groups

Groups organize chains within a chainset for better management.

Using Groups

Organize by purpose: “Import”, “Export”, “Validation”, “Reports”
Organize by domain: “Customers”, “Orders”, “Products”, “Inventory”
Organize by frequency: “Hourly”, “Daily”, “Weekly”, “On-Demand”

Group Features

Filter chain list to show only one group
Assign chains to groups via “Set Group” action
Each chain belongs to at most one group
Groups are purely organizational (no execution impact)

Next Steps

Quick Start Guide - Create your first ETL chain
ETL Designer - Learn the designer interface
Best Practices - Design effective ETL chains
Examples - See complete working examples

Next: Quick Start Guide