Frequently Asked Questions

General Questions

What is ETL?

ETL stands for Extract, Transform, Load. It’s a data processing pattern for: - Extracting data from various sources (databases, files, APIs) - Transforming data (cleaning, enriching, calculating, formatting) - Loading data to destinations (databases, files, systems)

The Ambience ETL module provides a visual, composable framework for building ETL pipelines.

When should I use ETL vs Workflows?

Use ETL when: - Multi-step data transformation required - Integrating multiple data sources - Processing large datasets - Reusable transformation logic needed - Scheduled or automated data operations

Use Workflows when: - State-dependent business processes - Human interaction required - Long-running processes with approvals - Audit trail is critical - Process may pause and resume

Use Both Together: - Workflows for process orchestration - ETL chains for data transformation within workflows

See Introduction for detailed comparison.

How many steps are available?

Over 800 steps across 70+ categories including: - Array, String, Number operations - MongoDB, JDBC database operations - File, JSON, XML processing - REST API calls, Email - Date/Time, Validation - And many more

Browse the Dictionary in ETL Designer to explore all available steps.

What is a chainset vs a chain vs a step?

  • Chainset - Container for related chains (like a module or package)
  • Chain - Sequence of steps that transforms data (like a function)
  • Step - Individual transformation operation (like a statement)

See Core Concepts for detailed explanations.

Design Questions

How do I choose which steps to use?

  1. Identify your goal - What transformation do you need?
  2. Browse categories - Open Dictionary, find relevant category
  3. Read step help - Understand what each step does
  4. Test incrementally - Add one step, test, add next

Common starting points: - Read data: MongoDB Reader, JDBC Reader, File Reader, JSON Record - Transform: String operations, Number operations, structure.AddFields - Write data: MongoDB Writer, JDBC Writer, File Writer, Send Mail

How do I handle errors in ETL chains?

Validation Pattern:

1. Read data
2. validation.MandatoryFields — check required fields exist
3. filter.Compare → validation.AddValidationIssue (repeat per check)
4. validation.IsValid → result field: isValid
5. structure.ChainIfElse
     True Chain: ProcessValidRecord
     False Chain: HandleErrors (surface validationIssues array)

validation.AddValidationIssue appends a message to the validationIssues array when its boolean field is false. Running all checks before branching surfaces every issue at once.

Error Logging Pattern:

1. Try operation
2. On error:
   - Add error message to record
   - Write to error collection
   - Optionally send alert

See Best Practices for detailed strategies.

How do I test ETL chains?

  1. Create test inputs - Sample data for different scenarios
  2. Test incrementally - Add one step, run, verify, add next
  3. Use debug logging - See exact input/output for each step
  4. Test edge cases - Empty data, invalid data, large datasets
  5. Test in context - With actual databases, APIs, integrations

See Best Practices for comprehensive testing approaches.

How do I make my chains reusable?

Use Typed Functions (recommended for shared logic):

For chains called from multiple places, use structure.FunctionSignature + structure.FunctionCall. This gives your function an explicit contract with named inputs and outputs, so callers don’t depend on implicit field names.

Function chain: "Fn_validateEmail"
  Step 1: structure.FunctionSignature
    Parameters: emailAddress
    Returns: isValid, validationMessage

  Step 2: (validation logic)

  Step 3: structure.AddFields
    { "isValid": true/false, "validationMessage": "..." }

Calling chain:
  Step N: structure.FunctionCall
    Function Name: Fn_validateEmail
    In: customerEmail → in0
    Out: out0 → emailIsValid, out1 → emailMessage

Name function chains with an Fn prefix by convention.

Use plain Chain Calls for one-off sub-chains:

Main Chain:
  1. Read data
  2. structure.ChainCall "ValidateOrder"
  3. structure.ChainCall "EnrichOrderData"
  4. Process data

Use Chain Imports for cross-chainset reuse:

Create CommonUtilities chainset:
  - Fn_formatAddress
  - Fn_validateEmail
  - Fn_calculateTax

Import in other chainsets:
  - Call imported functions as [CommonUtilities]Fn_formatAddress

See Best Practices.

How do I optimize slow chains?

Common optimizations: 1. Filter early - Reduce data volume as soon as possible 2. Optimize queries - Use specific database queries, not “read all then filter” 3. Create indexes - On frequently queried database fields 4. Minimize expansion - Avoid 1 => N steps early in chain 5. Remove unnecessary steps - Each step adds overhead

Debugging performance: - Enable debug logging to see step execution times - Check Results panel for total execution time - Identify bottleneck steps - Optimize or replace slow steps

See Best Practices.

Execution Questions

What happens when a step fails?

Default behavior: - Chain execution stops immediately - Job marked as failed - Error message captured in Results panel - Records processed before failure are not rolled back

Handling failures: - Design chains to validate early - Use filter steps to route invalid records - Log errors to error collection - Consider retry logic for transient failures

Can I run multiple chains concurrently?

Within same chainset: No - Jobs for same chainset execute sequentially - Jobs queue if multiple triggered

Different chainsets: Yes - Different chainsets can run concurrently - No interference between chainsets

Workaround for parallelism: - Split work across multiple chainsets - Each can run concurrently

How does the job queue work?

When a chain is triggered: 1. Job enters queue for that chainset 2. If no job running, starts immediately 3. If job running, waits in queue 4. Jobs execute in order received 5. Next job starts when previous completes

Implications: - Multiple users triggering same chain → jobs queue - Scheduler jobs queue with manual executions - Long-running jobs block subsequent jobs for that chainset

Can I cancel a running job?

Yes, in the ETL Designer Results panel: 1. Find the running job (status: “In Progress”) 2. Click “Cancel Job” button 3. Job stops, status changes to “Cancelled”

Note: Cancellation may not be immediate if step is in middle of operation.

Integration Questions

How do I schedule ETL chains?

Use the Scheduler module: 1. Create and test your ETL chain 2. In Scheduler, create new job 3. Select “ETL” type 4. Choose chainset and chain 5. Configure schedule (daily, hourly, cron expression) 6. Enable job

See Scheduler Integration for detailed guide.

How do I use ETL with Workflows?

ETL chains can be called from workflows at: - OnEntry - When entering a state - OnExit - When leaving a state - OnTransition - During state transition - Guard - To evaluate conditions

Workflow-specific ETL steps available: - workflow.CreateInstance - workflow.SendEvent - workflow.AddToElxPublic/Private - And more

See Workflow Integration for detailed guide.

How do I use ETL with Datasets/Dashboards?

Configure dataset to use ETL chain: 1. Create ETL chain that accepts parameters 2. Chain queries and transforms data 3. Returns results 4. In Dataset module, select “ETL” as data source 5. Choose chainset and chain 6. Define parameters

See Dataset Integration for detailed guide.

Can ETL chains call external APIs?

Yes, using REST API steps: - REST GET - Retrieve data from API - REST POST - Send data to API - REST PUT - Update data via API - REST DELETE - Delete via API

Example:

1. Prepare request data
2. REST POST
   URL: https://api.example.com/endpoint
   Headers: { "Authorization": "Bearer ${token}" }
   Body: ${requestData}
3. Parse response
4. Process results

Data Questions

What format is data in ETL chains?

Data flows as JSON records:

{
  "field1": "value1",
  "field2": 123,
  "field3": ["array", "values"],
  "field4": {
    "nested": "object"
  }
}

Each step receives a record, transforms it, and passes it forward.

How do I access field values in steps?

Use field references in step parameters: - fieldName - Direct field reference - ${fieldName} - Variable substitution in strings (via string.Substitute) - nested.field - Nested field access - array[0] - Array element access

Example: direct field reference

string.Upper
  Field: "customerName"
  To Field: "customerName"

Example: ${fieldName} substitution in a MongoDB aggregation pipeline

MongoDB reads use a three-step pattern. Define the aggregation pipeline as a string with ${placeholders}, substitute runtime values, then execute:

1. mongodb.AggregationDefinition
   Pool: your-pool-name
   Collection: customers
   Aggregation: [{"$match": {"customerId": "${customerId}", "date": "${reportDate}"}}]

2. string.Substitute
   Field: aggregation

3. mongodb.Reader

How do I handle large datasets?

ETL uses streaming execution: - Records processed one at a time - Low memory usage - Can handle millions of records - Automatic backpressure

Best practices: - Filter early to reduce volume - Use database queries to limit at source - Avoid collecting all records into arrays - Process incrementally when possible

Can I process multiple records at once?

Most steps process one record at a time (streaming).

Some steps aggregate multiple records: - Aggregation steps (sum, count, average) - Sort, Distinct - Array Collect - Merge

These steps buffer records in memory, so use carefully with large datasets.

Troubleshooting Questions

Why can’t I see my chainset?

Check: - Chainset is enabled (green checkmark) - You have appropriate role OR are owner - Workgroup filter not hiding it - You have mod-etl privilege

Why can’t I modify my chainset?

Only owners can modify chainsets.

Check: - You are the owner (created it) - Chainset not locked by another user in ETL Designer

Why does my chain fail with “field not found”?

Common causes: - Field name typo in step parameter - Field doesn’t exist in incoming record - Case sensitivity (field names are case-sensitive)

Debug: - Enable debug logging on previous step - Check exact field names in record - Verify spelling and case

Why is my chain slow?

Common causes: - Reading entire database then filtering - Multiple 1 => N expansions early - Inefficient database queries - Missing database indexes - Unnecessary transformations

Solutions: - Filter at database query level - Create appropriate indexes - Move filters earlier in chain - Remove unnecessary steps - See Performance Optimization

How do I debug a chain?

Steps: 1. Enable debug logging on suspected step 2. Run with test input - Use minimal test data 3. Check Results panel - View step-by-step output 4. Verify parameters - Check all step parameters correct 5. Test incrementally - Run to each step, verify output 6. Check logs - Look for warnings or errors

See Best Practices.

Advanced Questions

Can I use custom code in ETL chains?

ETL chains use predefined steps, not custom code. However: - Over 800 steps available - Steps cover most common operations - Complex logic can be built by composing steps - For truly custom needs, new steps can be developed (requires Scala development)

How do I version control my chainsets?

Recommended approach: 1. Export chainsets as JSON files 2. Store in version control system (Git, SVN, etc.) 3. Commit with meaningful messages 4. Tag releases 5. Document changes

Workflow: - Development: Create/modify in ETL Designer - Export: Download as JSON - Commit: Add to version control - Deploy: Import to other environments

Can I migrate chainsets between environments?

Yes: 1. Export from source environment (Download) 2. Review JSON file 3. Update database references if needed 4. Import to target environment (Upload) 5. Update roles/workgroups if needed 6. Test before enabling

See Configuration.

What happens to running jobs when I modify a chain?

Jobs use the chain definition at the time they started.

  • Running jobs continue with old definition
  • New jobs use new definition
  • No impact on running jobs

Best practice: Test changes before modifying production chains.

Can I call one chainset from another?

Yes, using chain imports: 1. Chainset A imports Chainset B 2. Chains in A can call chains in B using Chain Call steps 3. Changes to B affect all importers

Important: Avoid circular dependencies.

See Configuration.

Getting Help

Where can I find step documentation?

In ETL Designer: - Click “Dictionary” button (upper right) - Browse categories or search - Click step to see detailed help

In Add Step dialog: - Select step - View help panel on right - Shows parameters, input/output, examples

Where can I find more examples?

How do I report issues or request features?

Contact your Ambience administrator or support team.

Next Steps