ETL

Overview

The Ambience ETL module is a powerful data processing system for building sophisticated data transformation pipelines. Unlike simple scripts or one-off queries, ETL provides a visual, composable, and reusable framework for creating complex data workflows that integrate seamlessly with other Ambience modules.

Key Capabilities: - Visual chain design with drag-and-drop interface - Streaming execution model for efficient processing - Over 800 steps across 70+ categories - Deep integration with Scheduler, Workflows, Datasets, and Dashboards - Reusable transformation logic via chain calls and imports - Complete testing framework with test inputs and debug logging

Why Use ETL?

ETL chains excel at data transformation and integration tasks such as: - Data migration and synchronization between systems - Automated report generation and distribution - Data validation and cleansing pipelines - API integration and orchestration - Log processing and monitoring - Scheduled maintenance and cleanup operations - Real-time data enrichment - Multi-source data aggregation

Benefits: - Visual Design - Build pipelines by composing steps, see flow at a glance - Reusability - Write once, use in multiple contexts (scheduler, workflows, datasets) - Streaming Architecture - Process large datasets efficiently without memory issues - Integration - Connect databases, APIs, files, and cloud services - Testing - Test inputs and debug logging for reliable chains - Scalability - One chain definition handles unlimited executions

Getting Started

New to ETL? Start here:

  1. Introduction - Understand what ETL is and when to use it
  2. Core Concepts - Learn about chainsets, chains, steps, and records
  3. Quick Start Guide - Create your first ETL chain in 10 minutes

Documentation Sections

For End Users

For Developers

Reference

  • Configuration - Chainset setup, roles, imports, deployment
  • Glossary - Definitions of key terms
  • FAQ - Common questions and answers

Module Interfaces

Interface Description Required Privilege
ETL Management Manage chainsets, configure access control, import/export Mandatory: mod-etl

For modifying: ownership of chainset
ETL Designer Design chains, add steps, test execution, view results Mandatory: mod-etl-designer

For read access: role access OR ownership

For write access: ownership

ETL Step Categories

The ETL module provides over 800 steps organized into 70+ categories:

  • Array (38 steps) - Array manipulation and operations
  • String (47 steps) - Text processing and formatting
  • MongoDB (33 steps) - Database operations
  • Structure (49 steps) - Control flow and chain composition
  • Number (157 steps) - Mathematical operations and calculations
  • Date/Time (39 steps) - Date and timestamp operations
  • File (19 steps) - File system operations
  • JSON (10 steps) - JSON parsing and manipulation
  • Mail (5 steps) - Email operations
  • REST (5 steps) - HTTP API calls
  • Validation (17 steps) - Data validation
  • And 60+ more categories…

Browse the Dictionary in ETL Designer to explore all available steps. A comprehensive ETL Cookbook with recipes and examples for common tasks is planned as a separate reference guide.

Common Use Cases

Data Migration & Synchronization

Migrate data from legacy systems, transform formats, validate quality, and load to modern databases.

Report Generation & Distribution

Extract data from operational databases, aggregate metrics, generate Excel/PDF reports, and email to recipients on a schedule.

Data Validation & Cleansing

Validate incoming data against business rules, standardize formats, remove duplicates, and enrich with reference data.

API Integration & Orchestration

Call external REST APIs, transform responses, aggregate from multiple sources, and handle authentication and retries.

Log Processing & Monitoring

Parse log files, extract metrics, aggregate statistics, detect anomalies, and alert on thresholds.

Scheduled Maintenance & Cleanup

Archive old records, delete expired data, rebuild indexes, and maintain system health automatically.

Integration with Other Modules

ETL chains integrate seamlessly with other Ambience modules:

Module Integration Use Case
Scheduler Run chains on schedule Daily reports, hourly sync, maintenance tasks
Workflows Call chains from states/transitions Data transformation in business processes
Datasets Provide on-demand data Interactive dashboards and visualizations
Dashboards Real-time data retrieval Live metrics and analytics

See Integration for detailed guides.

Quick Links

Getting Started: - Quick Start Guide - Create your first chain in 10 minutes - Core Concepts - Understand the fundamentals - Examples - See complete working examples

Using ETL: - ETL Management - Manage chainsets - ETL Designer - Design and test chains - Best Practices - Design effective chains

Integration: - Scheduler Integration - Automate with scheduling - Workflow Integration - Use in business processes - Dataset Integration - Provide data for dashboards

Reference: - Configuration - Setup and deployment - Glossary - Key terms and definitions - FAQ - Common questions answered