ETL
Overview
The Ambience ETL module is a powerful data processing system for building sophisticated data transformation pipelines. Unlike simple scripts or one-off queries, ETL provides a visual, composable, and reusable framework for creating complex data workflows that integrate seamlessly with other Ambience modules.
Key Capabilities: - Visual chain design with drag-and-drop interface - Streaming execution model for efficient processing - Over 800 steps across 70+ categories - Deep integration with Scheduler, Workflows, Datasets, and Dashboards - Reusable transformation logic via chain calls and imports - Complete testing framework with test inputs and debug logging
Why Use ETL?
ETL chains excel at data transformation and integration tasks such as: - Data migration and synchronization between systems - Automated report generation and distribution - Data validation and cleansing pipelines - API integration and orchestration - Log processing and monitoring - Scheduled maintenance and cleanup operations - Real-time data enrichment - Multi-source data aggregation
Benefits: - Visual Design - Build pipelines by composing steps, see flow at a glance - Reusability - Write once, use in multiple contexts (scheduler, workflows, datasets) - Streaming Architecture - Process large datasets efficiently without memory issues - Integration - Connect databases, APIs, files, and cloud services - Testing - Test inputs and debug logging for reliable chains - Scalability - One chain definition handles unlimited executions
Getting Started
New to ETL? Start here:
- Introduction - Understand what ETL is and when to use it
- Core Concepts - Learn about chainsets, chains, steps, and records
- Quick Start Guide - Create your first ETL chain in 10 minutes
Documentation Sections
For End Users
- ETL Management - Create, edit, and manage chainsets
- ETL Designer - Design and test ETL chains visually
For Developers
- Integration - Use ETL with Scheduler, Workflows, Datasets
- Examples - Complete real-world examples
- Best Practices - Design patterns, optimization, error handling
Reference
- Configuration - Chainset setup, roles, imports, deployment
- Glossary - Definitions of key terms
- FAQ - Common questions and answers
Module Interfaces
| Interface | Description | Required Privilege |
|---|---|---|
| ETL Management | Manage chainsets, configure access control, import/export | Mandatory: mod-etl For modifying: ownership of chainset |
| ETL Designer | Design chains, add steps, test execution, view results | Mandatory: mod-etl-designer For read access: role access OR ownership For write access: ownership |
ETL Step Categories
The ETL module provides over 800 steps organized into 70+ categories:
- Array (38 steps) - Array manipulation and operations
- String (47 steps) - Text processing and formatting
- MongoDB (33 steps) - Database operations
- Structure (49 steps) - Control flow and chain composition
- Number (157 steps) - Mathematical operations and calculations
- Date/Time (39 steps) - Date and timestamp operations
- File (19 steps) - File system operations
- JSON (10 steps) - JSON parsing and manipulation
- Mail (5 steps) - Email operations
- REST (5 steps) - HTTP API calls
- Validation (17 steps) - Data validation
- And 60+ more categories…
Browse the Dictionary in ETL Designer to explore all available steps. A comprehensive ETL Cookbook with recipes and examples for common tasks is planned as a separate reference guide.
Common Use Cases
Data Migration & Synchronization
Migrate data from legacy systems, transform formats, validate quality, and load to modern databases.
Report Generation & Distribution
Extract data from operational databases, aggregate metrics, generate Excel/PDF reports, and email to recipients on a schedule.
Data Validation & Cleansing
Validate incoming data against business rules, standardize formats, remove duplicates, and enrich with reference data.
API Integration & Orchestration
Call external REST APIs, transform responses, aggregate from multiple sources, and handle authentication and retries.
Log Processing & Monitoring
Parse log files, extract metrics, aggregate statistics, detect anomalies, and alert on thresholds.
Scheduled Maintenance & Cleanup
Archive old records, delete expired data, rebuild indexes, and maintain system health automatically.
Integration with Other Modules
ETL chains integrate seamlessly with other Ambience modules:
| Module | Integration | Use Case |
|---|---|---|
| Scheduler | Run chains on schedule | Daily reports, hourly sync, maintenance tasks |
| Workflows | Call chains from states/transitions | Data transformation in business processes |
| Datasets | Provide on-demand data | Interactive dashboards and visualizations |
| Dashboards | Real-time data retrieval | Live metrics and analytics |
See Integration for detailed guides.
Quick Links
Getting Started: - Quick Start Guide - Create your first chain in 10 minutes - Core Concepts - Understand the fundamentals - Examples - See complete working examples
Using ETL: - ETL Management - Manage chainsets - ETL Designer - Design and test chains - Best Practices - Design effective chains
Integration: - Scheduler Integration - Automate with scheduling - Workflow Integration - Use in business processes - Dataset Integration - Provide data for dashboards
Reference: - Configuration - Setup and deployment - Glossary - Key terms and definitions - FAQ - Common questions answered