Integration with Other Modules
The ETL module is designed to integrate seamlessly with other Ambience modules, enabling powerful automation and data processing workflows.
Overview
ETL chains can be used in multiple contexts:
| Integration | Use Case | Trigger Method |
|---|---|---|
| Scheduler | Automated recurring tasks | Schedule jobs to run ETL chains |
| Workflows | Business process automation | Call ETL chains from workflow states/transitions |
| Datasets | On-demand data retrieval | Use ETL chains as dataset data sources |
| Dashboards | Real-time visualizations | ETL chains provide data for dashboard widgets |
| External Systems | API integration | ETL chains process webhook data or API calls |
Integration Patterns
Scheduled Automation
Use the Scheduler module to run ETL chains automatically: - Daily reports - Generate and email reports every morning - Data synchronization - Sync data between systems hourly - Maintenance tasks - Clean up old records weekly - Monitoring - Check system health every 5 minutes
See Scheduler Integration for details.
Workflow Actions
Use ETL chains within workflows for data transformation: - OnEntry actions - Process data when entering a workflow state - OnExit actions - Cleanup or validation when leaving a state - OnTransition actions - Transform data during state transitions - Guard conditions - Evaluate complex conditions with ETL logic
See Workflow Integration for details.
On-Demand Data Retrieval
Use ETL chains to provide data for interactive applications: - Dataset queries - Transform and aggregate data for datasets - Dashboard widgets - Provide real-time metrics and visualizations - Parameterized queries - Accept parameters for dynamic data retrieval - Cached results - Optimize performance with caching strategies
See Dataset & Dashboard Integration for details.
Common Integration Scenarios
Scenario 1: Scheduled Report with Email
Scheduler Job (runs daily at 8 AM)
↓
ETL Chain: "GenerateDailySalesReport"
1. MongoDB Reader (query yesterday's sales)
2. Aggregate by product category
3. Calculate totals and percentages
4. Generate Excel file
5. Compose email with report attached
6. Send email to management team
Scenario 2: Workflow with Data Enrichment
Workflow: "Leave Request Approval"
State: "Pending Approval"
OnEntry ETL Chain: "EnrichLeaveRequest"
1. Get employee details from HR database
2. Calculate remaining leave balance
3. Check manager information
4. Add enriched data to workflow instance
Scenario 3: Real-time Dashboard
Dashboard Widget: "Current Inventory Levels"
↓
Dataset: "InventoryStatus"
↓
ETL Chain: "GetInventoryStatus"
1. MongoDB Reader (read current inventory)
2. Join with product information
3. Calculate stock levels and reorder points
4. Flag low stock items
5. Return formatted results
Scenario 4: API Webhook Processing
External System → Webhook → Workflow → ETL Chain
Workflow: "ProcessWebhook"
OnEntry ETL Chain: "ValidateAndStoreWebhookData"
1. Validate webhook payload
2. Transform to internal format
3. Enrich with reference data
4. Store in database
5. Trigger downstream processes
Best Practices for Integration
Design for Reusability
Create ETL chains that can be used in multiple contexts: - Accept input parameters via incoming records - Return results in a standard format - Avoid context-specific assumptions - Document expected inputs and outputs
Handle Errors Gracefully
Integration points should handle errors appropriately: - Scheduler: Log errors, send alerts, don’t fail silently - Workflows: Return error status, allow workflow to handle - Datasets: Return empty results or error messages - APIs: Return appropriate HTTP status codes
Optimize for Context
Different integration contexts have different requirements: - Scheduler: Can run longer, process large datasets - Workflows: Should be fast, process single records - Datasets: Must be fast, support parameters - Dashboards: Must be very fast, consider caching
Test in Context
Test ETL chains in their actual integration context: - Test scheduled jobs with realistic schedules - Test workflow chains with actual workflow instances - Test dataset chains with expected parameters - Test with realistic data volumes
Monitor and Log
Implement appropriate monitoring for each integration: - Scheduler: Job completion status, execution time, errors - Workflows: Workflow instance updates, state transitions - Datasets: Query performance, cache hit rates - APIs: Response times, error rates, throughput
Security Considerations
Access Control
Different integration contexts may have different security requirements: - Scheduler: Runs with system privileges - Workflows: Runs with workflow instance user context - Datasets: Runs with requesting user’s privileges - APIs: Requires authentication and authorization
Data Sensitivity
Be aware of data sensitivity in different contexts: - Don’t expose sensitive data in dataset results - Log appropriately for audit requirements - Use encryption for sensitive data in transit - Follow data retention policies
Next Steps
- Scheduler Integration - Automate ETL chains with scheduling
- Workflow Integration - Use ETL in business processes
- Dataset & Dashboard Integration - Provide data for visualizations
- Examples - See complete integration examples