Integration with Other Modules

The ETL module is designed to integrate seamlessly with other Ambience modules, enabling powerful automation and data processing workflows.

Overview

ETL chains can be used in multiple contexts:

Integration Use Case Trigger Method
Scheduler Automated recurring tasks Schedule jobs to run ETL chains
Workflows Business process automation Call ETL chains from workflow states/transitions
Datasets On-demand data retrieval Use ETL chains as dataset data sources
Dashboards Real-time visualizations ETL chains provide data for dashboard widgets
External Systems API integration ETL chains process webhook data or API calls

Integration Patterns

Scheduled Automation

Use the Scheduler module to run ETL chains automatically: - Daily reports - Generate and email reports every morning - Data synchronization - Sync data between systems hourly - Maintenance tasks - Clean up old records weekly - Monitoring - Check system health every 5 minutes

See Scheduler Integration for details.

Workflow Actions

Use ETL chains within workflows for data transformation: - OnEntry actions - Process data when entering a workflow state - OnExit actions - Cleanup or validation when leaving a state - OnTransition actions - Transform data during state transitions - Guard conditions - Evaluate complex conditions with ETL logic

See Workflow Integration for details.

On-Demand Data Retrieval

Use ETL chains to provide data for interactive applications: - Dataset queries - Transform and aggregate data for datasets - Dashboard widgets - Provide real-time metrics and visualizations - Parameterized queries - Accept parameters for dynamic data retrieval - Cached results - Optimize performance with caching strategies

See Dataset & Dashboard Integration for details.

Common Integration Scenarios

Scenario 1: Scheduled Report with Email

Scheduler Job (runs daily at 8 AM)
  ↓
ETL Chain: "GenerateDailySalesReport"
  1. MongoDB Reader (query yesterday's sales)
  2. Aggregate by product category
  3. Calculate totals and percentages
  4. Generate Excel file
  5. Compose email with report attached
  6. Send email to management team

Scenario 2: Workflow with Data Enrichment

Workflow: "Leave Request Approval"
  State: "Pending Approval"
    OnEntry ETL Chain: "EnrichLeaveRequest"
      1. Get employee details from HR database
      2. Calculate remaining leave balance
      3. Check manager information
      4. Add enriched data to workflow instance

Scenario 3: Real-time Dashboard

Dashboard Widget: "Current Inventory Levels"
  ↓
Dataset: "InventoryStatus"
  ↓
ETL Chain: "GetInventoryStatus"
  1. MongoDB Reader (read current inventory)
  2. Join with product information
  3. Calculate stock levels and reorder points
  4. Flag low stock items
  5. Return formatted results

Scenario 4: API Webhook Processing

External System → Webhook → Workflow → ETL Chain
  
Workflow: "ProcessWebhook"
  OnEntry ETL Chain: "ValidateAndStoreWebhookData"
    1. Validate webhook payload
    2. Transform to internal format
    3. Enrich with reference data
    4. Store in database
    5. Trigger downstream processes

Best Practices for Integration

Design for Reusability

Create ETL chains that can be used in multiple contexts: - Accept input parameters via incoming records - Return results in a standard format - Avoid context-specific assumptions - Document expected inputs and outputs

Handle Errors Gracefully

Integration points should handle errors appropriately: - Scheduler: Log errors, send alerts, don’t fail silently - Workflows: Return error status, allow workflow to handle - Datasets: Return empty results or error messages - APIs: Return appropriate HTTP status codes

Optimize for Context

Different integration contexts have different requirements: - Scheduler: Can run longer, process large datasets - Workflows: Should be fast, process single records - Datasets: Must be fast, support parameters - Dashboards: Must be very fast, consider caching

Test in Context

Test ETL chains in their actual integration context: - Test scheduled jobs with realistic schedules - Test workflow chains with actual workflow instances - Test dataset chains with expected parameters - Test with realistic data volumes

Monitor and Log

Implement appropriate monitoring for each integration: - Scheduler: Job completion status, execution time, errors - Workflows: Workflow instance updates, state transitions - Datasets: Query performance, cache hit rates - APIs: Response times, error rates, throughput

Security Considerations

Access Control

Different integration contexts may have different security requirements: - Scheduler: Runs with system privileges - Workflows: Runs with workflow instance user context - Datasets: Runs with requesting user’s privileges - APIs: Requires authentication and authorization

Data Sensitivity

Be aware of data sensitivity in different contexts: - Don’t expose sensitive data in dataset results - Log appropriately for audit requirements - Use encryption for sensitive data in transit - Follow data retention policies

Next Steps