Scheduler Integration
Overview
The Scheduler module allows you to run ETL chains automatically at specified times or intervals. This is ideal for: - Recurring data processing tasks - Automated report generation - Data synchronization between systems - Maintenance and cleanup operations - Monitoring and alerting
Creating a Scheduled Job
Step 1: Create the ETL Chain
First, create and test your ETL chain in the ETL Designer:
Example Chain: "DailySalesReport"
1. mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
(read yesterday's sales)
2. Aggregate by product category
3. Calculate totals and percentages
4. Generate Excel report
5. Send email with report
Test the chain thoroughly with test inputs before scheduling.
Step 2: Create a Scheduler Job
- Navigate to the Scheduler module
- Click Add Job
- Configure the job:
- Name: “Daily Sales Report Job”
- Type: Select “ETL”
- Chainset: Select your chainset
- Chain: Select your chain
- Schedule: Configure when to run (e.g., “Every day at 8:00 AM”)
- Enabled: Check to activate
Step 3: Configure Job Parameters
If your ETL chain expects input parameters, configure them in the job:
Example: Pass date range to the chain
{
"startDate": "${yesterday}",
"endDate": "${today}",
"reportType": "daily"
}
The scheduler will pass this JSON as the initial record to your chain.
Step 4: Monitor Execution
After the job runs: 1. Check job history in the Scheduler module 2. View execution status (success/failure) 3. See execution time and any error messages 4. Review ETL chain results in ETL Designer
Schedule Patterns
Fixed Time Schedules
Run at specific times: - Daily at 8:00 AM - Morning reports - Every Monday at 9:00 AM - Weekly summaries - First day of month at 6:00 AM - Monthly processing
Interval Schedules
Run at regular intervals: - Every 5 minutes - Real-time monitoring - Every hour - Hourly data synchronization - Every 6 hours - Periodic updates
Cron Expressions
Use cron syntax for complex schedules: - 0 8 * * 1-5 - Weekdays at 8:00 AM - 0 0 1 * * - First day of each month at midnight - */15 * * * * - Every 15 minutes
Common Patterns
Note: The chain examples below use
${yesterday},${lastHour}, etc. in aggregation strings. These are resolved by thestring.Substitutestep at runtime, meaning the record must already contain fields namedyesterday,lastHour, etc. The scheduler job configuration should pass those values as parameters (e.g."yesterday": "${yesterday}") so the scheduler resolves them before the chain starts. Any date values not in the scheduler variables table (such as30DaysAgo) must be computed inside the chain usingdate.Todayanddate.Add-Subtractbefore thestring.Substitutestep.
Daily Report Generation
Schedule: Every day at 8:00 AM
ETL Chain: "GenerateDailyReport"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"date": "${yesterday}"}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader (reads previous day's data)
4. Aggregate Sales by Category
- Groups by product category
- Calculates totals
5. Generate Excel Report
- Creates formatted spreadsheet
- Includes charts and summaries
6. Compose Email
- Subject: "Daily Sales Report - ${yesterday}"
- Body: Summary text
- Attachment: Excel file
7. Send Mail
- Recipients: management team
- Send report
Hourly Data Synchronization
Schedule: Every hour
ETL Chain: "SyncCustomerData"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: customers
Aggregation: [{"$match": {"lastModified": {"$gte": "${lastHour}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader (reads recently modified customers)
4. Transform to External Format
- Map fields to external system schema
- Format dates and values
5. REST API Call
- POST to external system
- Send customer updates
6. Log Results
- Write sync status to log collection
- Record success/failure
Weekly Cleanup
Schedule: Every Sunday at 2:00 AM
ETL Chain: "CleanupOldRecords"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: your-collection
Aggregation: [{"$match": {"createdDate": {"$lt": "${30DaysAgo}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader (reads records older than 30 days)
4. Archive to File
- Export to JSON file
- Store in archive location
5. MongoDB Delete
- Remove archived records from database
- Free up space
6. Send Summary Email
- Report records archived
- Report space freed
Monitoring and Alerting
Schedule: Every 5 minutes
ETL Chain: "MonitorSystemHealth"
1. Check Database Connections
- Test connectivity
- Measure response time
2. Check Disk Space
- Query system metrics
- Calculate usage percentages
3. Check Error Logs
- Read recent error entries
- Count errors by type
4. Filter Alerts
- Identify issues requiring attention
- Filter by severity
5. Send Alert Email (if issues found)
- Notify operations team
- Include issue details
Passing Parameters to Scheduled Chains
Using Scheduler Variables
The scheduler resolves built-in variables when constructing the job parameters JSON. Use them in your scheduler job configuration:
| Variable | Description | Example |
|---|---|---|
${today} |
Current date | “2026-03-03” |
${yesterday} |
Previous day | “2026-03-02” |
${lastHour} |
One hour ago timestamp | “2026-03-03T12:00:00Z” |
${lastWeek} |
Seven days ago | “2026-02-24” |
${lastMonth} |
One month ago | “2026-02-03” |
These are scheduler-only placeholders — they are resolved before the initial record is passed to the ETL chain. Inside the chain, access the resolved values as ordinary record fields using string.Substitute. For example, if the job parameters include "reportDate": "${yesterday}", the chain receives reportDate = "2026-03-02" and can substitute it in an aggregation string as ${reportDate}.
For ETL chains that are not scheduler-triggered, compute dates using date.Today and date.Add-Subtract, or timestamp.Now.
Custom Parameters
Define custom parameters in the job configuration:
{
"reportType": "sales",
"region": "EMEA",
"includeDetails": true,
"emailRecipients": ["manager@example.com", "analyst@example.com"]
}
Access these in your ETL chain using field references.
Dynamic Date Ranges
Calculate date ranges dynamically:
{
"startDate": "${firstDayOfMonth}",
"endDate": "${lastDayOfMonth}",
"reportPeriod": "monthly"
}
Error Handling in Scheduled Jobs
Chain-Level Error Handling
Design ETL chains to handle errors gracefully:
1. Try to process records
2. On error:
- Log error details
- Write failed records to error collection
- Continue processing remaining records
3. Send summary email
- Report success count
- Report error count
- Include error details if any
Job-Level Error Handling
When a scheduled job fails, it is not automatically restarted. Instead, the job owner and any users with the mod-production-manager privilege receive a notification.
Automatic restart is intentionally not supported: by the time a failure is detected, the job may be in an indeterminate state. Rollback is often not feasible — for example, emails that have been sent or MQTT messages that have been published cannot be recalled. Restarting from an unknown midpoint could cause duplicate side-effects or data corruption.
If a failed job needs to be re-run, it should be triggered manually after the root cause has been investigated and resolved.
Monitoring Job Health
Regularly review: - Job execution history - Success/failure rates - Execution duration trends - Error patterns
Set up alerts for: - Job failures - Unusually long execution times - Repeated errors - Jobs not running on schedule
Performance Considerations
Avoid Peak Times
Schedule resource-intensive jobs during off-peak hours: - Run large data processing at night - Avoid scheduling during business hours - Stagger multiple jobs to avoid conflicts
Optimize Chain Performance
For scheduled chains: - Filter data at the source (database queries) - Process only changed data (incremental processing) - Use efficient step patterns - Avoid unnecessary transformations
Incremental Processing
Instead of processing all data every time:
1. Read last processed timestamp from control collection
2. Query only records modified since last run
3. Process new/changed records
4. Update control collection with current timestamp
This is much more efficient than full reprocessing.
Parallel Processing
For very large datasets, consider: - Breaking into multiple smaller jobs - Processing different data segments in parallel - Using separate chainsets to enable concurrent execution
Best Practices
Test Before Scheduling
- Create and test the ETL chain manually
- Test with realistic data volumes
- Test error scenarios
- Verify email notifications work
- Only then schedule the job
Start with Disabled Jobs
When creating a new scheduled job: 1. Create it in disabled state 2. Verify the schedule is correct 3. Test manually first 4. Enable only when confident
Monitor Initially
When first scheduling a job: - Monitor the first few executions closely - Check results are as expected - Verify timing is appropriate - Adjust schedule if needed
Document Job Purpose
For each scheduled job, document: - What it does - Why it runs at that schedule - Who depends on it - What to do if it fails - Contact person for issues
Use Meaningful Names
Name jobs clearly: - ✅ “Daily Sales Report - 8AM” - ✅ “Hourly Customer Sync to CRM” - ❌ “Job1”, “Test”, “ETL”
Set Up Alerts
Configure alerts for: - Job failures - Jobs taking too long - Jobs not producing expected results - Critical jobs not running
Review Regularly
Periodically review scheduled jobs: - Are they still needed? - Is the schedule still appropriate? - Are they performing well? - Can they be optimized? - Should they be consolidated?
Troubleshooting
Job Doesn’t Run
Check: - Job is enabled - Schedule is correct (check timezone) - Chainset is enabled - Chain exists and is valid - User has appropriate privileges
Job Fails
Check: - ETL chain results for error messages - Database connections are valid - External APIs are accessible - Required data exists - Parameters are correct
Job Runs Too Long
Optimize: - Add filters to reduce data volume - Use incremental processing - Optimize database queries - Remove unnecessary steps - Consider breaking into smaller jobs
Unexpected Results
Verify: - Input parameters are correct - Date variables are as expected - Chain logic is correct - Test chain manually with same parameters - Check for data changes
Example: Complete Scheduled Report
Here’s a complete example of a scheduled daily report:
Scheduler Job Configuration:
Name: Daily Sales Report
Type: ETL
Chainset: ReportGeneration
Chain: DailySalesReport
Schedule: Every day at 8:00 AM
Enabled: Yes
Parameters:
{
"reportDate": "${yesterday}",
"recipients": ["sales@example.com", "management@example.com"]
}
ETL Chain: “DailySalesReport”
1. JSON Record (get parameters)
- Receives: { reportDate: "2026-03-02", recipients: [...] }
2. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"orderDate": "${reportDate}", "status": "completed"}}]
3. string.Substitute
Field: aggregation
4. mongodb.Reader
5. Aggregate by Product
- Group by: productCategory
- Sum: orderTotal
- Count: orderCount
6. Sort by Total Descending
- Sort field: orderTotal
- Order: descending
7. Generate Excel
- Template: sales_report_template.xlsx
- Sheet: Daily Sales
- Include charts
8. Compose Email
- To: ${recipients}
- Subject: "Daily Sales Report - ${reportDate}"
- Body: "Please find attached the daily sales report."
- Attachment: generated Excel file
9. Send Mail
- Send the composed email
10. timestamp.Now
- To Field: now
11. MongoDB Writer (log completion)
- Pool: your-pool-name
- Collection: report_log
- Document: { reportType: "daily_sales", date: "${reportDate}",
status: "sent", timestamp: "${now}" }
This chain runs automatically every morning, generates the report, and emails it to the sales and management teams.
Next Steps
- Workflow Integration - Use ETL in business processes
- Dataset Integration - Provide data for visualizations
- Examples - More complete examples
- Best Practices - Design effective ETL chains