Dataset & Dashboard Integration
Overview
ETL chains can provide data for Datasets and Dashboards, enabling interactive data visualization and exploration. This integration allows you to: - Create parameterized data queries - Transform and aggregate data on-demand - Provide real-time data for dashboards - Support interactive filtering and drill-down
How It Works
When a Dataset or Dashboard requests data: 1. User interacts with dataset/dashboard (applies filters, selects parameters) 2. Dataset configuration specifies an ETL chain to execute 3. Parameters are passed to the ETL chain as input record 4. ETL chain executes, transforming and retrieving data 5. Results are returned to the dataset/dashboard 6. Data is displayed to the user
Creating a Dataset with ETL
Step 1: Create the ETL Chain
Design an ETL chain that: - Accepts parameters via input record - Queries and transforms data - Returns results in expected format
Example Chain: “GetSalesByRegion”
1. JSON Record (receives parameters)
Input: { "region": "EMEA", "startDate": "2026-01-01", "endDate": "2026-03-31" }
2. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"region": "${region}",
"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
3. string.Substitute
Field: aggregation
4. mongodb.Reader
5. Aggregate by Product
Group by: productCategory
Sum: orderTotal
Count: orderCount
6. Sort by Total Descending
Sort field: orderTotal
7. Return results
(Results automatically returned to dataset)
Step 2: Configure the Dataset
In the Dataset module: 1. Create a new dataset 2. Select “ETL” as the data source type 3. Choose the chainset and chain 4. Define parameters that users can set 5. Configure display options
Step 3: Test the Dataset
Test with different parameter values to ensure: - Data is retrieved correctly - Transformations work as expected - Performance is acceptable - Results display properly
Parameterized Queries
Datasets can pass parameters to ETL chains, enabling dynamic queries.
Defining Parameters
In the dataset configuration, define parameters: - Parameter name - Used in ETL chain - Parameter type - String, number, date, boolean - Default value - Used if user doesn’t specify - Required - Whether parameter must be provided
Using Parameters in ETL Chains
Parameters are available in the input record:
ETL Chain: "GetCustomerOrders"
1. JSON Record (receives parameters)
Input: {
"customerId": "C12345",
"startDate": "2026-01-01",
"endDate": "2026-03-31",
"status": "completed"
}
2. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"customerId": "${customerId}",
"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"},
"status": "${status}"}}]
3. string.Substitute
Field: aggregation
4. mongodb.Reader
5. Transform and return results
Parameter Validation
Validate parameters in the ETL chain:
1. JSON Record (receives parameters)
2. Validate Required Parameters
- Check customerId exists
- Check dates are valid
- If invalid: return error message
3. Validate Parameter Values
- Check date range is reasonable
- Check status is valid value
- If invalid: return error message
4. Continue with query if valid
Common Patterns
Pattern 1: Simple Data Retrieval
Dataset: "Active Customers"
ETL Chain: "GetActiveCustomers"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: customers
Aggregation: [{"$match": {"status": "active"}}]
2. mongodb.Reader
3. Sort by name
4. Return results
Pattern 2: Aggregated Metrics
Dataset: "Sales Summary"
ETL Chain: "GetSalesSummary"
1. MongoDB Reader (read orders)
2. Aggregate
- Group by: month
- Sum: orderTotal
- Count: orderCount
- Average: orderTotal
3. Calculate growth percentages
4. Return summary data
Pattern 3: Multi-Source Join
Dataset: "Customer Orders with Details"
ETL Chain: "GetCustomerOrderDetails"
1. MongoDB Reader (read orders)
2. For each order:
- Chain Call "GetCustomerInfo"
- Chain Call "GetProductDetails"
- Merge information
3. Return enriched results
Pattern 4: Filtered and Sorted
Dataset: "Top Products by Sales"
Parameters: region, timeframe
ETL Chain: "GetTopProducts"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"region": "${region}", "date": "${timeframe}"}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader
4. Aggregate by product
Sum: sales
5. Sort descending
6. Limit to top 10
7. Return results
Performance Optimization
Dataset ETL chains must be fast since users are waiting for results.
Optimize Database Queries
- Use specific queries to limit results at source
- Create indexes on frequently queried fields
- Use projections to retrieve only needed fields
- Avoid reading unnecessary data
Filter Early
Apply filters as early as possible:
Good:
1. MongoDB Reader with query filter
2. Additional filtering if needed
3. Transform
4. Return
Avoid:
1. MongoDB Reader (all data)
2. Multiple transformation steps
3. Filter at the end
4. Return
Limit Result Size
Datasets should return reasonable amounts of data: - Use LIMIT clauses - Implement pagination if needed - Aggregate when possible - Return summaries rather than details
Cache When Appropriate
For data that doesn’t change frequently: - Consider caching results - Set appropriate cache expiration - Invalidate cache when data changes - Balance freshness vs performance
Dashboard Integration
Dashboards use datasets to display visualizations.
Dashboard Widgets
Different widget types work with ETL-based datasets: - Tables - Display tabular data - Charts - Bar, line, pie charts - Metrics - Single value displays - Maps - Geographic visualizations
Real-Time Dashboards
For real-time dashboards: - Keep ETL chains very fast (< 1 second) - Use efficient queries - Consider pre-aggregated data - Implement caching where appropriate
Interactive Dashboards
Support user interaction: - Accept filter parameters - Support drill-down (pass parameters to detail views) - Enable date range selection - Allow sorting and grouping
Example: Sales Dashboard
Here’s a complete example of a dashboard with multiple datasets.
Dashboard: “Sales Overview”
Widget 1: Total Sales (Metric)
Dataset: "TotalSales"
Parameters: startDate, endDate
ETL Chain: "GetTotalSales"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader
4. Aggregate
Sum: orderTotal
5. Return: { totalSales: <sum> }
Widget 2: Sales by Region (Bar Chart)
Dataset: "SalesByRegion"
Parameters: startDate, endDate
ETL Chain: "GetSalesByRegion"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader
4. Aggregate by region
Sum: orderTotal
5. Sort by total descending
6. Return: [
{ region: "EMEA", sales: 150000 },
{ region: "AMER", sales: 120000 },
...
]
Widget 3: Top Products (Table)
Dataset: "TopProducts"
Parameters: startDate, endDate, limit
ETL Chain: "GetTopProducts"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader
4. Aggregate by product
Sum: orderTotal
Count: orderCount
5. Sort by total descending
6. Limit to ${limit} results
7. Return product list with sales data
Widget 4: Sales Trend (Line Chart)
Dataset: "SalesTrend"
Parameters: startDate, endDate, groupBy
ETL Chain: "GetSalesTrend"
1. mongodb.AggregationDefinition
Pool: your-pool-name
Collection: orders
Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
2. string.Substitute
Field: aggregation
3. mongodb.Reader
4. Aggregate by time period (${groupBy}: day/week/month)
Sum: orderTotal
5. Sort by date
6. Return: [
{ date: "2026-01-01", sales: 5000 },
{ date: "2026-01-02", sales: 5500 },
...
]
Best Practices
Design for Speed
Dataset ETL chains should execute quickly: - Target < 1 second for interactive use - Target < 5 seconds for complex queries - Optimize database queries - Use indexes appropriately - Limit result sizes
Handle Parameters Carefully
- Validate all parameters
- Provide sensible defaults
- Handle missing parameters gracefully
- Sanitize inputs to prevent injection
Return Consistent Formats
Return data in consistent, expected formats: - Use standard field names - Return arrays for lists - Return objects for single values - Include metadata when helpful
Test with Realistic Data
Test dataset chains with: - Realistic data volumes - Various parameter combinations - Edge cases (empty results, large results) - Performance under load
Document Expected Format
Document the expected output format: - Field names and types - Data structure - Units and formatting - Example output
Monitor Performance
Track dataset performance: - Execution time - Result sizes - Error rates - Usage patterns
Error Handling
Handle Missing Data
When data doesn’t exist: - Return empty array (not error) - Include helpful message - Log for monitoring
1. mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
(emits zero records if no match โ downstream steps receive no input for that record)
2. Check if results empty
If empty: return { data: [], message: "No data found for period" }
3. If data exists: continue processing
Handle Invalid Parameters
When parameters are invalid: - Return error with helpful message - Don’t execute expensive queries - Log invalid requests
1. Validate parameters
If invalid: return { error: "Invalid date range" }
2. If valid: continue
Handle Query Failures
When database queries fail: - Return error message - Log error details - Don’t expose sensitive information
Caching Strategies
When to Cache
Cache dataset results when: - Data changes infrequently - Queries are expensive - Same parameters requested often - Real-time data not required
Cache Invalidation
Invalidate cache when: - Underlying data changes - Cache expires (time-based) - User requests refresh - Data quality issues detected
Cache Configuration
Configure caching appropriately: - Set reasonable expiration times - Consider parameter variations - Monitor cache hit rates - Balance freshness vs performance
Example: Customer Analytics Dataset
Complete example of a dataset for customer analytics:
Dataset Configuration:
Name: Customer Analytics
Description: Customer purchase behavior and metrics
Data Source: ETL Chain
Chainset: Analytics
Chain: GetCustomerAnalytics
Parameters:
- customerId (required)
- startDate (default: 90 days ago)
- endDate (default: today)
Caching: 1 hour
ETL Chain: “GetCustomerAnalytics”
1. JSON Record (receive parameters)
Input: { customerId: "C12345", startDate: "...", endDate: "..." }
2. Validate Parameters
- Check customerId exists
- Check date range valid
- If invalid: return error
3. Get Customer Info
- mongodb.AggregationDefinition: Pool = your-pool-name, Collection = customers,
Aggregation = [{"$match": {"customerId": "${customerId}"}}]
- string.Substitute: Field = aggregation
- mongodb.Reader
4. Get Customer Orders
- mongodb.AggregationDefinition: Pool = your-pool-name, Collection = orders,
Aggregation = [{"$match": {"customerId": "${customerId}",
"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
- string.Substitute: Field = aggregation
- mongodb.Reader
5. Calculate Metrics
- Total orders
- Total spent
- Average order value
- Favorite products
- Order frequency
6. Get Customer Segment
- Chain Call "DetermineCustomerSegment"
- Based on purchase behavior
7. Format Results
- Combine customer info, metrics, segment
- Return structured data:
{
customer: { id, name, email, ... },
metrics: { totalOrders, totalSpent, avgOrderValue, ... },
segment: "High Value",
favoriteProducts: [...],
orderHistory: [...]
}
This dataset can be used in: - Customer detail pages - Analytics dashboards - Reporting tools - Customer service interfaces
Next Steps
- Scheduler Integration - Automate ETL chains with scheduling
- Workflow Integration - Use ETL in business processes
- Examples - More complete examples
- Best Practices - Design effective ETL chains