Dataset & Dashboard Integration

Overview

ETL chains can provide data for Datasets and Dashboards, enabling interactive data visualization and exploration. This integration allows you to: - Create parameterized data queries - Transform and aggregate data on-demand - Provide real-time data for dashboards - Support interactive filtering and drill-down

How It Works

When a Dataset or Dashboard requests data: 1. User interacts with dataset/dashboard (applies filters, selects parameters) 2. Dataset configuration specifies an ETL chain to execute 3. Parameters are passed to the ETL chain as input record 4. ETL chain executes, transforming and retrieving data 5. Results are returned to the dataset/dashboard 6. Data is displayed to the user

Creating a Dataset with ETL

Step 1: Create the ETL Chain

Design an ETL chain that: - Accepts parameters via input record - Queries and transforms data - Returns results in expected format

Example Chain: “GetSalesByRegion”

1. JSON Record (receives parameters)
   Input: { "region": "EMEA", "startDate": "2026-01-01", "endDate": "2026-03-31" }

2. mongodb.AggregationDefinition
   Pool: your-pool-name
   Collection: orders
   Aggregation: [{"$match": {"region": "${region}",
                             "orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]

3. string.Substitute
   Field: aggregation

4. mongodb.Reader

5. Aggregate by Product
   Group by: productCategory
   Sum: orderTotal
   Count: orderCount

6. Sort by Total Descending
   Sort field: orderTotal

7. Return results
   (Results automatically returned to dataset)

Step 2: Configure the Dataset

In the Dataset module: 1. Create a new dataset 2. Select “ETL” as the data source type 3. Choose the chainset and chain 4. Define parameters that users can set 5. Configure display options

Step 3: Test the Dataset

Test with different parameter values to ensure: - Data is retrieved correctly - Transformations work as expected - Performance is acceptable - Results display properly

Parameterized Queries

Datasets can pass parameters to ETL chains, enabling dynamic queries.

Defining Parameters

In the dataset configuration, define parameters: - Parameter name - Used in ETL chain - Parameter type - String, number, date, boolean - Default value - Used if user doesn’t specify - Required - Whether parameter must be provided

Using Parameters in ETL Chains

Parameters are available in the input record:

ETL Chain: "GetCustomerOrders"

1. JSON Record (receives parameters)
   Input: {
     "customerId": "C12345",
     "startDate": "2026-01-01",
     "endDate": "2026-03-31",
     "status": "completed"
   }

2. mongodb.AggregationDefinition
   Pool: your-pool-name
   Collection: orders
   Aggregation: [{"$match": {"customerId": "${customerId}",
                             "orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"},
                             "status": "${status}"}}]

3. string.Substitute
   Field: aggregation

4. mongodb.Reader

5. Transform and return results

Parameter Validation

Validate parameters in the ETL chain:

1. JSON Record (receives parameters)

2. Validate Required Parameters
   - Check customerId exists
   - Check dates are valid
   - If invalid: return error message

3. Validate Parameter Values
   - Check date range is reasonable
   - Check status is valid value
   - If invalid: return error message

4. Continue with query if valid

Common Patterns

Pattern 1: Simple Data Retrieval

Dataset: "Active Customers"
ETL Chain: "GetActiveCustomers"

1. mongodb.AggregationDefinition
   Pool: your-pool-name
   Collection: customers
   Aggregation: [{"$match": {"status": "active"}}]

2. mongodb.Reader

3. Sort by name

4. Return results

Pattern 2: Aggregated Metrics

Dataset: "Sales Summary"
ETL Chain: "GetSalesSummary"

1. MongoDB Reader (read orders)

2. Aggregate
   - Group by: month
   - Sum: orderTotal
   - Count: orderCount
   - Average: orderTotal

3. Calculate growth percentages

4. Return summary data

Pattern 3: Multi-Source Join

Dataset: "Customer Orders with Details"
ETL Chain: "GetCustomerOrderDetails"

1. MongoDB Reader (read orders)

2. For each order:
   - Chain Call "GetCustomerInfo"
   - Chain Call "GetProductDetails"
   - Merge information

3. Return enriched results

Pattern 4: Filtered and Sorted

Dataset: "Top Products by Sales"
Parameters: region, timeframe

ETL Chain: "GetTopProducts"

1. mongodb.AggregationDefinition
   Pool: your-pool-name
   Collection: orders
   Aggregation: [{"$match": {"region": "${region}", "date": "${timeframe}"}}]

2. string.Substitute
   Field: aggregation

3. mongodb.Reader

4. Aggregate by product
   Sum: sales

5. Sort descending

6. Limit to top 10

7. Return results

Performance Optimization

Dataset ETL chains must be fast since users are waiting for results.

Optimize Database Queries

  • Use specific queries to limit results at source
  • Create indexes on frequently queried fields
  • Use projections to retrieve only needed fields
  • Avoid reading unnecessary data

Filter Early

Apply filters as early as possible:

Good:
1. MongoDB Reader with query filter
2. Additional filtering if needed
3. Transform
4. Return

Avoid:
1. MongoDB Reader (all data)
2. Multiple transformation steps
3. Filter at the end
4. Return

Limit Result Size

Datasets should return reasonable amounts of data: - Use LIMIT clauses - Implement pagination if needed - Aggregate when possible - Return summaries rather than details

Cache When Appropriate

For data that doesn’t change frequently: - Consider caching results - Set appropriate cache expiration - Invalidate cache when data changes - Balance freshness vs performance

Dashboard Integration

Dashboards use datasets to display visualizations.

Dashboard Widgets

Different widget types work with ETL-based datasets: - Tables - Display tabular data - Charts - Bar, line, pie charts - Metrics - Single value displays - Maps - Geographic visualizations

Real-Time Dashboards

For real-time dashboards: - Keep ETL chains very fast (< 1 second) - Use efficient queries - Consider pre-aggregated data - Implement caching where appropriate

Interactive Dashboards

Support user interaction: - Accept filter parameters - Support drill-down (pass parameters to detail views) - Enable date range selection - Allow sorting and grouping

Example: Sales Dashboard

Here’s a complete example of a dashboard with multiple datasets.

Dashboard: “Sales Overview”

Widget 1: Total Sales (Metric)

Dataset: "TotalSales"
Parameters: startDate, endDate

ETL Chain: "GetTotalSales"
  1. mongodb.AggregationDefinition
     Pool: your-pool-name
     Collection: orders
     Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
  
  2. string.Substitute
     Field: aggregation
  
  3. mongodb.Reader
  
  4. Aggregate
     Sum: orderTotal
  
  5. Return: { totalSales: <sum> }

Widget 2: Sales by Region (Bar Chart)

Dataset: "SalesByRegion"
Parameters: startDate, endDate

ETL Chain: "GetSalesByRegion"
  1. mongodb.AggregationDefinition
     Pool: your-pool-name
     Collection: orders
     Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
  
  2. string.Substitute
     Field: aggregation
  
  3. mongodb.Reader
  
  4. Aggregate by region
     Sum: orderTotal
  
  5. Sort by total descending
  
  6. Return: [
       { region: "EMEA", sales: 150000 },
       { region: "AMER", sales: 120000 },
       ...
     ]

Widget 3: Top Products (Table)

Dataset: "TopProducts"
Parameters: startDate, endDate, limit

ETL Chain: "GetTopProducts"
  1. mongodb.AggregationDefinition
     Pool: your-pool-name
     Collection: orders
     Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
  
  2. string.Substitute
     Field: aggregation
  
  3. mongodb.Reader
  
  4. Aggregate by product
     Sum: orderTotal
     Count: orderCount
  
  5. Sort by total descending
  
  6. Limit to ${limit} results
  
  7. Return product list with sales data

Widget 4: Sales Trend (Line Chart)

Dataset: "SalesTrend"
Parameters: startDate, endDate, groupBy

ETL Chain: "GetSalesTrend"
  1. mongodb.AggregationDefinition
     Pool: your-pool-name
     Collection: orders
     Aggregation: [{"$match": {"orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
  
  2. string.Substitute
     Field: aggregation
  
  3. mongodb.Reader
  
  4. Aggregate by time period (${groupBy}: day/week/month)
     Sum: orderTotal
  
  5. Sort by date
  
  6. Return: [
       { date: "2026-01-01", sales: 5000 },
       { date: "2026-01-02", sales: 5500 },
       ...
     ]

Best Practices

Design for Speed

Dataset ETL chains should execute quickly: - Target < 1 second for interactive use - Target < 5 seconds for complex queries - Optimize database queries - Use indexes appropriately - Limit result sizes

Handle Parameters Carefully

  • Validate all parameters
  • Provide sensible defaults
  • Handle missing parameters gracefully
  • Sanitize inputs to prevent injection

Return Consistent Formats

Return data in consistent, expected formats: - Use standard field names - Return arrays for lists - Return objects for single values - Include metadata when helpful

Test with Realistic Data

Test dataset chains with: - Realistic data volumes - Various parameter combinations - Edge cases (empty results, large results) - Performance under load

Document Expected Format

Document the expected output format: - Field names and types - Data structure - Units and formatting - Example output

Monitor Performance

Track dataset performance: - Execution time - Result sizes - Error rates - Usage patterns

Error Handling

Handle Missing Data

When data doesn’t exist: - Return empty array (not error) - Include helpful message - Log for monitoring

1. mongodb.AggregationDefinition / string.Substitute / mongodb.Reader
   (emits zero records if no match โ€” downstream steps receive no input for that record)

2. Check if results empty
   If empty: return { data: [], message: "No data found for period" }

3. If data exists: continue processing

Handle Invalid Parameters

When parameters are invalid: - Return error with helpful message - Don’t execute expensive queries - Log invalid requests

1. Validate parameters
   If invalid: return { error: "Invalid date range" }

2. If valid: continue

Handle Query Failures

When database queries fail: - Return error message - Log error details - Don’t expose sensitive information

Caching Strategies

When to Cache

Cache dataset results when: - Data changes infrequently - Queries are expensive - Same parameters requested often - Real-time data not required

Cache Invalidation

Invalidate cache when: - Underlying data changes - Cache expires (time-based) - User requests refresh - Data quality issues detected

Cache Configuration

Configure caching appropriately: - Set reasonable expiration times - Consider parameter variations - Monitor cache hit rates - Balance freshness vs performance

Example: Customer Analytics Dataset

Complete example of a dataset for customer analytics:

Dataset Configuration:

Name: Customer Analytics
Description: Customer purchase behavior and metrics
Data Source: ETL Chain
Chainset: Analytics
Chain: GetCustomerAnalytics
Parameters:
  - customerId (required)
  - startDate (default: 90 days ago)
  - endDate (default: today)
Caching: 1 hour

ETL Chain: “GetCustomerAnalytics”

1. JSON Record (receive parameters)
   Input: { customerId: "C12345", startDate: "...", endDate: "..." }

2. Validate Parameters
   - Check customerId exists
   - Check date range valid
   - If invalid: return error

3. Get Customer Info
   - mongodb.AggregationDefinition: Pool = your-pool-name, Collection = customers,
     Aggregation = [{"$match": {"customerId": "${customerId}"}}]
   - string.Substitute: Field = aggregation
   - mongodb.Reader

4. Get Customer Orders
   - mongodb.AggregationDefinition: Pool = your-pool-name, Collection = orders,
     Aggregation = [{"$match": {"customerId": "${customerId}",
                                "orderDate": {"$gte": "${startDate}", "$lte": "${endDate}"}}}]
   - string.Substitute: Field = aggregation
   - mongodb.Reader

5. Calculate Metrics
   - Total orders
   - Total spent
   - Average order value
   - Favorite products
   - Order frequency

6. Get Customer Segment
   - Chain Call "DetermineCustomerSegment"
   - Based on purchase behavior

7. Format Results
   - Combine customer info, metrics, segment
   - Return structured data:
     {
       customer: { id, name, email, ... },
       metrics: { totalOrders, totalSpent, avgOrderValue, ... },
       segment: "High Value",
       favoriteProducts: [...],
       orderHistory: [...]
     }

This dataset can be used in: - Customer detail pages - Analytics dashboards - Reporting tools - Customer service interfaces

Next Steps