Case Study

Case Study
Prev	Chapter 4. Composite DataSource	Next

In this case study, we will use the Sales, Customer and Stores tables of the Mondrian Database in conjunction with a variety of processors and generate the output into different file formats. The data manipulations we will illustrate are extraction, merging, filtering, derivation, caching and transformation.

Before we begin, you should ensure the Mondrian datasource is configured as described in the section called “Using the JDBC/ODBC bridge driver”.

Adding the DataSources

Launch Elixir Repertoire. Choose or create a new file system or folder for this case study and from the popup menu choose Add->Datasource. In the DataSource Wizard that appears select the JDBC DataSource. Click Next.

In the Define JDBC DataSource screen, enter the DataSource name as Store. Select the JDBC/ODBC bridge(Sun JVM) as the driver specification. Enter the URL as jdbc:odbc:MondrianFoodMart. Click Next.

In the SQL window, add the following query:

Select * from Store

Click Next and then click Infer Schema. Similarly, add the Sales data source with SQL:

Select * from Sales_Fact_1997

Again, infer the schema. Similarly, add a Customer data source. In the SQL window enter:

Select * from Customer

and again infer the schema.

Add a Composite DataSource named Case Study.

Creating a Composite DataSource

After adding the Composite DataSource, it will open automatically. We are going to create the diagram as shown in Figure 4.72, “Case Study Composite Diagram”. Select the Customer DataSource and drag and place it on the diagram. Repeat the process for the Sales DataSource and then the Store DataSource.

Figure 4.72. Case Study Composite Diagram

Add the additional processors and connections as shown in Figure 4.72, “Case Study Composite Diagram”. In your version, you will notice the link from Cube to DataStore is a dashed line because the tool cannot identify the schema for this flow until the cache has a schema inferred. With the diagram created, we can walk through the flow and set the various processor properties.

Join 1

In the Options tab window enter If no matching secondary: Discard record. If multiple matching secondary: Repeat primary for every secondary.
Change to the secondary tab and select the customer_id field in the "primary" column against the customer_id field of the secondary data source.

This configuration corresponds to an inner join, so records from the primary are only retained if a matching secondary record exists. In this case, only customers that have made purchases (have sales records) will be retained.

Filter

In Click Filter#1 select Equals from the When condition column corresponding to the store_country field. Enter condition as USA.

The output from this part of the flow will only contain records where the store_country field has the value USA.

Derivative

Select the Derived tab and click Add Column to invoke the Add Column dialog box.
Enter name as grocery_meat_sum. Select the Data Type as Integer. Enter value as given below:
```
grocery+meat
```
Click OK to add the column to the wizard.

We have defined a new column grocery_meat_sum which contains the sum of grocery and meat fields.

Join 2

In the Options tab window enter If no matching secondary: Discard record and If multiple matching secondary: Repeat primary for each secondary.
Choose the Secondary tab and select the store_id field in the "primary" column against the store_id field of the secondary data source. Click Finish.

Again we have used an inner join so records are only fetched based on a match with the secondary input. As the secondary input filters out all countries except USA, the Join will discard all non-US customer sales from the primary datasource.

Cube

In the Cube Hierarchies screen, click Add.
The Add Hierarchy dialog box pops up. Enter name as Location. Select store_country from the Schema column and click the > button. store_country is added to the Hierarchy Elements list. Similarly, add the store_state and store_city to the Hierarchy Elements list. This creates a three-level hierarchy.
Click Ok to add the hierarchy column to the Wizard. Click Next.
On the Cube Axes screen, select gender and click the > button to add the field to the Column Dimensions list box. Select Location and click the > button to add the field to the Row Dimensions list box. Click Next.
On the Cube Measures screen, click Add. The Add Measure dialog box pops up. Select store_sales and the function Sum, and leave the Pattern blank. Click Ok to add the measure to the Cube.
Similarly, add a measure for Average(store_sales). Click Next.
The next screen allows cube options to be configured. We will leave the options at their default values, so click next again and infer the schema. This operation may take a while as it analyzes hundreds of thousands of records. If you have previously executed the flow, then the cache will have already been created and the inference will be faster.

Notice that after you inferred the schema, the flow connector from the Cube to the DataStore has changed from a dashed line to a solid line, indicating that a schema is now defined for this link. You can test the process by selecting the Cube processor and choosing View Cube from the popup menu. The output is displayed as shown in Figure 4.73, “View Cube Output”.

The sum and average of sales for the male and female customers for the different cities belonging to specific States of USA are displayed.

Figure 4.73. View Cube Output

DataStore

You can choose various kinds of output for the records that have been processed. For this walkthrough, we have chosen XML, MySQL and Oracle.

XML: Invoke the DataStore Wizard and set the following properties:

Select XML file from the Type combo box. Click Next.
Enter the URL as file:/C:/Output.xml (choose an appropriate location for your operating system) and click Finish.

Select the DataStore, and select Generate from the popup menu. The XML file will be generated and saved in the specified location.

MySQL: Before generating the MySQL JDBC DataStore, the MySQL driver file must be copied to the /Public/lib folder in the Repository.

Invoke the DataStore Wizard and set the following properties:

Select JDBC from the Type combo box. Click Next.
Select MySQL in the suggestions combo box.
On selecting the Driver Suggestion, the Driver class name and the URL are automatically displayed in the corresponding text boxes.
Enter the URL as jdbc:mysql://localhost:3306/test. Here, test is the dbname, localhost can be replaced by the IP address of MySQL server if it is not running on the same machine and 3306 is the port number.
Enter DataOutput as the table name.

Select the DataStore and choose Generate from the popup menu. The MySQL JDBC DataStore is generated and saved in the specified location.

Oracle: Before generating the Oracle JDBC DataStore, the Oracle driver file must be copied to the /Public/lib/ folder in the Repository. This ensures that the Oracle driver is loaded into the class path when the tool is launched.

Invoke the DataStore Wizard and set the following properties:

Select JDBC from the Type combo box. Click Next.
Select Oracle in the suggestions combo box.
On selecting the Driver Suggestion, the Driver class name and the URL are automatically displayed in the corresponding text boxes.
The URL is jdbc:oracle:thin:@localhost:1521:ELX. Here, localhost should be replaced by the IP address of the Oracle server if it is not running on the same machine. The number 1521 is the port number and ELX is the database name.
Enter DataOutput as the table name.

Select the DataStore and choose Generate from the popup menu. The Oracle JDBC DataStore is generated and saved in the specific location.

Prev	Up	Next
Composite JavaScript	Home	Chapter 5. Excel DataSource