In this case study, we will use the Sales, Customer and Stores tables of the Mondrian Database in conjunction with a variety of processors and generate the output into different file formats. The data manipulations we will illustrate are extraction, merging, filtering, derivation, caching and transformation.
Before we begin, you should ensure the Mondrian datasource is configured as described in the section called “Using the JDBC/ODBC bridge driver”.
Adding the DataSources
Launch Elixir Repertoire. Choose or create a new file system or folder for this case study and from the popup menu choose Add->Datasource. In the DataSource Wizard that appears select the JDBC DataSource. Click Next.
In the Define JDBC DataSource screen, enter the DataSource name as
Store. Select the JDBC/ODBC bridge(Sun JVM) as the driver
specification. Enter the URL as jdbc:odbc:MondrianFoodMart
. Click Next.
In the SQL window, add the following query:
Select *
from Store
Click Next and then click Infer Schema. Similarly, add the Sales data source with SQL:
Select * from Sales_Fact_1997
Again, infer the schema. Similarly, add a Customer data source. In the SQL window enter:
Select * from Customer
and again infer the schema.
Add a Composite DataSource named Case Study
.
Creating a Composite DataSource
After adding the Composite DataSource, it will open automatically. We are going to create the diagram as shown in Figure 4.70, “Case Study Composite Diagram”. Select the Customer DataSource and drag and place it on the diagram. Repeat the process for the Sales DataSource and then the Store DataSource.
Add the additional processors and connections as shown in Figure 4.70, “Case Study Composite Diagram”. In your version, you will notice the link from Cube to DataStore is a dashed line because the tool cannot identify the schema for this flow until the cache has a schema inferred. With the diagram created, we can walk through the flow and set the various processor properties.
Join 1
In the Options tab window enter If no matching secondary: Discard record.
If multiple matching secondary: Repeat primary for every secondary
.
Change to the secondary tab and select the customer_id
field in the "primary" column
against the customer_id
field of the secondary data source.
This configuration corresponds to an inner join, so records from the primary are only retained if a matching secondary record exists. In this case, only customers that have made purchases (have sales records) will be retained.
Filter
In Click Filter#1
select Equals
from the When
condition
column corresponding to the store_country
field. Enter condition
as USA
.
The output from this part of the flow will only contain records where the
store_country
field has the value USA
.
Derivative
Select the Derived tab and click Add Column
to invoke the Add Column
dialog box.
Enter name as grocery_meat_sum
. Select the Data Type as Integer
.
Enter value as given below:
grocery+meat
Click OK to add the column to the wizard.
We have defined a new column grocery_meat_sum
which contains the sum
of grocery and meat fields.
Join 2
In the Options tab window enter If no matching secondary: Discard record
and If multiple matching secondary: Repeat primary for each secondary
.
Choose the Secondary tab and select the store_id
field in
the "primary" column against the store_id
field of the
secondary data source. Click Finish.
Again we have used an inner join so records are only fetched based on a match with the secondary input. As the secondary input filters out all countries except USA, the Join will discard all non-US customer sales from the primary datasource.
Cube
In the Cube Hierarchies screen, click Add.
The Add Hierarchy
dialog box pops up. Enter name as
Location. Select store_country
from the Schema column and click
the > button. store_country
is added to the Hierarchy
Elements
list. Similarly, add the store_state
and store_city
to the
Hierarchy Elements
list. This creates a three-level hierarchy.
Click Ok to add the hierarchy column to the Wizard. Click Next.
On the Cube Axes screen, select gender
and click the
> button to add the field to the Column Dimensions
list
box. Select Location and click the > button to add the field
to the Row Dimensions
list box. Click Next.
On the Cube Measures screen, click Add.
The Add Measure
dialog box pops up. Select store_sales
and the function Sum
, and leave the Pattern blank. Click Ok to
add the measure to the Cube.
Similarly, add a measure for Average(store_sales)
.
Click Next.
The next screen allows cube options to be configured. We will leave the options at their default values, so click next again and infer the schema. This operation may take a while as it analyzes hundreds of thousands of records. If you have previously executed the flow, then the cache will have already been created and the inference will be faster.
Notice that after you inferred the schema, the flow connector from the Cube to the DataStore has changed from a dashed line to a solid line, indicating that a schema is now defined for this link. You can test the process by selecting the Cube processor and choosing View Cube from the popup menu. The output is displayed as shown in Figure 4.71, “View Cube Output”.
The sum and average of sales for the male and female customers for the different cities belonging to specific States of USA are displayed.
DataStore
You can choose various kinds of output for the records that have been processed. For this walkthrough, we have chosen XML, MySQL and Oracle.
XML: Invoke the DataStore Wizard and set the following properties:
Select XML file
from the Type combo box. Click Next.
Enter the URL as file:/C:/Output.xml
(choose an appropriate location for your
operating system) and click Finish.
Select the DataStore, and select Generate from the popup menu. The XML file will be generated and saved in the specified location.
MySQL: Before generating the MySQL JDBC DataStore, the MySQL driver file
must be copied to the /Public/lib
folder in the Repository.
Invoke the DataStore Wizard and set the following properties:
Select JDBC
from the Type combo box. Click Next.
Select MySQL
in the suggestions combo box.
On selecting the Driver Suggestion, the Driver class name and the URL are automatically displayed in the corresponding text boxes.
Enter the URL as
jdbc:mysql://localhost:3306/test
. Here, test
is the
dbname, localhost
can be replaced by the IP address of MySQL server if it
is not running on the same machine and
3306
is the port number.
Enter DataOutput
as the table name.
Select the DataStore and choose Generate from the popup menu. The MySQL JDBC DataStore is generated and saved in the specified location.
Oracle: Before generating the
Oracle JDBC DataStore, the Oracle driver file must be copied to
the /Public/lib/
folder in the Repository. This ensures that the Oracle driver is loaded into
the class path when the tool is launched.
Invoke the DataStore Wizard and set the following properties:
Select JDBC from the Type combo box. Click Next.
Select Oracle in the suggestions combo box.
On selecting the Driver Suggestion, the Driver class name and the URL are automatically displayed in the corresponding text boxes.
The URL is jdbc:oracle:thin:@localhost:1521:ELX
. Here,
localhost
should be replaced by the IP address of the Oracle server
if it is not running on the same machine. The number 1521
is
the port number and ELX
is the database name.
Enter DataOutput
as the table name.
Select the DataStore and choose Generate from the popup menu. The Oracle JDBC DataStore is generated and saved in the specific location.