Building Your Own Data Mining Interface

<< Click to Display Table of Contents >>

Navigation:  Graphs, Pivots and Charts >

Building Your Own Data Mining Interface

by Garry Robinson

 

This article will introduce some basic concepts for exploring data stored in data tables.  This will cover the SQL aggregate query, the where clause and the MS Graph form object.  

Quick Overview

I will demonstrate  how the Orders Table in the Northwind database can be setup in a form for undertaking a drilldown.  This will include adding the fieldnames for the aggregate functions (sum, count, avg)  and the drilldown group by fields.  Also demonstrated will be a simple way to setup a where clause for the query that copes with the vagaries of Date, String and Numerical data types.

Once the components for the aggregate query are established, we then have to make a SQL statement that will aggregate our data.  The processes here must allow for the user selecting ANY allowable field and ANY suitable "where clause".    Eg the following SQL shows the total freight costs by Region once the user has selected USA as a country to drilldown on.

 

SELECT Orders.ShipRegion, Sum(Orders.Freight) AS SumFreight
FROM Orders
WHERE (((Orders.ShipCountry)='USA'))
GROUP BY Orders.ShipRegion;

 

The fun bit.  

 

Once you have established the SQL, you have to feed it to the Graph object.  This actually is a very simple end to the process because whenever you change the SQL property for the graph, the graph object will change to reflect the changed data.

 

 

Background

 

Tucked away in the bowels of Access is a powerful graphing tool called MS Graph.  A lot of Office users will be familiar with the graphing capabilities of Excel but very few people seem to have successfully implemented graphs of much use inside Access.  Both products use similar core elements from the Office charting engine.  The difference between the two is that with Excel, you derive your graph data by pointing the graph to cells in a spreadsheet.  This allows people to manually move the cells around until they end up with a graph that they are happy with.  In Access, you have to manipulate queries until you end up the graph that you want.  This is a lot more difficult but the end result is far more versatile as the Access graph will work irrespective of the number of rows of data that you introduce via the query.  The other important feature of Access that you will deploy for data mining is the direct support of Structured Query Language in Visual Basic code. This allows you to easily change the exploration queries on the fly.  

 

 

 

Customer        Order Date        Shipped Date        Freight        City                Postal Code        Country

 

VINET        04-Aug-94        16-Aug-94        $32.38                Reims                51100                France

TOMSP        05-Aug-94        10-Aug-94        $11.61                Münster        44087                Germany

HANAR        08-Aug-94        12-Aug-94        $65.83                Rio                05454                Brazil

 

Table 1: NORTHWIND ORDERS TABLE

 

 

 

Applying The SQL To MS Graph

 

To make a graph in Access, you need to build a particular SQL statement called a consolidation query and then apply that to an MS Graph object in an Access form.    The best way to read about the different filtering techniques in Access is to start up the Office Assistant (97) or Answer Wizard (95) and enter "Group By".   Then choose "Group By Clause" and read the information plus review See Also.

If we were to setup a consolidation query on the nwOrders table, i.e. to show the sum of freight costs by country after a certain date,  we would have the following  SQL query and sample result as shown in Figure 2.  This consolidation query is broken up into the following parts.

 

SELECT shipCountry,                         Group By Columns

Sum(Freight) AS totalFreight                 Aggregated values

FROM nwOrders                                Table/Query source

WHERE orderDate > #01-Jan-1996#                Where Filter

GROUP BY shipCountry                        Group By Columns

ORDER BY shipCountry;                        Order By Clause

 

 

Country        TotalFreight

 

UK               $1,517.31

USA              $6,214.44

Venezuela        $832.45

 

Table 2: CONSOLIDATION QUERY  -- A Consolidation Query shown into its constituent parts plus a sample listing derived from this query.  Refer to this figure for a reference on terms used in building these types of queries.

 

 

Introduction To Data Mining

 

Following are some exploratory data mining on the nwOrders table to illustrate the technique.   In this exercise, I was interested in finding out where the top Freight costs were in the Orders table.  So I started off with the query in Figure 2 which produced the following graph in an Access form

 

robig01c

Figure 3: TOTAL FREIGHT COSTS BY COUNTRY  -- A Microsoft Access 3D graph showing
  the Sum of Freight Costs by the country where those cost occurred

 

Now to review the freight costs in USA (which were the highest), I needed to drilldown on the country USA.  This produces the following graph

 

robig01d

Figure 4:  USA FREIGHT COSTS BY CITY  -  Drilldown on freight orders for the USA  and show
 the total freight costs by the city where those costs occurred

 

Now as the largest freight costs are in Boise, Idaho, we will look at the freight costs by month to see when the main costs occurred.  This produces figure 5 which gives a very clear indication that the highest costs occurred during May in 1995.  We have now pinpointed the time and location of the problem and Accounts can now look into the detail.

 

robig01e

Figure 5: MONTHLY COSTS FOR BOISE, IDAHO -- Total Freight Costs For Boise, Idaho
 shown by consolidating the data by month.

 

 

Your Data Mining Software  

 

Now we have seen how we can use consolidation queries and graphs, how can you explore your data with the tools in YourDrilldown+FX.mdb. When you open the YourDrilldown+FX.mdb database, you will see the form in Figure 6 to which I have added some colorful numbers to reference different controls on the  form and the software and properties behind those controls.

 

robig01f

Figure 6:  THE DATA MINING FORM -- The data mining form in quickGraf.mdb setup to produce the graf of freight costs by cities in the USA (see figure 4).

 

Whenever you add a new table or attached table to the database, you will need to update the data mining system tables.  The onclick event under this button runs a module called  makeDataMiningTables which refreshes the z_sysTables and z_sysColumns tables (see figure 7) that you will need in your database.  The source for this routine can be found module FX_DrillDownCode.

 

 

tabname        colpos        colname        coltype        groupby        sumby

 

NWOrders        1        CustomerID        Text                Yes                No

NWOrders        2        OrderDate        Date                Yes                No

NWOrders        3        ShippedDate        Date                Yes                No

NWOrders        4        ShipVia        Number                Yes                Yes

NWOrders        5        Freight        Number                Yes                Yes

 

Table 7 Shows the z_sysColumns table that is used in the data mining form to provide information on columns in the currently selected table.

 

 This tableReq combo box will show all the tables in the database that you can drilldown on.  The important properties for this control are

 

rowSourceType = Table/Query

RowSource = z_SysTables

 

The one exit event event will reset the other combo box controls to look at a newly selected table as follows

 

Me!field1.RowSource = Me!field1.RowSource

 

The Field1 controls that allow you to setup the aggregate equation in your consolidation query do not have any VB code but have the following important properties that draw on the z_SysColumns table show in figure 7.  As aggregate equations generally apply to numerical fields, only those fields in z_SysColumns where the SumBy flag is set to true are selected as rows in this combo box.

 

Me!Field1.rowsource = "SELECT DISTINCTROW z_SysColumns.colname, z_SysColumns.coltype FROM z_SysColumns 

WHERE (((z_SysColumns.sumby)=True) AND ((z_SysColumns.tabname)=[Forms]![FX_YourDrilldown]![tableReq]));"

 

The second field1 combo box called Aggreg1 displays a list of the valid aggregate equations such as Sum, Average, Count that can be used in the consolidation query.  The Name1 text box allows you to give the aggregate equation a useful name when it is displayed in a graph. You will change this manually.

The GroupBy field is used to select which field or function in the table that you wish to consolidate on.   In figure 6, we are showing the total results by the city where the freight was shipped from.  The important property in this field is the RowSource and in this case we limit the fields in the table to those that have the groupBy flag set to true.  This will basically preclude any fields such as memo fields and insert objects.

 

GroupBy.RowSource = SELECT DISTINCTROW z_SysColumns.colname, z_SysColumns.coltype FROM z_SysColumns 

WHERE (((z_SysColumns.groupby)=True) AND ((z_SysColumns.tabname)=[Forms]![FX_YourDrilldown]![tableReq])); 

 

A very powerful feature of Access queries (that may not be supported by SQL backend databases) is the ability to consolidate data using a function.  In figure 5, I showed data by month using the format function   Format([ShippedDate],"yyyy-mm").  Try inserting this and other Access Functions such as Year([shippedDate]) to group your data into segments not actually supported in the data itself.

To allow you to drilldown on your data, the DrillFld combo box will display a list of fields like the GroupBy combo box.   Choose one of these and then in the DrillVal combo choose a unique value that will be displayed for that field.     There are 2 important visual basic events under this control to firstly build the unique values list for the combo box record source and then when you exit the combo box to generate and display a where filter that will be used for the drilldowns.

 

Private Sub drillVal_Enter()

 

' Build a unique list of values from the drill

' field that has been selected

 

Dim drillsql As String

 

If Not IsNull(Me!tableReq) And Not  IsNull(Me!groupBy) Then

 

' Make a unique values RowSource SQL statement

  

  drillsql = "SELECT " & Me!drillFld  & " from " & Me!tableReq _

   & " where " & Me!drillFld & " is not null "   & " group by " & Me!drillFld & ";"

 

' Now set the rowSource to this new SQL

  Me!DrillVal.RowSource = drillsql

 

  

End If

 

End Sub

 

Private Sub drillVal_Exit(Cancel As Integer)

  

' Prepare a suitable where filter by

' enclosing the data within suitable characters

 

  Dim txtEnclose As String

  

  Select Case Me!drillFld.Column(1)

  

    Case "Text"

      txtEnclose = "'"

    

    Case "Date"

      txtEnclose = "#"

       

    Case Else    ' Number fields

      txtEnclose = ""

  End Select

  

  Me![txtFilter] = "(" & Me!drillFld & " = " _

  & txtEnclose & Me!DrillVal & txtEnclose & ")"

 

  

End Sub

 

  The txtFilter field serves as a place where the valid drilldown where filter is placed.  If you have any understanding of SQL where statements, you can manual add any valid filter into this field and it will be immediately applied to the graph.  An example of this would be to look at the results from both USA and Canada using the following manually entered where filter

(ShipCountry = 'USA') and (ShipCountry = 'Canada')

 

The cmdOpenGraf button is where all these different fields are brought together to make a valid SQL consolidation query.  This is then applied to the record source of the Form with the graph object to display the data.  The code that sits behind this button is

 

Private Sub cmdOpenGraf_Click()

 

'  Build a consolidation query based on the

'  selections in the quikGraf form

'  and then pass that query to the graph

'  object in the Access form so that we

 

'  can view the results.

 

Dim grafsql As String, sqlOK As Integer

Dim grafName As String

 

If Not IsNull(Me!tableReq) And _

 Not IsNull(Me!field1) And _

 Not IsNull(Me!groupBy) Then

 

' Consolidate By and

' Aggregrate equation for field 1

 

  grafsql = "SELECT " & Me!groupBy & ", " _

   & Me!aggreg1 & "(" & field1 & ") as " & name1

  

  If Not IsNull(Field2) And Not IsNull(aggreg2) Then

    

'   Aggregrate equation for 2

    grafsql = grafsql & ", " & Me!aggreg2 & _

 

     "(" & Me!Field2 & ") as " & Me!Name2

  End If

  

' Table To Be Viewed

  grafsql = grafsql & " from " & Me!tableReq

  

' Where filter for drilldown

  If Not IsNull(Me!txtFilter) Then

    grafsql = grafsql & " where " & Me!txtFilter

  End If

  

' Consolidate By

  grafsql = grafsql & " group by " & Me!groupBy & ";"

  

  sqlOK = MsgBox(grafsql, vbYesNo, _

   "Send This SQL to the Graph")

  

  If sqlOK = vbYes Then

  

'   Send it the sql to both the graph object

 

'   and the header field in the form

    grafName = "frmGraf3D"

    

    DoCmd.OpenForm grafName

    Forms(grafName)!Graph1.RowSource = grafsql

    Forms(grafName)!Main_Title.Caption = grafsql

  

  End If

Else

   

  MsgBox "Fill In Table, First, Consolidate Fields", _

   vbInformation, "Problem with The Graphing"

End If

 

 

End Sub

 

This event code will produce a valid sql statement that is displayed in a message box before you apply it to the graph object.   The SQL that made the graph in figure 5 is as follows

 

SELECT format([shippedDate], "yyyy-mm"), Sum(Freight) as totalFreight from NWOrders where (ShipCity = 'Boise') group by format([shippedDate], "yyyy-mm");

 

 

Summary

 

Microsoft Access has a very powerful query engine that can be used to analyse data through consolidation queries.   If you add simple where filters to these queries and then switch the fields that you consolidate the data on,  you can start to gain a good understanding of your valuable data.  All of this can be done manually using consolidation queries or even by driving the Microsoft graph wizard.  Expand on this approach by automating the creation of the consolidation queries using visual basic and then apply this SQL to your graphs so that you can quickly discover trends in your data that you did not know exist.   This really only scratches the surface of data mining but if it offers you an insight into your information that can make or save your organization money, then you may be encouraged to invest more resources into this rapidly expanding field of computing.

 

The Software Database

 

The supporting Access accdb database called quikGraf.accdb includes the following objects

 

NwOrders                The orders table from the Access 97 demonstration database  (see Fig 1)

ZWorldDemo                A trial table of 300 sales and budget results for practicing

z_sysTables                A table that will hold the data mining definitions of all tables in your database

z_sysColumns        A table that will hold the data mining definitions of all the field columns in the database.

FX_YourDrilldown        A form that allows you to data mine your data using filters.  This form includes code behind the form illustrated in this help file

FrmGraf3D                A form that displays the data mining results in graphical form

FX_DrillDownCode        A module that contains functions to fill the data mining system tables

 

The file name in the downloads from SWReg is called   quikGraf.accdb

Click here to find out how to purchase all the downloads