Building Your Own Data Miner

<< Click to Display Table of Contents >>

Navigation:  Reports and Graphs >

Building Your Own Data Miner

by Garry Robinson


This article will introduce some basic concepts for exploring data stored in data tables.  This will cover the SQL aggregate query, the where clause and the MS Graph form object.  

Quick Overview

I will demonstrate  how the Orders Table in the Northwind database can be setup in a form for undertaking a drilldown.  This will include adding the fieldnames for the aggregate functions (sum, count, avg)  and the drilldown group by fields.  Also demonstrated will be a simple way to setup a where clause for the query that copes with the vagaries of Date, String and Numerical data types.

Once the components for the aggregate query are established, we then have to make a SQL statement that will aggregate our data.  The processes here must allow for the user selecting ANY allowable field and ANY suitable "where clause".    Eg the following SQL shows the total freight costs by Region once the user has selected USA as a country to drilldown on.


SELECT Orders.ShipRegion, Sum(Orders.Freight) AS SumFreight
FROM Orders
WHERE (((Orders.ShipCountry)='USA'))
GROUP BY Orders.ShipRegion;


The fun bit.  


Once you have established the SQL, you have to feed it to the Graph object.  This actually is a very simple end to the process because whenever you change the SQL property for the graph, the graph object will change to reflect the changed data.





Tucked away in the bowels of Access is a powerful graphing tool called MS Graph.  A lot of Office users will be familiar with the graphing capabilities of Excel but very few people seem to have successfully implemented graphs of much use inside Access.  Both products use similar core elements from the Office charting engine.  The difference between the two is that with Excel, you derive your graph data by pointing the graph to cells in a spreadsheet.  This allows people to manually move the cells around until they end up with a graph that they are happy with.  In Access, you have to manipulate queries until you end up the graph that you want.  This is a lot more difficult but the end result is far more versatile as the Access graph will work irrespective of the number of rows of data that you introduce via the query.  The other important feature of Access that you will deploy for data mining is the direct support of Structured Query Language in Visual Basic code. This allows you to easily change the exploration queries on the fly.  




Customer        Order Date        Shipped Date        Freight        City                Postal Code        Country


VINET        04-Aug-94        16-Aug-94        $32.38                Reims                51100                France

TOMSP        05-Aug-94        10-Aug-94        $11.61                Münster        44087                Germany

HANAR        08-Aug-94        12-Aug-94        $65.83                Rio                05454                Brazil






Applying The SQL To MS Graph


To make a graph in Access, you need to build a particular SQL statement called a consolidation query and then apply that to an MS Graph object in an Access form.    The best way to read about the different filtering techniques in Access is to start up the Office Assistant (97) or Answer Wizard (95) and enter "Group By".   Then choose "Group By Clause" and read the information plus review See Also.

If we were to setup a consolidation query on the nwOrders table, i.e. to show the sum of freight costs by country after a certain date,  we would have the following  SQL query and sample result as shown in Figure 2.  This consolidation query is broken up into the following parts.


SELECT shipCountry,                        Group By Columns

Sum(Freight) AS totalFreight                Aggregated values

FROM nwOrders                                Table/Query source

WHERE orderDate > #01-Jan-1996#                Where Filter

GROUP BY shipCountry                        Group By Columns

ORDER BY shipCountry;                        Order By Clause



Country        TotalFreight


UK              $1,517.31

USA            $6,214.44

Venezuela        $832.45


Table 2: CONSOLIDATION QUERY  -- A Consolidation Query shown into its constituent parts plus a sample listing derived from this query.  Refer to this figure for a reference on terms used in building these types of queries.



Introduction To Data Mining


Following are some exploratory data mining on the nwOrders table to illustrate the technique.   In this exercise, I was interested in finding out where the top Freight costs were in the Orders table.  So I started off with the query in Figure 2 which produced the following graph in an Access form



Figure 3: TOTAL FREIGHT COSTS BY COUNTRY  -- A Microsoft Access 3D graph showing
  the Sum of Freight Costs by the country where those cost occurred


Now to review the freight costs in USA (which were the highest), I needed to drilldown on the country USA.  This produces the following graph



Figure 4:  USA FREIGHT COSTS BY CITY  -  Drilldown on freight orders for the USA  and show
 the total freight costs by the city where those costs occurred


Now as the largest freight costs are in Boise, Idaho, we will look at the freight costs by month to see when the main costs occurred.  This produces figure 5 which gives a very clear indication that the highest costs occurred during May in 1995.  We have now pinpointed the time and location of the problem and Accounts can now look into the detail.



Figure 5: MONTHLY COSTS FOR BOISE, IDAHO -- Total Freight Costs For Boise, Idaho
 shown by consolidating the data by month.



Your Data Mining Software  


Now we have seen how we can use consolidation queries and graphs, how can you explore your data with the tools in YourDrilldown+FX.mdb. When you open the YourDrilldown+FX.mdb database, you will see the form in Figure 6 to which I have added some colorful numbers to reference different controls on the  form and the software and properties behind those controls.



Figure 6:  THE DATA MINING FORM -- The data mining form in quickGraf.mdb setup to produce the graf of freight costs by cities in the USA (see figure 4).


Whenever you add a new table or attached table to the database, you will need to update the data mining system tables.  The onclick event under this button runs a module called  makeDataMiningTables which refreshes the z_sysTables and z_sysColumns tables (see figure 7) that you will need in your database.  The source for this routine can be found module FX_DrillDownCode.



tabname        colpos        colname        coltype        groupby        sumby


NWOrders        1        CustomerID        Text                Yes                No

NWOrders        2        OrderDate        Date                Yes                No

NWOrders        3        ShippedDate        Date                Yes                No

NWOrders        4        ShipVia        Number                Yes                Yes

NWOrders        5        Freight        Number                Yes                Yes


Table 7 Shows the z_sysColumns table that is used in the data mining form to provide information on columns in the currently selected table.


 This tableReq combo box will show all the tables in the database that you can drilldown on.  The important properties for this control are


rowSourceType = Table/Query

RowSource = z_SysTables


The one exit event event will reset the other combo box controls to look at a newly selected table as follows


Me!field1.RowSource = Me!field1.RowSource


The Field1 controls that allow you to setup the aggregate equation in your consolidation query do not have any VB code but have the following important properties that draw on the z_SysColumns table show in figure 7.  As aggregate equations generally apply to numerical fields, only those fields in z_SysColumns where the SumBy flag is set to true are selected as rows in this combo box.


Me!Field1.rowsource = "SELECT DISTINCTROW z_SysColumns.colname, z_SysColumns.coltype FROM z_SysColumns

WHERE (((z_SysColumns.sumby)=True) AND ((z_SysColumns.tabname)=[Forms]![FX_YourDrilldown]![tableReq]));"


The second field1 combo box called Aggreg1 displays a list of the valid aggregate equations such as Sum, Average, Count that can be used in the consolidation query.  The Name1 text box allows you to give the aggregate equation a useful name when it is displayed in a graph. You will change this manually.

The GroupBy field is used to select which field or function in the table that you wish to consolidate on.   In figure 6, we are showing the total results by the city where the freight was shipped from.  The important property in this field is the RowSource and in this case we limit the fields in the table to those that have the groupBy flag set to true.  This will basically preclude any fields such as memo fields and insert objects.


GroupBy.RowSource = SELECT DISTINCTROW z_SysColumns.colname, z_SysColumns.coltype FROM z_SysColumns

WHERE (((z_SysColumns.groupby)=True) AND ((z_SysColumns.tabname)=[Forms]![FX_YourDrilldown]![tableReq]));


A very powerful feature of Access queries (that may not be supported by SQL backend databases) is the ability to consolidate data using a function.  In figure 5, I showed data by month using the format function   Format([ShippedDate],"yyyy-mm").  Try inserting this and other Access Functions such as Year([shippedDate]) to group your data into segments not actually supported in the data itself.

To allow you to drilldown on your data, the DrillFld combo box will display a list of fields like the GroupBy combo box.   Choose one of these and then in the DrillVal combo choose a unique value that will be displayed for that field.     There are 2 important visual basic events under this control to firstly build the unique values list for the combo box record source and then when you exit the combo box to generate and display a where filter that will be used for the drilldowns.


Private Sub drillVal_Enter()


' Build a unique list of values from the drill

' field that has been selected


Dim drillsql As String


If Not IsNull(Me!tableReq) And Not  IsNull(Me!groupBy) Then


' Make a unique values RowSource SQL statement


 drillsql = "SELECT " & Me!drillFld  & " from " & Me!tableReq _

  & " where " & Me!drillFld & " is not null "   & " group by " & Me!drillFld & ";"


' Now set the rowSource to this new SQL

 Me!DrillVal.RowSource = drillsql



End If


End Sub


Private Sub drillVal_Exit(Cancel As Integer)


' Prepare a suitable where filter by

' enclosing the data within suitable characters


 Dim txtEnclose As String


 Select Case Me!drillFld.Column(1)


   Case "Text"

     txtEnclose = "'"


   Case "Date"

     txtEnclose = "#"


   Case Else    ' Number fields

     txtEnclose = ""

 End Select


 Me![txtFilter] = "(" & Me!drillFld & " = " _

 & txtEnclose & Me!DrillVal & txtEnclose & ")"



End Sub


  The txtFilter field serves as a place where the valid drilldown where filter is placed.  If you have any understanding of SQL where statements, you can manual add any valid filter into this field and it will be immediately applied to the graph.  An example of this would be to look at the results from both USA and Canada using the following manually entered where filter

(ShipCountry = 'USA') and (ShipCountry = 'Canada')


The cmdOpenGraf button is where all these different fields are brought together to make a valid SQL consolidation query.  This is then applied to the record source of the Form with the graph object to display the data.  The code that sits behind this button is


Private Sub cmdOpenGraf_Click()


'  Build a consolidation query based on the

'  selections in the quikGraf form

'  and then pass that query to the graph

'  object in the Access form so that we


'  can view the results.


Dim grafsql As String, sqlOK As Integer

Dim grafName As String


If Not IsNull(Me!tableReq) And _

Not IsNull(Me!field1) And _

Not IsNull(Me!groupBy) Then


' Consolidate By and

' Aggregrate equation for field 1


 grafsql = "SELECT " & Me!groupBy & ", " _

  & Me!aggreg1 & "(" & field1 & ") as " & name1


 If Not IsNull(Field2) And Not IsNull(aggreg2) Then


'   Aggregrate equation for 2

   grafsql = grafsql & ", " & Me!aggreg2 & _


    "(" & Me!Field2 & ") as " & Me!Name2

 End If


' Table To Be Viewed

 grafsql = grafsql & " from " & Me!tableReq


' Where filter for drilldown

 If Not IsNull(Me!txtFilter) Then

   grafsql = grafsql & " where " & Me!txtFilter

 End If


' Consolidate By

 grafsql = grafsql & " group by " & Me!groupBy & ";"


 sqlOK = MsgBox(grafsql, vbYesNo, _

  "Send This SQL to the Graph")


 If sqlOK = vbYes Then


'   Send it the sql to both the graph object


'   and the header field in the form

   grafName = "frmGraf3D"


   DoCmd.OpenForm grafName

   Forms(grafName)!Graph1.RowSource = grafsql

   Forms(grafName)!Main_Title.Caption = grafsql


 End If



 MsgBox "Fill In Table, First, Consolidate Fields", _

  vbInformation, "Problem with The Graphing"

End If



End Sub


This event code will produce a valid sql statement that is displayed in a message box before you apply it to the graph object.   The SQL that made the graph in figure 5 is as follows


SELECT format([shippedDate], "yyyy-mm"), Sum(Freight) as totalFreight from NWOrders where (ShipCity = 'Boise') group by format([shippedDate], "yyyy-mm");





Microsoft Access has a very powerful query engine that can be used to analyse data through consolidation queries.   If you add simple where filters to these queries and then switch the fields that you consolidate the data on,  you can start to gain a good understanding of your valuable data.  All of this can be done manually using consolidation queries or even by driving the Microsoft graph wizard.  Expand on this approach by automating the creation of the consolidation queries using visual basic and then apply this SQL to your graphs so that you can quickly discover trends in your data that you did not know exist.   This really only scratches the surface of data mining but if it offers you an insight into your information that can make or save your organization money, then you may be encouraged to invest more resources into this rapidly expanding field of computing.


The Software Database


The supporting Access accdb database includes the following objects


NwOrders        ..  The orders table from the Access 97 demonstration database  (see Fig 1)

ZWorldDemo  ..  A trial table of 300 sales and budget results for practicing

z_sysTables ..  A table that will hold the data mining definitions of all tables in your database

z_sysColumns  ..  A table that will hold the data mining definitions of all the field columns in the database.

FX_YourDrilldown  ..   A form that allows you to data mine your data using filters.  This form includes code behind the form illustrated in this help file

FrmGraf3D  ..  A form that displays the data mining results in graphical form

FX_DrillDownCode  ..  A module that contains functions to fill the data mining system tables



download Your download file is Robinson_QuikGraf.accdb  and you can find it here Reports and Graphs>>