Creating Histograms Using Access rather than Excel

<< Click to Display Table of Contents >>

Navigation:  Smart Access 1996-2006 > Mar-1998 >

Creating Histograms Using Access rather than Excel

199803_aw2 Art Walls            
 
 
Excel hits a wall when the number of rows gets high. Here's how one Access developer used Access and the Access Query Designer to extend and enhance one of Excel's data analysis tools.

 

When, in the course of human events, it becomes necessary to analyze more than 16,384 rows using Excel 6.0 or Excel 95 (or 65,536 rows in Excel 97), one must turn to alternative methods of analysis. While both versions of Excel provide excellent ways of slicing and dicing your data, they have hard limits on how many rows you can work with. One good alternative is Access. Although Access isn't typically used for statistical analysis, it does get around some of the limitations of Excel, most notably the row limit. Here's the technique I used to handle an enormous amount of data in Access to give Excel-like results.
 
I was recently on a project examining stock items in several warehouses. The data resided on multiple AS/400s and, due to the number of calculations and examinations required, I originally decided to pull it off the mainframe and into a spreadsheet to do calculations. This meant breaking the data up into 16K rows (in Excel 95) to fit in the sheets, and then doing the calculations on the various sheets. The multiple-sheet method worked well for a while, though it did require several copy commands to get the calculations across all the sheets.

199803_aw1 Figure 1
 
One of the calculations for the analysis consisted of counting the number of rows that fell within various ranges. I started out using Excel's Analysis Toolpak Add-in feature called Histogram. With this tool, you enter the values for the ranges in a set of cells (bins), start the Histogram tool, select the input range, identify the cells where you put the ranges, and then indicate where the output should go. The dialog box you use appears in Figure 1, while Table 1 shows some sample output.
 
Table 1. Excel output from the Histogram tool.

Bin

Frequency

0

10334

100

7930

200

738

500

618

1000

290

More

202

 
 
This example says that there were 10,334 occurrences of zero values, 7,930 between 0 and 100, and so on. While this works well for a couple of repetitions, I would have been required to go through the steps about 40 times to process all the data I'd downloaded. I figured there had to be a better way.
 
Having worked with SQL on many occasions, I decided to give Access a try. I imported all of the sheets into Access and appended them into one table (tblItems). The table layout is shown in Table 2. I then created a "bin range table" (shown in Table 3) to hold the data that would define my ranges. With the bin range table created, I loaded it with the data to define the ranges I wanted to divide my data into. Table 4 shows a typical set of entries for the bin range table.
 
Table 2. The data input table.

Field name

Data type

Description

Warehouse

Text

Warehouse Id

Item No

Text

Stock Item Number

Description

Text

Item Description

Consignment

Text

Consignment Item?

AverageUsage

Number

Monthly Average Quantity

AverageUsageAmount

Number

Monthly Average Amount

LeadTime

Number

Lead Time in Days

UnitCost

Number

ItemUnit Cost

BOH

Number

Balance on Hand

LastIssueDate

Date/Time

Date Last Issued

Criticality

Text

Item Criticality

 
Table 3. The bin range table.

Field name

Data type

Description

txtRangeDesc

Text

Text Description of Range

dblLowerLimit

Number

Lower Limit of Range

dblUpperLimit

Number

Upper Limit of Range

 
Table 4. Sample data for the bin range table.
 

TxtRangeDesc

DblLowerLimit

DblUpperLimit

<=0

-999999

0

>0 and <= 100

0

100

>100 and <=200

100

200

>200 and <=500

500

1000

>1000

1000

9999999

 
Now comes the interesting part: getting Access to create the histogram. The end result had to check that the value in AverageUsageAmt was greater than dblLowerLimit and less than or equal to dblUpperLimit. I began by creating the query using Access's Query Designer. Of course, Query Designer is restricted to using equal conditions in joins, so you can't create the kind of nonequi-joins that I required simply using drag and drop. However, query designer does provide a good starting point. I began by creating the query shown in Figure 2.

199803_aw2 Figure 2
 
Had I actually run this query, the result set would have been empty, because the query is trying to get an equal comparison against two different values. You'll notice that I also sorted the table on the dblLowerLimit, which puts the data in order by ranges (that is, the range beginning at 0 first, the range beginning at 1000 last). You could also sort by the values in txtRangeDesc, but text string representations of numbers generally don't sort well, and I wanted the end result to look pretty.
 
The next step was to view the resulting SQL:
 

SELECT tblRange.dblLowerLimit, tblRange.txtRangeDesc,

Count(tblItems.ItemNo) AS CountOfItemNo,

Sum(tblItems.AverageUsageAmt) AS SumOfAverageUsageAmt

FROM tblItems INNER JOIN tblRange

ON tblItems.AverageUsageAmt = tblRange.dblUpperLimit

AND tblItems.AverageUsageAmt = tblRange.dblLowerLimit

GROUP BY tblRange.dblLowerLimit, tblRange.txtRangeDesc

ORDER BY tblRange.dblLowerLimit;

 
 
The query I wanted was almost an exact duplicate of the query generated by the Query Designer. The changes are the "<=" comparison against the dblUpperLimit and the ">" against the dblLowerLimit:
 

SELECT tblRange.dblLowerLimit, tblRange.txtRangeDesc,

Count(tblItems.ItemNo) AS CountOfItemNo,

Sum(tblItems.AverageUsageAmt) AS SumOfAverageUsageAmt

FROM tblItems INNER JOIN tblRange

ON tblItems.AverageUsageAmt <= tblRange.dblUpperLimit

AND tblItems.AverageUsageAmt > tblRange.dblLowerLimit

GROUP BY tblRange.dblLowerLimit, tblRange.txtRangeDesc

ORDER BY tblRange.dblLowerLimit;

 
 
When I ran the query, out popped the histogram data shown in Figure 3.

199803_aw3 Figure 3
 
As a bonus, I included a sum of the AverageUsageAmt, something that can't be done with an Excel Histogram. I then exported the data back to Excel to take advantage of its graphing capabilities.
 
Conclusion

Using Query Designer is an excellent way to start creating a complicated SQL statement, even if you're comfortable with SQL. It sure made creating my Histogram query an easy task. And, while Excel might be an excellent data analysis tool, when the data starts piling up, you can count on Access.
 
Read about the download HIST.ZIP on this page

 

ourProducts