Shuffling Lists of Numbers, Ranking, and Control-Level Validation

<< Click to Display Table of Contents >>

Navigation:  Smart Access 1996-2006 > May-1997 >

Shuffling Lists of Numbers, Ranking, and Control-Level Validation

Ken Getz            
 
This month Ken takes on three issues: shuffling values to provide a random list of numbers without repeats, ranking rows in a query, and providing control-level validation on a form. Look for code in Access 2.0 format (all of this month's solutions work in Access 2; 32-bit users can convert the 16-bit version for use in their own environment). The demonstration database is QA970516.MDB in download file GETZ43.EXE.
 
I need to create a list of random numbers, with no repetitions. For example, I might need a list of the numbers between 1 and 100, in random order. But I can't have any duplicates. I've tried various techniques, but I always end up with a lot of duplicate numbers! I thought using the Randomize statement would cause Access to not repeat numbers: obviously I was wrong, because I used Randomize and it doesn't affect the fact that I get duplicates. Is there some easy way to get a list of numbers, randomly shuffled, with no duplicates?
 
Wow, did this one cause a series of heated messages on Microsoft's newsgroups recently! Your question started a flurry of messages, one after the other, with various participants swearing that Randomize should provide a non-repeating series of random numbers. Let me promise you, it just isn't true!
 
Let's look at the options people tried in attempting to solve this problem. First of all, to retrieve a random integer between intLow and intHigh (the two bounding values), you must use an expression such as this:
 

intRandom = Int((intHigh - intLow + 1) * Rnd) + intLow

 
 
Because the Rnd() function returns a floating point value between 0 and 1 (not including 1), you must scale it, using the previous formula, which also converts the output value to an integer. A first attempt at solving this problem might look like this:
 

Sub ShuffleBad1 (intTop As Integer)

 ' Attempt to retrieve a list of values

 ' between 1 and intTop, shuffled.

 ' This won't work.

 

 Dim intRandom As Integer

 Dim intI As Integer

 

 For intI = 1 To intTop

   intRandom = Int(intTop * Rnd) + 1

   Debug.Print intRandom;

 Next intI

 Debug.Print

End Sub

 
 
If you run ShuffleBad1, passing in a top value, you'll find that it always repeats values (of course, if you pass it some tiny number as its top value, it might not repeat values, but you get the idea). The problem, of course, is that nothing is keeping the random value, once converted to an integer, from being the same value multiple times. At this point, several participants in the message thread suggested using Randomize to force Access to never repeat a value. Therefore, you might have tried a solution like this:
 

Sub ShuffleBad2 (intTop As Integer)

 ' Attempt to retrieve a list of values

 ' between 1 and intTop, shuffled.

 ' This won't work.

 

 Dim intRandom As Integer

 Dim intI As Integer

 

 Randomize

 

 For intI = 1 To intTop

   intRandom = Int(intTop * Rnd) + 1

   Debug.Print intRandom;

 Next intI

 Debug.Print

End Sub

 
 
If you tried this, you found that the output was no better. Why not? Randomize doesn't do anything similar to what people suggested it does. To understand what Randomize does, you have to dig into how Access creates random numbers. To make it as simple as possible, think of it this way: Randomize only ensures that the sequence of random numbers Access creates is different each time you run your code. That is, if you can imagine that Access generates a fixed series of random numbers, Randomize causes Access to start at a different location in its list each time it's executed. Without using Randomize, you run the risk that Access will generate the same series of numbers next time you start. But it certainly doesn't ensure that your random list of integers doesn't include duplicates -- I'm not sure where anyone got such an idea.
 
So, the question is: "How do you solve the problem?" The answer is easy, but not obvious. There are many shuffling algorithms out there, and I'll present one I learned back in high school. The basic steps are as follows (using integers between 1 and 5, as an example):
 
1. Fill an array with the series of numbers you'd like to shuffle. For this example, you'll need an array with five elements, with each element containing its index value.
 
2. Loop through the integers starting at the top of your interval, working down to the next-to-lowest value. For numbers between 1 and 5, you'd loop from 5 down to 2.
 
3. At each step, choose a random number between your lowest value and the current number. That is, for numbers between 1 and 5, first choose a value between 1 and 5, then 1 and 4, then 1 and 3, and so on. The value in the array, at the position you've selected, becomes the current item.
 
4. (This is the important step!) Take the value at the highest location (5, then 4, then 3, and so on) and place it in the location you just selected. This way, as the range of values you're working with gets smaller and smaller, you remove a selected item and replace it with the highest possible value.
 
5. Continue looping down to 2, until there's just a single value left. This value is the final item in your list.
 
Figure 1 illustrates this technique for the numbers between 1 and 5.

199705_kg1
Figure 1

 
 
The Shuffle procedure implements this technique. This procedure allows you to specify the upper limit, and it prints a shuffled list of numbers to the Debug window. You'll probably want to do something else with the shuffled values -- you might want to copy them into a separate array or store them in some way -- but this simple routine demonstrates the basic technique.
 

Sub Shuffle (intTop As Integer)

 ' Shuffles 10 random numbers

 

 ReDim aintValues(1 To intTop) As Integer

 Dim intI As Integer

 Dim intPos As Integer

 

 ' Fill in the original values.

 For intI = 1 To intTop

   aintValues(intI) = intI

 Next intI

 

 ' Loop through all the items except the last one.

 ' Once you get to the last item, there's no point

 ' using Rnd, just get it.

 For intI = intTop To 2 Step -1

   ' Get a random number between 1 and intI

   intPos = Int(Rnd * intI) + 1

   Debug.Print aintValues(intPos);

   aintValues(intPos) = aintValues(intI)

 Next intI

 ' Get the last one.

 Debug.Print aintValues(1)

End Sub

 
 
I'm sure there are other, perhaps even more efficient, alternatives to this solution. This algorithm made enough sense to stick with me for 20 years, so it must have made sense at the time!
 
I have a table containing student names and a semester average, from 1 to 100, for each student. I need to be able to print out a list of students, along with their class rank. That is, given the items in Table 1, I need to be able to print a list like Table 2. Is there an easy way to do this?
 
Table 1. Student names and semester averages.

ID

Name

Score

1

Tom

99

2

Mary

46

3

Sam

79

4

Kenny

86

5

Penny

80

6

Juan

80

7

Akita

92

8

Jennifer

98

9

Michael

96

10

Suzanne

91

11

Peter

100

12

Paul

90

13

Mike

85

14

Debra

60

 
Table 2. Students listed by class rank.

Name

Score

Rank

Peter

100

1

Tom

99

2

Jennifer

98

3

Michael

96

4

Akita

92

5

Suzanne

91

6

Paul

90

7

Kenny

86

8

Mike

85

9

Juan

80

10

Penny

80

10

Sam

79

12

Debra

60

13

Mary

46

14

 
Again, this question came to me by way of the Microsoft support newsgroups, and the solutions provided by participants were interesting. One message suggested creating an array of 100 elements, walking the table of students, and for each row, placing the student name in the corresponding row of the array, then walking the array from 100 down, printing out the name and the index (the grade). That's an interesting, but certainly not optimal, approach. Another person suggested sorting the table by the score value, then writing DAO code that walked the table, incrementing the value in the Rank field every time the code came across a new value in the Score field. Again, this is an interesting strategy, but it's too much work.
 
It turns out that there's a SQL-based solution to this problem. I had to do a bit of research to find it, but I found a Microsoft Knowledge Base article (Q120608) that provides a simple query-based solution. This solution depends on subqueries to do its work. A subquery (often called a subselect) allows you to base the data in the output of your query on the data returned from another query. In this case, for each row of data, you want to know the number of rows that have scores higher than the current row. If there are no rows, this row is ranked 1. If there are two rows with higher scores, the current row is ranked number 3. (Therefore, you also need to add 1 to the count returned from the subquery.) To create a query with a subselect, you must type the SQL expression directly into the query builder's SQL window -- there's no builder to help you out.
 
Because you're going to need to refer to the same table's data from two different perspectives in the same query, you'll need to alias the table in the main portion of the query. For example, the following simple query retrieves two fields from tblScores, but refers to it as tblScores1 whenever the query needs information from the table. This way, the subselect portion can refer to the same table as tblScores:
 

SELECT tblScores1.Name, tblScores1.Score

FROM tblScores AS tblScores1

ORDER BY tblScores1.Score DESC;

 
 
Now that you've created a new, internal name for the input table, you can imagine that you're working with two separate tables. One table (tblScores1) provides each name and score, and another table (tblScores) provides the rest of the values against which you're comparing the current row in tblScores1. To get the count of all the values from tblScores that have a score higher than the current score in tblScores1, you might use a SQL expression like this:
 

Select Count(*) From tblScores

WHERE tblScores.Score > tblScores1.Score;

 
 
This expression should return the number of scores in tblScores that are higher than the current score in tblScores1.
 
To combine the two and create a query that includes a column with each student's ranking, you can create a query like this:
 

SELECT tblScores1.Name, tblScores1.Score,

(Select Count(*) From tblScores

WHERE tblScores.Score > tblScores1.Score)+1 AS Rank

FROM tblScores AS tblScores1

ORDER BY tblScores1.Score DESC;

 
 
This query produces the requested output by using a subselect to find the number of rows in tblScores that have a score higher than the current row in tblScores1. It then adds 1 to that value and places it into a new field named Rank. You should be able to extend this concept to any situation in which you need to number the rows of your query. (Try out qryRank in download file GETZ43.EXE.)
 
What's the down side of using this technique? First of all, subselects in Access can be quite slow. In addition, ties are handled in a manner different from what you might like. As you can see in the original question, Juan and Penny had the same score (80), and the same rank (10). The next lower score, Sam's 79, has a ranking of 12. That is, there's no score with a rank of 11. In many industries, this is the standard ranking technique. In others, however, you want to have a unique, monotonically increasing rank for each row. Microsoft's KB article provides a workaround for this problem, but it does require an extra query or two. If you're interested, check it out at www.microsoft.com/kb.
 
I have an unbound form on which I need to gather four pieces of information: starting and ending date, a student name, and a Social Security number. This information will be used to filter a query that feeds data into a report. In addition, the form contains Cancel and OK buttons, and a Label control for displaying error messages. For historic reasons, I want to validate each control's data before entering the next control -- that is, if the current control's data is invalid, I don't want to allow users to move to the next control. I just want them to see an error message in the Label control, and be forced to stay in the current control until they've entered a reasonable value. Of course, if they press the Cancel button, the form should unload with no complaints. I've tried all sorts of solutions, but none work correctly. Can you suggest a solution I can use on this form, and on other similar forms?
 

Carl Imthurm (cimthurm@ccmail.llu.edu)

 
Funny you should ask this question! I recently battled a similar problem in a VB application and finally found a solution in that environment. Unfortunately, the VB solution won't work in Access. Why not? You'll see, as I explain the strategy I tried in Access.
 
My first attempt at a solution to this problem was to use the Cancel parameter that Access passes to the Exit event procedure for each control. For example, I wrote code such as the following to verify that the user had entered a name:
 

Sub txtName_Exit(Cancel As Integer)

   If Len(Me!txtName.Text) = 0 Then

       Me!lblErrorText.Caption = _

        "Please supply a name."

       Cancel = True

   End If

End Sub

 
 
This worked great, until I attempted to leave txtName empty and pressed the Cancel button. I couldn't close the form!
 
The next attempt was to find some way to not perform the validation if the user presses the Cancel button. That is, set a flag when the user presses Cancel, and if that flag is set, bypass the validation in the controls' Exit events. In VB, the Cancel button's MouseDown event occurs before the LostFocus event of the text box, and you can use that sequence of events to make sure you don't validate the text if you're going to close the form anyway. In Access, however, all the text box's events fire before the button's, so there's no way to avoid the problem.
 
The solution, then? One rather ungainly solution (and the one I devised for my application) was to use the form's Timer event. Rather than validating the text in each control's Exit or LostFocus event procedure, all those event procedures do is set a module variable, mCtl, to refer to the control that you've just left, and set the form's TimerInterval property to some small value (five milliseconds works for me). It's in the form's Timer event procedure that you'll perform your validation.
 
As you leave the control, one of two things can happen:

You click the Cancel button, which calls code to close the form, before the Timer event procedure has a chance to run.

You attempt to move to another text box, which allows the five milliseconds to pass, and the form attempts to run the validation code. If the test fails, the validation code displays an error message and sets the focus back to the original control.

The sample form, frmErrors, demonstrates this technique. The form contains two module-level variables: mCtl, which tracks the control that you're attempting to validate, and fCancel, a flag that allows the form to close. In each control's Exit event procedure, there's code like this:
 

Sub txtEndDate_Exit (Cancel As Integer)

   fCancel = False

   Set mCtl = Me!txtEndDate

   Me.TimerInterval = 5

End Sub

 
 
As you leave each control, the Exit event procedure stores a reference to the current control and turns on the form's timer. The Cancel button sets the fCancel flag to True and closes the form. If the timer interval expires before the form has been closed, the Timer event procedure checks the fCancel flag. If it isn't set, it then calls the Validate procedure (where all validation takes place). This procedure also resets the form's TimerInterval property, so it's ready for the next control:
 

Private Sub Form_Timer ()

   If Not fCancel Then

       Call Validate

   End If

   ' Reset the timer

   Me.TimerInterval = 0

End Sub

 
 
The center of this technique, then, is the Validate procedure. This procedure includes a Select Case statement, which switches the case based on the Name property of the control referred to by mCtl. For each control on the form, the procedure must perform the necessary validation, display an error message if necessary, and set the focus back to the original control if an error occurred. The following code shows the Validate procedure for the sample form:
 

Private Sub Validate ()

 Dim datTemp As Variant

 Dim fError As Integer

 Dim strErr As String

 

 ' Don't care to handle any errors in this routine.

 On Error Resume Next

 Select Case mCtl.Name

   Case "txtStartDate"

     ' Start date must be between 1/1/96 and now.

     strErr = _

      "Please supply a date between 1/1/96 and now."

     If IsDate(mCtl) Then

       datTemp = CVDate(mCtl)

       If datTemp < #1/1/96# Or datTemp > Date Then

         fError = True

       End If

     Else

       fError = True

     End If

 

   Case "txtEndDate"

     ' Start date must be between 1/1/96 and now.

     strErr = _

      "Please supply a date between 1/1/96 and now."

     If IsDate(mCtl) Then

       datTemp = CVDate(mCtl)

       If datTemp < #1/1/96# Or datTemp > Date Then

         fError = True

       End If

     Else

       fError = True

     End If

 

   Case "txtName"

     ' Name must not be empty.

     strErr = "Please supply a name."

     If Len(mCtl) = 0 Then

       fError = True

     End If

 

   Case "txtSSN"

     ' SSN must not be empty.

     strErr = _

      "Please supply a Social Security number."

     If Len(mCtl) = 0 Then

       fError = True

     End If

 End Select

 If fError Then

   lblErrorText.Caption = strErr

   mCtl.SetFocus

 Else

   lblErrorText.Caption = ""

 End If

End Sub

 
 
It's up to you to decide what you want the OK button to do when you click it. Most likely, you'll want it to validate each of the fields on the form before allowing the user to close the form. The sample form skips this step, because each application will handle this differently.
 
To set up a form like this, follow these steps:
 
1. Add the two module-level variables to your form's module (fCancel and mCtl).
 
2. Add code to the Exit event of each control you want validated, as shown in this article.
 
3. Write code for the form's Timer event, as shown in this article.
 
4. Write a Validate procedure that will handle all of the controls that need validation.
 
5. Add code that will close the form to the form's Cancel button.
 
This is a lot of work just to incorporate control-level validation to a form. I was surprised to find that the order of events in Access doesn't match the order in VB, and that this problem is much easier to solve, in general, in VB than it is in Access. I've now used this technique in a client application, but it takes effort to get the pieces in place.
 
Read about how to get the download GETZ43.EXE on this page