By Garry Robinson
A number of the Microsoft Access projects that I have been involved in are in essence rescue missions for spreadsheet systems that have outgrown themselves. When I examine these systems they tend to have characteristics such a 12 individual monthly spreadsheets, one yearly spreadsheet that draws the results from the monthly sheets and maybe even another set of weekly report sheets jumbled into the system as well. Once these interlinked spreadsheets hit their second or third year, all the monthly spreadsheets have changed in design and the old information is only accessible by by a lot of manual cutting and pasting.
With good design and a few compromises, you can turn these into database systems that have easier navigation for input for users, report equivalent output and they do not require spreadsheet skills to run. This article tries to capture the techniques that I have used over the years to get these conversion projects up and running smoothly.
The Initial Analysis - Its All In The Paperwork
Step 1. Print out the spreadsheets if they are at all complex. You only need a representative sample of every different part of the spreadsheet rather than a printout of the whole lot. If the spreadsheet consists of numerous spreadsheets that do the same thing then you only need to print out one of those spreadsheets. If there are obvious different sections inside a worksheet, split them into different printouts. When doing the printouts, make sure to Print Row and Column headings at the top of the sheet as these will be really handy when dissecting the formula's. You will find this on the Print Page Setup tab under Sheet.
Step 2. Find yourself a large notice board (the story board) or a clean piece of the wall close to your computer and stick all the parts of the spreadsheet on to it. Write the name of the spreadsheet and the worksheet on the top of every page so that you can reference it.
Step 3. Go through the spreadsheets and identify all cells in the sheets that have data in them. A highlighting pen is really useful here. Ignore any formula or references to other cells. Where you have cells with the same data repeated (like months 1-12) these will more than likely form the tables in the database. Cells with only a single entry may need to be stored in a control table in the database. At this stage, we are aiming to identify the tables and fields that will make up the database. Each one of these different data collections will need to be replicated in the database in a table. If you have repeated worksheets of the same data, the worksheet names themselves will be a field in your database i.e. Monthly worksheets would need a month identifier field.
Step 4. Whilst you are going through spreadsheets, you will now to identify where each formula cell retrieves its data from. Do not waste too much time with repeated offset formula every cell but be on the lookout for formulae or references that might change the data model. Draw lines or use grid references on the story board to show where the data is coming from and write brief explanations of the different formulae. These formulae will need to be replicated in queries or modules in the database.
Step 5. Highlight the portions of the spreadsheets that are used for reports. Testing of your database solution will include generating these results with queries.
Preparing The Excel Database
Before transferring data from your spreadsheet to an Access database, you need to make sure that the data is prepared for a transfer as follows.
1) You should have rows of cells with only data in them organized in consistent manner so that results for each potential field in your database line up under each other in the spreadsheet. If like information is stored in different worksheets then you are either going to have to get them together inside the spreadsheet or by appending tables in the database. See the highlighted data in figure 1.
2) The rows of data should have a heading row at the top. Here you can save yourself a lot of trouble by giving each column a name which you will later use in the table in the database. Follow good naming conventions and especially avoid spaces or special characters in the field name.
3) Once you have the data setup, set a database range that covers all the fields and all the rows that you want to import for a particular collection of data. This will form a table in your database. See figure 1 to see how this is done. The alternative to importing ranges is to import worksheets and but this method does not always work.
Figure 1 - Defining a range name for the data that will make the companies table.
Are The Cells Around The Wrong Way?
Sometimes the first column of data actually holds the fieldnames that you want to import and the data goes from left to right rather than down the page. This data will not import cleanly so you will have to use the transpose option as follows.
Select the cells that you want to transpose
Click Copy (Ctrl C).
Select the upper-left cell of the paste area. The paste area must be outside the copy area.
On the Edit menu, click Paste Special.
Select the Transpose check box.
This will rotate the data and make it suitable for importing.
Preparing for the inevitable changes along the way
During the project you will encounter issues in the importing and your users are likely to change the spreadsheet(s) whilst they are waiting for you to complete the software. So rather than doing all the transfers using manual procedures, start building a loading sequence of procedures that will transfer the data in total. Another reason to start building transfer and loading procedures is when you get to the end of the project, you will have a much higher quality transfer when it matters most.
Setting up a sequence to do the imports.
There are three ways to import the data
1) Use the Import command button from the File menu in Access. This is the most versatile but it is a manual process that you can get wrong once you start importing a number of data collections.
2) Use Linked Tables. Access can actually link to a name range in a spreadsheet and will allow you to manipulate the data as if it were a table (almost). This is a very good procedure to follow if you can because you can skip having to import into a temporary table.
3) Use the Transfer Spreadsheet method in visual basic or in macros. When I am setting up an automated loading procedure, I would use visual basic similar to the following code snippet.
Const transferDir = "c:\my documents\"
rangeName = "Companies"
tableName = "Companies"
DoCmd.RunSQL "delete from " & tableName
DoCmd.TransferSpreadsheet acImport, 8, tableName, _
transferDir & "OriginalData.xls", True, rangeName
Things to note in this transfer process are that the Transfer spreadsheet method will make a new table if none exists and append to an existing table if one exists. Therefore it is important to clean up previous transfers first. I do this with a delete SQL query that I also store in the same procedure. Also when writing import code, try to use constants for items like file locations.
I sometimes use Access macros for loading as they are simple to setup. If you end up with complicated macros, you can convert a macro into visual basic by right clicking on the macro and using the save as visual basic option.
Keeping Track Of Your Database Modeling
Now that you have your preliminary Excel data imported, it is time to use your design skills to implement a relationship diagram. When I am working on Access projects, I am always committed to producing and user testing the data model prior to development of the forms and reports that will complete the application. At this stage, you will be refering back to the storey board to help you with your design.
After importing the three spreadsheet ranges, I open the relationship diagram and add the tables Companies, Contacts and Calls. The final relationship diagram is shown in Figure 2. Building your relationship diagram is important as any changes will have an impact on your import processes. In the case of this spreadsheet, there is a clear one to many relationship between Companies and Contacts and a further one to many relationship between Contacts and Calls. To make this relationship, you first of all need to add a primary or unique key to the CompanyName field in the Companies table and ContactID in the contact table. It will also help to add an index to the CompanyName field in the Contacts table for better performance.
Now that the one to many relationships are setup, the sequence for cleaning up the data prior to loading and the loading itself will be effected by the relationship model. To clean up previously imported data, first delete all the data from the lowest tables in the relationship model upwards. I.e. Calls, Contacts and then Companies. Conversely when importing, load from the top down i.e. Companies first, then Contacts and then Calls.
Figure 2. The relationship diagram for the new access application.
Keeping Track Of All The Data Modifications.
As you build your data model, you are more than likely to run into problems with data integrity. i.e. you might decide a field should be numeric but in the spreadsheet, some entries are erroneous string entries. If you then decide to edit these incorrect items, it is better if you can work out how to do it in a query. That way you can run the query as part of your loading process. More than likely you are going to run into the situation where the data that you are loading does not fit the data model that you have designed for the application. If so the process will go something like this
•Delete all records or drop the temporary table that you are going to hold the data in.
•Import the data from the spreadsheet into the temporary table
•Run one or more update queries on the temporary table to clean the data.
•Append all the appropriate fields in the temporary table to the actual table.
In this sort of process, if you wish to just use normal action queries rather than visual basic i.e. delete, update, append, make table, you will need to make sure that you store the queries in the order that you will run them in the loading. The naming convention that I suggest is to give every action query a 3 or 4 digit prefix so that they sort into sequence When you are naming the queries, leave gaps in the sequence so that you can add additional queries that you might miss out on. In figure 3, you will see a sequence of queries that I used when building a metallurgical database from spreadsheets.
Figure 3. Setting up your action queries in numerical order.
Running 5 Queries In A Row - A Really Useful Tip
Have you ever run a number of action queries in a row and wondered which of them was being processed when you get a message saying
"You are about to delete 17 rows from the specified table"
Well a simple solution to this lies in the "AppTitle" property. This is the property that is displayed on the top window of the Access Application. Well it is also displayed on the top of the message boxes that are displayed by the action query messages. In the sample database, I have a included a routine called ChangeTitle that will change the Application title by passing the new title as an argument. e.g.
call ChangeTitle_FX ("Deleting from Contacts table")
DoCmd.RunSQL "delete from Contacts"
Going For The Pure Load Build
Sometimes it is necessary to build and populate the database structure in code and for this I use some (almost) hidden Access queries and methods. These commands are all part of the SQL set and help can be found by typing "SQL data-definition query" into the Access help. I know you can do mimic this commands using DAO and ADO methods but I prefer to use SQL commands whenever I can. The commands that are available are as follows
Create Table; Create Index; Alter Table; Constraint; Drop table
To illustrate how to add a table and a relationship to the database in software, I am going to add a lookup table for the ContactsType field using visual basic (see relationships in figure 2) . This lookup table will guarantee the data entered into the ContactsType field is allowed.
Dim sqlstr As String
sqlstr = "CREATE TABLE tlkp_ContactType " & _
"(ContactType text, Description text)"
' Now add the unique data from the imported contacts table
' This is the SQL equivalent of an Append action query
sqlstr = "INSERT INTO tlkp_ContactType " & _
"( ContactType, Description ) " & _
"SELECT ContactType, ContactType AS Description " & _
"FROM Contacts " & _
"GROUP BY ContactType, ContactType;"
Call ChangeTitle_FX("Appending To tlkp_ContactType")
Another of these SQL commands creates a primary index on the Lookup table.
sqlstr = "CREATE UNIQUE INDEX " & _
" MyIndex ON tlkp_ContactType (ContactType) With Primary"
Now I use a DAO library method to build a one to many cascading relationship between the tables. This has no equivalent command in SQL or in ADO. To find help on this topic, use Access 97 if you can because the DAO help in Access 2000 is very elusive. A good practice illustrated here is to add the "DAO." definition to all DAO objects. When I was developing this routine, I spent a long time debugging this code in Access 2000 because ADO took precedence and the errors that were returned on the append method lead me on a wild goose chase.
Dim dbsBuild As DAO.DATABASE
Dim relatNew As DAO.Relation
Set dbsBuild = CurrentDb
Set relatNew = dbsBuild.CreateRelation("MyRelationship", _
"tlkp_ContactType", "contacts", _
' Need to create a field in relation object prior to
' defining the name of the external field.
.Fields!ContactType.ForeignName = "ContactType"
And to complete the process, the building routine initially has to delete the relationship and the the lookup table in case either of them exist in the database. This deletion of relationships using code is important as the SQL drop table command will not work whilst a relationship is in place. Note that I turn off the error tracking for this section of the code so that the software will just continue on if the tables and relationships do not exist.
On Error Resume Next
' Delete the relationship if it exists
Set dbsBuild = CurrentDb
' Delete the table in case it exists.
DoCmd.RunSQL "Drop table tlkp_ContactType"
On Error GoTo 0
Generally I will only deploy create table and build relationships when I am involved in a large database transfer and conversion project. For most occasions, I load into the database design that I have built directly by using either direct loads or by using temporary tables where necessary. In big transfer projects, it is also useful to put together a set of queries that tell you how many rows have been transferred to each table. These should be compared against the expected results when you actually transfer the live data.
Make Your Solution Look Like A Spreadsheet
Interestingly I have had great success in these conversions projects by delivering an application that consists of only tables for data entry and queries for reports. As a result, the first version of the software that I deliver to the client looks and acts a lot like a spreadsheet. The characteristics of this solution are
1) A well refined data model with all the trimmings such as indexes, lookup tables, referential integrity, formats, data entry rules and field descriptions.
2) A form that basically resembles the relationship diagram with buttons to open each table in the database.
3) A form that fires off all the queries that are representative of the reports that the software is to produce.
That is all I usually deliver in version 1 of the software and surprisingly most spreadsheet users are happy with the outcome. Usually the time saved not doing the fancy forms and reports can be spent training the users about filters, building queries, transferring to a spreadsheet and maybe even putting together their own reports.
Excel Automation For Reports - A giant step to user satisfaction.
Microsoft Access reports are highly regarded as reporting tools amongst developers but they have one major flaw. They do not look like spreadsheets. To keep the spreadsheet users happy, I have been doing a number of reports by passing the data back to an Excel template using Office Automation. The advantage of this approach is that the final answer is the database data transferred back to the original spreadsheet designed by the user. The report cannot look any better than a report designed by the owner who is generally the client.
Lessons Learnt From These Projects
The most important things that I have learnt from transferring from a spreadsheet systems to a one centered on a database is
• The data is managed better
• The data is multi-user
• Less skills are needed to run the application.
• The information can be dissected in new ways far more easily than in a spreadsheet.
• You can sometimes replace many separate spreadsheet reports with just one database report
• Formulae can sometimes be hard to replicate in a database
• Spreadsheet graphs and reports are hard to replicate to the users satisfaction.
• You should understand Automation of Excel in case you want to transfer data back into a spreadsheet report.
• The most important thing in the conversion from another system such as a spreadsheet is to correctly transfer the data.
• You can always refine the database model in phase 2 of your solution because remodeling inside a database is far easier than importing and making a new model in the one step.
• Finally build a simple database interface without too many bells and whistles because the spreadsheet users are generally smart enough to work out a well designed database application. After all they are comfortable entering data into cells and navigating around spreadsheets of great complexity.
Your Sample Database Is Called "Robinson_SStoAccess.accdb" and can be downloaded from this page Design and Tables >>