Decomposing a Badly Designed Table

<< Click to Display Table of Contents >>

Navigation:  Tables >

Decomposing a Badly Designed Table

Frank Kegley

Many users have decided that they don’t need the full power of an RDBMS and that Excel is the right choice for storing data. It’s not a bad choice, until the application grows to the point that they do need a "real database." Frank Kegley shows you how to handle the typical problems of converting from Excel to Access.

Recently, a client told me that she was tired of using Excel’s "database" capabilities. She asked me to take a look at her list with a view to converting her Excel spreadsheet to one or more tables in an Access database. As you can see in Figure 1, there were duplicate values in many of the columns. There were also typographical errors and inconsistencies, such as different OrderDate values for the same OrderID value. Ensuring the correctness of her data was taking up more and more of her time. Reporting had also become more of a chore than she liked.

200412_fk1
Figure 1

Access was definitely the solution to her problems. In this article, I’ll demonstrate a low-cost, rapid application development (RAD) approach for building the database for moving the Excel "database" to an Access database. However, before we could begin the move to Access, we needed to design the database and "scrub" the data to clean up any problems.

Getting ready

After examining the list, I told my client that her single Excel spreadsheet would decompose into four Access database tables:

• Customers, containing the CustomerID and CustomerName fields.

• Products, containing the ProductID and ProductName fields.

• OrderDetails, containing the OrderDetailsID, OrderID, ProductID, Quantity, and UnitPrice fields.

• Orders, containing the OrderID, OrderDate, ShippedDate, and CustomerID fields.

The field definitions were clear-cut:

• The CustomerID, ProductID, and OrderDetailsID fields will be of type AutoNumber beginning at 1.

• The OrderID field will be of type AutoNumber beginning at the lowest value in the OrderID column.

Since I strongly believe that the primary key to every Access database table should be of type AutoNumber, each AutoNumber field would be the primary key of its respective table. Here are the reasons I believe every primary key should be a field of type AutoNumber:

• It’s simpler to use AutoNumbers than to decide which field or combination of fields should be the primary key. This makes sense for a RAD approach.

• An AutoNumber field is always a single numeric field, which maximizes performance during Joins and in retrieving records by the primary key (multiple-field primary keys and string primary keys run more slowly than AutoNumber fields).

• Each value is guaranteed to be different for each row in a given table, so no code is required to make that happen.

• The key is generated automatically, so you don’t need a process to generate primary keys.

• The primary key value cannot be changed accidentally.

• The primary key doesn’t have meaning as data, so there’s no temptation to change it anyway.

Before importing the Excel list into an Access database, we corrected all errors and inconsistencies in the data. It might have been possible to fix these problems in Access, but trying to import data before correcting these problems creates new problems. The first is that the rows with problems may not import at all, forcing you to reconcile two sets of data (Access and Excel). In addition, since many of the problems relate to primary key errors, you’ll almost certainly have to import the data into tables without a primary key. Without primary keys on your tables, Access is limited in what processing it can do.

Because the OrderID values in the list were used in a variety of other places, the missing OrderID values needed to be inserted in the list or else some records in the Orders table would have a different OrderID than the corresponding records in the Excel list. I built an Excel macro that inserted rows containing the missing OrderID values. That macro is included as part of the Download for this article.

With the data now ready to import, I used Make-Table queries to decompose this "roll your own" database into several smaller tables. My goal was to use SQL statements to generate the tables from the actual data in order to ensure that the table design matched my friend’s real-world data. Once the tables were initially set up, I could use the power of Access to tailor the tables to generate a professional database design.

Creating the Customers table

Now that I’d cleaned up the data, I could start taking advantage of the power of Access. So, to begin with, I imported the whole Excel list into a single table, called Imported. From here I would distribute the data to the tables that made up our design.

My next step was to build the Make-Table queries to create my new tables. While I could have created the table first and then loaded it with data, using a Make-Table query has two advantages:

• It ensures that the datatypes in the table match the data in the Imported table.

• Going into the table after it’s created to make any modifications ensures that I review the final product.

The following SQL statement pulled the customer name from the Imported table and used it to both populate and create the Customers table. When the query was run, it was just a matter of clicking Yes when asked to paste the rows into a new table to add all my data:

SELECT DISTINCT Imported.CustomerName

INTO Customers

FROM Imported

WHERE Imported.CustomerName Is Not Null;

When you close this query, you’ll need to decide whether you want to save it. This decision will be based on how confident you feel about your work. I’d recommend saving the query in case you need to re-run the process.

My query created a table with all of the columns from the imported table–but that was all it did. To add the primary key, I still needed to open the Customers table in Design view, add an AutoNumber field (which I called CustomerID), and click the Primary Key button to make my new field the primary key. Depending on what queries you think you’ll be running against this table, you’ll need to add any indexes that you require.

Creating the Orders table

The process for creating other tables was only slightly more complicated than the one for the Customers table. For instance, for the Orders table I needed to include a reference back to the Customers table, using the CustomerID field (which doesn’t exist in my Excel spreadsheet). In addition, I needed to convert my friend’s OrderID field into an AutoNumber field (fortunately, all of her OrderIDs were already numeric and in order–the only problem was that the first OrderID value wasn’t 1).

My SQL for the initial creation of the Orders table looked like this (you’ll notice that I’ve included the CustomerName field even though I really want the table to use the CustomerID field):

SELECT DISTINCT Imported.OrderID,

Imported.OrderDate, Imported.ShippedDate,

Imported.CustomerName

INTO Orders

FROM Imported;

I can’t change the type of the OrderID field to AutoNumber if it already contains data, so I switched to the Access Database window, copied the Orders table, and pasted it using the STRUCTURE ONLY option into a table I called NewOrders. After opening the NewOrders table in Design view, I changed the type of the field OrderID to AutoNumber and clicked the Primary Key button. While in Design view, I also used the Lookup Wizard to add a new field called CustomerID that would tie to the Customers table. I still needed to update this field with the Customer data, but I’ll come back to that.

I now had to determine the minimum OrderID value using this SQL statement to make it the starting point for my OrderID AutoNumber field. This query does that:

SELECT Min(Orders.OrderID) AS MinOfOrderID

FROM Orders;

I saved this query so that I could use it in a later step. However, I also wanted to make sure that I’d be able to recognize this query and delete it when the database was finally loaded, so I used an obviously temporary name: Query1. Now, to set the initial AutoNumber I used this Append query to add the lowest numbered Order to the table:

INSERT INTO NewOrders (OrderID, OrderDate,

ShippedDate, CustomerName)

SELECT DISTINCT Orders.OrderID, Orders.OrderDate,

Orders.ShippedDate, Orders.CustomerName

  FROM Orders INNER JOIN Query1

    ON Orders.OrderID = Query1.MinOfOrderID;

To append the rest of the rows from Orders to NewOrders, I just removed Query1 from the previous query and ran the query again. I got a message that not all rows could be appended because I had already appended one row, so I just clicked Yes to that message.

I now switched back to my NewOrders table. I sorted the orders on CustomerName and deleted the records that had no value in the CustomerName field. I also sorted on OrderID in order to verify that the lowest OrderID matched the number that I used to start my AutoNumber field. Then I deleted the original Orders table and renamed the NewOrders table to Orders.

To update the CustomerID field of the Orders table, I used the following Update query. This query joined the Customers table to the Orders table using the CustomerName field. With the tables joined, I updated the CustomerID field in the Orders table:

UPDATE Orders INNER JOIN Customers

ON Orders.CustomerName = Customers.CustomerName

SET Orders.CustomerID = [Customers]![CustomerID];

Back in Design view for the Orders table, I deleted the now redundant CustomerName field. I also set up the CustomerID field to do a lookup into the Customers table. I prefer combo boxes on forms to lookup fields in tables, so, while in Design view, I changed the Display Type property on the Lookup tab of the CustomerID field to Text Box.

Creating the OrderDetails table

The OrderDetails table used a similar process to the Orders table. First, I generated the table and loaded it with data:

SELECT DISTINCT Imported.OrderID,

Imported.ProductName, Imported.Quantity,

 Imported.UnitPrice

INTO OrderDetails

FROM Imported

WHERE Imported.ProductName Is Not Null;

I then created my AutoNumber OrderDetailsID field. Unlike the work I had to do with replacing the CustomerName field with my new CustomerID field, this table was already connected to the Orders table through the OrderID field. However, I did have to add a ProductID field, join Orders to my Products field using ProductName (once the Products table was created), and populate the ProductID field. Then I deleted the ProductName field. I also created a Lookup relationship between the ProductID field and the Products table.

Relationships

Finally, with all of my tables created, I added relationships between the tables. I opened the Relationships window and found most of my relationships already created. When I put the Lookup field in the Orders table earlier, Access built a relationship between the Customers and Orders tables. However, by default the relationships aren’t enforced. So I edited each relationship and set the options so that Referential Integrity is enforced.

Of course, there would be more work to do before arriving at a useful application, but my client’s data had now been successfully transferred to a data-centric environment. Along the way she had acquired a robust database design with real relationship integrity. We were now, for instance, in a position to further enhance her data integrity by adding validation rules to fields and tables, or to document her data using Access’ description fields.

 

 

Your download file is called 412KEGLEY.ZIP in the file SA2004-12down.zip

This can be purchased with all the other downloads on this page

 

See Also