My sister got carded around the time she turned 40, and they were not carding everyone. She might not want you to know she is over 40, but heigh ho, it’s pretty cool that someone might think she was underage at this point. With two adorable little boys, she still looks young and hip (too hip to use that term, I’m guessing). She definitely looks younger than I, and, well, she is. She says that she used to think that she was the cute one and I was the smart one, but now with her living in sunny Pasadena and me living in Iowa, she realizes that she is also the smarter of the two of us. Of course, she would have said “smartest.”
I have almost never been carded. I guess I drank even less when I was younger than I do now, when I have a drink perhaps 6 times a year, give or take 4. With different allergies to at least beer, wine, gin, and tequila, count me in for the vodka or rum drinks, with an occasional white wine. You might think I have no good point here, other than to chitchat before talking about primary keys, but there really is a good reason to bring this up. Because the audience for this blog is small, I have met or will meet many of those reading this, and I mentioned a martini earlier, but now there might be a better chance you will buy me a vodka martini with 3 olives. In my role as a preacher’s wife, this is a very rare occurrence.
The ID I show when flying or were I ever to have been carded is my driver’s license. The unique identifier on it has 3 digits, then two alpha characters, then 4 more digits, although that pattern varies across the planet, even from state to state within the U.S. Let’s store information about a bunch of these driver’s licenses. We can do that by creating a database table, or, in our case, a (logical) MultiValue file.
In the relational model there are candidate keys, those combinations of attributes that constitute a unique identifier for a relation. These are implemented as one or more columns in a table. In many DBMS implementations, one of these candidate keys, composed of one or more columns, is selected as the primary key for a table.
In a MultiValue database, each File has a primary key that is a single field. This is typically referred to as the @ID. Not only can an MV file be modeled as a mathematical relation, just as tables are in a relational database, it can be modeled as a function. Functions, in mathematics, are relations with special characteristics, but I won’t bore you with too many details. A File named Drivers.Licenses is a function in that Drivers.Licenses(@ID) = @RECORD where @RECORD is the rest of the fields, other than the primary key. With Drivers.License as our function, Drivers.Licenses(“123AB4567”) = (“Mary Smith”, “F”, “01/01/1960”,…), a tuple of information related to the drivers license with license id 123AB4567.
The @ID is a single field, but it might be composed of multiple parts. So, where one might have a composite key in a relational database table, this would turn into a multi-part key in a MultiValue file, typically with parts of the key separated by a delimiter, such as an asterisk. We don’t need that for this example. We have a single field, LicenseId, that can be our primary key, aka our @ID.
In this example, I have used what is called a natural key in that the driver’s license identifier, LicenseId, is created outside of our system. We could, alternatively, have selected a surrogate key, one we generate within our system. What might prompt us to model our data with a surrogate key for our Drivers.License file?
1. Ease of implementation I
Cache’ gives an easy means of creating an auto-increment key, one where the first one written is 1, the next one is 2, and the 537th one is 537. In other words, using a surrogate key in Cache’ is a piece of cake.
To get this feature packaged in, we need to write records using object syntax of newkey = myrecord->%Save(). Since we are using the class definition as the source for our metadata, if we specify no primary key index on any property in the class, then it is assumed that the primary key is an auto-increment key.
As best I can tell, use of an auto-increment key in Cache’ is almost assumed, so that if you use a natural key for an MV file in Cache’ someone might tell you that you are overriding the id. Don’t punch them out if they say that, even if that is your inclination.
2. Ease of implementation II
In the Zen AJAX framework, using Model-View-Controller, when you refer to the %id in a page with a dataController, it refers to the primary key if and only if the key is an auto-increment key. In your XML specification for a form field, for example, you can specify dataBinding=”%id”. Otherwise you are on your own for figuring out the name of the primary key. There is no general name for the unique identifier for a model class.
If I am struggling with some of the basic Create-Read-Update-Delete (CRUD) features in a Zen web page, it is likely that the Model I am using is keyed with a natural key. Few examples in the Cache’ documentation or sample namespace are done with anything other than an auto-increment primary key.
3. Bitmap indices
We can create way-cool bitmap indices for Cache’. While I have not yet defined any of these (who optimizes for performance before a system is live, eh?), I anticipate these could be very helpful in due time. As the Cache’ documentation indicates, “bitmap operations are optimized for transaction processing: you can use bitmap indices within tables with no performance penalty as compared with using regular indices”.
One requirement for these is stated as “you can only define bitmap indices in tables (classes) that either use system-assigned numeric ID values, or use an IDKEY to define custom ID values when the IDKEY is based on a single property with type %Integer and MINVAL is greater than 0”. There you have it, we have this added feature available for files that have numeric keys.
4. The Usual Reasons for Choosing a Surrogate Key
Of course there are the usual reasons for using an auto-increment or system-generated key that is not the same as the number on a physical object, such as a driver’s license. The argument between selecting a natural or surrogate key is well-worn. I am more likely to argue for a natural key when the decision seems to be a toss-up.
What do we do if we find that everything about the driver’s license was entered properly except for the LicenseId? Unless we write tools to do otherwise, that would be a delete and re-enter scenario if the key is the LicenseId, but would be a piece of cake if the LicenseId were just another field like the Gender.
Of course, if this is not really a Drivers.License file for tracking driver’s license information, but one where we want to track a person so that they can change their driver’s license number and we want to reflect that change here, then we would also not want the LicenseId to be the primary key.
Some of those are better reasons than others for choosing a surrogate key. Although I look first for natural keys, with our SnupNow software we have more files that are keyed with an auto-increment key than we might have were I not to have adjusted the MultiValue modeling skills acquired over my prior years of experience. Of those files we have started using in our builds, 9 out of 16 [16 files translates to ~50 relational tables] have an auto-increment key, with some of the others being co-files of these, in that the value of their key is the same as the auto-increment key in a sister file. I can only think of two files I specified with alpha characters in the key. Obviously the Email primary key includes alpha characters.
I am not typically a fan of creating unique identifiers for unique identifiers, surrogate keys when there are obvious natural keys, but there were not many natural keys for this application anyway, with Email being an exception. I would have had more multi-part keys if implemented in another MV environment, however. I steered away from some of those simply because multi-part keys are typically not numeric, having an alpha delimiter between the parts. I created two multi-part keys with no delimiter, but with the second part of the key being a fixed length so we can find the parts without a delimiter.
Since this is the first Cache’ implementation for which I have done any data modeling, time will tell if I made good primary key choices. A more accurate way to state that might be to say that time will tell which were the poorest choices I made regarding primary keys. I don’t recall discussions among developers where we talk about how delighted we are with a particular choice for a primary key, but I certainly can recall times when what previously seemed a good choice has turned out to be a lousy one, typically due to new requirements.
If I were a purist, or maybe even an excellent software engineer, I would state right now that when we see that an @ID should be redefined, we will refactor for the change right then and there so we do not perpetuate poor design choices as we add to the software. Truth be told, however, I would be ID’d as a pragmatist and a business woman, so any such decisions will be made given the cost and benefit information we have at the time.
I will be continuing this topic, with the next article documenting the various names for the @ID within our development environment. If you thought this one was getting a little too detailed or technical, you should definitely avoid the next one. They can’t all be winners to all types of people. In fact, before I let you read that one, may I see your @ID?