The Fallacy of Meaningful Codes
By Moneo

In the early 1960's, I was working on a programming contract for Schering Corporation to implement a Production Planning and Control System. My mentor, Tony Penta, was the project manager. In addition to programming, I was assigned the task of creating the codes for the Product Master file. I analysed all the possible classifications, and designed a code composed of fixed-length sub-sections such as: group, class, department, etc. I proudly showed the code design to Tony Penta. He took a look at it and said the following words which I never forgot: "Ed, the more meaning you put into a code, the quicker it will corrupt."

Tony went on to explain that sub-sections of codes, like the department for example, today fits fine into a one digit number, because today they only have 6 departments. But what happens to this one digit code tomorrow when they expand to 40 departments? Even converting the one character position to alphanumeric would not hold the required combinations. The structure of the code would have to be modified to add extra digits, and the result would be a corruption of the original code design which would require a costly conversion project to convert all the related files, input formats, and reports.

He explained that the safest and simplest approach to code design is completely random or consecutive numbers. How many products do you have today? let's say you have 5000. That's a 4 digit number. Then, allowing for expansion, make the code 6 digits, which will allow the company to grow to 1 million products.

It took me a while to grasp the magnitude of Tony's warning, but then I kept it fresh in my mind for the rest of my life.

The biggest argument that I ever got in support of meaningful codes is that you can look at a code and tell what the item is based on the digits of the code. This is only true if the code has not suffered any changes in format. Hey, if you  want to know exactly an item is, lookup the item's code on the Item or Product Master File. Another argument is that meaningful codes are easy to remember. The truth is that people who work with the items/products every day, can just as well recognize or memorize the codes that were assigned without meaning.

Here's a few examples...

In 1983, at a computer assembly company in Mexico, I was developing a complete manufacturing system. The industrial engineers, who were from Ericsson, had begun design of a part number, which curiously looked very similar in style to the original product code that I had designed back at Schering. It took me quite some time to convince these engineers of Tony's rule. We finally decided on a 6 digit, numeric part number with no embedded meaning.

In 1990, while at Citibank-Mexico, I was developing a Manpower Planning system, for which every job or post had to have a unique code. My boss, the IT director, already had a design in mind consisting of 10 digits, of course with all kinds of meaning in the digits. So, I let him have Tony's rule with both barrels. Amazingly, he thought about it for a few minutes, asked a few questions, understood it, and said "Do it." Since we had less than 1000 employees/jobs, we went with a 5 numeric digit job code.

A classic example of code corruption is the Mexican TaxID which is the equivalent of the USA Social Security Number. When I arrived in Mexico in 1983, the code consisted of 4 leading alpha characters plus 6 numeric digits. The 4 alpha were taken from your last name, mother's maiden name, and your first name. The 6 numeric were your date of birth as YYMMDD. When I first saw this code, I immediately remembered Tony's rule.

Well, sure as heck, in 1991 the government announced a "new" 13 character TaxID, where they had appended an additional 3 alphanumeric characters in order to solve the problem of collision in the numbers. The conversion of existing computer systems was chaotic. Some companies attempted to generate the extra 3 characters using algorithms of dubious origin. Actually, the government never published the algorithm because it really wasn't an algorithm perse. The logic was contained in one continuously changed program at the central bank. I happened to see this program, and was amazed at the amount of hard-coded exceptions that it contained especially to inhibit generating codes that contained "dirty" words.


To make matters worse, in 1999, they announced another code called "CURP: the unique code for population registration". This beauty did not substitute the TaxID, but had to be used in conjunction to the TaxID for payrolls, invoices, taxes, etc. This CURP code is 18 characters long, using the first 10 characters of the TaxID, and adds additional "meaningful" codes, such as sex, place of birth, the first consonant after the start of last, mother's maiden, and first name, one numeric digit (normally zero) to be modified for duplicate codes, and a check-digit whose algorithm is a goverment secret. If my dear friend Tony were to see this code, he would turn over in his grave.

*****