How to Write Bad Code

Hans Spiller

There are a lot of ways to write bad code, and I've done mostof them. But before we can really go about deciding what is badcode and what's not, we need to establish some metrics and goals.

Why would you want to write bad code? Well, there are lot ofreasons:

fun

The most important is that it's fun. In the same way asthrowing rocks at passing cars, and tearing the wings off of bugsare both a lot of fun, it's fun to really screw up a program.

Job security

if you're the only one that can work on a piece of code thatthey need, they can't fire you. The flip side of this is thatthey can't let you go work on anything else. Being able to reallyscrew up a piece of code and then ducking out to go screw upanother project too is a truly special talent that only a fewhave demonstrated.

Egoism.

Knowing that no one else can work on your code makes you feelsmarter.

What are the tools of our trade?

The most important tool is confusion. It helps to a degree ifwe are confused ourselves, but if we're sufficiently confused wemight not be able to get the piece of code to appear to work, andthen where would we be? Code that doesn't work at all doesn'tsatisfy any of the reasons for writing bad code--no one will wantit, so no one will be screwed up by it. So the key is to keep ourpersonal confusion to a safe level, and maliciously do things toconfuse the unfortunate who tries to read our work.

Most of the standard tactical maneuvers can be applied toachieve confusion: misdirection, obscuration, overwhelming,stealth, surprise, indirection, etc. If it works in football orwarfare, it can probably be used to write bad code.

The second tool is suffering. This includes a lot of things:waiting for compiles, unhelpful error messages, long hours, thecumulative effects of confusion--anything that makes working on apiece of code unpleasant. I generally refer to this as pain. Badcode is painful to work on.

Lets look at a few software engineering concepts and see howthey can make our code good or bad.

Locality.

Locality can be a very easy thing to measure. In order tofigure out how something works, how often do you have to searcharound? If you have to follow several pointers for every singleline, the bad code artist has done a good job.

Djikstra demonstrates one example of this in his "GotoConsidered Harmful" letter. Each time you have to follow agoto, you have to change contexts. Of course, in most cases it'sdifficult to do more than a single indirection per line usinggoto, so it's a comparatively weak method of writing bad code.Worse yet, there are a lot of cases where gotos can actuallyimprove readability, so goto is a fairly limited weapon.

We can expand the power of the goto by turning it into afunction call, and doing several tiny ones on a line. Accessorfunctions are an excellent way to achieve this, especiallyinherited accessor functions. Each function does a tiny thing,and then calls some other function, requiring the would bemaintainer to change contexts each time, not just elsewhere inthe same procedure, but in another source, or better yet, inanother source that's in another directory. We can use this toour advantage though, by hiding a side effect inside theaccessor. It's most effective if he's not sure if the accessorhe's looking at is the right one. Accessors, Inheritance andOverloading are among the best things yet invented for bad codewriters. Of course, accessors are a double edged sword. It'squite possible to write an accessor that does something helpful,like internal consistency checking, or making it easy to changethe layout of a structure, instead of just generating pointlessand irritating indirection. W hen most accessors do almostnothing, while others, apparently very similar, do somethingquite radical, the bad coder has achieved the goal.

Djikstra's complaint about goto is that it leads to spaghetticode. Of course, with a little work, spaghetti code can befollowed. Spaghetti data on the other hand, requires a muchlarger amount of work to follow. Spaghetti objects are clearlythe highest form so far devised.

Source Code Browsers can give the illusion of greater localitythan there actually is. This can work tremendously in favor ofthe bad coder. For example, many browsers don't work if theapplication failed to build. So by wiring in hidden dependencieson the specific directory structure of your machine, you can makelife sheer hell for the person who would try to work on yourcode. And you can say "I dunno, it compiled for me" andshow them how easy it is to find stuff with the browser.

Coupling/Encapsulation.

Coupling is a measure of how interrelated two separate thingsare. If there's a lot of coupling, then it's difficult to followone thing without a deep understanding of the other. If there'sminimal coupling, then you can understand the one thing with onlya minimal understanding of what the other one does. Of course, asbad code artists, we're trying to maximize coupling.

The object orientation people have done bad code artists ahuge favor here. What those misguided fools think they're doingis minimizing the amount that the played user of a component hasto understand about the workings of the component. One of theideas they use, they call "Hiding". Wow, this soundsjust like a bad code technique! By making it difficult to getinto the components of an "Object" they think they'repromoting neatly encapsulated packages. Wrong! They've right intoour hands. We can use those very mechanisms to hide our littlepitfalls. And by forcing a complicated interface to fit anirrelevant language structure, we can convolute it dramatically!

If the object oriented people really wanted to promoteencapsulation, they would have stressed encapsulation more, andnever even used the word "hiding". Hiding is a verypowerful bad code technique, because it makes things difficult tofollow. Encapsulation is just the opposite--it makes itunnecessary to follow. You can't force encapsulation--you have tocome up with an interface that's easy to describe--anathema to abad coder. You can easily enforce hiding of course, and thatplays perfectly into the hands of bad coders.

Here's a simple bad code technique based on coupling that canbe used in any language to really screw things up: It isfrequently useful to build a table which is indexed with anenumeration. Now I hear you say that this sounds like a good codetechnique. To make it a bad code technique, you merely need toseparate the definition of the enumeration and the definition ofthe table itself. With most languages, this is easily done--infact, with many languages, it takes work to do anything else.Now, for the Coup de Grace: Put some complicated conditionalcompilation in the table, so that not all builds use all of theelements--and make sure the enumeration has all the sameconditionals. Putting all the variation at the end is cheating,and so is using a tool to generate it (unless you're abusing atool...see below). For truly bad code, you have to be able to mixand match from the middle.

Redundancy

Duplication of code is a tremendously effective way to setpitfalls. For example, if you have a whole bunch of code thatdoes very similar things, possibly with very minor variations,it's important to duplicate it if you want to write bad code. Theeasiest thing to see that this does is to bloat code, which isobviously the sort of badness we're looking for. But there's amuch more insidious thing. Suppose that the would-be understanderof your bad code needs to add something to this duplicatedsequence, such as a bug fix. He has to find each instance andseparately add the change to each one. Similarly, there's achance some subtle difference will sneak in to some of the cases,and will get by testing. Serendipitous bad code! This of courseworks best if the similar sequences look quite different, e.g.with different comments or variable names, but are in fact verysimilar functionally.

Djikstra's "Goto Considered Harmful" letter is oneof the bad code artist's best friends here. A lot of people whothink they're writing good code by eliminating gotos, are in factsetting just the kind of pitfalls we're talking about. One kindof redundancy that's often used to eliminate gotos is to add aflag. Here we have an extra piece of data for which the solepurpose is to obfuscate a simple piece of control flow! After 25years, I think it's fair to say that Djikstra's letter has donefar more to promote bad code through duplication and spuriousflags than the "spaghetti" code it was intended toprevent. Hooray for Djikstra! Of course, there is nothingpreventing bad code artists from doing both.

Obscurity

It's quite possible to write a bad piece of bad code which isconcise, self contained, and non-redundant. It generally is quitedifficult though, and to achieve it is a sort of Nivana for thebad coder. It is in the area of obscurity that all of the trulybad coders rise to their supreme creative achievements. The mosteffective way is through unnecessarily complicated algorithms. Anexample would be using a quick sort or shell sort (uncommented ofcourse) when there can never be more than a dozen or so items tobe sorted. A good coder would have used a bubble sort. It'ssimpler, and for the short list, just as fast. Another example isusing a complicated data structure, especially one that requiresa lot of finicky special case code, to handle something that caneasily be traversed linearly, and which is only used in a nontime critical portion, such as user interaction.

One of my favorite comments is "Abandon hope all ye whoenter here" heading a particularly obscure passage in afamous operating system. Many of the worst pieces of obscurity,such as the one this comment heads, are done in the name ofperformance. When this is done with full knowledge of the realperformance of the program (either late in the development cycle,or by someone with a deep understanding of the behavior of thesystem), and in ways that leave the code portable andmaintainable, this is good code. That's a lot of"with"s. Leave any of them out, and you've got badcode. Bit twiddlers are often the baddest of the bad coders, andthey frequently take a justifiable pride in this.

One of the most effective ways to obscure things is to hidethem in constructors and destructors. Many languages haveconstructs which have hidden or obscure side effects, (e.g.garbage collection, heap allocation, implicit indirection, massallocation or copying) but the Object Oriented people haveprovided the most effective tool yet. They've also generated abunch of new terminology and syntax for some old ideas, whichalways contributes mightily in our great goal.

Inheritance

Inheritance is a wonderful weapon that the object orientedpeople have added to the bad coder's arsenal. It comes with arisk, because it's possible to use it to add symmetry andsimplification to a program. Almost nobody does, though. Ot theseveral thousand objects that the author has examined, exactlyfour of them used inheritance to any real improvement inreadability or performance. Most of them didn't use it at all,because it's pretty hard to understand. Virtually all attempts touse inheritance succeeded only in making the program harder tounderstand. Clearly a natural for bad coders!

Inheritance is a mechanism by which components can beselectively reused verbatim by another component of a program.This can be used to achieve incredible levels of nonlocality,obscuration, and misdirection if suitably applied. The mosteffective use the author has seen is in a fairly simple (approx5000 lines of code) program which does some would-be simpleanalysis, using a well understood technique which has beenuniversal since the computer industry first began. (The previousimplementations of the algorithm the author has seen are in thegeneral range of 2000 LOC, including one that was entirely inassembly, b.t.w.) This program used identically named functionswhich were in fact not related in any way, while frequentlycalling inherited functions across the same interface, to achievea remarkable level of confusion. The programmer also foundextensive opportunities to duplicate large passages of nearlyidentical code in several components. So you can see, inheritanceis a wonderful opportunity for bringing out our other techniques.

Comments

Comments are one of the best ways to obfuscate and misdirect.At the very least, they can be used to overwhelm. Here are acouple of examples:

	les	bx, [pFoo]	; move pointer to a foo into es:bx

Note that the comment provides NO information at all. If wecan read the code, we don't need the comment. Even without usingHungarian, we'd need to be pretty thick to not be able to figureout that something being loaded into es:bx was a pointer. (forthose not fluent in the 80x86, es:bx is a register pair which can*only* be used as a pointer, it gives a protection fault if youload it with something that is not). All the comment does is takeup space on the disk, and more importantly, take up time of thewould-be understander.

/************ *IT FOOTYPE::ItOfThat(THAT * pThat) * Purpose: *	find the IT of a THAT * Parameters: *	pThat:	A pointer to a THAT * Returns: *	an IT * Notes: *****************************/IT FOOTYPE::ItOfThat(THAT * pThat) {

This is even better. In a sense, it's triple redundant.Hungarian, the comment, and the declaration all say exactly thesame thing, while providing no useful information aboutinteractions, algorithms, side effects, and so forth. If it usesthe this pointer in an obscure way, that just adds to the fun. Bewary: if the input and output are registers (i.e., if this isabout assembly language), this is useful information, and as badcoders, we wouldn't want that. (Incidently, this example, withthe names changed, is taken from actual code from an actualprogram which is being marketed commercialy by a major softwaredeveloper.) A lot of bad coders blindly put such a header on allprocedures, to pad their Lines Of Code statistics.

Perhaps the single most important thing that can be done withcomments is misdirection, and sometimes it happens entirely byaccident by people who have no intention of writing bad code. Thesituation usually comes about like this: Coder A writes a trickybit of code, and carefully comments the algorithm used. Coder B(perhaps even the same person as coder A, but some time later)comes along and makes a big change to the algorithm, or evenmoves it entirely into a different place, leaving the commentbehind.

Coder C, trying to make some subsequent change, reads thecomment, and then spends a bunch of time reading the code andtrying to match it up with the comments. Note that had Coder Aactually been a malicious bad coder, he could have achieved thesame effect by writing some obscure but irrelevant (but relevantsounding) comment instead of waiting for the transformation tooccur by accident. Doug Klunder says that he'd rather have nocomment at all than a misleading one. Obviously an enemy of badcoding.

Logistics

More pain can be caused by logistics than through any otherelement of code design. By logistics, I mean the actual mechanisminvolved in putting a change into a piece of executing code. Thiscan involve editors, compilers, linkers, downloaders, debuggers,automatic build tools (e.g. make) and any other tool you mightthink of to involve. To a certain extent, the more toolsinvolved, the more painful it is to work on a piece of code. Butsome tools actually do help. If the objective is to write badcode, the help they give should be minimized and undermined atevery turn.

The makefile is one of the most effective. For example, usingthe same dependency list for every object file, whether it uses aparticular include file or structure or not. will cause you torecompile everything, whether what you changed effected it ornot. The people who use a "master include file" whichincludes everything all the time have added their support to thislittle element of bad coding. The monolithic precompiled headeris of course exactly the same thing. Notice that this producesmuch of the effect of coupling without the bad code artistactually having to write any coupled code.

Object Oriented Languages, particularly C++, contributegreatly to this effect. In order to add a private method (ormember) to a class, you need to modify the include file. In orderto use any method of the class, you need to include the includefile. So even with a well written makefile, you need to recompilea lot of stuff when all you've actually changed is completelyencapsulated! For the bad code writer, this is a big improvementover the predecessor language C, where if you write a privatefunction, nothing needs to be recompiled except the module thatcontains the private function. There are efforts afoot to buildcompilers which are able to figure out the true dependencies andavoid this spurious overhead. The bad coders of the world mustunite to put down this sacrilege!

(a non sarcastic aside: the single best feature of C++ (andapart from inlining and // comments, perhaps the only goodfeature of C++) is that because all procedures are defined withCLASSNAME::ProcName it's very easy to find the definition with atext editor. By including the colons in the search, you only geta few of uses of the function)

One of my personal favorites is the spurious use of toolsadvertised as labor saving. One of the best examples is in thearea of writing compiler front ends. For the last dozen or soyears, one of the popular bad coding fads is to use so-calledcompiler compilers. These tools mostly use a technology calledLALR(1) (see the section on obscurity) and when given a languagedescription with imbedded actions written in a host language,produces a parser for the language with the actions invoked atthe appropriate place. This sounds pretty simple, and many ofthose cretins who think that the goal is to write good codejumped on the bandwagon. Of course, none of the academics whoinvented LALR had ever written a compiler for a language withmore than a few dozen productions in it, or for a user communitythat they couldn't count on the fingers of one hand. When peoplestarted using it to write compilers for real programmers to use,they found that the underlying tables grew exponentially. Butthen some bad c oders noticed that they were sparse and theyadded a compression algorithm that made it run in exponentialtime, instead of exponential space. Then some other bad codersdiscovered that the error messages were terrible, so instead ofleaving bad enough alone, the added some extremely obscure codeto make error messages that were negligibly better. Then theydiscovered that the output file (a machine generated programwhich implements the parser described) had become too big tocompile, and they split it up. Now any good coder would know thatto minimize coupling, you'd split things by function: the syntaxand semantics for a given construct together, and the syntax andsemantics for a different construct in a different place. Butbecause they were using a tool, the bad coders cut the syntaxfrom the semantics, almost perfectly perpendicular to what a goodcoder would have done, and maximized the coupling across thedividing line. A masterpiece of bad code, and many of the mostpopular compilers used to day do it! Unmaintainable, slow,terrible error handling, complicated logistics, obscure....LALR'sgot it all! Perhaps the best part is that using the traditionalrecursive descent technology, parsers are simple, fast, handleerrors well, easy to write and maintain, and are a pretty smallpart of the typical compiler. LALR not only was done by badcoders, it can make bad coders out of good ones! These guys arestars of the bad code art!

Portability

Portability can describe a number of things. If a source treecan be used to build the same program on a number of differentmachines, but using the same processor and operating system, onelevel of portabilty has been achieved. A higher level is to beable to vary the operating system or the processor. There aremany aspects to this: byte and word order, the machines nativeword size, arcane memory structures (such as segmentation) andfinally, operating system incompatibilities.

Byte and word order and size are pretty hard to make muchtrouble with. Most of the fun is to be had in file formats, whichis discussed in a separate chapter. But there still some tricksto be played. For example, using the same data object for severaldifferent things...sometines as an array of bytes, sometimes asan array of larger integers (or even as floating point!) andchanging the current representation without changing data. Thisis a wonderful trick for the bad coder, because not only does itmake the code precarious and unportable, but it also screws upthe compiler's optimizations because it has a property known as"aliasing".

Another trick to screw up portabilty is amazingly common. Thelangauge C has two logical sorts of integers: ones which have aspecific size (signed and unsigned char, short and long), andones which are based on the word size of the machine (int andunsigned int). The language guarantees that int will be at leastas long as a short, but no more. So the bad coder working on a 32bit platform should naturally use "int" for all thatstuff that needs a 32 bits to work right, and "long"for all those things which tend to be limited by the machine,such as array indicies and other sorts of counters. This is justthe opposite of what a good coder would do.

The higher level things described, such as memory architectureand operating system compatibilities, are much more powerful,because they can affect the program at a much higher (andtherefor more abstract) level. Design something into thearchitecture of the program which is fundimentally based on someoperating system structure which is present in no other operatingsystem. If you're on a segmented machine, build the availabilityof cheap movement of segment sized chunks of memory into yourdesign. When you get to a machine with a flat memory model, yourprogram will run slooooowly.

File Formats

The worst damage that can be done to portability is in thearea of file formats. Dirty tricks in file formats have so muchpotential that they deserve a section on their own.

All of the platform dependant stuff is a natural. You canbuild a dependency on a particular platform's byte order, andthen when you try to read it in on a platform with a differentbyte order, wierd stuff happens. Packing issues are effectivetoo. Most machines need to align data words on machine wordboundaries. But these often vary: 16 bit machines need 16 bitalignment, and 32 bit machines need 32 bit alignment. Some evenrequire more, and a few require none at all. So defining aparticular packing into a file, we can make it tricky to port.

Of course, by the simple expedient of file accessors, the fileformat can be isolated in such a way that a good coder canundermine our hard work. But remember that accessor functions canbe a double edged sword. Build up a towering, precarious, andmost of all, obscure dependency structure among these accessorsand you're right back into bad code. There are a number of waysto achieve this, such as complicated interrelations between partsof the file that are not adjacent in any apparent way (see thesection on Locality--the most effective apprach for this is toactually put the pieces in different physical files). But themost effective approach is versioning.

Versioning is simply the necessary fact that file formatschange a little bit (and sometimes a lot) from version to versionof the program. The new version is generally required to read theold versions, so you can bury lots of magic in the code torecognize these distinctions. A good coder will make a carefuldecision and implement low level version issues in the low levelaccessors, and high level isssues at the higher level. So a badcoder should do just the opposite. Generally it's difficult (butby no means impossible) to get low level stuff to a very highlevel, but pushing high level issues towards the bottom is reallypowerful, and it generally results in tremendous duplication ofsimilar, but obscurely different pieces of code.

There are other file format issues that bear on bad code too.For example, you can make a format which is extremely wasteful.for example, the famous WINMAIL.DAT format encodes 256 colorbitmaps and 18 color icons with 16 bits per pixel. 8 bits perpixel would have been ample. You can also simply be redundant.The popular DBF format maintains the length of each field atleast twice, as well as keeping a separate position within therecord. Since the records are all consecutive, the accessorfunction could have simply added up the lengths and figured outwhere the fields were, This requires a database which wants tochange the sizes of a field to update all of these piecesseparately. Excellent bad code.

Of course, the darnned good coders keep finding ways tominimize this stuff. For example, they've been puttingcompression schemes into the file system, so even though thesewasteful formats are still pointlessly difficult to manipulate,they don't actually waste much of the disk. But the bad codersstill have a way to fight back. The compression schemes rely onhighly regular patterns in the data to achieve their ends. Bydoing something to obscure these regularities, we can undermineit. The best way known to man is called "DataEncryption". The people designing data encryption schemesare trying to do precisely what we want, which is to make datadifficult to read by the uninitiated. Of course all you need tobecome initiated is a password, but the file system compressionscheme certainly doesn't have that. Encryption changes theregularities in a file and makes them appear like random noise.The more effective the encryption scheme, the more random itappears.

One of my personal favorite tricks with file formats is to getso good at reading the bits that the original coder just looks atthem with a hex dumper. Of course the next guy along is going toneed a tool to make heads or tails of it. Gotcha!

There is one file format that bad coders need to be warnedagainst, and that is human readable formats. Since text files arerepresented in a way which is extremely easy to move from oneplatform to another, it undermines our goal of being nonportable.Since the "language" described by a human readable fileformat is almost always extensible in obvious, regular, easilyunderstood ways, it undermines our versioning tricks. As long asit's not encrypted, human readable files are easily compressed,often quite a lot. Of course the need for a special dumper iszero. Uncompressed, they are a relatively inefficient way tostore stuff, but unless the designer of the text format is verybad (one of us bad coders!), it's rarely worse than a factor oftwo or three.

The one place that the bad coder can screw up a human readableformat is in the reading and writing of it. Since it's built upof more abstract pieces, a parser is required. A simple recursivedescent parser is very easy to write and with a well writtenlexical phase can be extremely fast, so even here, bad codershave their work cut out for them. But by the simple expedient ofwriting it badly, the simplest parser can be made an obscure knotof indecipherable gobletygook, and the lexer can be cripplinglyslow. Of course, it takes some skill to do these simple thingsbadly enough to achieve our ends while getting them to work atall, but it's been achieved many times.