Log in

No account? Create an account
color cycle (slow)

Kistaro Windrider, Reptillian Situation Assessor

Unfortunately, I Really Am That Nerdy

Previous Entry Share Next Entry
Self-misdocumenting code (a 4:47 AM rant, so you get what you get)
The biggest argument for "self-documenting code" I've heard tends to run that "all documentation becomes stale": not only are project changes not represented in the specifications, code is changed out from under comments that refer to that code, and then the comments themselves are wrong. An incorrect comment is, of course, worse than no comment at all, since a future programmer will read it, assume it is correct, and fail to audit the code the comment refers to, drawing dangerously incorrect assumptions about the code. Better to simply code clearly, and avoid relying on comments to make your code make sense.

After two years mired in a pile of crap that was obviously intended to be self-documenting and just as obviously has failed in doing so, I think I have some fairly strong arguments against this philosophy. Self-documenting code is no less prone to documentation decay than any other form, but it is even harder to sort out what the code does than does bad documentation. I assert that inaccurate documentation is as easy to determine as decayed code.

Lack of architecture documentation leads to a mess of an architecture. An object that was supposed to represent connection settings for one machine became God- it's part static, part instance, with some singleton-related behavior invoked when particular instance methods are called. 93% of the 180+ codefiles call directly into it, and every variable or interface it exposes is consumed somewhere, if only in one place. It no longer has any coherent function; I can identify about three or four classes it should be split into, but it's going to be a hell of a task to disentangle them. And disentangle them I must, because two of those theoretical classes need a completely different implementation strategy, for reasons mostly related to what happened to the Windows security model. Some of these decisions were justifiable when the project was designed ten years ago, but they should have been refactored out when they became dangerously wrong.

Would documentation have helped this? Possibly. If the class had a defined scope, than maybe the cross-machine driver installation and connection functions would have been identified as not within it, and this wouldn't be nearly so nightmarish to untangle. If there was real API documentation, and there was any intent to maintain it, maybe implementors wouldn't have been so cavalier about adding new functionality exposing data that really needed to remain private to the object for a variety of consistency reasons.

But that's an assertion that documentation itself would have reined in the project; I have little hope for that. But if documentation had at least been in the code, maybe it wouldn't have taken me several months to trace- the intent of some of these truly mysterously-named classes would have made a little bit of sense. And at least when documentation is wrong, it's usually wrong about something immediately below it; when the name of a function is wrong, I have to trace into the function to discover that it is horrifyingly wrong.

My best example: In a class that can be described as a Collection of Collection of Collection of DataTypeNamedPairEvenThoughItContainsNineProperties, there is an instance method named GetVarPairGroupCollection. One would infer that this is an accessor that returns part of the contents of the object.

Despite this function returning a VarPairGroupCollection, and the one you would expect, it was also an initialization function that called a context-sensitive mutator on every object enclosed by the class.

I, for one, do not think a function named "Get" should be an O(n2) operation that alters all the data within the class, for which the program will not work if it is not called at a specific time, and which can corrupt program state if it is called during another operation that can occur on a parallel thread and only hasn't because of the grace of the scheduler. (And, to be fair, because the scheduler would have to make some truly ludicrous decisions for it to ever show up.)

At least incorrect comments are honest. They are a red flag that the code is untrustworthy. The same is true for incorrect function names- the documentation in self-documenting code- but they are much, much harder to find, and even understanding the variable names in the first place requires some significant degree of the programmer's mindset, which can be impossible to attain without some other documentation to help.

  • 1
I have just one word for this:


I, however, will be less succinct.

The documentation problems you're having sound like death by iteration. My mental narrative runs something like this:*

1) Project X started out simple enough. Specs were drafted, meetings were called. Heads nodded, and the star programmer bolted from the room to write a brief skeleton design. This skeleton would soon become the first official revision.

Perhaps a few releases would ship against this functionality. After all, X is just so simple!

2) As time progressed, new requirements and assumptions crept in that eventually broke X. Phones rang, orders hurried out. Peering into the code for the first time in years,** it became clear that X needed to be refactored.

3) Instead of slowing the process down and updating each spec, someone caved and decided the changes would also be trivial. The next coder was then tasked with grafting in the required functionality.

4) Repeat 2) and 3) n times. For each run, increase complexity by one order of magnitude.

... thus leading us to today.

There are many red flags I could address, especially the lack of API documentation, propagation of broken windows, and fact comments should describe why, not how.

But at the end of the day, these are symptoms of iteration and deferral. There should be constant dialog to hammer out a functional spec, translate it to technical requirements, then blast the offending code. On and on until the documents and code actually agree with one another.

And it needs to be done as soon as possible.

If this is in any way related to previous discussion, my sympathies.


* It was tempting to write a narrative involving greek gods and stone tablets. I think I need more caffeine. :p
** I'd substitute months, but that seems overly optimistic.


Moving off the high level rant and back on topic, self-documenting code is still backwards. I'd argue that we want to derive the code from the documentation and that repetition only worsens the decay.

But with that said, I'd like to separate self-documenting code from the idea of being self-documenting. It's great to have a snapshot of current discussion and assumptions in time, be they in VCS, email, or otherwise. It's just not information you want living in the code or doc trees.

As for "getters", I agree that an accessor should always be O(1) and never alter object state. So infrequently is this the case, though.

Anyway, best of luck.

You almost convinced me for real to not work for any big programmer company. o'===='O;


Whenever I have a getter which does not simply return a variable, I name it something like queryXYZ or calcXYZ to indicate that if the result is used multiple times, it must be buffered.

Same strategy here. When I get a chance to rewrite that portion of the code (sadly, there are actually other higher priorities for repair, that's not the worst of it), I'll be splitting it into two functions- InitializeModeSettings and AggregateCollection. (Well, approximately; variable names will be more appropriate to the code itself.) I don't like any form of get to be anything other than O(1).

Reads like one of the better recommendation against self-documenting code I've read.

Typically, self-documenting is little different from undocumented. Of course, classes and methods should preferably be understandable from the names. For instance: any getter should have no observable side effects (initializing a field the first time it is called and then returning the value of that field on subsequent calls doesn't count providing no other part of the class uses that field) and should be fast (in some cases, returning a clone of an object is required so o(n) might be needed. If there is any reason to believe that o(n) will actually be big, a 'copy'-suffix will do the job, but I digress). If there is anything else that could be interesting to know about the function, it should be put in documentation. You don't want to call your method getQueryResultSetProvidedThatTheConnectionStringWasSetIfNotThrowAnIllegalStateException. Again, of course, you should try to avoid creating such methods in the first place, but sometimes maybe you want to. Then you need to document it. Someone who fanatically adheres to self-documenting code will not do so. So we have a method which does something that it is impossible to determine without reading/invoking the code. This might work except for that the function is documented entirely by its implementation, it can't be changed in any (observable) manner. If it does something that doesn't make sense at all, it still can't be changed because it is impossible to determine if it is intended behaviour or not, apart from hunting down the people who have worked on it before and getting either "I don't remember.. it shouldn't do something.. you could try to change it and see if it breaks something" or "yeah... I think there was some reason we did it like that. Don't remember why or if it's still a problem".

Documentation serves as a golden guideline to determine what is being done. As I said, ideally, not much should be needed. But sometimes it does and when it's there documentation should be assumed to never be wrong. If a method doesn't do what the documentation states, unless there is an honest mistake in the documentation, the implementation should be changed. If someone uses a block of code as it's implemented and not as it is documented, they are coding against it wrong.

Anyway, I'm ranting on and I think I have said my main points now.

  • 1