After two years mired in a pile of crap that was obviously intended to be self-documenting and just as obviously has failed in doing so, I think I have some fairly strong arguments against this philosophy. Self-documenting code is no less prone to documentation decay than any other form, but it is even harder to sort out what the code does than does bad documentation. I assert that inaccurate documentation is as easy to determine as decayed code.
Lack of architecture documentation leads to a mess of an architecture. An object that was supposed to represent connection settings for one machine became God- it's part static, part instance, with some singleton-related behavior invoked when particular instance methods are called. 93% of the 180+ codefiles call directly into it, and every variable or interface it exposes is consumed somewhere, if only in one place. It no longer has any coherent function; I can identify about three or four classes it should be split into, but it's going to be a hell of a task to disentangle them. And disentangle them I must, because two of those theoretical classes need a completely different implementation strategy, for reasons mostly related to what happened to the Windows security model. Some of these decisions were justifiable when the project was designed ten years ago, but they should have been refactored out when they became dangerously wrong.
Would documentation have helped this? Possibly. If the class had a defined scope, than maybe the cross-machine driver installation and connection functions would have been identified as not within it, and this wouldn't be nearly so nightmarish to untangle. If there was real API documentation, and there was any intent to maintain it, maybe implementors wouldn't have been so cavalier about adding new functionality exposing data that really needed to remain private to the object for a variety of consistency reasons.
But that's an assertion that documentation itself would have reined in the project; I have little hope for that. But if documentation had at least been in the code, maybe it wouldn't have taken me several months to trace- the intent of some of these truly mysterously-named classes would have made a little bit of sense. And at least when documentation is wrong, it's usually wrong about something immediately below it; when the name of a function is wrong, I have to trace into the function to discover that it is horrifyingly wrong.
My best example: In a class that can be described as a Collection of Collection of Collection of DataTypeNamedPairEvenThoughItContainsNin
Despite this function returning a VarPairGroupCollection, and the one you would expect, it was also an initialization function that called a context-sensitive mutator on every object enclosed by the class.
I, for one, do not think a function named "Get" should be an O(n2) operation that alters all the data within the class, for which the program will not work if it is not called at a specific time, and which can corrupt program state if it is called during another operation that can occur on a parallel thread and only hasn't because of the grace of the scheduler. (And, to be fair, because the scheduler would have to make some truly ludicrous decisions for it to ever show up.)
At least incorrect comments are honest. They are a red flag that the code is untrustworthy. The same is true for incorrect function names- the documentation in self-documenting code- but they are much, much harder to find, and even understanding the variable names in the first place requires some significant degree of the programmer's mindset, which can be impossible to attain without some other documentation to help.
sleepy
2009-12-27 04:15 pm (UTC)
Ow.
2009-12-27 08:46 pm (UTC)
The documentation problems you're having sound like death by iteration. My mental narrative runs something like this:*
1) Project X started out simple enough. Specs were drafted, meetings were called. Heads nodded, and the star programmer bolted from the room to write a brief skeleton design. This skeleton would soon become the first official revision.
Perhaps a few releases would ship against this functionality. After all, X is just so simple!
2) As time progressed, new requirements and assumptions crept in that eventually broke X. Phones rang, orders hurried out. Peering into the code for the first time in years,** it became clear that X needed to be refactored.
3) Instead of slowing the process down and updating each spec, someone caved and decided the changes would also be trivial. The next coder was then tasked with grafting in the required functionality.
4) Repeat 2) and 3) n times. For each run, increase complexity by one order of magnitude.
... thus leading us to today.
There are many red flags I could address, especially the lack of API documentation, propagation of broken windows, and fact comments should describe why, not how.
But at the end of the day, these are symptoms of iteration and deferral. There should be constant dialog to hammer out a functional spec, translate it to technical requirements, then blast the offending code. On and on until the documents and code actually agree with one another.
And it needs to be done as soon as possible.
If this is in any way related to previous discussion, my sympathies.
---
* It was tempting to write a narrative involving greek gods and stone tablets. I think I need more caffeine. :p
** I'd substitute months, but that seems overly optimistic.
2009-12-27 10:59 pm (UTC)
2009-12-28 04:44 am (UTC)
Moving off the high level rant and back on topic, self-documenting code is still backwards. I'd argue that we want to derive the code from the documentation and that repetition only worsens the decay.
But with that said, I'd like to separate self-documenting code from the idea of being self-documenting. It's great to have a snapshot of current discussion and assumptions in time, be they in VCS, email, or otherwise. It's just not information you want living in the code or doc trees.
As for "getters", I agree that an accessor should always be O(1) and never alter object state. So infrequently is this the case, though.
Anyway, best of luck.
2009-12-27 09:20 pm (UTC)
*squeaks*
Getters
2009-12-27 09:24 pm (UTC)
Re: Getters
2009-12-27 10:16 pm (UTC)
2009-12-28 01:13 am (UTC)
Typically, self-documenting is little different from undocumented. Of course, classes and methods should preferably be understandable from the names. For instance: any getter should have no observable side effects (initializing a field the first time it is called and then returning the value of that field on subsequent calls doesn't count providing no other part of the class uses that field) and should be fast (in some cases, returning a clone of an object is required so o(n) might be needed. If there is any reason to believe that o(n) will actually be big, a 'copy'-suffix will do the job, but I digress). If there is anything else that could be interesting to know about the function, it should be put in documentation. You don't want to call your method getQueryResultSetProvidedThatTheConnecti
Documentation serves as a golden guideline to determine what is being done. As I said, ideally, not much should be needed. But sometimes it does and when it's there documentation should be assumed to never be wrong. If a method doesn't do what the documentation states, unless there is an honest mistake in the documentation, the implementation should be changed. If someone uses a block of code as it's implemented and not as it is documented, they are coding against it wrong.
Anyway, I'm ranting on and I think I have said my main points now.