I recently joined in a very interesting discussion with some of my peers on the thorny subject of naming variables in programs. The core question was whether it’s OK to give a variable a temporary name while you’re still working out what it’s for, or whether you should pause and think out what it’s called before you move on.
The question raises other questions, and those are much more interesting to consider. For example, there’s an aphorism in computing that naming things is one of the hardest problems we have. That isn’t true. We’ve been naming things for 60,000-100,000 years, and writing down names for things for 5,000 years.
If you know about the thing you’re naming, and you know what the name should convey, then naming things is easy. For example, this blog post is on the topic of naming things, and communicating the topic is an important part of the title, so calling the post “On naming things” was very easy. Then, because I’m a comedian, I decided to go back and use a different name.
If naming something is hard, either we don’t know something about it, or we don’t know something about communicating that knowledge. The second of those is usually much simpler than the first, when it comes to variable names. The variable name is only used by other programmers: either inspecting the program text, or understanding dynamic behaviour in a debugger. The variable represents a snapshot of a part of the program state, and its name should communicate how that snapshot contributes to the valuable computation the program encapsulates.
It’s therefore likely that when we struggle to name a variable, it’s not because we haven’t identified the audience. It’s because we haven’t identified what the variable is, or what it’s for.
In most software design methodologies, we derive the existence of variables from aspects of the design. In the incremental refinement approaches described by people like Tony Hoare and Niklaus Wirth, as we refine a specification we identify the invariants that hold at each level, and the variables we need to preserve those invariants.
In DeMarco’s structured analysis, we design our systems by mapping the data flow: our variables hold those data and enable their transformation.
In Object-Oriented analysis and design, we design objects that take on particular roles in an interaction that models the domain problem. Our variables represent the responsibilities and collaborators known to each object.
In Test-Driven Development, we identify a desirable change in the system’s behaviour, and then enact that change. Our variables represent contributions to that behaviour.
It’s likely that when we can’t name a variable, it’s because we haven’t designed enough to justify introducing the variable yet.
As a specific example, if we’re thinking about an algorithm in a process-centric manner, we might have a detailed view of the first few steps, and decide to write a variable that stores the outcome of those steps and is used as input in the subsequent steps. In such a case, the variable doesn’t represent anything in the solution model, and is going to be hard to name. It represents “where I got to in the design before I started typing”, which isn’t a useful variable name. The solution in this case is neither to come up with a good name, nor to drop in a temporary name and move on. The solution is to remove the variable, and go back to designing the rest of the algorithm.
Naming is the process of creating semantic associations in the developer’s mind. This is not too hard, we do it all the time.
Next these associations need to be mapped to a character sequence. This is the hard bit.
Developers want short names (typing those characters is treated as being, oh so much work). The semantic associations need to be ordered sequentially (e.g., num_widget, widget_num, plus plurals). Then some degree of consistency across names having similar associations, uses in the code, application semantics, etc.
Probably more than you wanted to know about naming here: http://www.coding-guidelines.com/cbook/sent792.pdf