Naming things is often complained about - usually glibly - as one of the “hard” problems of software engineering. I know I certainly encounter this problem most days coding, if only with simple variable and function names. But the questions that arise from these small issues can reveal larger uncertainties about the entire system.
“How can we encode the purpose of this
path
parameter in its name? Or put another way, how can we communicate to future consumers of our function what is going to happen to the path?But wait… is this a file or can it be a directory? Or can it in fact be a URI? Would this name change alone allow this tool to process web as well as file system resources?”
More philosophically (if you will indulge me once again), I think naming is hard because it can force us to reckon with one of the fundamental activities of software engineering: mapping the real world to a model. When we struggle to decide between two names, it’s like an aliasing issue: the full concept is too complex to label, so we must simplify, and in doing so some resolution is lost. But choosing one simplification over another can be awkward - which concepts are more important to retain in the label? It’s hard to write a heuristic for this because ideas and concepts are hard to quantify and compare.
It can feel frustrating to have to pause to come up with a complex variable or function name. When writing code this can feel like a blockage in the creative process. But if the naming is genuinely a problem, then it is almost always an indication of a larger issue. Perhaps a function that should actually be two. Or state that should not exist. Or even some unrecognised or unresolved conflict in our system design.
This is not an novel idea, but as I’ve become more experienced, I find it really noticeable. And indeed I find I’m quicker now at choosing good names, and conversely identifying when a naming issue points to some larger, underlying problem; and there are some rules of thumb I’ve found that help.
1 / Stitches in time and naming spades
The most important rule is to try and name things what they are, when it is possible to do so. It sounds obvious, but it takes some practice to get more fluent at doing this, and recognising where names are deficient and can be improved in some simple way.
# Why "dir" when "directory" is clearer and more universal?
test_dir = Path("test-directory")
# What is a file? Is a path to a file a file, or a string?
test_file = test_dir / "test-file.txt"
test_directory_path = Path("test-directory")
test_file_path = test_directory_path / "test-file.txt"
# Avoid the ambiguity between paths and file "handles"
with test_file_path.open() as test_file:
test_file.write(...)
Many naming issues are not hard, and indeed the correct name can be an arbitrarily simple choice. Choosing an unequivocal, unabbreviated, culturally neutral name over an ambiguous, abbreviated, shiboleth or culturally specific name may be just a case of typing some extra words. It’s frustrating that sometimes this is the excuse given for poor naming: the number of characters to type.
I’m a big fan of expanding out abbreviations - and boy do you get push-back on this one. I will admit that some abbreviations are so universal that they can pass for words and have no need for expansion. But it can be surprising how “universal” these are, and where arguments can arise over what language is common-use and what is jargon. Try asking whether the abbreviations the same in other languages. And consider different levels of expertise - readers of the code with more and less experience in the domain. The old maxim of explicit over implicit seems safer to my mind.
2 / Identifying interfaces
It’s probably needless to say that the higher the name is in the hierarchy of concepts, the more important it is to try and get it right. A variable can be easily renamed, a function slightly less so. A module is awkward to rename, and system as a whole can be near impossible to rename without far-reaching consequences. Identifying interfaces is an important part of recognising this, and therefore attaching the appropriate level of attention to names.
I think parochialism is a really common problem here, whereby an engineer, knee-deep in a specific problem, doesn’t recognise that they are missing useful generalisations. For example, I was working recently (under pressure, in my defence!) on a server that runs a specific AI on an provided input. Slowly but surely language from that specific AI started to leak into naming in the service; before an aptly-timed code review brought to light that this was an over-specialisation. It was the name of a lowly configuration parameter that prompted the moment of clarity for the system as a whole.
Perhaps worse, a lack of pause can lead to what one might term “namespace overconsumption”. That is, when a label takes up a whole category of which it really only represents part. A ConfigurationSetting
struct may be better represented by something more specific, e.g. UserConfigurationSetting
, leaving space for other future specialisations DefaultConfigurationSetting
, LocalConfigurationSetting
, CustomConfigurationSetting
. But once our configuration is serialised to a file, we now have difficulty renaming.
Explicitly separated abstract interfaces can really help with this. Defining the interface for a class, for example, in a header or interface definition, may give one pause for thought when adding a new function. It also draws the immediate attention for someone reviewing the code.
Don’t forget that interfaces can take many forms. We wouldn’t hesitate to consider carefully (or call in a domain expert) when adding text to a end-user-inteface. But when adding a field to a database? What is a database schema if not the primary interface of the database? I worked on a project where for technical reasons a database interface was separated into a different repository from the rest of the database code. This meant that any changes to the schema was subject to much greater scrutiny than other changes, and we had a much cleaner data model as a result. As the database was the primary store of domain-centric data, this also gave our team real pause to check we understood the concepts being stored and that the schema was as future-proof as possible.
3 / Consistency is king
Another useful principle is to use the same name for a concept whenever it occurs. For example, when passing a variable into a function and then unmodified into a child function, there’s a good chance the name should be the same. If not, then your brain has to do an implicit translation as you read the child function.
This consistency makes reading and searching the code easier, and by following the simple rules aforementioned, consistency in naming becomes second nature, and you find that the same concepts get the same names unintentionally.
Furthermore, by separating interfaces out, we can ensure that the naming and language is consistent across the interface. By using tools like swagger and interface definition languages for our APIs, we stand a much better chance of maintaining naming rules across the API.
Thanks for reading! Please share this post if you found it useful, check out my other posts, and of course, consider buying me a coffee!