I’ve had an interesting conversation on the topic of null
over the last few days, spurred by the logical disaster of null. I disagreed with the statement in the post that:
Logically-speaking, there is no such thing as Null
This is true in some logics, but not in all logics. Boolean logic as described in An Investigation of the Laws of Thought, admits only two values:
instead of determining the measure of formal agreement of the symbols of Logic with those of Number generally, it is more immediately suggested to us to compare them with symbols of quantity admitting only of the values 0 and 1.
…and in a later chapter he goes on to introduce probabilities. Anyway. A statement either does hold (it has value 1, with some probability p) or it does not (it has value 0, with some probability 1-p; the Law of the Excluded Middle means that there is no probability that the statement has any other value).
Let’s look at an example. Let x be the lemma “All people are mortal”, and y be the conclusion “Socrates is mortal”. What is the value of y? It isn’t true, because we only know that people are mortal and do not know that Socrates is a person. On the other hand, we don’t know that Socrates is not a person, so it isn’t false either. We need another value, that means “this cannot be decided given the current state of our knowledge”.
In SQL, we find a logic that encodes true, false, and “unknown/undecided” as three different outcomes of a predicate, with the third state being given the name null
. If we had a table linking Identity to Class and a table listing the Mortality of different Classes of things, then we could join those two tables on their Class and ask “what is the Mortality of the Class of which Socrates is a member”, and find the answer null
.
But there’s a different mathematics behind relational databases, the Relational Calculus, of which SQL is an imperfect imitation. In the relational calculus predicates can only be true or false, there is no “undecided” state. Now that doesn’t mean that the answer to the above question is either true or false, it means that that question cannot be asked. We must ask a different question.
“What is the set of all Mortality values m in the set of tuples (m, c) where c is any of the values of Class that appear in the set of tuples (x, c) where x is Socrates?”
Whew! It’s long-winded, but we can ask it, and the answer has a value: the empty set. By extension, we could always change any question we don’t yet know the answer to into a question of the form “what is the set of known answers to this question”. If we know that the set has a maximum cardinality of 1, then we have reinvented the Optional/Maybe type: it either contains a value or it does not. You get its possible value to do something by sending it a foreach
message.
And so we ask whether we would rather model our problem using a binary logic, where we have to consider each question asked in the problem to decide whether it needs to be rewritten as a set membership test, or a ternary logic, where we have to consider that the answer to any question may be the “I don’t know” value.
Implementationally-speaking, there are too many damn nulls
We’ve chosen a design, and now we get to implement it. In an implementation language like Java, Objective-C or Ruby, a null value is supplied as a bottom type, which is to say that there is a magic null
or nil
keyword whose value acts as a subtype of all other types in the system. Good: we get “I don’t know” behaviour for free anywhere we might want it. Bad: we get that behaviour anywhere else too, so we need to think to be sure that in all places where “I don’t know” is not an answer, that invariant holds in our implementation, or for those of us who don’t like thinking we have to pepper our programs with defensive checks.
I picked those three languages as examples, by the way, because their implementations of null are totally different so ruin the “you should never use a language with null because X” trope.
- Java nulls are terminal: if you see a null, you blew it.
- Objective-C nulls are viral: if you see a null, you say a null.[*]
- Ruby nulls are whatever you want: monkey-patch
NilClass
until it does your thing.
[*] Objective-C really secretly has _objc_setNilReceiver(id)
, but I didn’t tell you that.
Languages like Haskell don’t have an empty bottom, so anywhere you might want a null you are going to need to build a thing that represents a null. On the other hand, anywhere you do not want a null you are not going to get one, because you didn’t tell your program how to build one.
Either approach will work. There may be others.