The title is a riff of (Finn, n.d.).

In the real world there are many reasons why a data point would be absent from a dataset:

  • Not collected
  • Not collected for reasons
  • Not observed
  • Thrown out for reasons
  • Anonymized for reasons

If I have an some values associated with time, like

YearData A
2000123
2003456
2004789

I want to combine this with yearly data of another data set to do some analysis:

YearData AData B
2000123ab
2001cd
2002ef
2003456gh
2004789ij

Is Data A missing years 2001 and 2002? Is it missing 1995? How about 1980 or 2008?

How is this missing-ness encoded in the data methodology, data sets, software, and programming languages?

Languages and their Nothings

LanguageSyntaxImplementationMeaning
IEEE 7541NaNValue(s) of a floating-point numberNot a Number
PythonNoneObject, Singleton of NoneTypeAbsence of a value2
PythonnanFloat valueIEEE NaN
Python - Pandas<NA>Nullable IntegerProxy for IEEE NaN3
JuliaNaNFloat valueIEEE NaN
JuliamissingValue, Singleton of MissingMissing value in statistical sense4
RNAValue, Instances for multiple typesMissing value in statistical sense5
SQLNULLMarker for absent valueAbsence of a value, Missing or Inapplicable information
C/C++NULLPreprocessor macro (implementation-defined)Pointer that does not point to a valid object
C/C++nullptrSingleton of nullptr_tPointer that does not point to a valid object
HaskellNothingValue of Maybe aOptional value, used for errors or exceptional cases.6
RustNoneValue of Option<T>Optional value, used for default values, errors, nullable pointers7

There’s a saying that programming is just manipulating data. But does that really apply to the statistical and experimental interpretation of “data”?

Bonus: Default function arguments, and the caller does not supply anything

IEEE 754

Section 6.2

Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.

“IEEE 754 Error Handling and Programming Languages”

(Maclaren, 2000)

(Maclaren, 2000 Percolation):

The definition “max(1.0, NaN ) = NaN ” is correct when a NaN is a missing value and what is wanted is the maximum non-missing value of a vector (as in one expression mode in many statistical packages) but is mathematically incorrect when it is an error state (as generally in IEEE 754)

Appendix A considers some potential interpretations of NaN:

  • A. A missing value (i.e. unknown but valid)
  • B. Not numeric at all (e.g. ‘purple’)
  • C. Inapplicable (i.e. not a datum)
  • D. Numerically indefinite (e.g. ≈ 0/ ≈ 0)
  • E. The result of an invalid operation

Footnotes

  1. Okay, not a language but still significant.

  2. https://docs.python.org/3/library/constants.html#None

  3. https://pandas.pydata.org/docs/user_guide/integer_na.html

  4. https://docs.julialang.org/en/v1/manual/missing/

  5. https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling

  6. https://www.haskell.org/onlinereport/haskell2010/haskellch21.html#x29-25500021

  7. https://doc.rust-lang.org/std/option/

Finn, S. (n.d.). The Metaphysics of Nothing. Internet Encyclopedia of Philosophy. Retrieved January 25, 2025, from https://iep.utm.edu/metaphysics-of-nothing/
Maclaren, N. (2000). IEEE 754 Error Handling and Programming Languages. IEEE Interval Standard Working Group - P1788 Mailing List. https://grouper.ieee.org/groups/1788/email/pdfmPSi1DgZZf.pdf