The title is a riff of (Finn, n.d.).

In the real world there are many reasons why a data point would be absent from a dataset:

Not collected
Not collected for reasons
Not observed
Thrown out for reasons
Anonymized for reasons

If I have an some values associated with time, like

Year	Data A
2000	123
2003	456
2004	789

I want to combine this with yearly data of another data set to do some analysis:

Year	Data A	Data B
2000	123	ab
2001		cd
2002		ef
2003	456	gh
2004	789	ij

Is Data A missing years 2001 and 2002? Is it missing 1995? How about 1980 or 2008?

How is this missing-ness encoded in the data methodology, data sets, software, and programming languages?

Languages and their Nothings

Language	Syntax	Implementation	Meaning
IEEE 754¹	`NaN`	Value(s) of a floating-point number. Two kinds of NaN: quiet and signaling	Not a Number
Python	`None`	Object, Singleton of `NoneType`	Absence of a value²
Python	`nan`	Float value	IEEE NaN
Python - Pandas	`<NA>`	Nullable Integer	Proxy for IEEE NaN³
Julia	`NaN`	Float value	IEEE NaN
Julia	`missing`	Value, Singleton of `Missing`	Missing value in statistical sense⁴
R	`NA`	Value, Instances for multiple types	Missing value in statistical sense⁵
SQL	`NULL`	Marker for absent value	Absence of a value, Missing or Inapplicable information
C/C++	`NULL`	Preprocessor macro (implementation-defined)	Pointer that does not point to a valid object
C/C++	`nullptr`	Singleton of `nullptr_t`	Pointer that does not point to a valid object
Haskell	`Nothing`	Value of `Maybe a`	Optional value, used for errors or exceptional cases.⁶
Rust	`None`	Value of `Option<T>`	Optional value, used for default values, errors, nullable pointers⁷

There’s a saying that programming is just manipulating data. But does that really apply to the statistical and experimental interpretation of “data”?

Bonus: Default function arguments, and the caller does not supply anything

IEEE 754

Section 6.2

Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.

“IEEE 754 Error Handling and Programming Languages”

(Maclaren, 2000)

(Maclaren, 2000 Percolation):

The definition “max(1.0, NaN ) = NaN ” is correct when a NaN is a missing value and what is wanted is the maximum non-missing value of a vector (as in one expression mode in many statistical packages) but is mathematically incorrect when it is an error state (as generally in IEEE 754)

Appendix A considers some potential interpretations of NaN:

A. A missing value (i.e. unknown but valid)

B. Not numeric at all (e.g. ‘purple’)

C. Inapplicable (i.e. not a datum)

D. Numerically indefinite (e.g. ≈ 0/ ≈ 0)

E. The result of an invalid operation

Finn, S. (n.d.). The Metaphysics of Nothing. Internet Encyclopedia of Philosophy. Retrieved January 25, 2025, from https://iep.utm.edu/metaphysics-of-nothing/

Maclaren, N. (2000). IEEE 754 Error Handling and Programming Languages. IEEE Interval Standard Working Group - P1788 Mailing List. https://grouper.ieee.org/groups/1788/email/pdfmPSi1DgZZf.pdf

Jonathan Fung

Latest Pages

Refinement Types

Searching Annotations in Zotero

Tables in Emacs

All Pages

The Metaphysics of Nothing - in Programming Languages

Languages and their Nothings

IEEE 754

“IEEE 754 Error Handling and Programming Languages”

Graph View

Table of Contents

Jonathan Fung

Latest Pages

Refinement Types

Searching Annotations in Zotero

Tables in Emacs

All Pages

The Metaphysics of Nothing - in Programming Languages

Languages and their Nothings

IEEE 754

“IEEE 754 Error Handling and Programming Languages”

Footnotes

Graph View

Table of Contents