You’re not computing with numbers

(since no such thing exists)

These are some examples I’ve been showing to many people, up to the point where I started feeling like my grandma, telling same joke each Sunday. So I decided to write it down and only keep handing the link, which saves us all some time.

Nothing more unreal than the real line

If you’ve studied any mathematics you might have already developed some justified worries about the real line. For example, people can’t figure out its basic structure like which subsets are there, and the science trying to clean up this mess turned into massive endeavor, and apparently they do use forcing which should be a red light to any sane person.

Nevertheless, we’re interested in digital computers and computer programs so who cares about Vitali sets and other Borelian nonsense? The first order theory of (a field of) real numbers is decidable, so as long as we’re doing computations, one number at a time, we’re good. Right?

Well, it’s not that great. For example:

$ python3
Python 3.10.4 (main, May 14 2022, 05:40:22) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1.0 - 0.9 - 0.1
-2.7755575615628914e-17

Hmm. Probably that’s why I didn’t use Pandas or whatever is fancy now… In my times (grandma mode) we used R.

$ R

R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> 1.0 - 0.9 - 0.1
[1] -2.775558e-17

Oww… And it’s like that almost everywhere, and of course it has to be. Any kid knows how floating-point representation works, and any reasonable teenager had already figured you can’t fit arbitrary Cauchy sequence into 64bits. Since if you could, that would mean there are at most 18 446 744 073 709 551 616 real numbers. So we’re cutting some corners, but how could it harm anyone?

I’ll tell you how it harmed me, or rather a simulation we were doing at the University.

We were experimenting with a simple 1-dimensional continuous discrete dynamical system ([0,1],φ) where

       ⎧    3x, x∈[0,1/3]
φ(x) = ⎨ -3x+2, x∈(1/3,2/3]
       ⎩  3x-2, x∈(2/3,1]

In other words we were taking points (“real numbers”) between 0 and 1 and checking out what happens when you apply φ multiple times, like this (in R):

phi <- function(x) {
         ifelse(x < 1/3, 3*x,
                ifelse(x <= 2/3, -3*x+2,
                       3*x-2))
}

trajectory <- function(f,x0,n) {
  t <- c()
  for(i in 0:n) {
    t <- append(t,x0)
    x0 <- f(x0)
  }
  t
}

for example φ(0.7) = 0.1, then φ(φ(0.7)) = φ(0.1) = 0.3, φ(φ(φ(0.7))) = φ(0.3) = 0.9 etc. :

> phi(0.7)
[1] 0.1
> phi(0.1)
[1] 0.3
> phi(0.3)
[1] 0.9
> phi(0.9)
[1] 0.7

So far so good. From the above example it’s obvious (and paper and pencil confirm that beyond any doubt) 7/10 is a point of period 4, meaning its trajectory will just oscillate through these four: 1/10, 3/10, 9/10, 7/10. Now, since there are hardly any numbers larger than 13 (maybe 23 and 42) this is how we carelessly confirmed correctness of the above “implementation”:

> trajectory(phi,0.7,13)
 [1] 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1

Great. So what’s the problem? Things got weird when we started plotting accumulated statistics from pulling thousands of points via the system… it took a while to realize the above program does not work as expected. Look:

> trajectory(phi,0.7,100)
  [1] 0.70000000 0.10000000 0.30000000 0.90000000 0.70000000 0.10000000
  [7] 0.30000000 0.90000000 0.70000000 0.10000000 0.30000000 0.90000000
 [13] 0.70000000 0.10000000 0.30000000 0.90000000 0.69999999 0.09999998
 [19] 0.29999995 0.89999986 0.69999959 0.09999876 0.29999628 0.89998885
 [25] 0.69996655 0.09989966 0.29969898 0.89909695 0.69729084 0.09187253
 [31] 0.27561759 0.82685278 0.48055833 0.55832502 0.32502495 0.97507485
 [37] 0.92522454 0.77567361 0.32702082 0.98106246 0.94318738 0.82956213
 [43] 0.48868640 0.53394080 0.39817759 0.80546724 0.41640173 0.75079482
 [49] 0.25238445 0.75715335 0.27146006 0.81438017 0.44314050 0.67057851
 [55] 0.01173554 0.03520662 0.10561985 0.31685954 0.95057862 0.85173587
 [61] 0.55520761 0.33437718 0.99686847 0.99060541 0.97181624 0.91544872
 [67] 0.74634617 0.23903850 0.71711550 0.15134650 0.45403950 0.63788151
 [73] 0.08635548 0.25906645 0.77719934 0.33159801 0.99479403 0.98438209
 [79] 0.95314628 0.85943885 0.57831655 0.26505034 0.79515101 0.38545303
 [85] 0.84364090 0.53092271 0.40723186 0.77830443 0.33491328 0.99526015
 [91] 0.98578046 0.95734138 0.87202413 0.61607240 0.15178279 0.45534837
 [97] 0.63395490 0.09813530 0.29440589 0.88321767 0.64965302

Turns out we have been fooled by R’s truncated display of the “numbers” and after 15 steps the tiny fluctuation becomes visible… Analytically periodic point literally left its orbit, and the further you go, the more it’s all over the place… In IEEE754’s defense, this system is chaotic (hint: Li-Yorke theorem and look at the plot of 3-fold composition of φ, and forget the 3 fixpoints — clearly there are 23 points of period 3, which not only implies chaos but apparently also Chaos, Heil Eris!)

Eventually, in order to get correct data for our experiments, we switched from floating-point “real numbers” to bignum-based “rational numbers” which are one of the builtin numerical types in Scheme (also since Scheme’s syntax is so clean, we could use Unicode φ, how cool is that?!)

(define (φ x)
  (cond ((< x 1/3) (* x 3))
        ((< x 2/3) (+ (* -3 x) 2))
        (else (- (* 3 x) 2))))
        
(define (trajectory φ x0 n)
  (if (> n 0)
      `(,x0 . ,(trajectory f (f x0) (- n 1)))
      '()))

Which has no problems and can be casted to doubles afterwards:

scheme@(guile-user)> (trajectory φ 7/10 100)
$1 = (7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10
      1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10
      3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10
      9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10
      7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10
      1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10
      3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10
      9/10 7/10 1/10 3/10 9/10 7/10 1/10 3/10 9/10)
scheme@(guile-user)> (map (lambda (x) (* 1.0 x))
                          (trajectory φ 7/10 100))
$2 = (0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7
      0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1
      0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3
      0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9
      0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7
      0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9 0.7 0.1 0.3 0.9)

These issues with floats and doubles were pretty obvious to our grandparents who wrote most of currently used numerical code in Fortran. Since then nobody knows how it works and we just have to preserve it.

No integers, no naturals, no Dave

Now obviously bignum doesn’t solve all our problems, but it does solve one, which is already easy to miss:

scala> 65536*65536
res0: Int = 0

Apparently 65536 is a divisor of 0. You might think it’s obvious, but then it’s not immediately obvious the following C procedure does NOT compute factorial function:

int factorial(int n) {
  int res = 1;
  while(n>0) { res *= n ; n--; }
  return res; 
}

But it takes no time to see it doesn’t and maybe a moment to realize it can’t.

void main() { int i; for(i=0; i<42; i++) printf("factorial(%d)=%d\n",i,factorial(i)); }

produces:

factorial(0)=1
factorial(1)=1
factorial(2)=2
factorial(3)=6
factorial(4)=24
factorial(5)=120
factorial(6)=720
factorial(7)=5040
factorial(8)=40320
factorial(9)=362880
factorial(10)=3628800
factorial(11)=39916800
factorial(12)=479001600
factorial(13)=1932053504
factorial(14)=1278945280
factorial(15)=2004310016
factorial(16)=2004189184
factorial(17)=-288522240
factorial(18)=-898433024
factorial(19)=109641728
factorial(20)=-2102132736
factorial(21)=-1195114496
factorial(22)=-522715136
factorial(23)=862453760
factorial(24)=-775946240
factorial(25)=2076180480
factorial(26)=-1853882368
factorial(27)=1484783616
factorial(28)=-1375731712
factorial(29)=-1241513984
factorial(30)=1409286144
factorial(31)=738197504
factorial(32)=-2147483648
factorial(33)=-2147483648
factorial(34)=0
factorial(35)=0
factorial(36)=0
factorial(37)=0
factorial(38)=0
factorial(39)=0
factorial(40)=0
factorial(41)=0

Sweet, isn’t it? And not any better with unsigned int, while relying on bignum instead only pushes these problems a bit further. At first glance it’s a lot further, but still within a reach of a good old combinatorial explosion (go on, feed trajectory with factorial and some moderately large number and enjoy the flight).

This can get pretty dangerous, for example I do believe programmers in the aviation industry are nothing less than brilliant and careful, stick to the “best practices” and testing and all that. Yet, look at this (TL;DR Boeing 787 planes need to be “reset” every 200 days or so, “because a software bug shuts down the plane’s electricity generators every 248 days” and apparently said bug is related to hitting the limit of what unsigned int can represent).

Similar problems hold for practically any computable arithmetic function (unless it’s not defined only on a small finite subset of natural numbers). We never compute them, we have some artifacts which clearly do something and seem to fit the definitions, but they’re nothing like “the real thing” — they can’t be because “the real thing” doesn’t even exist. It’s just a opinion, but a strong one. Even the very large finite things can’t exist, but for this consult e.g. wiki. I recall there was a neat anecdote about Esenin-Volpin in some early chapters of “Naming Infinity: A True Story of Religious Mysticism and Mathematical Creativity” by Loren Graham and Jean-Michel Kantor (Harvard University Press, 2009) but I had the book borrowed, so I can’t reproduce it faithfully enough now, and I drifted away into digression, didn’t I? My point is that despite the above factorial procedure/program lends itself (trivially) to proof by induction that it does “capture the factorial function”, quite obviously it doesn’t “capture the factorial function”, since any physical storage is nothing like the set of natural numbers… Inductive proofs about computer programs inside a physical computer are never correct! There is only certain correspondence between initial segment of the induction/recursion and the program’s behaviour (perhaps, maybe).

This makes formal methods in program verification a little less attractive than they seemed, doesn’t it? You can e.g. establish semantic equivalence of two algorithms, implement them on a digital computer, and have two semantically different programs. I did. This happens. Sorry.

Conclusions

In this way too long rant I tried to make three points:

Although I really meant only this: next time you ask me “So, does it work now?” don’t be puzzled I reply “We may never know” — I do mean it!


Last rev. 2023/10/13