Jekyll2020-04-06T08:25:48+00:00//Jeremie CoullonJeremie Coullon's blog: posts about programming, statistics, and jazzImplementing natural numbers in OCaml2020-04-06T08:00:00+00:002020-04-06T08:00:00+00:00/2020/04/06/NaturalNumbersOCaml<p>In this post we’re going to implement natural numbers (positive integers) in <a href="https://ocaml.org/">OCaml</a> to see how we can define numbers from first
principle, namely without using OCaml’s built in <code class="highlighter-rouge">Integer</code> type. We’ll then write a simple UI so that we have a basic (but inefficient) calculator. You can find all the code for this post on <a href="https://github.com/jeremiecoullon/natural_numbers_post">Github</a>.</p>
<h2 id="definition">Definition</h2>
<p>We’ll start with a recursive definition of natural numbers:</p>
<script type="math/tex; mode=display">n \in \mathcal{N} \iff n = \begin{cases}0 \\ S(m) \hspace{5mm} \text{for }m \in \mathcal{N}
\end{cases}</script>
<p>We used the function <script type="math/tex">S(m)</script> which is called the <a href="https://en.wikipedia.org/wiki/Successor_function">successor function</a>. This simply returns the next natural number (for example <script type="math/tex">S(0)=1</script>, and <script type="math/tex">S(4)=5</script>).</p>
<p>This definition means that a natural number is either <script type="math/tex">0</script> or the successor of another natural number. For example <script type="math/tex">0</script> is a natural number (the first case in the definition), but <script type="math/tex">1</script> is also a natural number, as it’s the successor of <script type="math/tex">0</script> (you would write <script type="math/tex">1=S(0)</script>). 2 can then be written as <script type="math/tex">2 = S(S(0))</script> , and so on. By using recursion (the definition of a natural number includes another natural number) we can “bootstrap” building numbers without using many other definitions.</p>
<p>We now write this definition as a type in OCaml, which looks a lot like the mathematical definition above:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">nat</span> <span class="o">=</span>
<span class="o">|</span> <span class="nc">Zero</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="k">of</span> <span class="n">nat</span>
</code></pre></div></div>
<p>The vertical lines denote the two cases. Here you would write 1 as <code class="highlighter-rouge">Succ Zero</code>, 2 as <code class="highlighter-rouge">Succ Succ Zero</code>, and so on.</p>
<p>However we haven’t said what these numbers are (what <em>is</em> zero? What <em>are</em> numbers? ). To do
that we need to define how they act.</p>
<h2 id="some-operators">Some operators</h2>
<p>We’ll start off by defining how we can increment and decrement them.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">incr</span> <span class="n">n</span> <span class="o">=</span>
<span class="nc">Succ</span> <span class="n">n</span>
<span class="k">let</span> <span class="n">decr</span> <span class="n">n</span> <span class="o">=</span>
<span class="k">match</span> <span class="n">n</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="nc">Zero</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="n">nn</span> <span class="o">-></span> <span class="n">nn</span>
</code></pre></div></div>
<p>The increment function simply adds a <code class="highlighter-rouge">Succ</code> before the number, so this corresonds to adding 1. So <code class="highlighter-rouge">incr (Succ Zero)</code> returns <code class="highlighter-rouge">Succ Succ Zero</code>. The decrement function checks whether the number <code class="highlighter-rouge">n</code> is <code class="highlighter-rouge">Zero</code> or the successor of a number. In the first case it simply returns <code class="highlighter-rouge">Zero</code> (So this means that <code class="highlighter-rouge">decr Zero</code> returns <code class="highlighter-rouge">Zero</code>. However this could be extended to include negative numbers). In the second case the function returns the number that precedes it. So <code class="highlighter-rouge">decr (Succ Succ Succ Zero)</code> returns <code class="highlighter-rouge">Succ Succ Zero</code>.</p>
<h3 id="addition">Addition</h3>
<p>We can now define addition as a recursive function which we denote by <code class="highlighter-rouge">++</code> (in OCaml we define <a href="https://en.wikipedia.org/wiki/Infix_notation">infix operators</a> using parentheses). So the addition function takes two elements <code class="highlighter-rouge">n</code> and <code class="highlighter-rouge">m</code> of type <code class="highlighter-rouge">nat</code> and returns an element of type <code class="highlighter-rouge">nat</code>. Note the <code class="highlighter-rouge">rec</code> added before the function name which means that it’s recursive.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">rec</span> <span class="p">(</span><span class="o">++</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">match</span> <span class="n">m</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="n">n</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="n">mm</span> <span class="o">-></span> <span class="p">(</span><span class="nc">Succ</span> <span class="n">n</span><span class="p">)</span> <span class="o">++</span> <span class="n">mm</span>
</code></pre></div></div>
<p>Because we defined the function to be an infix operator we put it in between the arguments (ex: <code class="highlighter-rouge">Zero ++ (Succ Zero)</code>). This function checks whether <code class="highlighter-rouge">m</code> is <code class="highlighter-rouge">Zero</code> or the successor of a number. If it’s a successor of <code class="highlighter-rouge">mm</code> it returns the sum of <code class="highlighter-rouge">mm</code> and <code class="highlighter-rouge">Succ n</code>.</p>
<p>Let check that this definition behaves correctly by calculating 1+1 which we write as <code class="highlighter-rouge">(Succ Zero) ++ (Succ Zero)</code>. The first call to the function finds that the second argument is the successor of <code class="highlighter-rouge">Zero</code>, so returns the sum <code class="highlighter-rouge">(Succ Succ Zero) ++ Zero</code>. This calls the functions a second time which finds that the second argument is <code class="highlighter-rouge">Zero</code>. As a result the function return <code class="highlighter-rouge">Succ Succ Zero</code> which is 2 !</p>
<p>So in summary 1+1 is written as <code class="highlighter-rouge">(Succ Zero) ++ (Succ Zero)</code> = <code class="highlighter-rouge">(Succ Succ Zero) ++ Zero</code> = <code class="highlighter-rouge">Succ Succ Zero</code>. Math still works!</p>
<h3 id="subtraction">Subtraction</h3>
<p>We now define subtraction:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">rec</span> <span class="p">(</span><span class="o">--</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">match</span> <span class="n">m</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="n">n</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="n">mm</span> <span class="o">-></span> <span class="p">(</span><span class="n">decr</span> <span class="n">n</span><span class="p">)</span> <span class="o">--</span> <span class="n">mm</span>
</code></pre></div></div>
<p>This decrements both arguments until the second one is Zero. Note that if <code class="highlighter-rouge">m</code> is bigger than <code class="highlighter-rouge">n</code> then <code class="highlighter-rouge">n -- m</code> will still equal <code class="highlighter-rouge">Zero</code>.</p>
<h3 id="multiplication">Multiplication</h3>
<p>Moving on, we define multiplication:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="p">(</span><span class="o">+*</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">aux</span> <span class="n">n</span> <span class="n">m</span> <span class="n">acc</span> <span class="o">=</span>
<span class="k">match</span> <span class="n">m</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="n">acc</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="n">mm</span> <span class="o">-></span> <span class="n">aux</span> <span class="n">n</span> <span class="n">mm</span> <span class="p">(</span><span class="n">n</span> <span class="o">++</span> <span class="n">acc</span><span class="p">)</span>
<span class="k">in</span>
<span class="n">aux</span> <span class="n">n</span> <span class="n">m</span> <span class="nc">Zero</span>
</code></pre></div></div>
<p>Here we use an auxiliary function (<code class="highlighter-rouge">aux</code>) which builds up the result in the accumulator <code class="highlighter-rouge">acc</code> by adding <code class="highlighter-rouge">n</code> to it <code class="highlighter-rouge">m</code> times. So applying this function to <script type="math/tex">3</script> and <script type="math/tex">2</script> gives: <script type="math/tex">3*2 = 3*1 + 3 = 3*0 + 6 = 6</script>. And in code this is:</p>
<ul>
<li><code class="highlighter-rouge">(Succ (Succ (Succ Zero))) +* (Succ (Succ Zero))</code></li>
<li>Which returns <code class="highlighter-rouge">((Succ (Succ (Succ Zero))) +* (Succ Zero)) ++ (Succ (Succ (Succ Zero)))</code></li>
<li>Which returns <code class="highlighter-rouge">((Succ (Succ (Succ Zero))) +* Zero) ++ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))) </code></li>
<li>which returns <code class="highlighter-rouge">(Succ (Succ (Succ (Succ (Succ (Succ Zero))))))</code> (namely <script type="math/tex">6</script>)</li>
</ul>
<h3 id="division">Division</h3>
<p>We also define the ‘strictly less than’ operator which we then use to define integer division.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">rec</span> <span class="p">(</span><span class="o"><<</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">match</span> <span class="p">(</span><span class="n">n</span><span class="o">,</span> <span class="n">m</span><span class="p">)</span> <span class="k">with</span>
<span class="o">|</span> <span class="p">(</span><span class="n">p</span><span class="o">,</span> <span class="nc">Zero</span><span class="p">)</span> <span class="o">-></span> <span class="bp">false</span>
<span class="o">|</span> <span class="p">(</span><span class="nc">Zero</span><span class="o">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">-></span> <span class="bp">true</span>
<span class="o">|</span> <span class="p">(</span><span class="n">p</span><span class="o">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">decr</span> <span class="n">n</span><span class="p">)</span> <span class="o"><<</span> <span class="p">(</span><span class="n">decr</span> <span class="n">m</span><span class="p">)</span>
<span class="k">let</span> <span class="p">(</span><span class="o">//</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">aux</span> <span class="n">p</span> <span class="n">acc</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">lt</span> <span class="o">=</span> <span class="n">p</span> <span class="o"><<</span> <span class="n">m</span> <span class="k">in</span>
<span class="k">match</span> <span class="n">lt</span> <span class="k">with</span>
<span class="o">|</span> <span class="bp">true</span> <span class="o">-></span> <span class="n">acc</span>
<span class="o">|</span> <span class="bp">false</span> <span class="o">-></span> <span class="n">aux</span> <span class="p">(</span><span class="n">p</span> <span class="o">--</span> <span class="n">m</span><span class="p">)</span> <span class="p">(</span><span class="nc">Succ</span> <span class="n">acc</span><span class="p">)</span>
<span class="k">in</span>
<span class="n">aux</span> <span class="n">n</span> <span class="nc">Zero</span>
</code></pre></div></div>
<p>Like in the case of multiplication, the division function defines an auxiliary function that builds up the result in the accumulator <code class="highlighter-rouge">acc</code>. This function checks whether the first argument <code class="highlighter-rouge">p</code> is less than <code class="highlighter-rouge">m</code>. If it isn’t, then increment the accumulator by 1 and call <code class="highlighter-rouge">aux</code> again but with <code class="highlighter-rouge">p-m</code> as the first argument. Once <code class="highlighter-rouge">p</code> is less than <code class="highlighter-rouge">m</code> then return the accumulator. So this auxiliary function counts the number of times that <code class="highlighter-rouge">m</code> fits into <code class="highlighter-rouge">p</code>, which is exactly what integer division is. We run this function with <code class="highlighter-rouge">n</code> as first argument and with the accumulator as <code class="highlighter-rouge">Zero</code>.</p>
<p>Finally we can define the modulo operator. As we use previous definitions of division, multiplication, and subtraction, this definition is abstracted away from our implementation of natural numbers. This function gives the remainder when dividing <code class="highlighter-rouge">n</code> by <code class="highlighter-rouge">m</code>.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">let</span> <span class="p">(</span><span class="o">%</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="n">n</span> <span class="o">//</span> <span class="n">m</span> <span class="k">in</span>
<span class="n">n</span> <span class="o">--</span> <span class="p">(</span><span class="n">p</span> <span class="o">+*</span> <span class="n">m</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="a-basic-ui">A basic UI</h2>
<p>We’ve defined the natural numbers and the basic operators, but it’s a bit unwieldy to use them in their current form. So we’ll write some code to convert them to the usual number system (represented as strings) and back.</p>
<h3 id="from-type-nat-to-string-representation">From type <code class="highlighter-rouge">nat</code> to string representation</h3>
<p>We’ll write some code to convert numbers to base 10 and then represent them in the usual Arabic numerals.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">ten</span> <span class="o">=</span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)))))))))</span>
<span class="k">let</span> <span class="n">base10</span> <span class="n">n</span> <span class="o">=</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">aux</span> <span class="n">q</span> <span class="n">acc</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">r</span> <span class="o">=</span> <span class="n">q</span> <span class="o">%</span> <span class="n">ten</span> <span class="k">in</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="n">q</span> <span class="o">//</span> <span class="n">ten</span> <span class="k">in</span>
<span class="k">match</span> <span class="n">p</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="n">r</span><span class="o">::</span><span class="n">acc</span>
<span class="o">|</span> <span class="n">pp</span> <span class="o">-></span> <span class="n">aux</span> <span class="n">p</span> <span class="p">(</span><span class="n">r</span><span class="o">::</span><span class="n">acc</span><span class="p">)</span>
<span class="k">in</span>
<span class="n">aux</span> <span class="n">n</span> <span class="bp">[]</span>
</code></pre></div></div>
<p>This function returns a list where each element corresponds to the number of 1s, 10s, 100s etc in the number. So if <code class="highlighter-rouge">n</code> is <code class="highlighter-rouge">Succ Succ Succ Succ Succ Succ Succ Succ Succ Succ Succ Succ Zero</code> (ie: 12), then <code class="highlighter-rouge">base10 n</code> returns <code class="highlighter-rouge">[Succ Zero; Succ Succ Zero]</code>.</p>
<p>We then define the 10 digits (with a hack for the cases bigger than 9) and put it all together in the function <code class="highlighter-rouge">string_of_nat</code>.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">print_nat_digits</span> <span class="o">=</span> <span class="k">function</span>
<span class="o">|</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"0"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"1"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"2"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"3"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"4"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"5"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"6"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"7"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"8"</span>
<span class="o">|</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Succ</span> <span class="nc">Zero</span> <span class="o">-></span> <span class="s2">"9"</span>
<span class="o">|</span> <span class="n">_</span> <span class="o">-></span> <span class="s2">"bigger than 9"</span>
<span class="k">let</span> <span class="n">string_of_nat</span> <span class="n">n</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">base_10_rep</span> <span class="o">=</span> <span class="n">base10</span> <span class="n">n</span> <span class="k">in</span>
<span class="k">let</span> <span class="n">list_strings</span> <span class="o">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="n">print_nat_digits</span> <span class="n">base_10_rep</span> <span class="k">in</span>
<span class="nn">String</span><span class="p">.</span><span class="n">concat</span> <span class="s2">""</span> <span class="n">list_strings</span>
</code></pre></div></div>
<p><code class="highlighter-rouge">string_of_nat</code> converts the number of type <code class="highlighter-rouge">nat</code> to base 10, then maps each of the list element to a string and concatenates those strings.</p>
<p>So <code class="highlighter-rouge">string_of_nat (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))))))</code> returns <code class="highlighter-rouge">"12"</code> which is easier to read!</p>
<h3 id="from-string-representation-to-type-nat">From string representation to type <code class="highlighter-rouge">nat</code></h3>
<p>We then define some code to go the other way around: from string representation to natural numbers.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">string_to_list</span> <span class="n">s</span> <span class="o">=</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">loop</span> <span class="n">acc</span> <span class="n">i</span> <span class="o">=</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">then</span> <span class="n">acc</span>
<span class="k">else</span>
<span class="n">loop</span> <span class="p">((</span><span class="nn">String</span><span class="p">.</span><span class="n">make</span> <span class="mi">1</span> <span class="n">s</span><span class="o">.</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">::</span> <span class="n">acc</span><span class="p">)</span> <span class="p">(</span><span class="n">pred</span> <span class="n">i</span><span class="p">)</span>
<span class="k">in</span> <span class="n">loop</span> <span class="bp">[]</span> <span class="p">(</span><span class="nn">String</span><span class="p">.</span><span class="n">length</span> <span class="n">s</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">let</span> <span class="n">nat_of_listnat</span> <span class="n">l</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">lr</span> <span class="o">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">rev</span> <span class="n">l</span> <span class="k">in</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">aux</span> <span class="n">n</span> <span class="n">b</span> <span class="n">lr</span> <span class="o">=</span>
<span class="k">match</span> <span class="n">lr</span> <span class="k">with</span>
<span class="o">|</span> <span class="bp">[]</span> <span class="o">-></span> <span class="n">n</span>
<span class="o">|</span> <span class="n">h</span><span class="o">::</span><span class="n">t</span> <span class="o">-></span> <span class="n">aux</span> <span class="p">(</span><span class="n">n</span> <span class="o">++</span> <span class="p">(</span><span class="n">b</span><span class="o">+*</span><span class="n">h</span><span class="p">))</span> <span class="p">(</span><span class="n">b</span><span class="o">+*</span><span class="n">ten</span><span class="p">)</span> <span class="n">t</span>
<span class="k">in</span>
<span class="n">aux</span> <span class="nc">Zero</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)</span> <span class="n">lr</span>
<span class="k">let</span> <span class="n">nat_of_string_digits</span> <span class="o">=</span> <span class="k">function</span>
<span class="o">|</span> <span class="s2">"0"</span> <span class="o">-></span> <span class="nc">Zero</span>
<span class="o">|</span> <span class="s2">"1"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="nc">Zero</span>
<span class="o">|</span> <span class="s2">"2"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)</span>
<span class="o">|</span> <span class="s2">"3"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">))</span>
<span class="o">|</span> <span class="s2">"4"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)))</span>
<span class="o">|</span> <span class="s2">"5"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">))))</span>
<span class="o">|</span> <span class="s2">"6"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)))))</span>
<span class="o">|</span> <span class="s2">"7"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">))))))</span>
<span class="o">|</span> <span class="s2">"8"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">)))))))</span>
<span class="o">|</span> <span class="s2">"9"</span> <span class="o">-></span> <span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="p">(</span><span class="nc">Succ</span> <span class="nc">Zero</span><span class="p">))))))))</span>
<span class="o">|</span> <span class="n">_</span> <span class="o">-></span> <span class="k">raise</span> <span class="p">(</span><span class="nc">Failure</span> <span class="s2">"string must be less than 10"</span><span class="p">)</span>
<span class="c">(* Converts string to nat *)</span>
<span class="k">let</span> <span class="n">nat_of_string</span> <span class="n">s</span> <span class="o">=</span>
<span class="k">let</span> <span class="n">liststring</span> <span class="o">=</span> <span class="n">string_to_list</span> <span class="n">s</span> <span class="k">in</span>
<span class="k">let</span> <span class="n">listNatbase</span> <span class="o">=</span> <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="n">nat_of_string_digits</span> <span class="n">liststring</span> <span class="k">in</span>
<span class="n">nat_of_listnat</span> <span class="n">listNatbase</span>
<span class="c">(*
final (infix) functions for adding, subtracting, multiplying, and dividing
which take strings as arguments and return a string
*)</span>
<span class="k">let</span> <span class="p">(</span><span class="o">+++</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="n">string_of_nat</span> <span class="p">((</span><span class="n">nat_of_string</span> <span class="n">n</span><span class="p">)</span> <span class="o">++</span> <span class="p">(</span><span class="n">nat_of_string</span> <span class="n">m</span><span class="p">))</span>
<span class="k">let</span> <span class="p">(</span><span class="o">---</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="n">string_of_nat</span> <span class="p">((</span><span class="n">nat_of_string</span> <span class="n">n</span><span class="p">)</span> <span class="o">--</span> <span class="p">(</span><span class="n">nat_of_string</span> <span class="n">m</span><span class="p">))</span>
<span class="k">let</span> <span class="p">(</span><span class="o">+**</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="n">string_of_nat</span> <span class="p">((</span><span class="n">nat_of_string</span> <span class="n">n</span><span class="p">)</span> <span class="o">+*</span> <span class="p">(</span><span class="n">nat_of_string</span> <span class="n">m</span><span class="p">))</span>
<span class="k">let</span> <span class="p">(</span><span class="o">///</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="n">string_of_nat</span> <span class="p">((</span><span class="n">nat_of_string</span> <span class="n">n</span><span class="p">)</span> <span class="o">//</span> <span class="p">(</span><span class="n">nat_of_string</span> <span class="n">m</span><span class="p">))</span>
<span class="k">let</span> <span class="p">(</span><span class="o">%%</span><span class="p">)</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span>
<span class="n">string_of_nat</span> <span class="p">((</span><span class="n">nat_of_string</span> <span class="n">n</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="n">nat_of_string</span> <span class="n">m</span><span class="p">))</span>
</code></pre></div></div>
<p>So putting it all together, we have a working calculator for natural numbers!</p>
<p>Let’s try it out:</p>
<ul>
<li><code class="highlighter-rouge">"3" +++ "17"</code> returns <code class="highlighter-rouge">"20"</code></li>
<li><code class="highlighter-rouge">"182" --- "93"</code> returns <code class="highlighter-rouge">"89"</code></li>
<li><code class="highlighter-rouge">"12" +** "3"</code> returns <code class="highlighter-rouge">"36"</code></li>
<li><code class="highlighter-rouge">"41" /// "3"</code> returns <code class="highlighter-rouge">"13"</code></li>
<li><code class="highlighter-rouge">"41" %% "3"</code> returns <code class="highlighter-rouge">"2"</code></li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>We have built up natural numbers from first principles and now have a working calculator. However these operators start getting very slow for numbers of around 7 digits or more, so sticking with built-in integers sounds preferable..</p>
<p><em>All the code for this post is on <a href="https://github.com/jeremiecoullon/natural_numbers_post">Github</a></em></p>
<p><em>Thanks to <a href="https://www.linkedin.com/in/james-jobanputra-62582669">James Jobanputra</a> for useful feedback on this post</em></p>In this post we’re going to implement natural numbers (positive integers) in OCaml to see how we can define numbers from first principle, namely without using OCaml’s built in Integer type. We’ll then write a simple UI so that we have a basic (but inefficient) calculator. You can find all the code for this post on Github.Testing MCMC code: the prior reproduction test2020-02-04T08:00:00+00:002020-02-04T08:00:00+00:00/2020/02/04/PriorReproductionTest<p><a href="https://darrenjw.wordpress.com/2010/08/15/metropolis-hastings-mcmc-algorithms/">Markov Chain Monte Carlo</a> (MCMC) is a class of algorithms for sampling from probability distributions. These are very useful algorithms, but it’s easy to go wrong and obtain samples from the wrong probability distribution. What’s more, it won’t be obvious if the sampler fails, so we need ways to check whether it’s working correctly.</p>
<p>This post is mainly aimed at MCMC practitioners and describes a powerful MCMC test called the Prior Reproduction Test (PRT). I’ll go over the context of the test, then explain how it works (and give some code). I’ll then explain how to tune it and discuss some limitations.</p>
<h2 id="why-should-we-test-mcmc-code-">Why should we test MCMC code ?</h2>
<p>There are two main ways MCMC can fail: either the chain doesn’t mix or the sampler targets the wrong distribution. We say that a chain mixes if it explores the target distribution in its entirety without getting stuck or avoiding a certain subset of the space. To check that a chain mixes, we use diagnostics such as running the chain for a long time and examining the trace plots, calculating the <script type="math/tex">\hat{R}</script> (or <a href="https://mc-stan.org/docs/2_21/reference-manual/notation-for-samples-chains-and-draws.html">potential scale reduction factor</a>), and using the multistart heuristic. See the <a href="https://www.mcmchandbook.net/">Handbook of MCMC</a> for a good overview of these diagnostics. These help check that the chain converges to a distribution.</p>
<p>However the target distribution of the sampler may not be the correct one. This could be due to a bug in the code or an error in the maths (for example the Hastings correction in the Metropolis-Hastings algorithm could be wrong). To test the software, we can do tests such as unit tests which check that individual functions act like they should. We can also do integration tests (testing the entire software rather than just a component). One such test is to try to recover simulated values (as recommended by the <a href="https://github.com/stan-dev/stan/wiki/Stan-Best-Practices#recover-simulated-values">Stan documentation</a>): generate data given some “true” parameters (using your data model) and then fit the model using the sampler. The true parameter that should be within the credible interval (loosely within 2 standard deviations of it). This checks that the sampler can indeed recover the true parameter.</p>
<p>However this test is only a “sanity check” and doesn’t check whether samples are truly from the target distribution. What’s needed here is a goodness of fit (GoF) test. As doing a GoF test for arbitrarily complex posterior distributions is hard, the PRT reduces the problem to testing that some samples are from the prior rather than the posterior. I had trouble finding books or articles written about this (a similar version of this test is described by Cook, Gelman, and Rubin <a href="http://www.stat.columbia.edu/~gelman/research/published/Cook_Software_Validation.pdf">here</a>, but they don’t call it PRT); if you know of any references let me know! I know of this test from my PhD supervisor <a href="https://www.ucl.ac.uk/statistics/people/yvopokern">Yvo Pokern</a> who learnt it from another researcher during his postdoc. From talking to other researchers, it seems that this method has often been transmitted by word of mouth rather than from textbooks.</p>
<h2 id="the-prior-reproduction-test">The Prior Reproduction Test</h2>
<p>The prior reproduction test runs as follows: sample from the prior <script type="math/tex">\theta_0 \sim \pi_0</script>, generate data using this prior sample <script type="math/tex">X \sim p(X|\theta_0)</script>, and run the to-be-tested sampler long enough to get an independent sample from the posterior <script type="math/tex">\theta_p \sim \pi(\theta|X)</script>. If the code is correct, the samples from the posterior should be distributed according to the prior.
One can repeat this procedure to obtain many samples <script type="math/tex">\theta_p</script> and test whether they are distributed according to the prior.</p>
<p>Here is the test in Python (code available on <a href="https://github.com/jeremiecoullon/PRT_post">Github</a>). First we define the observation operator <script type="math/tex">\mathcal{G}</script>) (the mapping from parameter to data, in this case simply the identity) along with the log-likelihood, log-prior, and log-posterior. So here our data is simply sampled from a Gaussian with mean 5 and standard deviation 3.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">G</span><span class="p">(</span><span class="n">theta</span><span class="p">):</span>
<span class="s">"""
G(theta): observation operator. Here it's just the identity function, but it could
be a more complicated model.
"""</span>
<span class="k">return</span> <span class="n">theta</span>
<span class="c1"># data noise:
</span><span class="n">sigma_data</span> <span class="o">=</span> <span class="mi">3</span>
<span class="k">def</span> <span class="nf">build_log_likelihood</span><span class="p">(</span><span class="n">data_array</span><span class="p">):</span>
<span class="s">"Builds the log_likelihood function given some data"</span>
<span class="k">def</span> <span class="nf">log_likelihood</span><span class="p">(</span><span class="n">theta</span><span class="p">):</span>
<span class="s">"Data model: y = G(theta) + eps"</span>
<span class="k">return</span> <span class="o">-</span> <span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">sigma_data</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">([(</span><span class="n">elem</span> <span class="o">-</span> <span class="n">G</span><span class="p">(</span><span class="n">theta</span><span class="p">))</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">data_array</span><span class="p">])</span>
<span class="k">return</span> <span class="n">log_likelihood</span>
<span class="k">def</span> <span class="nf">log_prior</span><span class="p">(</span><span class="n">theta</span><span class="p">):</span>
<span class="s">"uniform prior on [0, 10]"</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o"><</span> <span class="n">theta</span> <span class="o"><</span> <span class="mi">10</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">9999999</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">build_log_posterior</span><span class="p">(</span><span class="n">log_likelihood</span><span class="p">):</span>
<span class="s">"Builds the log_posterior function given a log_likelihood"</span>
<span class="k">def</span> <span class="nf">log_posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">):</span>
<span class="k">return</span> <span class="n">log_prior</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span> <span class="o">+</span> <span class="n">log_likelihood</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span>
<span class="k">return</span> <span class="n">log_posterior</span>
</code></pre></div></div>
<p>We want to the test the code for a Metropolis sampler with Gaussian proposal (given in the <a href="https://github.com/jeremiecoullon/PRT_post/tree/master/MCMC"><code class="highlighter-rouge">MCMC</code> module</a>), so we run the PRT for it (the following code is in the <code class="highlighter-rouge">run_PRT()</code> function in <a href="https://github.com/jeremiecoullon/PRT_post/blob/master/PRT.py"><code class="highlighter-rouge">PRT.py</code></a>):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">B</span> <span class="o">=</span> <span class="mi">200</span>
<span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">B</span><span class="p">):</span>
<span class="c1"># sample from prior
</span> <span class="n">sam_prior</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span>
<span class="c1"># generate data points using the sampled prior
</span> <span class="n">data_array</span> <span class="o">=</span> <span class="n">G</span><span class="p">(</span><span class="n">sam_prior</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">sigma_data</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="c1"># build the posterior function
</span> <span class="n">log_likelihood</span> <span class="o">=</span> <span class="n">build_log_likelihood</span><span class="p">(</span><span class="n">data_array</span><span class="o">=</span><span class="n">data_array</span><span class="p">)</span>
<span class="n">log_posterior</span> <span class="o">=</span> <span class="n">build_log_posterior</span><span class="p">(</span><span class="n">log_likelihood</span><span class="p">)</span>
<span class="c1"># define the sampler
</span> <span class="n">ICs</span> <span class="o">=</span> <span class="p">{</span><span class="s">'theta'</span><span class="p">:</span> <span class="mi">1</span><span class="p">}</span>
<span class="n">sd_proposal</span> <span class="o">=</span> <span class="mi">20</span>
<span class="n">mcmc_sampler</span> <span class="o">=</span> <span class="n">MHSampler</span><span class="p">(</span><span class="n">log_post</span><span class="o">=</span><span class="n">log_posterior</span><span class="p">,</span> <span class="n">ICs</span><span class="o">=</span><span class="n">ICs</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="c1"># add a Gaussian proposal
</span> <span class="n">mcmc_sampler</span><span class="o">.</span><span class="n">move</span> <span class="o">=</span> <span class="n">GaussianMove</span><span class="p">(</span><span class="n">ICs</span><span class="p">,</span> <span class="n">cov</span><span class="o">=</span><span class="n">sd_proposal</span><span class="p">)</span>
<span class="c1"># Get a posterior sample.
</span> <span class="c1"># Let the sampler run for 200 iterations to make sure it's independent from the initial condition
</span> <span class="n">mcmc_sampler</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">n_iter</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">print_rate</span><span class="o">=</span><span class="mi">300</span><span class="p">)</span>
<span class="n">last_sample</span> <span class="o">=</span> <span class="n">mcmc_sampler</span><span class="o">.</span><span class="n">all_samples</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">theta</span>
<span class="c1"># store the results. Keep the posterior sample as well as the prior that generated the data
</span> <span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">({</span><span class="s">'posterior'</span><span class="p">:</span> <span class="n">last_sample</span><span class="p">,</span> <span class="s">'prior'</span><span class="p">:</span> <span class="n">sam_prior</span><span class="p">})</span>
</code></pre></div></div>
<p>We then check that the posterior samples are uniformly distributed (i.e. the same as the prior) (see figure 1). Here we do this by eye, but we could have done this more formally (for example using the <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov-Smirnov test</a>).</p>
<figure class="post_figure">
<img src="/assets/PRT_post/empirical_CDF_data10.png" />
<figcaption>Figure 1: Empirical CDF of the output of PRT: these seem to be uniformly distributed</figcaption>
</figure>
<h2 id="tuning-the-prt">Tuning the PRT</h2>
<p>Notice how we let the sampler run for 200 iterations to make sure that the posterior sample we get is independent of the initial condition (<code class="highlighter-rouge">mcmc_sampler.run(n_iter=200, print_rate=300)</code>). The number of iterations used needs to be tuned to the sampler; if it’s slow then you’ll need more samples. This means that a slowly mixing sampler will cause the PRT to become more computationally expensive. We also needed to tune the proposal variance in the Gaussian proposal (called <code class="highlighter-rouge">sd_proposal</code>); ideally this will be a good tuning for any dataset generated in the PRT, but this may not always be the case. Sometimes the sampler needs hand tuning for each generated dataset; in this case it may also be too expensive to run the entire test. We’ll see later what other tests we can do in this case.</p>
<p>Finally, how do we choose the amount of data to generate (here we chose <code class="highlighter-rouge">10</code> data points)? Consider 2 extremes: if we choose too much data then the posterior will have a very low variance and will be centred around the true parameter. So almost any posterior sample we obtain will be close to the true parameter (which we sampled from the prior), and so the PRT will (trivially) produce samples from the prior. This doesn’t test the statistical properties of the sampler, but rather tests that the posterior is centred around the true parameter. In the other extreme case, if we have too little data the likelihood will have a weak effect on the posterior, which will then essentially be the prior. The MCMC sampler will then sample from a distribution that is very close to prior, and again the PRT becomes weaker. We therefore need to choose somewhere in the middle.</p>
<p>To tune the amount of data to generate we can plot the posterior vs the prior samples from the PRT as we can see in figure 2 below. Ideally there is a nice amount of variation around the line <code class="highlighter-rouge">y=x</code> as in the middle plot (for <code class="highlighter-rouge">N=10</code> data points). In the other two case the PRT will trivially recover prior samples and not test the software properly.</p>
<figure class="post_figure">
<img src="/assets/PRT_post/3_data_comparison.png" />
<figcaption>Figure 2: We need to tune the amount of data to generate in PRT</figcaption>
</figure>
<h2 id="limitations-and-alternatives">Limitations and alternatives</h2>
<p>In some cases however it’s not possible to run the PRT. The likelihood may be too computationally expensive; it might require solving numerically a differential equation for example. It’s also possible that the proposal distribution needs to be tuned for each dataset.
In this case you have to tune the proposal manually at each iteration of the PRT.</p>
<p>A way to deal with these problems is to only test conditionals of the posterior (in the case of higher dimensional posteriors).
For example if the posterior is <script type="math/tex">\pi(\theta_1, \theta_2)</script>, then run the test on <script type="math/tex">\pi(\theta_1 | \theta_2)</script>. In some cases this can solve the problem of needing to retune the proposal distribution for every dataset. This also helps with the problem of expensive likelihoods, as the dimension of the conditional posterior is lower than the original one. Less samples are then needed to run the test.</p>
<p>Another very simple alternative is to use the sampler to sample from the prior (so simply commenting out the likelihood function in the posterior). This completely bypasses the problem of expensive likelihoods and the need to retune the proposal at every step. This test checks that the MCMC proposal is correct (the Hastings correction for example), so is good for testing complicated proposals. However if the proposal needed to sample from the prior is qualitatively different from the proposal needed to sample from the posterior, then it’s not a useful test.</p>
<p>As mentioned in the introduction, the PRT reduces to testing goodness of fit of prior samples, the idea being that this is easier to test as prior distributions are often chosen for their simplicity. One can of course test goodness of fit on the MCMC samples directly (without the PRT) using a method such as the <a href="http://proceedings.mlr.press/v48/chwialkowski16.html">Kernel Goodness-of-fit test</a>. This avoids the problems discussed above, but it requires gradients of the log target density, whereas the PRT makes no assumptions about the target distribution.</p>
<h2 id="conclusions">Conclusions</h2>
<p>The Prior Reproduction Test is a powerful way to test MCMC code but can be expensive computationally. This test - along with its simplified versions described above - can be included in an arsenal of diagnostics to check that MCMC samples are from the correct distribution.</p>
<p><em>Code to reproduce the figures is on <a href="https://github.com/jeremiecoullon/PRT_post">Github</a></em></p>
<p><em>Thanks to <a href="http://herrstrathmann.de/">Heiko Strathmann</a> and <a href="https://uk.linkedin.com/in/lea-goetz-neuroscience">Lea Goetz</a> for useful feedback on this post</em></p>Markov Chain Monte Carlo (MCMC) is a class of algorithms for sampling from probability distributions. These are very useful algorithms, but it’s easy to go wrong and obtain samples from the wrong probability distribution. What’s more, it won’t be obvious if the sampler fails, so we need ways to check whether it’s working correctly.The DjangoVerse2019-11-27T11:01:52+00:002019-11-27T11:01:52+00:00/2019/11/27/DjangoVerse<p>The <a href="https://www.londondjangocollective.com/djangoverse/">DjangoVerse</a> is a 3D graph of gypsy jazz players around the world. I designed this with <a href="https://www.mattholborn.com">Matt Holborn</a> (he got the idea from <a href="https://www.coreymwamba.co.uk/resources/rhizome/">the Rhizome</a>) and built it using React and Django.</p>
<h2 id="how-does-it-work-">How does it work ?</h2>
<p>As anyone can modify it, people can <a href="https://www.londondjangocollective.com/djangoverse/forms/player/list">add themselves or players</a> they know to it. If you click on a player you get information about them: what instrument they play, a picture of them, a short bio, and a link to a youtube video of them. As the names are coloured by country, you can immediately see how many players there are in the different countries around the world. You can try out the DjangoVerse in the figure below:</p>
<figure style="text-align:center">
<iframe src="https://djangoversereact.s3.eu-west-2.amazonaws.com/index.html" style="width:96%; margin-left:2%; height:400px;"></iframe>
<figcaption><a href="https://www.londondjangocollective.com/djangoverse/">The DjangoVerse</a></figcaption>
</figure>
<p>The players have a link between them if they have gigged together, and if you click on a player you get those links highlighted in red. This allows you to see at a glance who they’ve played with and whether they’ve played with people from different countries. You can also filter the graph to only display players from chosen countries, based on the instruments they play, or whether or not they’re active. We started out by added around 60 players ourselves, and then shared it on Facebook and Instagram; the gypsy jazz community added the rest (there are 220 players across 21 countries at the time of writing).</p>
<h2 id="tech-stack">Tech stack</h2>
<p>I built the graph with React and <a href="https://github.com/vasturiano/react-force-graph">D3 force directed graph</a> and hosted it on S3 (<a href="https://github.com/jeremiecoullon/DjangoVerse-react">see code</a>). The API is built using Django and Postgres and is hosted on Heroku (with S3 for static files). As the DjangoVerse is part of the <a href="https://www.londondjangocollective.com/">London Django Collective</a>, I used the same <a href="https://github.com/jeremiecoullon/ldc">Django application</a> to serve the pages for the Collective as well as the API. As the React app with the graph is hosted on S3, the <a href="https://www.londondjangocollective.com/djangoverse/">page</a> in the Collective website simply has an iframe that points to it.</p>
<h1 id="the-design-process">The design process</h1>
<h2 id="a-first-attempt">A first attempt</h2>
<p>The main motivation was that I’ve wanted for a long time to create a 3D graph mapping links between related things (and had ideas about doing this for academic disciplines, jazz standards, and more). So this project was a way to scratch that itch. The objective more specifically was to be able to visualise the gypsy jazz scene in one place, discover new players and bands, and let people be able to promote their music/bands.</p>
<p>As a result we started off with many different types of nodes: players, bands, festivals, albums, and venues. So each of these would be added to the graph along with links between them. A link between a player and band would mean that a players is in a band, a link between a band and a festival would mean that it’s played at the festival, and so on. Each node would be a sphere of different size (the size would depend on the type) and the name would appear on hover; this was inspired by <a href="https://steemverse.com/">Steemverse</a> (a visualisation of a social network).</p>
<p>Furthermore, the links between two nodes would also have information about it, such as the year a band has played in a festival, or the years a player was active in a band. You would then be able to filter the graph to only show what happened in a given year, which would give a “snapshot” of the gypsy jazz scene at that moment in time.</p>
<h2 id="too-much-stuff">Too much stuff</h2>
<p>However, it quickly became clear that it was too much information: having all these types of nodes and information about the links would be too overwhelming to have in the graph. So we removed the venue and album types, along with the information about each link. We kept only the active/inactive tags which would allow to differentiate between the gypsy jazz scene in past and in the present.</p>
<p>We then tested a prototype (with players, bands, and venues all represented as spheres of different sizes) with some friends (see the classic <a href="https://www.amazon.co.uk/Dont-Make-Me-Think-Usability/dp/0321344758">Don’t Make Me Think</a> for an overview of user testing), and it turned out that it wasn’t very clear what the DjangoVerse was. For example one reaction was <em>“I’m guessing it’s a simulation of a molecule or something”</em>, which makes sense given that it essentially looked like <a href="https://vasturiano.github.io/3d-force-graph/example/async-load/">this</a>. This could maybe be fixed by adding names next to the nodes, but if you do this then D3 starts lagging quite quickly as you add many players.</p>
<p>Another problem was that festivals naturally ended up being at the centre of the graph, as they were the nodes with the most connections. The players and bands themselves then ended up seeming less important, even though we think a style of music is mainly about the players themselves rather than the festivals. As a visualisation is supposed to bring out the aspects of the data that the designer thinks is most important, we needed to have the players be more prominent.</p>
<h2 id="simplifying-the-design">Simplifying the design</h2>
<p>A fix to both of these problems was to simplify the graph again: we remove festivals and albums and kept just the players. We also just showed the names of the players rather than the spheres. As the names are immediately visible, a user can then recognise some of the players and guess immediately what this is about (this was confirmed with testing). However a downside of this is that having all the names rather than just spheres causes the graph to lag when there are more than 100 or so players. <a href="https://steemverse.com/">Steemverse</a> gets around this problem by only having names for the “category” types of nodes (which are rare); all other spheres only have names on hover.</p>
<p>For the aspect of users adding players, there is no authentication so anyone can add or modify a player without needing to log in. The benefit is that there is less of a barrier for people to add to the graph, but with the risk of people posting spam (or deleting all the players!). To mitigate this, I set up daily backups (easy to do with Heroku) which would allow to restore the graph to before there was a problem. If the problem persisted, I would have simply added authentication (for example OAuth with Google/Facebook/etc..).</p>
<h1 id="outcomes-and-comparison-to-other-graphs">Outcomes and comparison to other graphs</h1>
<p>Players on the gypsy jazz scene around the world added lots of players to the graph: there are 220 players spanning 21 countries and with 9 instruments represented. A feature that was used a lot was the possibility of adding a youtube video: this allows each player to showcase their music. The short bio for each player was also interesting; when we added the bio we didn’t think much of it nor consider too much how it would be used. However some of the users added information such as which players were related to each other (father, cousin etc..) which was really interesting!</p>
<h2 id="lessons">Lessons</h2>
<p>In terms of design, an important take-away to be learnt from graph visualisations such as this is about how much information to include in it. Although a main aspect of these visualisations is just “eye-candy” (ie: it looks fun), it would be good if it was also informative or insightful. At one end of the spectrum, if there is too little information then there is not much to learn from the visualisation. At the other extreme, if there is too much information (and the design isn’t done carefully) then it’s easy to get overwhelmed. For me, some examples of this are <a href="https://www.wikiverse.io/">Wikiverse</a> (it has a huge amount of information (it’s a subset of wikipedia!) and I find the interface very confusing), <a href="https://steemverse.com/">Steemverse</a> (it looks great, but there’s not much information in it) or the <a href="https://www.coreymwamba.co.uk/resources/rhizome/">Rhizome</a> (as it’s in only 2 dimensions, it’s hard to see what’s going on in the graph).</p>
<p>In contrast, an example of a simple graph that I think works well is this <a href="https://www.quantamagazine.org/frontier-of-physics-interactive-map-20150803/">map of “theories of everything”</a>. I don’t understand what these theories are (these are disciplines in theoretical physics), but the design is done very well and classifies them in a clear way.</p>
<p>Other examples of very well designed graphs are the ones built by <a href="http://concept.space/">concept.space</a>, such as this <a href="http://map.philosophies.space/">map of philosophy</a>. It has a huge amount of information, but most of it is hidden if you are zoomed out. As you zoom into a specific area of philosophy you get more and more detail about that area of philosophy until you have individual papers. When you click on a paper you then get the abstract and a link to it.</p>
<p>Notice also the minimap in the lower right hand corner that reminds you of where you currently are in the map. Finally, it seems that they have automated the process of adding and clustering the papers (from looking at the software <a href="http://philosophies.space/credits/">credited</a> on their website). They seemed to have scraped <a href="https://philpapers.org/">PhilPapers</a>, used <a href="https://code.google.com/archive/p/word2vec/">Word2Vec</a> to get word embeddings for each paper, <a href="https://github.com/lmcinnes/umap">reduced the dimension</a> of the space, and finally <a href="https://hdbscan.readthedocs.io/en/latest/">clustered</a> the result to find the location of each paper in the 2 dimensional map. As a result they could then use this workflow to create a similar map for <a href="http://map.climate.space/">climate science</a> and <a href="http://concept.space/projects/biomap/">biomedicine</a>.</p>
<p>In conclusion, the idea of a visual map showing the links between different things in a discipline (players in gypsy jazz, papers in philosophy, etc..) is a very appealing one. However, getting it right is surprisingly difficult; for me the best example is the map of philosophy described above.</p>
<p><em>Thanks to <a href="https://www.lukas.derungs.de/">Lukas DeRungs</a> for reading a draft of this post</em></p>The DjangoVerse is a 3D graph of gypsy jazz players around the world. I designed this with Matt Holborn (he got the idea from the Rhizome) and built it using React and Django.