OCaml Planet

## August 30, 2016

### Andrej Bauer

#### Formal proofs are not just deduction steps

I have participated in a couple of lengthy discussions about formal proofs. I realized that an old misconception is creeping in. Let me expose it.

In traditional mathematical logic (by which I mean first-order logic, as established by Hilbert, Ackermann, Skolem, Gödel and others) the concepts of logical formula and formal proof are the central notions. This is so because the main goal of traditional logic is the meta-mathematical study of provability, i.e., what can be proved in principle. Other concerns, such as what can be computed in principle, are relegated to other disciplines, such as computability theory.

It is too easy to forget that mathematical logic is only an idealization of what mathematicians actually do. Indeed, a bizarre reversal has occurred in which mathematicians have adopted the practice of dressing up their activity as a series of theorems with proofs, even when a different kind of presentation is called for. Definitions are allowed but seen as just convenient abbreviations, and logicians enforce this view with the Conservativity theorem. Some even feel embarrassed about placing too much motivation and explanatory text in between the theorems, and others are annoyed by a speaker who spends a moment on motivation instead of plunging right into a series of unexplained technical moves.

To show what I am talking about let us consider a typical situation when the theorem-proof form is inappropriate. Often we see a statement and a proof of the form

Theorem: There exists a gadget $x$ such that $\phi(x)$.

Proof. We construct $x$ as follows. (An explicit construction $C$ is given). QED

but then the rest of the the text clearly refers to the particular construction $C$ from the proof. At a formal level this is wrong because the theorem states $\exists x . \phi(x)$ and it therefore abstracts away the construction in the proof (this is not about excluded middle at all, in case you are wondering). Whatever is done inside the proof is inaccessible because proofs are irrelevant.

Lately Vladimir Voevodsky has been advocating a different style of writing in which we state Problems which are then solved by giving constructions (see for instance page 3 here). This is a strict generalization of traditional logic because a theorem with a proof can be seen as the problem/construction “construct the proof of the given statement”. Vladimir Voevodsky may have been motivated by Martin-Löf’s type theory, where this is the common view, but let us also note that Euclid did it as well. Remembering Euclid and paying attention to Martin-Löf’s teaching is a very positive development, but is not the one I would like to talk about.

Another crucial component of mathematical activity which is obscured by traditional logic is computation. Traditional logic, and to some extent also type theory, hides computation behind equality. Would you like to compute $2 + 2$? Just make a series of deduction steps whose conclusion is $2 + 2 = 4$. But how do we know what we are supposed to prove if we have not calculated the result yet? Computation is not about proving equalities, it is a process which leads from inputs to outputs. Moreover, I claim that computation is a fundamental process which requires no expression in terms of another activity, nor does it need an independent justification.

Another word for computation is manipulation of objects. Even in traditional logic we must admit that before logic itself comes manipulation of syntax. One has to be able not only to build and recognize syntactic objects, but also manipulate them in non-trivial ways by performing substitution. Once substitution is on the table we’re only a step away from $\lambda$-calculus.

The over-emphasis on formal derivations is making difficult certain discussions and design decisions about computer-verified mathematics. Some insist that formal derivations must be accessible, either explicitly as objects stored in memory or implicitly through applications of structural recursion, for independent proof-checking or proof transformations. I think this is fine as far as derivations and constructions go, but let us not forget computation. It is a design error to encode computations as chains of equations glued together by applications of transitivity. An independent verification of a computation involves independently re-running the computation – not verifying someone else’s trace of it encoded as a derivation. A transformation of a computation is not a transformation of a chain of equations – it is something else, but what? I am not sure.

Once computation is recognized as essential, irreducible and fundamental, we can start asking the right questions:

1. What is computation in general?
2. What form of computation should be allowed in proof checkers?
3. How do we specify computation in proof objects so that it can be independently verified by proof checkers?

We have a pretty good idea about the answer to the first question.

A good answer to the second question seems difficult to accept. Several modern proof assistants encode computation in terms of normalization of terms, which shows that they have not quite freed themselves from the traditional view that computation is about proving equalities. If we really do believe that computation is basic then proof checkers should allow general and explicit computation inside the trusted core. After all, if you do not trust your own computer to compute correctly, why would you trust it to verify proofs?

The third question is about design. Coq has Mtac, HOL and Andromeda essentially are metalevel programming languages, and Agda has rewrite rules. I suppose I do not have to explain my view here: there is little reason to make the user jump through hoops by having them encode computation as normalization, or application of tactics, or whatnot. Just give them a programming language!

Lest someone misunderstands me, let me conclude by a couple of disclaimers.

First, I am not saying that anything was wrong with the 20th century logic. It was amazing, it was a revolution, a pinnacle of human achievement. It’s just that the current century (and possible all the subsequent ones) belongs to computers. The 20th century logicians thought about what can be formally proved in principle, while we need to think about how to formally prove in practice.

Second, I am not advocating untrusted or insecure proof checkers. I am advocating flexible trusted proof checkers that allow users a direct expression of their mathematical activity, which is not possible as long as we stick to the traditional notion of formal derivation.

Supplemental: I think I should explain a bit more precisely how I imagine basic computations would be performed in a trusted kernel. A traditional kernel checks that the given certificate is valid evidence of derivability of some judgment. (Note: I did not say that kernels check formal derivations because they do not do that in practice. Not a single one I know.) For instance, in Martin-Löf type theory a typing judgment $\Gamma \vdash e : A$ contains enough information to decide whether it is derivable, so it can be used as a certificate for its own derivability. Now, sometimes it makes sense to compute parts of the judgment on the fly (typically $e$) instead of giving it explicitly, for various reasons (efficiency, modularity, automation). In such cases it should be possible to provide a program $p$ which computes those parts, and the kernel should know how to run $p$. (It is irrelevant whether $p$ is total,  but that is a separate discussion.) There is of course the  question of how we can trust computations. There are in fact several such questions:

1. Can we trust the kernel to faithfully execute programs? For instance, if the kernel uses the CPU to compute sums of 64-bit integers, can that be trusted? And what if the language interpreter has a bug? This is the same sort of trust as general trust in the kernel, so it is not really new: in order to know that the kernel works correctly we need to certify all components that it depends on (the CPU, the operating system, the compiler used to compile the kernel, the source code of the kernel, etc.)
2. Can the programs executed by the kernel perform illegal instructions that corrupt the it or trick it into doing something bad? This is a standard question about programming languages that is addressed by safety theorems.
3. Can we trust that the given program $p$ actually computes the intended object? In some situations this question is irrelevant because the evidence will be checked later on anyway. An example of this would be a program which computes (parts of) a witness $(a,b)$ for a statement $\sum_{x : A} B(x)$. We do not care where $(a,b)$ came from because the kernel is going to use them as certificates of $\sum_{x : A} B(x)$ and discover potential problems anyhow. In other situations we are very much interested in knowing that the program does the right thing, but this is a standard situation as well: if you need to know that your program works correctly, state and prove the correctness criterion.

So I think there’s nothing new or fishy about trust and correctness in what I am proposing. The important thing is that we let the kernel run arbitrary programs that the user can express directly the way programs are normally written in an general-purpose programming language. Insisting that computation take on a particular form (a chain of equations tied together by transitivity, prolog-like proof search, a confluent and terminating normalization procedure) is ultimately limiting.

## August 29, 2016

### Github OCaml jobs

#### Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong

Software Developer

Jane Street is a proprietary quantitative trading firm, focusing primarily on trading equities and equity derivatives. We use innovative technology, a scientific approach, and a deep understanding of markets to stay successful in our highly competitive field. We operate around the clock and around the globe, employing over 400 people in offices in New York, London and Hong Kong.

The markets in which we trade change rapidly, but our intellectual approach changes faster still. Every day, we have new problems to solve and new theories to test. Our entrepreneurial culture is driven by our talented team of traders and programmers. At Jane Street, we don't come to work wanting to leave. We come to work excited to test new theories, have thought-provoking discussions, and maybe sneak in a game of ping-pong or two. Keeping our culture casual and our employees happy is of paramount importance to us.

We are looking to hire great software developers with an interest in functional programming. OCaml, a statically typed functional programming language with similarities to Haskell, Scheme, Erlang, F# and SML, is our language of choice. We've got the largest team of OCaml developers in any industrial setting, and probably the world's largest OCaml codebase. We use OCaml for running our entire business, supporting everything from research to systems administration to trading systems. If you're interested in seeing how functional programming plays out in the real world, there's no better place.

The atmosphere is informal and intellectual. There is a focus on education, and people learn about software and trading, both through formal classes and on the job. The work is challenging, and you get to see the practical impact of your efforts in quick and dramatic terms. Jane Street is also small enough that people have the freedom to get involved in many different areas of the business. Compensation is highly competitive, and there's a lot of room for growth.

You can learn more about Jane Street and our technology from our main site, janestreet.com. You can also look at a a talk given at CMU about why Jane Street uses functional programming (http://ocaml.janestreet.com/?q=node/61), and our programming blog (http://ocaml.janestreet.com).

We also have extensive benefits, including:

• 90% book reimbursement for work-related books
• 90% tuition reimbursement for continuing education
• Excellent, zero-premium medical and dental insurance
• Free lunch delivered daily from a selection of restaurants
• Catered breakfasts and fresh brewed Peet's coffee
• An on-site, private gym in New York with towel service
• Kitchens fully stocked with a variety of snack choices
• Full company 401(k) match up to 6% of salary, vests immediately
• Three weeks of paid vacation for new hires in the US
• 16 weeks fully paid maternity/paternity leave for primary caregivers, plus additional unpaid leave

## August 27, 2016

### Shayne Fletcher

#### Perfectly balanced binary search trees

The type of "association tables" (binary search trees).

type (α, β) t =| Empty| Node of (α , β) t * α * β * (α, β) t * int
There are two cases : a tree that is empty or, a node consisting of a left sub-tree, a key, the value associated with that key, a right sub-tree and, an integer representing the "height" of the tree (the number of nodes to traverse before reaching the most distant leaf).

The binary search tree invariant will be made to apply in that for any non empty tree $n$, every node in the left sub-tree is ordered less than $n$ and every node in the right sub-tree of $n$ is ordered greater than $n$ (in this program, ordering of keys is performed using the Pervasives.compare function).

This function, height, given a tree, extracts its height.

let height : (α, β) t -> int = function  | Empty -> 0  | Node (_, _, _, _, h) -> h

The value empty, is a constant, the empty tree.

let empty : (α, β) t = Empty

create l x d r creates a new non-empty tree with left sub-tree l, right sub-tree r and the binding of key x to the data d. The height of the tree created is computed from the heights of the two sub-trees.

let create (l : (α, β) t) (x : α) (d : β) (r : (α, β) t) : (α, β) t =  let hl = height l and hr = height r in  Node (l, x, d, r, (max hl hr) + 1)

This next function, balance is where all the action is at. Like the preceding function create, it is a factory function for interior nodes and so takes the same argument list as create. It has an additional duty though in that the tree that it produces takes balancing into consideration.

let balance (l : (α, β) t) (x : α) (d : β) (r : (α, β) t) : (α, β) t =  let hl = height l and hr = height r in  if hl > hr + 1 then    match l with
In this branch of the program, it has determined that production of a node with the given left and right sub-trees (denoted $l$ and $r$ respectively) would be unbalanced because $h(l) > hr(1) + 1$ (where $h$ denotes the height function).

There are two possible reasons to account for this. They are considered in turn.

    (*Case 1*)    | Node (ll, lv, ld, lr, _) when height ll >= height lr ->      create ll lv ld (create lr x d r)
So here, we find that $h(l) > h(r) + 1$, because of the height of the left sub-tree of $l$.
    (*Case 2*)    | Node (ll, lv, ld, Node (lrl, lrv, lrd, lrr, _), _) ->      create (create ll lv ld lrl) lrv lrd (create lrr x d r)
In this case, $h(l) > h(r) + 1$ because of the height of the right sub-tree of $l$.
    | _ -> assert false
We assert false for all other patterns as we aim to admit by construction no further possibilities.

We now consider the case $h(r) > h(l) + 1$, that is the right sub-tree being "too long".

  else if hr > hl + 1 then    match r with

There are two possible reasons.

    (*Case 3*)    | Node (rl, rv, rd, rr, _) when height rr >= height rl ->      create (create l x d rl) rv rd rr
Here $h(r) > h(l) + 1$ because of the right sub-tree of $r$.
    (*Case 4*)    | Node (Node (rll, rlv, rld, rlr, _), rv, rd, rr, _) ->      create (create l x d rll) rlv rld (create rlr rv rd rr)
Lastly, $h(r) > h(l) + 1$ because of the left sub-tree of $r$.
    | _ -> assert false
Again, all other patterns are (if we write this program correctly according to our intentions,) impossible and so, assert false as there are no further possibilities.

In the last case, neither $h(l) > h(r) + 1$ or $h(r) > h(l) + 1$ so no rotation is required.

  else    create l x d r

add x data t computes a new tree from t containing a binding of x to data. It resembles standard insertion into a binary search tree except that it propagates rotations through the tree to maintain balance after the insertion.

let rec add (x : α) (data : β) : (α, β) t -> (α, β) t = function    | Empty -> Node (Empty, x, data, Empty, 1)    | Node (l, v, d, r, h) ->      let c = compare x v in      if c = 0 then        Node (l, x, data, r, h)      else if c < 0 then        balance (add x data l) v d r      else         balance l v d (add x data r)

To implement removal of nodes from a tree, we'll find ourselves needing a function to "merge" two binary searchtrees $l$ and $r$ say where we can assume that all the elements of $l$ are ordered before the elements of $r$.

let rec merge (l : (α, β) t) (r : (α, β) t) : (α, β) t =   match (l, r) with  | Empty, t -> t  | t, Empty -> t  | Node (l1, v1, d1, r1, h1), Node (l2, v2, d2, r2, h2) ->    balance l1 v1 d1 (balance (merge r1 l2) v2 d2 r2)
Again, rotations are propagated through the tree to ensure the result of the merge results in a perfectly balanced tree.

With merge available, implementing remove becomes tractable.

let remove (id : α) (t : (α, β) t) : (α, β) t =   let rec remove_rec = function    | Empty -> Empty    | Node (l, k, d, r, _) ->      let c = compare id k in      if c = 0 then merge l r else        if c < 0 then balance (remove_rec l) k d r        else balance l k d (remove_rec r) in  remove_rec t

The remaining algorithms below are "stock" algorithms for binary search trees with no particular consideration of balancing necessary and so we won't dwell on them here.

let rec find (x : α) : (α, β) t -> β = function  | Empty ->  raise Not_found  | Node (l, v, d, r, _) ->    let c = compare x v in    if c = 0 then d    else find x (if c < 0 then l else r)let rec mem (x : α) : (α, β) t -> bool = function  | Empty -> false  | Node (l, v, d, r, _) ->    let c = compare x v in    c = 0 || mem x (if c < 0 then l else r)    let rec iter (f : α -> β -> unit) : (α, β) t -> unit = function  | Empty -> ()  | Node (l, v, d, r, _) ->    iter f l; f v d; iter f rlet rec map (f : α -> β -> γ) : (α, β) t -> (α, γ) t = function  | Empty -> Empty  | Node (l, k, d, r, h) ->     Node (map f l, k, f k d, map f r, h)let rec fold (f : α -> β -> γ -> γ) (m : (α, β) t) (acc : γ) : γ =  match m with  | Empty -> acc  | Node (l, k, d, r, _) -> fold f r (f k d (fold f l acc))open Formatlet print     (print_key : formatter -> α -> unit)    (print_data : formatter -> β -> unit)    (ppf : formatter)    (tbl : (α, β) t) : unit =  let print_tbl ppf tbl =    iter (fun k d ->            fprintf ppf "@[<2>%a ->@ %a;@]@ " print_key k print_data d)      tbl in  fprintf ppf "@[<hv>[[%a]]@]" print_tbl tbl

The source code for this post can be found in the file 'ocaml/misc/tbl.ml' in the OCaml source distribution. More information on balanced binary search trees including similar but different implementation techniques and complexity analyses can be found in this Cornell lecture and this one.

## August 26, 2016

### Overview

CollegeVine is looking for a product-focused full-stack developer to help engineer the future of mentorship and higher education attainment.

There aren't many industries left that haven't been significantly disrupted by technology in some way, but you're reading about one right here! You will find many opportunities to apply high-leverage computer science (think machine learning, probabilistic reasoning, etc.) as well as plenty of opportunities for the more human side of the problem. As it stands, the current admissions process is a huge source of stress and confusion for students and parents alike. If we execute correctly, your work will impact the entire next generation of college graduates-to-be.

You will join a fast-moving company whose culture centers around authenticity, excellence, and balance. You'll find that everyone likes to keep things simple and transparent. We prefer to be goal-oriented and hands-off as long as you are a self-starter.

Our modern perspective on developer potential means we celebrate and optimize for real output. And that's probably the reason why we're a polyglot functional programming shop, with emphasis on Haskell and functional paradigms. Our infrastructure and non-mission-critical tooling tends to be in whatever works best for the task at hand: sometimes that's Haskell with advanced GHC extensions a-blazin', other times it's minimalist Ruby or bash—basically, it's a team decision based on whatever sits at the intersection of appropriateness, developer joy, quality, and velocity.

As an early-stage company headquartered in Cambridge, MA, we have a strong preference for key members of our team to be located in the Boston metro area; however, given that our company has its roots in remote work (and that it's 2016), we are open to remote arrangements after one year of continuous employment and/or executive approval.

### Requirements

You know you are right for this position if:

• You have at least five years of professional software engineering experience, and at least two years of preference for a high-level programming language that's used in industry, like Haskell, Clojure, OCaml, Erlang, F#, or similar.
• You have some front-end experience with JS or a functional language that compiles to JS, like PureScript, Elm, Clojurescript, or similar. We use PureScript, React, and ES6 on the front-end. It's pretty awesome.
• You are a self-starter and internally motivated, with a strong desire to be part of a successful team that shares your high standards.
• You have great written communication skills and are comfortable with making big decisions over digital presence (e.g. video chat).
• You have polyglot experience along several axes (dynamic/static, imperative/functional, lazy/strict, weird/not-weird).
• You are comfortable with modern infrastructure essentials like AWS, Heroku, Docker, CI, etc. You have basic but passable sysadmin skills.
• You are fluent with git.
• You instrument before you optimize. You test before you ship. You listen before you conclude. You measure before you cut. Twice.

### Benefits

We offer a competitive salary and a full suite of benefits, some of them unconventional, but awesome for the right person:

• Medical, dental, vision insurance and 401k come standard.
• Flexible hours with a 4-hour core - plan the rest of your workday as you wish, just give us the majority of your most productive hours. Productivity ideas: avoid traffic, never wait in line at the grocery store, wake up without an alarm clock.
• Goal-based environment (as opposed to grind-based or decree-based environment; work smarter, not harder; intelligently, not mindlessly). We collaborate on setting goals, but you set your own process for accomplishing those goals. You will be entrusted with a lot of responsibility and you might even experience fulfillment and self-actualization as a result.
• Daily physical activity/mindfulness break + stipend: invest a non-core hour to make yourself more awesome by using it for yoga, tap-dance lessons, a new bike, massage, a surfboard - use your imagination! Just don’t sit at a computer all day! Come back to work more relaxed and productive and share your joy with the rest of the team. Note: You must present and share proof of your newly enriched life with the team in order to receive the stipend.
• Equipment/setup budget so you can tool up the way you want. A brand new 15" MBP is standard issue if you have no strong opinions.

Remember: We’re a startup. You’re an early employee. We face challenges. We have to ship. Your ideas matter. You will make a difference.

Get information on how to apply for this position.

## August 22, 2016

### Sylvain Le Gall

#### Release of OASIS 0.4.7

I am happy to announce the release of OASIS v0.4.7.

OASIS is a tool to help OCaml developers to integrate configure, build and install systems in their projects. It should help to create standard entry points in the source code build system, allowing external tools to analyse projects easily.

This tool is freely inspired by Cabal which is the same kind of tool for Haskell.

You can find the new release here and the changelog here. More information about OASIS in general on the OASIS website.

Pull request for inclusion in OPAM is pending.

Here is a quick summary of the important changes:

• Drop support for OASISFormat 0.2 and 0.1.
• New plugin "omake" to support build, doc and install actions.
• Improve automatic tests (Travis CI and AppVeyor)
• Trim down the dependencies (removed ocaml-gettext, camlp4, ocaml-data-notation)

Features:

• findlib_directory (beta): to install libraries in sub-directories of findlib.
• findlib_extra_files (beta): to install extra files with ocamlfind.
• source_patterns (alpha): to provide module to source file mapping.

This version contains a lot of changes and is the achievement of a huge amount of work. The addition of OMake as a plugin is a huge progress. The overall work has been targeted at making OASIS more library like. This is still a work in progress but we made some clear improvement by getting rid of various side effect (like the requirement of using "chdir" to handle the "-C", which leads to propage ~ctxt everywhere and design OASISFileSystem).

I would like to thanks again the contributor for this release: Spiros Eliopoulos, Paul Snively, Jeremie Dimino, Christopher Zimmermann, Christophe Troestler, Max Mouratov, Jacques-Pascal Deplaix, Geoff Shannon, Simon Cruanes, Vladimir Brankov, Gabriel Radanne, Evgenii Lepikhin, Petter Urkedal, Gerd Stolpmann and Anton Bachin.

### OCamlCore Forge News

#### OASIS v0.4.7 release

Read the full blog post here: http://le-gall.net/sylvain+violaine/blog/index.php?post/2016/08/22/Release-of-OASIS-0.4.6

## August 12, 2016

### Functional Jobs

#### Software Engineer (Haskell/Clojure) at Capital Match (Full-time)

Overview

Capital Match is a leading marketplace lending and invoice financing platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 10 million business loans processed with a strong monthly growth (current rate of USD 1.5-2.5 million monthly). We are also eyeing expansion into new geographies and product categories. Very exciting times!

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.

Responsibilities

We are looking to hire a software engineer with a minimum of 2-3 years coding experience. The current tech team includes a product manager and 3 software engineers. We are currently also in the process of hiring CTO.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending / invoice financing space would be a great advantage.

Requirements

Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.

Offer

We offer a combination of salary and equity depending on experience and skills of the candidate.

Most expats who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range).