OCaml Planet

April 18, 2014

GaGallium

The 6 parameters of ('a, 'b, 'c, 'd, 'e, 'f) format6

The infamous format6 type is the basis of the hackish but damn convenient, type-safe way in which OCaml handles format strings:

Printf.printf "%d) %s -> %f\n"
  3 "foo" 5.12

The first argument of printf in this example is not a string, but a format string (they share the same literal syntax, but have different type, and there is a small hack in the type-inference engine to make this possible). It's type, which you can get by asking ("%d) %s -> %f\n" : (_,_,_,_,_,_) format6) in a toplevel, is a mouthful (and I renamed the variables for better readability):

(int -> string -> float -> 'f, 'b, 'c, 'd, 'd, 'f) format6

What do those six arcane parameters mean? In the process of reviewing Benoît Vaugon's work on using GADTs to re-implement format functions, I've finally felt the need (after years of delighted oblivion) to understand each of those parameters. And that came after several days of hacking on easier-to-understand ten-parameters GADT functions; hopefully most of our readers will never need this information, but the probability that I need to refresh my memory in the future is strong enough for me to write this blog post.

The main take-away is that this type slowly evolved, from a more reasonable ('a, 'b, 'c) format type, to the monster that lurks in the documentation today. Advanced features of format strings needed more and more parameters to carry their type information. It is possible to implement format strings with less features and simpler types, and it may be possible to implement the same format strings with types that are easier to understand, but I need to understand the interface as it exists.

I built my own understanding, not by reading the documentation (though I tried that first), but by imagining layers of features, from the simple to the advanced. This is this post's approach. It does necessarily correspond to the actual chronological evolution of format types -- Pierre Weis would know about that.

At the beginning were printf and sprintf

# Printf.printf "%d) %s -> %f\n";;
- : int -> string -> float -> unit = <fun>
# Printf.sprintf "%d) %s -> %f\n";;
- : int -> string -> float -> string = <fun>

Those two functions differ by their return type: one prints and returns nothing (unit), the other returns a string. This parameter is stored in the sixth-parameter, 'f, and is called "result type for the printf-like functions" in the documentation.

Of course, the format string does not need to impose any particular value to this parameter: a given format string can be used by a function with any possible result type, hence the undetermined variable 'f in the format type above. It is the printf functions that force a precise instantiation of the parameter: Printf.printf has a type of the form ('a, ..., unit) format6 -> 'a.

The type 'a mentioned in the type of printf is a large function type, with one parameter for each conversion specification (%d, %s, etc.), and whose return type is enforced by the format to be equal to the last parameter of the format. In the case of "%d) %s -> %f\n", this type is int -> string -> float -> 'a, where 'a is the same variable present in the last parameter of the format -- the result type. This type depends only on the format string, not at all on the printing (or scanning) function used.

To summarize, in ('a, 'b, 'c, 'd, 'e, 'f) format6, we've seen that:

  • 'f is what you get after you pass all the parameters to a printf-like function
  • 'a is the big function type that expects all those parameters and returns 'f

'b and 'c for user-defined printers:

You may know about %a and the lesser-known (but actually simpler) %t, used as follow:

Printf.sprintf "%d) %a -> %f"
  3
  print_foo foo
  6.23

The idea is that instead of converting the value foo to a string first, you can, at printf time, pass a "user-defined printer" print_foo as an extra argument that tells how to convert foo into a string. This is easy enough to handle typing-wise: the format %a adds two parameters to the big function type, instead of just one.

In a non-existing ideal world, print_foo used in sprintf would have the convenient type foo -> string. However, a single %a must work with many printing function: for sprintf, the user-defined should return a string, for printf it should just print and return unit, with fprintf it needs to be passed the channel in which we're printing, or with bprintf the target buffer. So this determines two type parameters in ('a, 'b, 'c, 'd, 'e, 'f) format6:

  • 'b is the type of the extra information that user-defined printers may need (Buffer.t for bprintf, output_channel for fprintf, etc.)

  • 'c is the return type of the user-defined printers accepted by the function (string for sprintf, unit for most others).

There are two remarks. First, it happens that printf use out_channel as second parameter, while you would probably expect unit. This is strange API design, but the rationale is to let you reuse your user-defined printers for both printf and fprintf.

Second, you may remark that the 'c parameter looks suspiciously like "the result type of printf-lie functions", 'f. Indeed, all current printf-like functions use, to my knowledge, the same type for 'f and 'c. But it would be very reasonable, for performance reasons, to design a different version of sprintf that returns a string, but whose user-defined sub-printers put result in a Buffer.t parameter and thus just return unit.

Scanf is just a continuation away

Consider the difference between printing and scanning:

Printf.printf "%d) %s %f"
  3 "foo" 7.23;;

Scanf.scanf "%d) %s %f"
  (fun num name value -> ....)

When printing with "%d) %s %f", the user has to provide an int parameter, a string parameter, and a float parameter, hence the type of the form int -> string -> float -> .... With scanning, you have to give them to the user, which looks like a fairly different activity. How can you reuse the same typing of the format string?

Fortunately, expressing scanning with a continuation (the function (fun num name value -> ...)) allows precisely to reuse a type of the form int -> string -> float -> ... to describe scanning: just say that for any type 'f, if the user gives a int -> string -> float -> 'f, scanf will return a 'f.

In a simple world, the type of scanf would thus be of the following form:

('a, ..., 'f) format6 -> 'a -> 'f

(where it is understood that for any format string, 'a will actually be of the form t1 -> t2 -> ... -> 'f)

(Alternate design idea: have a new type parameter that stores, instead of a big arrow type, the product of all parameter types (int * string * float in our example), and make scanf return this directly instead of using a continuation-passing style. The present style has the property of being easier to implement with GADTs, but that probably wasn't a design consideration.)

In fact, the typing of scanf is a bit more complicated, as explained in the next section.

'd and 'e for user-defined readers

Just as user-defined printers, to write formats in a more high-level style it is possible to provide user-defined scanners with %r:

let read_foo scanbuf = ...

Scanf.scanf "%d) %r -> %f"
  read_foo
  (fun n foo x -> ...)

Now the typing implications of this for format strings are a bit tricky. As we read three values of type int, foo and float, the naive typing of scanf proposed earlier

('a, ..., 'f) -> ('a -> 'f)

would specialize to, in this instance:

(int -> foo -> float -> 'f, ..., 'f) format6
 -> ((int -> foo -> float -> 'f) -> 'f)

which is not what we want, as there is no room for the extra read_foo parameter.

The argument expected by scanf after the format string is not just the continuation anymore, we need an extra argument for each user-defined reader. Yet those extra arguments should not appear in the type of the continuation.

What we need to do is to express the relation between the simple type 'a -> 'f (from continuation to result), which is the return type of scanf fmt in absence of user-defined readers, and the real return type of scanf. This is what our last two parameters do:

  • the second-last parameter, 'e, is forced by scanf functions to be equal to 'a -> 'f

  • the parameter 'd is the actual type of scanf fmt

  • the typing of format strings will generate constraints on the relation between 'd and 'e, depending on the number of %r present in the format; without any %r they are equal.

Consider our running example without any %r:

# ( "%d) %s -> %f" : (_, _, _, _, _, _) format6 );;
- : (int -> string -> float -> 'f,
     'b, 'c, 'd, 'd, 'f) format6

The two parameters I'm talking about are constrained to be equal to the same variables 'd. On the contrary, if we use %r instead of %s:

# ( "%d) %r -> %f" : (_,_,_,_,_,_) format6 );;
- : (int -> 'x -> float -> 'f,
     'b, 'c, ('b -> 'x) -> 'e, 'e, 'f) format6

Suddenly they are distinct, with one equal to the unknown 'e (that will be forced to equal (... -> 'f) -> 'f when applying scanf), and the other is ('b -> 'x) -> 'e: it adds on top of 'e an extra parameter (the user-defined format) which must return a 'x (if I pass the reader read_foo, this will be equated to foo) and takes as input a 'b, which is the type of the extra parameter of user-defined printers or readers (scanf will set it to Scanning.in_channel).

Benoît's GADT declarations are surprisingly efficient at showing you precisely what happens to the type parameters when you add a new conversion description to an existing format string. In particular, the constructor for user-defined readers reads as follow:

| Reader :                                                 (* %r *)
    ('a, 'b, 'c, 'd, 'e, 'f) fmt ->
      ('x -> 'a, 'b, 'c, ('b -> 'x) -> 'd, 'e, 'f) fmt

You can very well see that an argument of type 'x (the result type of the user-defined result) is added to the parameter 'a (the type of the continuation finally passed to scanf), while an argument of type ('b -> 'x) is added to 'd (the type of the arguments following the format string).

.

PS: Extra credit for the bonus question: why do I use (fmt : (_, _, _, _, _, _) format6) instead of the more convenient to write format_of_string fmt?

by Gabriel Scherer at April 18, 2014 08:00 AM

April 15, 2014

Matthias Puech

Disjunctive normal forms in big steps

This is probably a second-semester functional programming exercise, but I found it surprisingly hard, and could not find a solution online. So at the risk of depriving a TA from a problem for its mid-term exam, here is my take on it, that I painfully put together yesterday.

Given a formula built out of conjunction, disjunction and atoms, return its disjunctive normal form, in big step or natural semantics, that is, not applying repetitively the distributivity and associativity rules, but in a single function run. Before you go any further, please give it a try and send me your solution!

Formulas are described by the type form:

type atom = int

type form =
  | X of atom
  | And of form * form
  | Or of form * form

To ensure the correctness of the result, I represent disjunctive normal form by a restriction of this type, disj, by stratifying it into conjunctions and disjunctions:

type conj = X of atom | And of atom * conj
type disj = Conj of conj | Or of conj * disj

There are actually two restrictions at stake here: first, conjunctions cannot contain disjunctions, and second, all operators are necessarily right-associative. Constructor Conj is just a logically silent coercion. If you look carefully enough, you will notice that conj (resp. disj) is isomorphic to a non-empty list of atoms (resp. conj).

The first step is to lift the second restriction (associativity), and prove that we can always build a conjunction of conj, resp. a disjunction of disj. Easy enough if you think about lists: these functions look like concatenation.

let rec conj : conj -> conj -> conj = fun xs ys -> match xs with
  | X x -> And (x, ys)
  | And (x, xs) -> And (x, conj xs ys)

let rec disj : disj -> disj -> disj = fun xs ys -> match xs with
  | Conj c -> Or (c, ys)
  | Or (x, xs) -> Or (x, disj xs ys)

Then, we will lift the second restriction, using distributivity. We must show that it is always possible to form a conjunction. First, we show how to build the conjunction of a conj and a disj:

let rec map : conj -> disj -> disj = fun x -> function
  | Conj y -> Conj (conj x y)
  | Or (y, ys) -> Or (conj x y, map x ys)

The first case is trivial, the second reads as the distributivity of conjunction over disjunction. By analogy to lists again, I called this function map because it maps function conj x to all cells of the list.

Next, we show how to build the conjunction of two disj:

let rec cart : disj -> disj -> disj = fun xs ys -> match xs with
  | Conj c -> map c ys
  | Or (x, xs) -> disj (map x ys) (cart xs ys)

Considering the meaning of the previously defined functions, the first case is trivial, and the second, again, reads as distributivity, only in the other direction. I called this function cart because it computes the cartesian product of the two “lists” passed as arguments (only on non-empty lists).

Now we can easily put together the final function computing the DNF:

let rec dnf : form -> disj = function
  | X x -> Conj (X x)
  | Or (a, b) -> disj (dnf a) (dnf b)
  | And (a, b) -> cart (dnf a) (dnf b)

It is easy to see that all function terminate: they are primitive recursive.

Wait, let’s not forget to test our contraption:

let () =
  assert (dnf (Or (And (X 1, X 2), X 3)) =
          Or (And (1, X 2), Conj (X 3)));
  assert (dnf (And (Or (X 1, X 2), X 3)) =
          Or (And (1, X 3), Conj (And (2, X 3))));
  assert (dnf (And (Or (And (X 0, X 1), X 2), And (X 3, X 4))) =
          Or (And (0, And (1, And (3, X 4))), Conj (And (2, And (3, X 4)))))

That’s my solution. Reader, is there another one? Is there a better explanation, for instance based on Danvy’s small-step to big-step derivation? I believe there is…

Supplementary exercise

Technically, there still could be bugs in this code. Prove that it is correct wrt. the small-step rewrite rules, using only OCaml and GADTs. Here is the beginning of an idea: annotate form, conj and disj with their meaning in terms of OCaml types:

type ('a, 'b) union = Inl of 'a | Inr of 'b

type 'a atom = int

type 'a form =
  | X : 'a atom -> 'a form
  | And : 'a form * 'b form -> ('a * 'b) form
  | Or : 'a form * 'b form -> ('a, 'b) union form

type 'a conj =
  | X : 'a atom -> 'a conj
  | And : 'a atom * 'b conj -> ('a * 'b) conj

type 'a disj =
  | Conj : 'a conj -> 'a disj
  | Or : 'a conj * 'b disj -> ('a, 'b) union disj

(the definition of union is irrelevant here), state the relation between equivalent types as a type:

  type ('a, 'b) equiv =
    | Refl : ('a, 'a) equiv
    | Trans : ('a, 'b) equiv * ('b, 'c) equiv -> ('a, 'c) equiv
    | AssocA : (('a * 'b) * 'c, 'a * ('b * 'c)) equiv
    | AssocO : ((('a, 'b) union, 'c) union, ('a, ('b, 'c) union) union) equiv
    | DistribL : ('a * ('b, 'c) union, ('a, 'b) union * ('a, 'c) union) equiv
    | DistribR : (('b, 'c) union * 'a, ('b, 'a) union * ('c, 'a) union) equiv

pack up a solution as an existential: an equivalence proof together with a DNF:

type 'a dnf = Dnf : ('a, 'b) equiv * 'b disj -> 'a dnf

and code a function:

let dnf : type a. a form -> a dnf =
  function _ -> (assert false)   (* TODO *)

Ok fair enough, it’s not an exercise, it’s something I haven’t been able to put together yet ;)


by Matthias Puech at April 15, 2014 08:14 PM

OCamlPro

OCamlPro Highlights: March 2014

Here is a short report of some of our activities in March 2014 !

Welcome Thomas

First, we would like to welcome Thomas Blanc on board ! Thomas is starting a PhD at OCamlPro, with Michel Mauny from ENSTA as his PhD advisor. So, he will stay with us for three years, working on static analysis of whole OCaml programs, with two main objectives:

  • To detect uncaught exceptions, using a different approach than ocamlexc, the tool that François Pessaux developed during his PhD thesis, a method that will make Thomas' tool easier to upgrade at each new OCaml version;

  • To optimize generated code, with the benefits of whole program knowledge;

OPAM Improvements

OPAM has seen its share of code quality improvements during this month, with a rewritten parser for OPAM description files, now reporting error locations, better handling of some corner cases, more expressive install requests from the command-line, and a more friendly way of handling pinned packages. We are now stabilizing these features and you should expect soon a Beta release of OPAM 1.2.0.

Binary Release of OCaml in OPAM

We also spent some time this month working on using OPAM to distribute binary releases of OCaml. OPAM is great to use, but it sometimes takes a lot of time to compile OCaml when a new switch is needed for some small experimentation. We decided it would be interesting to experiment with a binary package of OCaml (4.01.0 for now).

The results are quite interesting, in most of our tests, downloading the compiler binary archive and uncompressing it takes an order of magnitude less time than compiling from sources, even with a slow connection:

OCaml 4.01.0 installation time on OPAM
From Sources Binary DownloadBinary Install
Intel i7 2.10GHz, Linux 64 bits, 16 GB RAM, SSD 238s 220s (DSL) 8s
Intel i7 2.60GHz, Linux 64 bits, 8 GB, SSD 253s 22s (Cable) 9s
Intel i5 2.67GHz, Linux 64 bits, 2 GB, SATA 289s 65s (Wifi) 26s
Intel Core2 2.53 GHz, Linux 32 bits, 4 GB, SSD 402s 68s (Wifi) 40s
Intel Xeon 2.67GHz, Linux 64 bits, 8 GB, SATA 289s
Intel core2 3.00 Ghz, Linux 32 bits, 4 GB, SATA 258s 15s (Cable) 7s

You can try our compiler on your system (on Intel Linux only for now, but we plan to add binary versions for other architectures):

opam switch 4.01.0+bin-ocp

So now, the fastest way to get OCaml running on a computer is downloading OPAM and calling:

opam init --comp 4.01.0+bin-ocp

Another great thing with this compiler is that the binary archive is kept in a cache (in ~/.opam/ocp-compiler-cache/), so that creating more switches for this compiler is almost cost free. For example, you can try:

opam switch 4.01.0+test+my+soft --switch 4.01.0+bin-ocp

If you have an Intel Linux computer, and you get any problem with this distribution, you should send us the output of OPAM and we will try to fix the problem ! Unfortunately for CentOS users, we had to choose a recent libc, so that it will probably not run on this system.

Solvers in the Cloud for OPAM

In some cases, OPAM's internal heuristic is not enough -- upgrades are a np-complete problem after all. For this reason, OPAM has been thought from the beginning to be able to use an external solver, using the CUDF format to interoperate with solvers. Unfortunately, depending on your system, such a solver, like aspcud, may not be available or easy to install.

So last month, we worked with the Mancoosi team to set up on one of their servers a CUDF solver farm, to work around this issue. Following the instructions on the site:

http://cudf-solvers.irill.org/index.html

you can setup your OPAM configuration to use a remote CUDF solver, that will usually propose better solutions than the one provided internally by OPAM.

A bit "hackish" way of using it immediatly is to copy the following script in a file aspcud in your path:

#!/bin/bash
SERVER=cudf-solvers.irill.org
if bzip2 -c $1 | curl -f --data-binary @- http://$SERVER/cudf.bz2?criteria="$3" | bunzip2 -c > $2;
then exit 0
else echo FAIL > $2
fi

The Alt-Ergo SMT Solver

In the context of the Bware project, where we are working on improving automatic solvers for Atelier B, we submitted a short paper to the ABZ conference describing the improvements we made in Alt-Ergo to increase its success rate on formulas translated from B proof obligations. Our paper has been accepted for publication! We will present it at ABZ 2014, which will be held in Toulouse (France) in the beginning of June.

In the Scilab Corner

We are pursuing our quest for a JIT for Scilab. Scilab really is a dynamic scripting language, in terms of typing, scoping, constructs and of how its users consider it. For most of them, it is basically an advanced, scriptable calculator. A common usecase is to pause in the middle of a function, open a toplevel at this point to edit some local variable or open a graphical visualisation, and then resume the execution.

In this context, we decided to adopt a really pragmatic approach, which is to detect statement-level code patterns that we know to be JITable, instead of trying to JIT whole functions. This way, we'll be able to leave the intro and outro of functions (which may do whatever they want to control and scope) to the interpreter, and concentrate on the inner, performance critical loop.

So, this month, we worked on several tasks required to achieve this goal. First, we worked on the detection of code snipplets, and as a side effect of this work, we plan to provide Scilab users with a new syntax-aware grep-like tool. We also worked on how to customize Scilab's loading function to replace some parts of the code by foreign calls to the future JIT. We also worked on the introspection of Scilab's context to build the environment of the JIT's type system.

by Louis Gesbert (louis.gesbert@ocamlpro.com) at April 15, 2014 12:00 AM

April 11, 2014

Matthias Puech

Representing pattern-matching with GADTs

Here is a little programming pearl. I’ve been wanting to work on pattern-matching for a while now, and it seems like I will finally have this opportunity here at my new (academic) home, McGill.

Encoding some simply-typed languages with GADTs is now routine for a lot of OCaml programmers. You can even take (kind of) advantage of (some form of) convenient binding representation, like (weak) HOAS; you then use OCaml variables to denote your language’s variables. But what about pattern-matching? Patterns are possibly “deep”, i.e. they bind several variables at a time, and they don’t respect the usual discipline that a variable is bound for exactly its subterm in the AST.

It turns out that there is an adequate encoding, that relies on two simple ideas. The first is to treat variables in patterns as nameless placeholders bound by λ-abstractions on the right side of the arrow (this is how e.g. Coq encodes matches: match E₁ with (y, z) -> E₂ actually is sugar for match E₁ with (_, _) -> fun x y -> E₂); the second is to thread and accumulate type arguments in a GADT, much like we demonstrated in our printf example recently.

The ideas probably extends seamlessly to De Bruijn indices, by threading an explicit environment throughout the term. It stemmed from a discussion on LF encodings of pattern-matching with Francisco over lunch yesterday: what I will show enables also to represent adequately pattern-matching in LF, which I do not think was ever done this way before.

Let’s start with two basic type definitions:

type ('a, 'b) union = Inl of 'a | Inr of 'b
type ('a, 'b) pair = Pair of 'a * 'b

The encoding

First, I encode simply-typed λ-expressions with sums and products, in the very standard way with GADTs: I annotate every constructor by the (OCaml) type of its denotation.

type 'a exp =
  | Lam : ('a exp -> 'b exp) -> ('a -> 'b) exp
  | App : ('a -> 'b) exp * 'a exp -> 'b exp
  | Var : 'a -> 'a exp
  | Pair : 'a exp * 'b exp -> ('a, 'b) pair exp
  | Inl : 'a exp -> (('a, 'b) union) exp
  | Inr : 'b exp -> ('a, 'b) union exp
  | Unit : unit exp

At this point, I only included data type constructors, not their destructors. These are replaced by a pattern-matching construct: it takes a scrutinee of type 's, and a list of branches, each returning a value of the same type 'c.

  | Match : 's exp * ('s, 'c) branch list -> 'c exp

Now, each branch is the pair of a pattern, possibly deep, possibly containing variables, and an expression where all these variables are bound.

(* 's = type of scrutinee; 'c = type of return *)
and ('s, 'c) branch =
  | Branch : ('s, 'a, 'c exp) patt * 'a -> ('s, 'c) branch

To account for these bindings, I use a trick when defining patterns that is similar to the one used for printf with GADTs. In the type of the Branch constructor, the type 'a is an “accumulator” for all variables appearing in the pattern, eventually returning a 'c exp. For instance, annotation 'a for a pattern that binds two variables of type 'a -> 'b and 'a would be ('a -> 'b) exp -> 'a exp -> 'c exp.

Let’s define type patt. Note that it also carries and checks the annotation 's for the type of the scrutinee. The first three cases are quite easy:

(* 's = type of scrutinee;
   'a = accumulator for to bind variables;
   'c = type of return *)
and ('s, 'a, 'c) patt =
  | PUnit : (unit, 'c, 'c) patt
  | PInl : ('s, 'a, 'c) patt -> (('s, 't) union, 'a, 'c) patt
  | PInr : ('t, 'a, 'c) patt -> (('s, 't) union, 'a, 'c) patt

Now, the variable case is just a nameless dummy that “stacks up” one more argument in the “accumulator”, i.e. what will be the type of the right-hand side of the branch:

  | X : ('s, 's exp -> 'c, 'c) patt

Finally, the pair case is the only binary node. It need to thread the accumulator, to the left node, then to the right.

  | PPair : ('s, 'a, 'b) patt * ('t, 'b, 'c) patt 
    -> (('s, 't) pair, 'a, 'c) patt

Note that it is possible to swap the two sides of the pair; we would then bind variables in the opposite order on the right-hand side.

That’s the encoding. Note that it ensures only well-typing of terms, not exhaustiveness of patterns (which is another story that I would like to tell in a future post).

Examples

Here are a couple of example encoded terms: first the shallow, OCaml value, then its representation:

let ex1 : = fun x -> match x with
  | Inl x -> Inr x
  | Inr x -> Inl x

let ex1_encoded : 'a 'b. (('a, 'b) union -> ('b, 'a) union) exp =
  Lam (fun x -> Match (x, [
      Branch (PInl X, fun x -> Inr x);
      Branch (PInr X, fun x -> Inl x);
    ]))

let ex2 : 'a 'b. ((('a, 'b) union, ('a, 'b) union) pair
    -> ('a, 'b) union) =
  fun x -> match x with
    | Pair (x, Inl _) -> x
    | Pair (Inr _, x) -> x
    | Pair (_, Inr x) -> Inr x

let ex2_encoded : 'a 'b. ((('a, 'b) union, ('a, 'b) union) pair 
    -> ('a, 'b) union) exp =
  Lam (fun x -> Match (x, [
      Branch (PPair (X, PInl X), (fun x _ -> x));
      Branch (PPair (PInr X, X), (fun _ x -> x));
      Branch (PPair (X, PInr X), (fun _ x -> Inr x));
    ]))

An interpreter

Finally, we can code an evaluator for this language. It takes an expression to its (well-typed) denotation. The first few lines are standard:

let rec eval : type a. a exp -> a = function
  | Lam f -> fun x -> eval (f (Var x))
  | App (m, n) -> eval m (eval n)
  | Var x -> x
  | Pair (m, n) -> Pair (eval m, eval n)
  | Inl m -> Inl (eval m)
  | Inr m -> Inr (eval m)
  | Unit -> ()
  | Match (m, bs) -> branches (eval m) bs

Now for pattern-matching, we call an auxilliary function branches that will try each branch sequentially:

and branches : type s a. s -> (s, a) branch list -> a = fun v -> function
  | [] -> failwith "pattern-matching failure"
  | Branch (p, e) :: bs -> 
    try eval (branch e (p, v)) with Not_found -> branches v bs

A branch is tested by function branch, which is where the magic happens: it matches the pattern and the value of the scrutinee, and returns a (potentially only) partially applied resulting expression. The first cases are self-explanatory:

and branch : type s a c. a -> (s, a, c) patt * s -> c = fun e -> function
  | PUnit, () -> e
  | PInl p, Inl v -> branch e (p, v)
  | PInr p, Inr v -> branch e (p, v)
  | PInl _, Inr _ -> raise Not_found
  | PInr _, Inl _ -> raise Not_found

In the variable case, we know that e is a function that expects an argument: the value v of the scrutinee.

  | X, v -> e (Var v)

The pair case is simple and beautiful: we just compose the application of branch on both sub-patterns.

  | PPair (p, q), Pair (v, w) -> branch (branch e (p, v)) (q, w)

That’s it. Nice eh? There are two obvious questions that I leave for future posts: can we compile this encoding down to simple case statement, with the guarantee of type preservation? and could we enhance the encoding such as to guarantee statically exhaustiveness?

See you soon!


by Matthias Puech at April 11, 2014 10:38 PM

Ocsigen project

Ocsigen Js_of_ocaml 2.0

We are happy to announce release 2.0 of Js_of_ocaml, the compiler from OCaml bytecode to JavaScript.

A lot of efforts has been put in reducing the size of the generated JavaScript code. Much shorter variable names are used; unnecessary whitespaces and semicolons are omitted; multiple occurrences of a same constant are shared... The runtime is minified as well. You can expect a space reduction of 15% to 20%.

Recursive modules are now supported. Tail calls between mutually recursive functions are optimized (using trampolines). In particular, lexers produced by ocamllex are now properly optimized.

The runtime now simulates a small filesystem (in memory), which makes it possible to use the OCaml I/O functions. The standard outputs are by default redirected to the JavaScript console, which is convenient for debugging.

A larger part of the OCaml libraries are supported: bigarrays, the time-related functions of the Unix library.

A number of incompatible changes have been made. In particular:

  • JavaScript numbers are simply given type 'float' rather than type 'float Js.t';
  • the compiler generates "strict mode" JavaScript; therefore, 'Js.Unsafe.variable "this"' does not refer to the JavaScript global object ("window" in browsers) anymore; you can use 'Js.Unsafe.global' instead;
  • runtime primitives are now wrapped together with the generated code in a huge function so as not to pollute the global scope.

LINKS

Regards,

Jérôme Vouillon and Hugo Heuzard

by Ocsigen team at April 11, 2014 02:13 PM

April 10, 2014

LexiFi

Inlined records in constructors

I'd like to introduce a new language feature, inlined record arguments on constructors, which I propose for inclusion in OCaml. In a nutshell, it allows you to define a record directly within a sum type constructor:

1
2
3
  type t =
     | A of { x : int; y: string }
     | ...

The argument of the constructor is a full-fledged record type. All features of records are available: dot notation, mutable fields, polymorphic fields, punning syntax, record override. You can write:

read more

by Alain Frisch at April 10, 2014 11:52 AM

April 07, 2014

Anil Madhavapeddy

Grepping the source of every OCaml package in OPAM

A regular question that comes up from OCaml developers is how to use OPAM as a hypothesis testing tool against the known corpus of OCaml source code. In other words: can we quickly and simply run grep over every source archive in OPAM? So that’s the topic of today’s 5 minute blog post:

git clone git://github.com/ocaml/opam-repository
cd opam-repository
opam-admin make
cd archives
for i in *.tar.gz; \
  do tar -zxOf $i | grep caml_stat_alloc_string; \
done

In this particular example we’re looking for instances of caml_stat_alloc_string, so just replace that with the regular expression of your choice. The opam-admin tool repacks upstream archives into a straightforward tarball, so you don’t need to worry about all the different archival formats that OPAM supports (such as git or Darcs). It just adds an archive directory to a normal opam-repository checkout, so you can reuse an existing checkout if you have one already.

$ cd opam-repository/archives
$ du -h
669M	.
$ ls | wc -l
   2092

April 07, 2014 11:00 PM

Jane Street

Generic mapping and folding in OCaml

Haskell has a function fmap which can map over a number of different datatypes. For example, fmap can map a function over both a List and a Maybe (the equivalent of an option in OCaml):

Prelude> fmap (+ 1) [1,2]
[2,3]
Prelude> fmap (+ 1) (Just 3)
Just 4

Unfortunately, the equivalent is impossible in OCaml. That is, there's no way to define an OCaml value fmap so that the two expressions:

# fmap [1;2]    ~f:((+) 1)
# fmap (Some 3) ~f:((+) 1)

both typecheck and evaluate to the right value.

Even if we eliminate the complexity of type inference by specifying the type explicitly, we can't define fmap so that the two expressions:

# fmap ([1;2]  : _ list)   ~f:((+) 1)
# fmap (Some 3 : _ option) ~f:((+) 1)

typecheck and evaluate to the right value.

However, the Generic module in Jane Street's Core_extended library will let us do exactly that with just a trivial syntactic change. But before continuing, I'll warn you that the Generic module is not necessarily something you'd want to use in real world code; it falls much more in the "cute trick" category. But with that caveat, let's look at our example using Generic:

# open Core.Std;;
# open Core_extended.Generic;;

# map ([1;2] >: __ list) ~f:((+) 1);;
- : int list = [2; 3]
# map (Some 3 >: __ option) ~f:((+) 1);;
- : int option = Some 4    

Note that, after opening the Generic module, all we did to the previous example was change : to >: and _ to __. (Also, the Generic module calls the mapping function map instead of fmap, but that's inconsequential.)

Of course, the trick is that >:, __, list, and option are actually values defined by the Generic module in such a way that their intended usage looks like a type annotation.

Note that these "types" are nestable as you would expect real types to be:

# map ([None; Some 3] >: __ option list) ~f:((+) 1);;
- : int option list = [None; Some 4]        

This means that you can change what map does just by changing the "type" you assign to its argument:

# map ([None; Some 3] >: __ option list) ~f:(fun _ -> ());;
- : unit option list = [None; Some ()]
# map ([None; Some 3] >: __ list) ~f:(fun _ -> ());;
- : unit list = [(); ()]

The Generic module also defines a generic fold function so that you can accumulate values at any "depth" in your value:

# fold ([[Some 3; None]; [Some 5; Some 2]] >: __ option list list) ~init:0 ~f:(+);;
- : int = 10

Not every "type" formable is __ followed by some sequence of options and lists: for example, Generic also provides string (considered as a container of characters):

# map ([Some "foo"; None; Some "bar"] >: string option list) ~f:Char.uppercase;;
- : string option list = [Some "FOO"; None; Some "BAR"]

Note that the fact that the "types" are nestable means that these values must have unusual definitions: in particular, __ (and string) are functions which must be able to take a variable number of arguments. Indeed, these values are defined using a technique sweeks wrote about in a blog post on variable argument functions: the f and z in sweeks's post are analogous here to __ and >: respectively.

Here's the definition of the primitive values we've used so far (Generic actually defines a few more):

let __ k = k (fun f x -> f x)

let ( >: ) x t y = t (fun x -> x) y x

let map x ~f = x f

let string k = k (fun f -> String.map ~f)

let list map k = k (fun f -> List.map ~f:(map f))

let option map k = k (fun f -> Option.map ~f:(map f))

The types of these turn out to be extremely unhelpful, and you can't really use them to figure out how to use these values. For example, here is the type of >: (and this isn't just the inferred type of the above definition, this is the type which must actually be exposed to use >:):

val ( >: ) : 'a -> (('b -> 'b) -> 'c -> 'a -> 'd) -> 'c -> 'd

Finally, is this module actually used? The answer is no. As far as I know, it's used nowhere in Jane Street's codebase. But it's still kind of cute.

by Michael O'Connor at April 07, 2014 11:10 AM