OCaml Planet

April 20, 2014

April 18, 2014

Matthias Puech

Typeful disjunctive normal form

This is the answer to last post’s puzzle. I gave an algorithm to put a formula in disjunctive normal form, and suggested to prove it correct in OCaml, thanks to GADTs. My solution happens to include a wealth of little exercises that could be reused I think, so here it is.

I put the code snippets in the order that I think is more pedagogical, and leave to the reader to reorganize them in the right one.

First, as I hinted previously, we are annotating formulas, conjunctions and disjunctions with their corresponding OCaml type, in order to reason on these types:

type 'a atom = int

type 'a form =
  | X : 'a atom -> 'a form
  | And : 'a form * 'b form -> ('a * 'b) form
  | Or : 'a form * 'b form -> ('a, 'b) union form

type 'a conj =
  | X : 'a atom -> 'a conj
  | And : 'a atom * 'b conj -> ('a * 'b) conj

type 'a disj =
  | Conj : 'a conj -> 'a disj
  | Or : 'a conj * 'b disj -> ('a, 'b) union disj

What we are eventually looking for is a function dnf mapping an 'a form to a 'b disj, but now these two must be related: they must represent two equivalent formulae. So, correcting what I just said: dnf must return the pair of a 'b disj and a proof that 'a and 'b are equivalent. This pair is an existential type, which is easily coded with a GADT (we do similarly for conjunctions):

type 'a cnj = Cnj : 'b conj * ('a, 'b) equiv -> 'a cnj
type 'a dsj = Dsj : 'b disj * ('a, 'b) equiv -> 'a dsj

Let’s leave out the definition of equiv for a while. Now the code from the previous post is fairly easily adapted:

let rec conj : type a b. a conj -> b conj -> (a * b) cnj =
  fun xs ys -> match xs with
  | X x -> Cnj (And (x, ys), refl)
  | And (x, xs) ->
    let Cnj (zs, e) = conj xs ys in
    Cnj (And (x, zs), lemma0 e)

let rec disj : type a b. a disj -> b disj -> (a, b) union dsj =
  fun xs ys -> match xs with
  | Conj c -> Dsj (Or (c, ys), refl)
  | Or (x, xs) ->
    let Dsj (zs, e) = disj xs ys in
    Dsj (Or (x, zs), lemma1 e)

let rec map : type a b. a conj -> b disj -> (a * b) dsj =
  fun x -> function
  | Conj y ->
    let Cnj (z, e) = conj x y in
    Dsj (Conj z, e)
  | Or (y, ys) ->
    let Cnj (z, e1) = conj x y in
    let Dsj (t, e2) = map x ys in
    Dsj (Or (z, t), lemma2 e1 e2)

let rec cart : type a b. a disj -> b disj -> (a * b) dsj =
  fun xs ys -> match xs with
  | Conj c -> map c ys
  | Or (x, xs) ->
    let Dsj (z, e1) = map x ys in
    let Dsj (t, e2) = cart xs ys in
    let Dsj (u, e3) = disj z t in
    Dsj (u, lemma3 e1 e2 e3)

let rec dnf : type a. a form -> a dsj = function
  | X x -> Dsj (Conj (X x), refl)
  | Or (a, b) ->
    let Dsj (c, e1) = dnf a in
    let Dsj (d, e2) = dnf b in
    let Dsj (e, e3) = disj c d in
    Dsj (e, lemma4 e1 e2 e3)
  | And (a, b) ->
    let Dsj (c, e1) = dnf a in
    let Dsj (d, e2) = dnf b in
    let Dsj (e, e3) = cart c d in
    Dsj (e, lemma5 e1 e2 e3)

It seems more verbose, but since the functions now return existentials, we need to deconstruct them and pass them around. I abstracted over the combinators that compose the proofs of equivalence lemma1lemma5, we’ll deal with them in a moment. For now, you can replace them by Obj.magic and read off their types with C-c C-t to see if they make sense logically. Look at the last function’s type. It states, as expected: for any formula A, there exists a disjuctive normal form B such that A \Leftrightarrow B.

Now on this subject, what is it for two types to be equivalent? Well, that’s the “trick”: let’s just use our dear old Curry-Howard correspondence! 'a and 'b are equivalent if there are two functions 'a -> 'b and 'b -> 'a (provided of course that we swear to use only the purely functional core of OCaml when giving them):

type ('a, 'b) equiv = ('a -> 'b) * ('b -> 'a)

Now we can state and prove a number of small results on equivalence with respect to the type constructors we’re using (pairs and unions). Just help yourself into these if you’re preparing an exercise sheet on Curry-Howard :)

(* a = a *)
let refl : type a. (a, a) equiv =
  (fun a -> a), (fun a -> a)

(* a = b -> b = a *)
let sym : type a b. (a, b) equiv -> (b, a) equiv =
  fun (ab, ba) -> (fun b -> ba b), (fun a -> ab a)

(* a = b -> b = c -> a = c *)
let trans : type a b c. (b, c) equiv -> (a, b) equiv -> (a, c) equiv =
  fun (bc, cb) (ab, ba) -> (fun a -> bc (ab a)), (fun c -> ba (cb c))

(* a = b -> c = d -> c * a = d * b *)
let conj_morphism : type a b c d. (a, b) equiv -> (c, d) equiv ->
  (c * a, d * b) equiv = fun (ab, ba) (cd, dc) ->
    (fun (c, a) -> cd c, ab a),
    (fun (c, b) -> dc c, ba b)

let conj_comm : type a b. (a * b, b * a) equiv =
  (fun (x, y) -> y, x), (fun (x, y) -> y, x)

(* (a * b) * c = a * (b * c) *)
let conj_assoc : type a b c. ((a * b) * c, a * (b * c)) equiv =
  (fun ((x, y), z) -> x, (y, z)),
  (fun (x, (y, z)) -> (x, y), z)

(* a = b -> c + a = c + b *)
let disj_morphism : type a b c d. (a, b) equiv -> (c, d) equiv ->
  ((c, a) union, (d, b) union) equiv =
  fun (ab, ba) (cd, dc) ->
    (function Inl c -> Inl (cd c) | Inr a -> Inr (ab a)),
    (function Inl d -> Inl (dc d) | Inr b -> Inr (ba b))

(* (a + b) + c = a + (b + c) *)
let disj_assoc : type a b c. (((a, b) union, c) union,
                              (a, (b, c) union) union) equiv =
  (function Inl (Inl a) -> Inl a
          | Inl (Inr b) -> Inr (Inl b)
          | Inr c -> Inr (Inr c)),
  (function Inl a -> Inl (Inl a)
          | Inr (Inl b) -> Inl (Inr b)
          | Inr (Inr c) -> Inr c)

(* a * (b + c) = (a * b) + (a * c) *)
let conj_distrib : type a b c. (a * (b, c) union,
                               (a * b, a * c) union) equiv =
  (function a, Inl b -> Inl (a, b)
          | a, Inr c -> Inr (a, c)),
  (function Inl (a, b) -> a, Inl b
          | Inr (a, c) -> a, Inr c)

Finally, thanks to these primitive combinators, we can prove the lemmas we needed. Again, these are amusing little exercises.

let lemma0 : type a b c d. (a * b, c) equiv -> ((d * a) * b, d * c) equiv =
  fun e -> trans (conj_morphism e refl) conj_assoc

let lemma1 : type a b c d. ((a, b) union, d) equiv ->
  (((c, a) union, b) union, (c, d) union) equiv =
  fun e -> trans (disj_morphism e refl) disj_assoc

let lemma2 : type a c d u v. (a * c, u) equiv -> (a * d, v) equiv ->
  (a * (c, d) union, (u, v) union) equiv =
  fun e1 e2 -> trans (disj_morphism e2 e1) conj_distrib

let lemma3 : type a b c d e f. (a * b, c) equiv -> (d * b, e) equiv ->
((c, e) union, f) equiv -> ((a, d) union * b, f) equiv =
  fun e1 e2 e3 ->
    trans e3
      (trans (disj_morphism e2 e1)
         (trans (disj_morphism conj_comm conj_comm)
            (trans conj_distrib
               conj_comm)))

let lemma4 : type a b c d e. (a, c) equiv -> (b, d) equiv ->
  ((c, d) union, e) equiv -> ((a, b) union, e) equiv =
  fun e1 e2 e3 -> trans e3 (disj_morphism e2 e1)

let lemma5 : type a b c d e. (a, c) equiv ->
(b, d) equiv -> (c * d, e) equiv -> ((a * b), e) equiv =
  fun e1 e2 e3 -> trans e3 (conj_morphism e2 e1)

Note that I only needed the previous primitives to prove these lemmas (and as such to define my functions), so we can even make the type equiv abstract, provided that we are giving a complete set of primitives (which is not the case here). Although I’m not sure what it would buy us…

Anyway. That’s my solution! What’s yours?


by Matthias Puech at April 18, 2014 04:13 PM

GaGallium

The 6 parameters of (’a, ’b, ’c, ’d, ’e, ’f) format6

The infamous format6 type is the basis of the hackish but damn convenient, type-safe way in which OCaml handles format strings:

Printf.printf "%d) %s -> %f\n"
  3 "foo" 5.12

The first argument of printf in this example is not a string, but a format string (they share the same literal syntax, but have different type, and there is a small hack in the type-inference engine to make this possible). It's type, which you can get by asking ("%d) %s -> %f\n" : (_,_,_,_,_,_) format6) in a toplevel, is a mouthful (and I renamed the variables for better readability):

(int -> string -> float -> 'f, 'b, 'c, 'd, 'd, 'f) format6

What do those six arcane parameters mean? In the process of reviewing Benoît Vaugon's work on using GADTs to re-implement format functions, I've finally felt the need (after years of delighted oblivion) to understand each of those parameters. And that came after several days of hacking on easier-to-understand ten-parameters GADT functions; hopefully most of our readers will never need this information, but the probability that I need to refresh my memory in the future is strong enough for me to write this blog post.

The main take-away is that this type slowly evolved, from a more reasonable ('a, 'b, 'c) format type, to the monster that lurks in the documentation today. Advanced features of format strings needed more and more parameters to carry their type information. It is possible to implement format strings with less features and simpler types, and it may be possible to implement the same format strings with types that are easier to understand, but I need to understand the interface as it exists.

I built my own understanding, not by reading the documentation (though I tried that first), but by imagining layers of features, from the simple to the advanced. This is this post's approach. It does necessarily correspond to the actual chronological evolution of format types -- Pierre Weis would know about that.

At the beginning were printf and sprintf

# Printf.printf "%d) %s -> %f\n";;
- : int -> string -> float -> unit = <fun>
# Printf.sprintf "%d) %s -> %f\n";;
- : int -> string -> float -> string = <fun>

Those two functions differ by their return type: one prints and returns nothing (unit), the other returns a string. This parameter is stored in the sixth-parameter, 'f, and is called "result type for the printf-like functions" in the documentation.

Of course, the format string does not need to impose any particular value to this parameter: a given format string can be used by a function with any possible result type, hence the undetermined variable 'f in the format type above. It is the printf functions that force a precise instantiation of the parameter: Printf.printf has a type of the form ('a, ..., unit) format6 -> 'a.

The type 'a mentioned in the type of printf is a large function type, with one parameter for each conversion specification (%d, %s, etc.), and whose return type is enforced by the format to be equal to the last parameter of the format. In the case of "%d) %s -> %f\n", this type is int -> string -> float -> 'a, where 'a is the same variable present in the last parameter of the format -- the result type. This type depends only on the format string, not at all on the printing (or scanning) function used.

To summarize, in ('a, 'b, 'c, 'd, 'e, 'f) format6, we've seen that:

  • 'f is what you get after you pass all the parameters to a printf-like function
  • 'a is the big function type that expects all those parameters and returns 'f

'b and 'c for user-defined printers:

You may know about %a and the lesser-known (but actually simpler) %t, used as follow:

Printf.sprintf "%d) %a -> %f"
  3
  print_foo foo
  6.23

The idea is that instead of converting the value foo to a string first, you can, at printf time, pass a "user-defined printer" print_foo as an extra argument that tells how to convert foo into a string. This is easy enough to handle typing-wise: the format %a adds two parameters to the big function type, instead of just one.

In a non-existing ideal world, print_foo used in sprintf would have the convenient type foo -> string. However, a single %a must work with many printing function: for sprintf, the user-defined should return a string, for printf it should just print and return unit, with fprintf it needs to be passed the channel in which we're printing, or with bprintf the target buffer. So this determines two type parameters in ('a, 'b, 'c, 'd, 'e, 'f) format6:

  • 'b is the type of the extra information that user-defined printers may need (Buffer.t for bprintf, output_channel for fprintf, etc.)

  • 'c is the return type of the user-defined printers accepted by the function (string for sprintf, unit for most others).

There are two remarks. First, it happens that printf use out_channel as second parameter, while you would probably expect unit. This is strange API design, but the rationale is to let you reuse your user-defined printers for both printf and fprintf.

Second, you may remark that the 'c parameter looks suspiciously like "the result type of printf-lie functions", 'f. Indeed, all current printf-like functions use, to my knowledge, the same type for 'f and 'c. But it would be very reasonable, for performance reasons, to design a different version of sprintf that returns a string, but whose user-defined sub-printers put result in a Buffer.t parameter and thus just return unit.

Scanf is just a continuation away

Consider the difference between printing and scanning:

Printf.printf "%d) %s %f"
  3 "foo" 7.23;;

Scanf.scanf "%d) %s %f"
  (fun num name value -> ....)

When printing with "%d) %s %f", the user has to provide an int parameter, a string parameter, and a float parameter, hence the type of the form int -> string -> float -> .... With scanning, you have to give them to the user, which looks like a fairly different activity. How can you reuse the same typing of the format string?

Fortunately, expressing scanning with a continuation (the function (fun num name value -> ...)) allows precisely to reuse a type of the form int -> string -> float -> ... to describe scanning: just say that for any type 'f, if the user gives a int -> string -> float -> 'f, scanf will return a 'f.

In a simple world, the type of scanf would thus be of the following form:

('a, ..., 'f) format6 -> 'a -> 'f

(where it is understood that for any format string, 'a will actually be of the form t1 -> t2 -> ... -> 'f)

(Alternate design idea: have a new type parameter that stores, instead of a big arrow type, the product of all parameter types (int * string * float in our example), and make scanf return this directly instead of using a continuation-passing style. The present style has the property of being easier to implement with GADTs, but that probably wasn't a design consideration.)

In fact, the typing of scanf is a bit more complicated, as explained in the next section.

'd and 'e for user-defined readers

Just as user-defined printers, to write formats in a more high-level style it is possible to provide user-defined scanners with %r:

let read_foo scanbuf = ...

Scanf.scanf "%d) %r -> %f"
  read_foo
  (fun n foo x -> ...)

Now the typing implications of this for format strings are a bit tricky. As we read three values of type int, foo and float, the naive typing of scanf proposed earlier

('a, ..., 'f) -> ('a -> 'f)

would specialize to, in this instance:

(int -> foo -> float -> 'f, ..., 'f) format6
 -> ((int -> foo -> float -> 'f) -> 'f)

which is not what we want, as there is no room for the extra read_foo parameter.

The argument expected by scanf after the format string is not just the continuation anymore, we need an extra argument for each user-defined reader. Yet those extra arguments should not appear in the type of the continuation.

What we need to do is to express the relation between the simple type 'a -> 'f (from continuation to result), which is the return type of scanf fmt in absence of user-defined readers, and the real return type of scanf. This is what our last two parameters do:

  • the second-last parameter, 'e, is forced by scanf functions to be equal to 'a -> 'f

  • the parameter 'd is the actual type of scanf fmt

  • the typing of format strings will generate constraints on the relation between 'd and 'e, depending on the number of %r present in the format; without any %r they are equal.

Consider our running example without any %r:

# ( "%d) %s -> %f" : (_, _, _, _, _, _) format6 );;
- : (int -> string -> float -> 'f,
     'b, 'c, 'd, 'd, 'f) format6

The two parameters I'm talking about are constrained to be equal to the same variables 'd. On the contrary, if we use %r instead of %s:

# ( "%d) %r -> %f" : (_,_,_,_,_,_) format6 );;
- : (int -> 'x -> float -> 'f,
     'b, 'c, ('b -> 'x) -> 'e, 'e, 'f) format6

Suddenly they are distinct, with one equal to the unknown 'e (that will be forced to equal (... -> 'f) -> 'f when applying scanf), and the other is ('b -> 'x) -> 'e: it adds on top of 'e an extra parameter (the user-defined format) which must return a 'x (if I pass the reader read_foo, this will be equated to foo) and takes as input a 'b, which is the type of the extra parameter of user-defined printers or readers (scanf will set it to Scanning.in_channel).

Benoît's GADT declarations are surprisingly efficient at showing you precisely what happens to the type parameters when you add a new conversion description to an existing format string. In particular, the constructor for user-defined readers reads as follow:

| Reader :                                                 (* %r *)
    ('a, 'b, 'c, 'd, 'e, 'f) fmt ->
      ('x -> 'a, 'b, 'c, ('b -> 'x) -> 'd, 'e, 'f) fmt

You can very well see that an argument of type 'x (the result type of the user-defined result) is added to the parameter 'a (the type of the continuation finally passed to scanf), while an argument of type ('b -> 'x) is added to 'd (the type of the arguments following the format string).

.

PS: Extra credit for the bonus question: why do I use (fmt : (_, _, _, _, _, _) format6) instead of the more convenient to write format_of_string fmt?

by Gabriel Scherer at April 18, 2014 08:00 AM

April 15, 2014

Matthias Puech

Disjunctive normal forms in big steps

This is probably a second-semester functional programming exercise, but I found it surprisingly hard, and could not find a solution online. So at the risk of depriving a TA from a problem for its mid-term exam, here is my take on it, that I painfully put together yesterday.

Given a formula built out of conjunction, disjunction and atoms, return its disjunctive normal form, in big step or natural semantics, that is, not applying repetitively the distributivity and associativity rules, but in a single function run. Before you go any further, please give it a try and send me your solution!

Formulas are described by the type form:

type atom = int

type form =
  | X of atom
  | And of form * form
  | Or of form * form

To ensure the correctness of the result, I represent disjunctive normal form by a restriction of this type, disj, by stratifying it into conjunctions and disjunctions:

type conj = X of atom | And of atom * conj
type disj = Conj of conj | Or of conj * disj

There are actually two restrictions at stake here: first, conjunctions cannot contain disjunctions, and second, all operators are necessarily right-associative. Constructor Conj is just a logically silent coercion. If you look carefully enough, you will notice that conj (resp. disj) is isomorphic to a non-empty list of atoms (resp. conj).

The first step is to lift the second restriction (associativity), and prove that we can always build a conjunction of conj, resp. a disjunction of disj. Easy enough if you think about lists: these functions look like concatenation.

let rec conj : conj -> conj -> conj = fun xs ys -> match xs with
  | X x -> And (x, ys)
  | And (x, xs) -> And (x, conj xs ys)

let rec disj : disj -> disj -> disj = fun xs ys -> match xs with
  | Conj c -> Or (c, ys)
  | Or (x, xs) -> Or (x, disj xs ys)

Then, we will lift the second restriction, using distributivity. We must show that it is always possible to form a conjunction. First, we show how to build the conjunction of a conj and a disj:

let rec map : conj -> disj -> disj = fun x -> function
  | Conj y -> Conj (conj x y)
  | Or (y, ys) -> Or (conj x y, map x ys)

The first case is trivial, the second reads as the distributivity of conjunction over disjunction. By analogy to lists again, I called this function map because it maps function conj x to all cells of the list.

Next, we show how to build the conjunction of two disj:

let rec cart : disj -> disj -> disj = fun xs ys -> match xs with
  | Conj c -> map c ys
  | Or (x, xs) -> disj (map x ys) (cart xs ys)

Considering the meaning of the previously defined functions, the first case is trivial, and the second, again, reads as distributivity, only in the other direction. I called this function cart because it computes the cartesian product of the two “lists” passed as arguments (only on non-empty lists).

Now we can easily put together the final function computing the DNF:

let rec dnf : form -> disj = function
  | X x -> Conj (X x)
  | Or (a, b) -> disj (dnf a) (dnf b)
  | And (a, b) -> cart (dnf a) (dnf b)

It is easy to see that all function terminate: they are primitive recursive.

Wait, let’s not forget to test our contraption:

let () =
  assert (dnf (Or (And (X 1, X 2), X 3)) =
          Or (And (1, X 2), Conj (X 3)));
  assert (dnf (And (Or (X 1, X 2), X 3)) =
          Or (And (1, X 3), Conj (And (2, X 3))));
  assert (dnf (And (Or (And (X 0, X 1), X 2), And (X 3, X 4))) =
          Or (And (0, And (1, And (3, X 4))), Conj (And (2, And (3, X 4)))))

That’s my solution. Reader, is there another one? Is there a better explanation, for instance based on Danvy’s small-step to big-step derivation? I believe there is…

Supplementary exercise

Technically, there still could be bugs in this code. Prove that it is correct wrt. the small-step rewrite rules, using only OCaml and GADTs. Here is the beginning of an idea: annotate form, conj and disj with their meaning in terms of OCaml types:

type ('a, 'b) union = Inl of 'a | Inr of 'b

type 'a atom = int

type 'a form =
  | X : 'a atom -> 'a form
  | And : 'a form * 'b form -> ('a * 'b) form
  | Or : 'a form * 'b form -> ('a, 'b) union form

type 'a conj =
  | X : 'a atom -> 'a conj
  | And : 'a atom * 'b conj -> ('a * 'b) conj

type 'a disj =
  | Conj : 'a conj -> 'a disj
  | Or : 'a conj * 'b disj -> ('a, 'b) union disj

(the definition of union is irrelevant here), state the relation between equivalent types as a type:

  type ('a, 'b) equiv =
    | Refl : ('a, 'a) equiv
    | Trans : ('a, 'b) equiv * ('b, 'c) equiv -> ('a, 'c) equiv
    | AssocA : (('a * 'b) * 'c, 'a * ('b * 'c)) equiv
    | AssocO : ((('a, 'b) union, 'c) union, ('a, ('b, 'c) union) union) equiv
    | DistribL : ('a * ('b, 'c) union, ('a, 'b) union * ('a, 'c) union) equiv
    | DistribR : (('b, 'c) union * 'a, ('b, 'a) union * ('c, 'a) union) equiv

pack up a solution as an existential: an equivalence proof together with a DNF:

type 'a dnf = Dnf : ('a, 'b) equiv * 'b disj -> 'a dnf

and code a function:

let dnf : type a. a form -> a dnf =
  function _ -> (assert false)   (* TODO *)

Ok fair enough, it’s not an exercise, it’s something I haven’t been able to put together yet ;)


by Matthias Puech at April 15, 2014 08:14 PM

OCamlPro

OCamlPro Highlights: March 2014

Here is a short report of some of our activities in March 2014 !

Welcome Thomas

First, we would like to welcome Thomas Blanc on board ! Thomas is starting a PhD at OCamlPro, with Michel Mauny from ENSTA as his PhD advisor. So, he will stay with us for three years, working on static analysis of whole OCaml programs, with two main objectives:

  • To detect uncaught exceptions, using a different approach than ocamlexc, the tool that François Pessaux developed during his PhD thesis, a method that will make Thomas' tool easier to upgrade at each new OCaml version;

  • To optimize generated code, with the benefits of whole program knowledge;

OPAM Improvements

OPAM has seen its share of code quality improvements during this month, with a rewritten parser for OPAM description files, now reporting error locations, better handling of some corner cases, more expressive install requests from the command-line, and a more friendly way of handling pinned packages. We are now stabilizing these features and you should expect soon a Beta release of OPAM 1.2.0.

Binary Release of OCaml in OPAM

We also spent some time this month working on using OPAM to distribute binary releases of OCaml. OPAM is great to use, but it sometimes takes a lot of time to compile OCaml when a new switch is needed for some small experimentation. We decided it would be interesting to experiment with a binary package of OCaml (4.01.0 for now).

The results are quite interesting, in most of our tests, downloading the compiler binary archive and uncompressing it takes an order of magnitude less time than compiling from sources, even with a slow connection:

OCaml 4.01.0 installation time on OPAM
From Sources Binary DownloadBinary Install
Intel i7 2.10GHz, Linux 64 bits, 16 GB RAM, SSD 238s 220s (DSL) 8s
Intel i7 2.60GHz, Linux 64 bits, 8 GB, SSD 253s 22s (Cable) 9s
Intel i5 2.67GHz, Linux 64 bits, 2 GB, SATA 289s 65s (Wifi) 26s
Intel Core2 2.53 GHz, Linux 32 bits, 4 GB, SSD 402s 68s (Wifi) 40s
Intel Xeon 2.67GHz, Linux 64 bits, 8 GB, SATA 289s
Intel core2 3.00 Ghz, Linux 32 bits, 4 GB, SATA 258s 15s (Cable) 7s

You can try our compiler on your system (on Intel Linux only for now, but we plan to add binary versions for other architectures):

opam switch 4.01.0+bin-ocp

So now, the fastest way to get OCaml running on a computer is downloading OPAM and calling:

opam init --comp 4.01.0+bin-ocp

Another great thing with this compiler is that the binary archive is kept in a cache (in ~/.opam/ocp-compiler-cache/), so that creating more switches for this compiler is almost cost free. For example, you can try:

opam switch 4.01.0+test+my+soft --switch 4.01.0+bin-ocp

If you have an Intel Linux computer, and you get any problem with this distribution, you should send us the output of OPAM and we will try to fix the problem ! Unfortunately for CentOS users, we had to choose a recent libc, so that it will probably not run on this system.

Solvers in the Cloud for OPAM

In some cases, OPAM's internal heuristic is not enough -- upgrades are a np-complete problem after all. For this reason, OPAM has been thought from the beginning to be able to use an external solver, using the CUDF format to interoperate with solvers. Unfortunately, depending on your system, such a solver, like aspcud, may not be available or easy to install.

So last month, we worked with the Mancoosi team to set up on one of their servers a CUDF solver farm, to work around this issue. Following the instructions on the site:

http://cudf-solvers.irill.org/index.html

you can setup your OPAM configuration to use a remote CUDF solver, that will usually propose better solutions than the one provided internally by OPAM.

A bit "hackish" way of using it immediatly is to copy the following script in a file aspcud in your path:

#!/bin/bash
SERVER=cudf-solvers.irill.org
if bzip2 -c $1 | curl -f --data-binary @- http://$SERVER/cudf.bz2?criteria="$3" | bunzip2 -c > $2;
then exit 0
else echo FAIL > $2
fi

The Alt-Ergo SMT Solver

In the context of the Bware project, where we are working on improving automatic solvers for Atelier B, we submitted a short paper to the ABZ conference describing the improvements we made in Alt-Ergo to increase its success rate on formulas translated from B proof obligations. Our paper has been accepted for publication! We will present it at ABZ 2014, which will be held in Toulouse (France) in the beginning of June.

In the Scilab Corner

We are pursuing our quest for a JIT for Scilab. Scilab really is a dynamic scripting language, in terms of typing, scoping, constructs and of how its users consider it. For most of them, it is basically an advanced, scriptable calculator. A common usecase is to pause in the middle of a function, open a toplevel at this point to edit some local variable or open a graphical visualisation, and then resume the execution.

In this context, we decided to adopt a really pragmatic approach, which is to detect statement-level code patterns that we know to be JITable, instead of trying to JIT whole functions. This way, we'll be able to leave the intro and outro of functions (which may do whatever they want to control and scope) to the interpreter, and concentrate on the inner, performance critical loop.

So, this month, we worked on several tasks required to achieve this goal. First, we worked on the detection of code snipplets, and as a side effect of this work, we plan to provide Scilab users with a new syntax-aware grep-like tool. We also worked on how to customize Scilab's loading function to replace some parts of the code by foreign calls to the future JIT. We also worked on the introspection of Scilab's context to build the environment of the JIT's type system.

by Louis Gesbert (louis.gesbert@ocamlpro.com) at April 15, 2014 12:00 AM

April 11, 2014

Ocsigen project

Ocsigen Js_of_ocaml 2.0

We are happy to announce release 2.0 of Js_of_ocaml, the compiler from OCaml bytecode to JavaScript.

A lot of efforts has been put in reducing the size of the generated JavaScript code. Much shorter variable names are used; unnecessary whitespaces and semicolons are omitted; multiple occurrences of a same constant are shared... The runtime is minified as well. You can expect a space reduction of 15% to 20%.

Recursive modules are now supported. Tail calls between mutually recursive functions are optimized (using trampolines). In particular, lexers produced by ocamllex are now properly optimized.

The runtime now simulates a small filesystem (in memory), which makes it possible to use the OCaml I/O functions. The standard outputs are by default redirected to the JavaScript console, which is convenient for debugging.

A larger part of the OCaml libraries are supported: bigarrays, the time-related functions of the Unix library.

A number of incompatible changes have been made. In particular:

  • JavaScript numbers are simply given type 'float' rather than type 'float Js.t';
  • the compiler generates "strict mode" JavaScript; therefore, 'Js.Unsafe.variable "this"' does not refer to the JavaScript global object ("window" in browsers) anymore; you can use 'Js.Unsafe.global' instead;
  • runtime primitives are now wrapped together with the generated code in a huge function so as not to pollute the global scope.

LINKS

Regards,

Jérôme Vouillon and Hugo Heuzard

by Ocsigen team at April 11, 2014 02:13 PM

April 10, 2014

LexiFi

Inlined records in constructors

I'd like to introduce a new language feature, inlined record arguments on constructors, which I propose for inclusion in OCaml. In a nutshell, it allows you to define a record directly within a sum type constructor:

1
2
3
  type t =
     | A of { x : int; y: string }
     | ...

The argument of the constructor is a full-fledged record type. All features of records are available: dot notation, mutable fields, polymorphic fields, punning syntax, record override. You can write:

read more

by Alain Frisch at April 10, 2014 11:52 AM