OCaml Planet

June 22, 2016

OCamlCore Forge News

Camlzip 1.06 released

Camlzip is a library to create and access compressed files in Zip and Gzip formats. This release fixes incompatibilities with OCaml 4.03 and adheres to "safe string" mode.

by Xavier Leroy at June 22, 2016 04:06 PM

June 17, 2016

Shayne Fletcher

Generic mappings over pairs

Browsing around on Oleg Kiselyov's excellent site, I came across a very interesting paper about "Advanced Polymorphism in Simpler-Typed Languages". One of the neat examples I'm about to present is concerned with expressing mappings over pairs that are generic not only in the datatypes involved but also over the number of arguments. The idea is to produce a family of functions $pair\_map_{i}$ such that


pair_map_1 f g (x, y) (x&apos, y&apos) → (f x, g y)
pair_map_2 f g (x, y) (x&apos, y&apos) → (f x x&apos, g y y&apos)
pair_map_3 f g (x, y) (x&apos, y&apos) (x&apos&apos, y&apos&apos, z&apos&apos) → (f x x&apos x&apos&apos, g y y&apos y&apos&apos)
.
.
.
The technique used to achieve this brings a whole bunch of functional programming ideas together : higher order functions, combinators and continuation passing style (and also leads into topics like the "value restriction" typing rules in the Hindley-Milner system).

let ( ** ) app k = fun x y -> k (app x y)
let pc k a b = k (a, b)
let papp (f1, f2) (x1, x2) = (f1 x1, f2 x2)
let pu x = x
With the above definitions, $pair\_map_{i}$ is generated like so.

(*The argument [f] in the below is for the sake of value restriction*)
let pair_map_1 f = pc (papp ** pu) (f : α -> β)
let pair_map_2 f = pc (papp ** papp ** pu) (f : α -> β -> γ)
let pair_map_3 f = pc (papp ** papp ** papp ** pu) (f : α -> β -> γ -> δ)
For example,

# pair_map_2 ( + ) ( - ) (1, 2) (3, 4) ;;
- : int * int = (4, -2)

Reverse engineering how this works requires a bit of algebra.

Let's tackle $pair\_map_{1}$. First


pc (papp ** pu) = (λk f g. k (f, g)) (papp ** pu) = λf g. (papp ** pu) (f, g)
and,

papp ** pu = λx y. pu (papp x y) = λx y. papp x y
so,

λf g. (papp ** pu) (f, g) =
λf g. (λ(a, b) (x, y). (a x, b y)) (f, g) =
λf g (x, y). (f x, g y)
that is, pair_map_1 = pc (papp ** pu) = λf g (x, y). (f x, g y) and, we can read the type off from that last equation as (α → β) → (γ → δ) → α * γ → β * δ.

Now for $pair\_map_{2}$. We have


pc (papp ** papp ** pu) =
(λk f g. k (f, g)) (papp ** papp ** pu) =
λf g. (papp ** papp ** pu) (f, g)
where,

papp ** papp ** pu = papp ** (papp ** pu) =
papp ** (λa' b'. pu (papp a' b')) =
papp ** (λa' b'. papp a' b') =
λa b. (λa' b'. papp a' b') (papp a b)
which means,

pc (papp ** papp ** pu) =
λf g. (papp ** papp ** pu) (f, g) =
λf g. (λa b.(λa' b'. papp a' b') (papp a b)) (f, g) =
λf g. (λb. (λa' b'. papp a' b') (papp (f, g) b)) =
λf g. λ(x, y). λa' b'. (papp a' b') (papp (f, g) (x, y)) =
λf g. λ(x, y). λa' b'. (papp a' b') (f x, g y) =
λf g. λ(x, y). λb'. papp (f x, g y) b' =
λf g. λ(x, y). λ(x', y'). papp (f x, g y) (x', y') =
λf g (x, y) (x', y'). (f x x', g y y')
that is, a function in two binary functions and two pairs as we expect. Phew! The type in this instance is (α → β → γ) → (δ → ε → ζ) → α * δ → β * ε → γ * ζ.

To finish off, here's the program transliterated into C++(14).


#include <utility>
#include <iostream>

//let pu x = x
auto pu = [](auto x) { return x; };

//let ( ** ) app k = fun x y -> k (app x y)
template <class F, class K>
auto operator ^ (F app, K k) {
return [=](auto x) {
return [=] (auto y) {
return k ((app (x)) (y));
};
};
}

//let pc k a b = k (a, b)
auto pc = [](auto k) {
return [=](auto a) {
return [=](auto b) {
return k (std::make_pair (a, b)); };
};
};

//let papp (f, g) (x, y) = (f x, g y)
auto papp = [](auto f) {
return [=](auto x) {
return std::make_pair (f.first (x.first), f.second (x.second)); };
};

int main () {

auto pair = &std::make_pair<int, int>;

{
auto succ= [](int x){ return x + 1; };
auto pred= [](int x){ return x - 1; };
auto p = (pc (papp ^ pu)) (succ) (pred) (pair (1, 2));
std::cout << p.first << ", " << p.second << std::endl;
}

{
auto add = [](int x) { return [=](int y) { return x + y; }; };
auto sub = [](int x) { return [=](int y) { return x - y; }; };
auto p = pc (papp ^ papp ^ pu) (add) (sub) (pair(1, 2)) (pair (3, 4));
std::cout << p.first << ", " << p.second << std::endl;
}

return 0;
}

by Shayne Fletcher (noreply@blogger.com) at June 17, 2016 07:39 PM

June 15, 2016

Andrej Bauer

The real numbers in homotopy type theory (CCA 2016 slides)

I am about to give an invited talk at the Computability and Complexity in Analysis 2016 conference (yes, I am in the south of Portugal, surrounded by loud English tourists, but we are working here, in a basement no less). Here are the slides, with extensive speaker notes, comment and questions are welcome.

Slides: hott-reals-cca2016.pdf

by Andrej Bauer at June 15, 2016 10:03 AM

June 01, 2016

Functional Jobs

CTO (Haskell/Clojure) at Capital Match (Full-time)

Overview

Capital Match (www.capital-match.com) is a leading peer-to-peer lending platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 6 million business loans processed with a strong monthly growth (current rate of USD 1-2 million monthly).

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.

Responsibilities

We are looking to hire an experienced software engineer (min. 5-8 years coding experience) with a team lead and product management experience to lead the efforts. The current tech team includes a product manager and 3 software engineers. CTO would directly report to CEO and the Board of Directors.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending space would be a great advantage.

Requirements

Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.

Offer

We offer a combination of salary and equity depending on experience and skills:

Salary: USD 6,000-8,000 / month

Equity: ca. 1% (subject to vesting)

European citizens who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range). Visa sponsorship will be provided.

Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia.

Get information on how to apply for this position.

June 01, 2016 10:04 AM

Software Engineer (Haskell/Clojure) at Capital Match (Full-time)

Overview

Capital Match (www.capital-match.com) is a leading peer-to-peer lending platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 6 million business loans processed with a strong monthly growth (current rate of USD 1-2 million monthly).

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.

Responsibilities

We are looking to hire a software engineer with a minimum of 2-3 years coding experience. The current tech team includes a product manager and 3 software engineers. We are currently also in the process of hiring CTO.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending space would be a great advantage.

Requirements

Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.

Offer

We offer a combination of salary and equity depending on experience and skills:

Salary: USD 3,500-5,000 / month

Equity: ca. 0.5% (subject to vesting)

European citizens who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range). Visa sponsorship will be provided.

Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia.

Get information on how to apply for this position.

June 01, 2016 10:04 AM

May 20, 2016

OCaml Labs compiler hacking

Thirteenth OCaml compiler hacking evening at Pembroke College

It's Spring, and time for the thirteenth Cambridge OCaml compiler-hacking evening! We'll be back in Pembroke College again, in the centre of Cambridge.

Pembroke

If you're planning to come along, it'd be helpful if you could indicate interest via Doodle and sign up to the mailing list to receive updates.

Where: N7, Pembroke College, Cambridge CB2 1RF. Head through the entrance on Trumpington Street, and turn left straight after the Porters Lodge.

When: 7pm, Tuesday 31st May.

Who: anyone interested in improving OCaml. Knowledge of OCaml programming will obviously be helpful, but prior experience of working on OCaml internals isn't necessary.

What: fixing bugs, implementing new features, learning about OCaml internals

Wiki: https://github.com/ocamllabs/compiler-hacking/wiki

We're defining "compiler" pretty broadly, to include anything that's part of the standard distribution, which means at least the standard library, runtime, tools (ocamldep, ocamllex, ocamlyacc, etc.), the debugger, the documentation, and the compiler itself. We'll have suggestions for mini-projects for various levels of experience, but feel free to come along and work on whatever you fancy.

We've arranged a self-service buttery dinner at 6:30pm - please head to the buttery when you arrive and help yourself to the food choices available. You can then bring your dinner to the Old Library (just next door) to sit down and eat.

by OCaml Labs (cl-ocamllabs@lists.cam.ac.uk) at May 20, 2016 10:30 PM

May 13, 2016

Marc Simpson

Tail call optimisation in (g)awk

Tail call optimisation in (g)awk

# May 13, 2016

or “Wait, what? Tail call optimisation in awk?”

Overview

This post covers tail call optimisation (TCO) behaviour in three common awk implementations1: gawk, mawk and nawk (AKA the one true awk).

None of the three implement full TCO, while gawk alone provides self-TCO. The bulk of this post will therefore be devoted to gawk’s implementation and related pitfalls.

Initial observations

Let’s begin with a simple awk script that defines a single function, recur, called from the BEGIN block:

$ nawk 'function recur() {return recur()} BEGIN {recur()}'
Segmentation fault: 11
$ mawk 'function recur() {return recur()} BEGIN {recur()}'
Segmentation fault: 11
$ gawk 'function recur() {return recur()} BEGIN {recur()}'
# ...runs indefinitely [?]...

Note the difference in behaviour here: nawk and mawk blow the stack and segfault while gawk cheerily continues running. Thanks gawk.

But wait! Gawk is actually dynamically allocating additional stack frames—so long as there’s memory (and swap) to consume, gawk will gobble it up and our script will plod on. Below, the first 30 seconds of (virtual) memory consumption are charted2:

Whoops!

The gawk optimiser

In order to obtain (self-)TCO and spare your poor swap partition, gawk provides the -O switch,

$ gawk -O 'function foo() {return recur()} BEGIN {recur()}'
# ...runs indefinitely; air conditioning no longer required...

and lo and behold,

Doubling down

What about full TCO? Let’s expand our one liner a little to include a trampoline call:

$ gawk -O 'function go() {return to()} function to() {return go()} BEGIN {go()}'

and chart memory consumption again,

Bugger. So, it looks like gawk isn’t keen on full blown TCO. Time to find out why.

The secret sauce

We’ve just seen that gawk seems to optimise self-calls in tail position when the -O flag is specified. To better understand this functionality we can dump opcodes from the trampoline case and take a look under the hood:

$ echo 'function go() {return to()} function to() {return go()} BEGIN {go()}' > /tmp/trampoline.awk
$ gawk --debug -O -f /tmp/trampoline.awk
gawk> dump

	# BEGIN

[     1:0x7fc00bd022e0] Op_rule             : [in_rule = BEGIN] [source_file = /tmp/trampoline.awk]
[     1:0x7fc00bd02360] Op_func_call        : [func_name = go] [arg_count = 0]
[      :0x7fc00c800f60] Op_pop              :
[      :0x7fc00c800e20] Op_no_op            :
[      :0x7fc00c800ea0] Op_atexit           :
[      :0x7fc00c800f80] Op_stop             :
[      :0x7fc00c800e60] Op_no_op            :
[      :0x7fc00bd01e00] Op_after_beginfile  :
[      :0x7fc00c800e40] Op_no_op            :
[      :0x7fc00c800e80] Op_after_endfile    :

	# Function: go ()

[     1:0x7fc00bd01f20] Op_func             : [param_cnt = 0] [source_file = /tmp/trampoline.awk]
[     1:0x7fc00bd020a0] Op_func_call        : [func_name = to] [arg_count = 0]
[     1:0x7fc00bd01fb0] Op_K_return         :
[      :0x7fc00c800ee0] Op_push_i           : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[      :0x7fc00c800f00] Op_K_return         :

	# Function: to ()

[     1:0x7fc00bd02130] Op_func             : [param_cnt = 0] [source_file = /tmp/trampoline.awk]
[     1:0x7fc00bd02270] Op_func_call        : [func_name = go] [arg_count = 0]
[     1:0x7fc00bd021f0] Op_K_return         :
[      :0x7fc00c800f20] Op_push_i           : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[      :0x7fc00c800f40] Op_K_return         :

Note the lack of a distinct jump or tailcall opcode; instead, even with the optimiser turned on, go and to are performing Op_func_calls. Hmm, okay; we’ll see a different opcode in our original recur case, though, right? Wrong:

$ echo 'function recur() {return recur()} BEGIN {recur()}' > /tmp/recur.awk
$ gawk --debug -O -f /tmp/recur.awk
gawk> dump

	# BEGIN

[     1:0x7fc1d0408ef0] Op_rule             : [in_rule = BEGIN] [source_file = /tmp/recur.awk]
[     1:0x7fc1d0408f80] Op_func_call        : [func_name = recur] [arg_count = 0]
[      :0x7fc1d0802120] Op_pop              :
[      :0x7fc1d0802020] Op_no_op            :
[      :0x7fc1d08020a0] Op_atexit           :
[      :0x7fc1d0802140] Op_stop             :
[      :0x7fc1d0802060] Op_no_op            :
[      :0x7fc1d0408bc0] Op_after_beginfile  :
[      :0x7fc1d0802040] Op_no_op            :
[      :0x7fc1d0802080] Op_after_endfile    :

	# Function: recur ()

[     1:0x7fc1d0408ce0] Op_func             : [param_cnt = 0] [source_file = /tmp/recur.awk]
[     1:0x7fc1d0408e60] Op_func_call        : [func_name = recur] [arg_count = 0]
[     1:0x7fc1d0408d70] Op_K_return         :
[      :0x7fc1d08020e0] Op_push_i           : Nnull_string [MALLOC|STRING|STRCUR|NUMCUR|NUMBER]
[      :0x7fc1d0802100] Op_K_return         :

¯\_(ツ)_/¯

Time to dig around gawk’s grammar definition. Here’s return, defined in awkgram.y:

| LEX_RETURN
  {
    if (! in_function)
        yyerror(_("`return' used outside function context"));
  } opt_exp statement_term {
    if ($3 == NULL) {
        $$ = list_create($1);
        (void) list_prepend($$, instruction(Op_push_i));
        $$->nexti->memory = dupnode(Nnull_string);
    } else {
        if (do_optimize
            && $3->lasti->opcode == Op_func_call
            && strcmp($3->lasti->func_name, in_function) == 0
        ) {
            /* Do tail recursion optimization. Tail
             * call without a return value is recognized
             * in mk_function().
             */
            ($3->lasti + 1)->tail_call = true;
        }

        $$ = list_append($3, $1);
    }
    $$ = add_pending_comment($$);
  }

Take a closer look at the code following that comment:

if (do_optimize
  && $3->lasti->opcode == Op_func_call
  && strcmp($3->lasti->func_name, in_function) == 0
) { /* ... */
  ($3->lasti + 1)->tail_call = true; /* <--- */
}

In other words, during a return gawk:

  1. Checks whether the do_optimize flag (-O) is specified.
  2. If so, it checks whether the previous instruction is an Op_func_call.
  3. If that call is to a function with the same name as the current one,
  4. …the tail_call flag is set.

So it goes.

At last, a conclusion

Here’re a few takeaways from the above3:

  • Don’t rely on TCO if you’re writing awk.
  • Just don’t.
  • If you do need TCO, make sure you’re using gawk
    • Be sure to specify the -O flag otherwise you’ll need to buy a new fan,
    • and make sure you’re not trampolining as the optimiser won’t be of any help.

Personally, I’ll be sticking with nawk.


  1. Probably the most common.

  2. Output drawn from ps

  3. YMMV


<script type="text/javascript"> var disqus_shortname = '0branch'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

by Marc Simpson at May 13, 2016 11:20 AM