OCaml Planet

July 31, 2014

Jane Street

The ML Workshop looks fantastic

I'm a little biased, by being on the steering committee, but this year's ML workshop looks really interesting. Here's a link to the program:

http://okmij.org/ftp/ML/ML14.html

It has a bunch of interesting papers, including work on cleaning up and simplifying first-class modules, type-level module aliases, new approaches in type-directed test generation, a couple of papers on implicits (how they've worked out in Scala, and a proposed approach in OCaml), and more.

The OCaml users-and-developers meeting is also looking good, though for me there's a clear favorite among the presentations: OCaml Labs' announcement of version 1 of the OCaml Platform. I think that's a pretty important milestone for the language.

All told, it's shaping up to be a good ICFP.

by Yaron Minsky at July 31, 2014 02:59 PM

Functional Jobs

Software Engineer (OCaml) at Cryptosense (Full-time)

About Cryptosense

Cryptosense is a Paris-based start-up producing vulnerability analysis software for cryptographic interfaces. We program in OCaml with occasional resort to other languages in order to communicate with various crypto platforms. Our technology is based on research at the French institute for computer science (INRIA) and uses techniques from formal analysis (model-checking) as well as smart fuzzing and symbolic machine learning.

About the Post

We're looking for experienced OCaml programmers to join our development team. Your experience need not be industry - a masters or PhD thesis project with a solid implementation, preferably as part of a larger project, could be fine. You should have a “security mentality” and a keen eye for vulnerabilities as well as a passion for quality in software products. Your skills should include several of:

  • strong knowledge of data structures, algorithms, and designing for performance, scalability and security;
  • cryptography, including attacks on cryptographic algorithms by e.g. padding oracles;
  • formal logic;
  • compilers and language design;
  • cryptographic hardware (e.g. smartcards, HSMs).

You should enjoy working in a culture of ownership, mutual respect and collaboration. We just signed our first investment round, and can offer a competitive salary as well as stock options.

Get information on how to apply for this position.

July 31, 2014 01:36 PM

July 30, 2014

Functional Jobs

Generalist Engineer at Esper (Full-time)

What You’ll Do:

  • Work in a small team to brainstorm, design, prototype, plan, execute, and ship out our latest and greatest features.
  • Own a feature within Esper.
  • Own release and QA cycles of your code.
  • Be able to quickly identify a problem, and debug someone else’s code, if need be.

What You’ll Bring:

  • BS in computer science or equivalent work experience.
  • Strong functional and objective-oriented programming and debugging skills.
  • Experience developing native mobile apps is preferred (Java or Obj-C), but not required

Working With Us:

  • We’re a startup. Startups, by their nature, are based on speed and effort. You will not have a “typical” job - no 9 to 5, no stupid meetings, no layers, and no TPS reports. Our team works hard. But if you enjoy your work, team trips, and great food, it’s not really work.
  • We’re a proud functional programming (i.e. OCaml) company, but we also use javascript, java, and Obj-C when necessary.
  • We are inquisitive. Outwardly, we are engineers, designers, and product people. Inwardly, we could be airplane designers, mountain climbers, renowned singers, power cyclists, or any number of extraordinary beings.
  • We believe healthy people work better. Esper has standing desks, team hikes, healthy snacks (nuts will make you live longer), and discussions about circadian rhythm, to name a few. We are always updating our workplace to create a healthy environment. Like other startups, we work long hours, cater lunch/dinner, but we make sure to have flexibility for employees and provide full healthcare, vision and dental coverage.

Get information on how to apply for this position.

July 30, 2014 01:30 AM

July 29, 2014

GaGallium

Lawvere theories and monads

In this Agda file, I give an introduction to Lawvere theories and Monads. It's not exactly a "gentle" introduction because:

  1. I'm still trying to grok these things myself,
  2. I've to live up to my reputation of writing unfathomable gagaposts.

We'll start with the State monad and, briefly, a Tick monad. The game is the following: I'll present them syntactically -- through a signature -- and semantically -- through a collection of equations. I'll then show how the usual monad is related to this presentation.

<style type="text/css"> /* Aspects. */ .Comment { color: #B22222 } .Keyword { color: #CD6600 } .String { color: #B22222 } .Number { color: #A020F0 } .Symbol { color: #404040 } .PrimitiveType { color: #0000CD } .Operator {} /* NameKinds. */ .Bound { color: black } .InductiveConstructor { color: #008B00 } .CoinductiveConstructor { color: #8B7500 } .Datatype { color: #0000CD } .Field { color: #EE1289 } .Function { color: #0000CD } .Module { color: #A020F0 } .Postulate { color: #0000CD } .Primitive { color: #0000CD } .Record { color: #0000CD } /* OtherAspects. */ .DottedPattern {} .UnsolvedMeta { color: black; background: yellow } .UnsolvedConstraint { color: black; background: yellow } .TerminationProblem { color: black; background: #FFA07A } .IncompletePattern { color: black; background: #F5DEB3 } .Error { color: red; text-decoration: underline } .TypeChecks { color: black; background: #ADD8E6 } /* Standard attributes. */ a { text-decoration: none } a[href]:hover { background-color: #B4EEB4 } </style>
module Monad where

open import Level

open import Data.Unit hiding (setoid)
open import Data.Product

open import Relation.Binary
open import Relation.Binary.PropositionalEquality 
  hiding ( setoid ; isEquivalence )
  renaming ( trans to trans≡ ; sym to sym≡ ; cong to cong≡ ; cong₂ to cong₂≡ ) 

-- Ok, so let's set the scene with some historical context about
-- "computational effects". Back in the days, the only way to describe
-- the semantics of an effectful language was through an "operational
-- semantics": you more or less mimic what the machine does. The lack
-- of compositionality of such model is an annoyance: we would like a
-- "denotational model", in which the denotation of a blurb of code is
-- the composition of the denotation of its pieces.

-- Enters Moggi and his "Notions of Computation and Monads"
-- [http://www.disi.unige.it/person/MoggiE/ftp/ic91.pdf]: by picking
-- a suitable monad, we can interpret our effectful program into an
-- adequate Kleisli category (hence, we get compositionality).

-- From there, Phil Wadler
-- [http://homepages.inf.ed.ac.uk/wadler/papers/essence/essence.ps]
-- brought monads to mainstream programming through Haskell (this is
-- an Agda file: in my frame of reference, Haskell is some sort of
-- COBOL). However, these darned programmers also want to *combine*
-- monads: what if my program is stateful *and* exceptionful: can we
-- automagically build its supporting monad from the State monad and
-- the Exception monad? It's tricky because there can be some funny
-- interaction between State and Exception: upon an exception, do we
-- roll-back the state, or not?

-- In Edinburgh, John Power and Gordon Plotkin realized that (some)
-- monads could be described through a signature -- the effectful
-- operations -- and an equational theory -- describing how these
-- operations interact. As the slogan goes, "notions of computation
-- determine monads"
-- [https://www.era.lib.ed.ac.uk/bitstream/1842/196/1/Comp_Eff_Monads.pdf]:
-- we get the monad from a more basic (and more algebraic, yay!)
-- presentation. Categorically, this combination of a signature and an
-- equational theory is part of the toolbox of "Lawvere theories"
-- [http://ncatlab.org/nlab/show/Lawvere+theory]. I won't go into the
-- categorical details here: Jacques-Henri is holding a gun to my head


-- Before delving into the details, I should hasten to add that these
-- ideas are already percolating into "real-world programming" (again,
-- frame of reference, etc.). Conor McBride and his student Stevan
-- Andjelkovic haven been working on 'Frank'
-- [https://hackage.haskell.org/package/Frank] and its
-- dependently-typed version. Andrej Bauer, Matija Pretnar and their
-- team work on Eff [http://math.andrej.com/eff/], which includes a
-- notion of "effect handler": I'll leave that out completely in this
-- file (but the relationship with delimited continuations is truly
-- fascinating).


-- This file stems from a remark made by Tarmo Uustalu, who told me
-- this "one weird trick" to computing monads. Rumor has it that a
-- similar story is told between the lines in Paul-André Melliès'
-- "Segal condition meets computational effects"
-- [http://www.pps.univ-paris-diderot.fr/~mellies/papers/segal-lics-2010.pdf]
-- but, boy oh boy, there is a lot of space between these lines.


-- * Stateful operations

module StateMonad (S : Set) where

  -- ** Syntax: signature for state

  -- A stateful program is built from two state-manipulating
  -- operations:
  --   * get, which returns the current state
  --   * set, which updates the current state

  -- To formalize this intuition, we write the following signature:

  data ΣState (X : Set) : Set where
    `get : (S  X)  ΣState X
    `set : S × X  ΣState X


  -- Remark: if we were *that* kind of person, we would probably write
  -- ΣState from the following, more elementary blocks:

  data ΣGet (X : Set) : Set where
    `get : (S  X)  ΣGet X
  data ΣSet (X : Set) : Set where
    `set : S × X  ΣSet X
  data _⊕_ (F G : Set  Set)(X : Set) : Set where
    inl : F X  (F  G) X
    inr : G X  (F  G) X

  -- which gives: ΣState ≡ ΣGet ⊜ ΣSet

  -- But, for many reasons, this wouldn't be a good idea to follow
  -- that path just now.

  -- ** Free term algebra for State

  -- From a signature, we can build a *syntax* for writing stateful
  -- programs, that is, combining 'get's and 'set's and pure
  -- computations ('return'). Syntax is easy, it's a good ol'
  -- inductive type:

  data StateF (V : Set) : Set where
    return : V  StateF V
    op : ΣState (StateF V)  StateF V

  -- Think of 'V' as a set of variables, then 'StateF V' denotes
  -- stateful computations with variables in 'V'. Unsurprisingly,
  -- 'StateF' is a monad (a rather boring one, but still):

  mutual
    _>>=_ : ∀{V W}  StateF V  (V  StateF W)  StateF W
    return x >>= mf = mf x
    op fa >>= mf = op (ΣStatemap mf fa)
  
    -- Ignore this helper function, it's just here to please the
    -- Termination Checker:
    ΣStatemap : ∀{V W}  (V  StateF W)  ΣState (StateF V)  ΣState (StateF W)
    ΣStatemap mf (`get k) = `get  s  (k s) >>= mf)
    ΣStatemap mf (`set (s , k)) = `set (s , k >>= mf)

  -- If one thinks of 'V' and 'W' as sets of variables, then '>>=' can
  -- be thought as implementing a simultaneous substitution. One can
  -- also think of these things as trees (ie. syntax trees) terminated
  -- by 'V' leaves, to which one grafts trees terminated by 'W'
  -- leaves. Whichever you feel comfortable with.

  -- Reassuringly, our 'StateF' monad comes equipped with the
  -- so-called "generic" 'get' and 'set' operations, with the expect
  -- types:

  get : StateF S
  get = op (`get λ s  return s)
  
  set : S  StateF 
  set s = op (`set (s , return tt))

  -- Remark: Yes, yes, there is nothing special about 'StateF': given
  -- any (well-behaved) endofunctor 'F : Set → Set', we could build
  -- another functor 'Free F : Set → Set' which happens to be a monad:
  -- this is the 'free monad' construction
  -- [http://ncatlab.org/nlab/show/free+monad]. We would get, for
  -- free, the substitution '>>='. The free monad construction is a
  -- cottage industry, here are some pointers to the big names on the
  -- market:
  --   * [https://www.fpcomplete.com/user/dolio/many-roads-to-free-monads]
  --   * [http://blog.sigfpe.com/2014/04/the-monad-called-free.html]
  --   * [http://hackage.haskell.org/package/free-operational].

  -- Remark: it is a bit improper to call this thing the 'free monad':
  -- as we shall see, the category terrorist expects some form of quotienting
  -- over that free monad. Here, it is just a lump of syntax: I tend
  -- to call it the 'free term algebra', as in 'it's just syntax'.


  -- At this stage, we can write stateful programs, such as:
       
  test0 : StateF S
  test0 = get >>= λ s  
          set s >>= λ _  
          get >>= λ s'  
          return s'
  
  test1 : StateF S
  test1 = get >>= λ s'  
          return  s'
  
  test2 : StateF S
  test2 = get >>= λ s  
          set s >>= λ _  
          return s

  -- ** Equational theory of State
  
  -- Intuitively, the above examples denote the same program. Can we
  -- make this formal?

  -- To do so, we need to equip our syntax with an equational
  -- theory. That is, we need to specify which kind of identities
  -- should hold on stateful programs. Let's see what we want:

  data _≃_ {V : Set} : StateF V  StateF V  Set where

    -- 1. Getting the current state twice is equivalent to getting it
    --    only once
    get-get : ∀{k : S  S  StateF V}  
            (get >>=  s  get >>= λ s'  k s s' ))  (get >>= λ s  k s s )

    -- 2. Setting the state twice is equivalent to performing only the
    --    last 'set'
    set-set : ∀{k s₁ s₂}  
            (set s₁ >>=  _  set s₂ >>= λ _  k))  (set s₂ >>= λ _  k)

    -- 3. Getting the current state and setting it back in is
    --    equivalent to doing nothing
    get-set : ∀{k}  
            (get >>= λ s  set s >>= λ _  k)  k

    -- 4. Setting the state then getting its value is equivalent to
    --    setting the state and directly moving on with that value
    set-get : ∀{k s}  
            (set s >>=  _  get >>= k))  (set s >>= λ _  k s)


  -- Question: Where do these equations come from? Quite frankly, I
  -- took them from Matija Pretnar's thesis
  -- [http://matija.pretnar.info/pdf/the-logic-and-handling-of-algebraic-effects.pdf]. I
  -- hear that Paul-André came up with a minimal set of equations
  -- through some clever trick. There seems to be something 'rewrity'
  -- in the air: how much milliPlotkin would it take to generate these
  -- kinds of "critical pairs"?

  -- Remark: If you're lost and born a mathematician (a dreadful state
  -- of affair), you might want to connect my mumbo-jumbo with the way
  -- one introduces algebraic structures such as monoids, groups,
  -- etc.: 
  --   * One starts by giving a signature of operations, such as
  --     "there is a unary symbol '1' and a binary symbol '.'". That's
  --     a signature, as 'ΣState'.
  --   * Then, one gives a bunch of axioms by equating terms with some
  --     free variables, such as "(a . b) . c = a . (b . c), "e . a =
  --     a". That's an equational theory, as '_≃_' above.


  -- From the theory, we easily build its congruence closure:

  data _∼_ {V : Set} : StateF V  StateF V  Set₁ where
    -- 1. It includes '≃'
    inc : ∀{p q}  p  q  p  q

    -- 2. It is transitive, reflexive, and symmetric
    trans : ∀{p q r}  p  q  q  r  p  r
    refl : ∀{p}  p  p
    sym : ∀{p q}  p  q  q  p

    -- 3. It is a congruence, ie. we can lift it from subterms to
    --    whole terms:
    cong : ∀{W}(tm : StateF W){ps qs : W  StateF V}   
           (∀ w  ps w  qs w)  
           (tm >>= ps)  (tm >>= qs)



  setoid : Set  Setoid _ _
  setoid V = record
    { Carrier       = StateF V
    ; _≈_           = _∼_
    ; isEquivalence = isEquivalence
    }
    where  isEquivalence :  {V : Set}  IsEquivalence (_∼_ {V = V})
           isEquivalence = record
             { refl  = refl
             ; sym   = sym
             ; trans = trans
             }

  --  * Quotient: StateF/∼ = State

  -- What the Lawvere theory tells us is that the Haskellian's state
  -- monad can be recovered from the above, algebraic
  -- presentation. What your local category terrorists would say is
  -- that the Haskellian's state monad is obtained by taking our term
  -- algebra 'StateF' and quotienting it by the congruence
  -- '∼'. However, in type theory (and in programming in general)
  -- quotienting is not an option. Ideally, we would like to find a
  -- definition that is quotiented from the get-go.
  
  -- After thinking very hard, one realizes that every term of
  -- 'StateF' quotiented by '∼' will start with a 'get', followed by a
  -- 'set', followed by a 'return'. We thus have the following normal
  -- form:

  State : Set  Set 
  State X = ΣGet (ΣSet X)

  -- Holy crap, that's the usual State monad! Now, it's time to call
  -- your coworkers: there is some dude on the interwebs that has
  -- found the most convoluted way to build the State monad.


  -- But perhaps you don't trust me when I claim that this is the
  -- normal form. I'm going to (constructively!) prove it to you,
  -- using good ol' normalization by evaluation.

  -- First step, we interpret our stateful terms into a semantic
  -- domain (which is -- extensionally -- quotiented by the theory of
  -- State):

  eval : ∀{A}  StateF A  (S  S × A)
  eval (return a) = λ s  (s , a)
  eval (op (`get k)) = λ s  eval (k s) s
  eval (op (`set (s' , k))) = λ s  eval k s'
  
  -- Second step, we reify the semantic objects into normal forms,
  -- which is trivial:

  reify : ∀{A}  (S  S × A)  State A
  reify {A} f = `get λ s  `set (f s)

  -- The normalization procedure *really* computes the normal form
  norm : ∀{A}  StateF A  State A
  norm p = reify (eval p)

  -- and these normal forms are a subset of terms:
  ⌈_⌉ : ∀{A}  State A  StateF A 
   `get k  = get >>= λ s  help (k s) 
    where help :  {A}  ΣSet A  StateF A
          help (`set (s , v)) = set s >>= λ _  return v
  

  -- When we read the statement "there exists a normal form"
  -- constructively, it means that we have a procedure that computes
  -- this normal form. That's exactly the 'norm' function.

  -- ** Soundness & Completeness

  -- Now, we want to check that term thus computed is indeed a normal
  -- form. This is captured by two statement, a 'soundness' result and
  -- a 'completeness' result.


  -- (Some auxiliary lemmas, which we prove later:
  pf-sound : ∀{A}  (p : StateF A)  p   norm p 
  pf-complete :  {A} {p q : StateF A}  p  q  ∀{s}  eval p s  eval q s
  -- )

  -- First, 'norm' is sound: if two terms have the same normal form,
  -- they belong to the same congruence class:

  sound :  {V} (p q : StateF V)   norm p    norm q   p  q
  sound {V} p q r =
    begin
      p
    ≈⟨ pf-sound p  
       norm p 
    ≡⟨ r 
       norm q 
    ≈⟨ sym (pf-sound q) 
      q
    
      where open  import Relation.Binary.EqReasoning (setoid V)
    
  -- Second, 'norm' is complete: if two terms belong to the same
  -- congruence class, they have the same normal form.

  complete :  {A} {p q : StateF A}  p  q   norm p    norm q 
  complete {p = p} {q} r = 
    begin
       norm p 
    ≡⟨ refl 
       reify (eval p) 
    ≡⟨ cong≡  x   reify x ) (ext  z  pf-complete r)) 
       reify (eval q) 
    ≡⟨ refl 
       norm q 
    
    where open ≡-Reasoning

          -- We neeed extensionality, but it's not a big deal: this is a
          -- proof, not a program.
          postulate ext : Extensionality zero zero 


  -- Soundness and completeness rely on these technical lemmas, which
  -- are not worth our attention:
  
  pf-sound (return x) = sym (inc get-set)
  pf-sound {V} (op (`get k)) =
    begin 
      op (`get k)
    ≡⟨ refl 
      get >>= k
    ≈⟨ cong (op (`get return)) 
              s'  pf-sound (k s')) 
       get >>=  s'   norm (k s') )
    ≡⟨ refl 
      op (`get λ s'  
      op (`get  s  
      op (`set (proj₁ (eval (k s') s) , 
      return (proj₂ (eval (k s') s)))))))
    ≈⟨ inc get-get 
      op (`get λ s  
      op (`set (proj₁ (eval (k s) s) , 
      return (proj₂ (eval (k s) s)))))
    ≡⟨ refl 
       norm (op (`get k)) 
    
      where open  import Relation.Binary.EqReasoning (setoid V)
  pf-sound {V} (op (`set (s' , k))) =
    begin 
      op (`set (s' , k ))
    ≈⟨ cong (op (`set (s' , return tt)))  _  pf-sound k) 
      op (`set (s' ,  norm k ) )
    ≡⟨ refl 
      op (`set (s' , 
      op (`get  s  
      op (`set (proj₁ (eval k s),
      return (proj₂ (eval k s))))))))
    ≈⟨ inc set-get 
      op (`set (s' ,
      op (`set (proj₁ (eval k s'),
      return (proj₂ (eval k s'))))))
    ≈⟨ inc set-set 
      op (`set (proj₁ (eval k s'),
      return (proj₂ (eval k s'))))
    ≈⟨ sym (inc get-set) 
      op (`get λ s 
      op (`set (s , 
      op (`set (proj₁ (eval k s'),
      return (proj₂ (eval k s')))))))
    ≈⟨ cong (get >>= return)  s  inc set-set) 
      op (`get λ s 
      op (`set (proj₁ (eval k s'),
      return (proj₂ (eval k s')))))
    ≡⟨ refl 
       norm (op (`set (s' , k))) 
    
      where open import Relation.Binary.EqReasoning (setoid V)
    
  eval-compose : ∀{A B}(tm : StateF A)(k : A  StateF B){s}  
               eval (tm >>= k) s
              (let p : S × A 
                    p = eval tm s in
                 eval (k (proj₂ p)) (proj₁ p))
  eval-compose (return x) k = λ {s}  refl
  eval-compose (op (`get x)) k = λ {s}  eval-compose (x s) k
  eval-compose (op (`set (s' , x))) k = λ {s}  eval-compose x k
    
  pf-complete (inc get-get) = refl
  pf-complete (inc set-set) = refl
  pf-complete (inc set-get) = refl
  pf-complete (inc get-set) = refl
  pf-complete {p = p}{q} (trans {q = r} r₁ r₂) = λ {s} 
    begin 
      eval p s
    ≡⟨ pf-complete r₁ 
      eval r s
    ≡⟨ pf-complete r₂ 
      eval q s
    
    where open ≡-Reasoning
  pf-complete refl = refl
  pf-complete (sym r) = sym≡ (pf-complete r)
  pf-complete (cong tm {ps}{qs} x) = λ {s}  
    begin
      eval (tm >>= ps) s
    ≡⟨ eval-compose tm ps 
      eval (ps (proj₂ (eval tm s))) (proj₁ (eval tm s))
    ≡⟨ pf-complete (x (proj₂ (eval tm s))) 
      eval (qs (proj₂ (eval tm s))) (proj₁ (eval tm s))
    ≡⟨ sym≡ (eval-compose tm qs) 
      eval (tm >>= qs) s
    
    where open ≡-Reasoning

  -- ** Examples

  -- Right. Now we have a reflexive decision procedure for equality of
  -- stateful programs. Let's use it!

  -- For instance we can "prove" (by a trivial reasoning) that our
  -- earlier programs 'test0', 'test1' and 'test2' are all equivalent:

  test01 : test0  test1
  test01 = sound test0 test1 refl
  
  test12 : test1  test2
  test12 = sound test1 test2 refl

  -- The trick here is to rely on the soundness of normalization and
  -- compare the norm forms for equality.


  -- We can also do some funky reasoning. For instance, Gabriel was
  -- wondering why 'cong' is right-leaning: we can only substitute for
  -- subterms 'ps' and 'qs' under a common 'tm', while one might want
  -- to have a more general version:

  cong₂ : ∀{V W}(tm tm' : StateF W){ps qs : W  StateF V}   
         (tm  tm') 
         (∀ w  ps w  qs w)  
         (tm >>= ps)  (tm' >>= qs)

  -- We prove this more general statement by working over the normal
  -- forms. First, a boring, technical lemma that generalizes
  -- 'eval-compose' to the normalization procedure:

  norm-compose : ∀{V W}(tm : StateF W)(ps : W  StateF V) 
     norm (tm >>= ps)    norm ( norm tm  >>= λ w   norm (ps  w) ) 
  norm-compose tm ps = 
    begin
       norm (tm >>= ps) 
    ≡⟨ refl 
       op (`get  s 
       op (`set (let p : S × _
                     p = eval (tm >>= ps) s in
       proj₁ p , return (proj₂ p)))))
    ≡⟨ cong≡  k  op (`get k)) (ext help) 
       op (`get  s 
       (op (`set (let p₁ : S × _
                      p₁ = eval tm s 
                      p₂ : S × _
                      p₂ = eval (ps (proj₂ p₁)) (proj₁ p₁) in
           proj₁ p₂ , return  (proj₂ p₂))))))
    ≡⟨ refl 
       norm ( norm tm  >>= λ w   norm (ps  w) ) 
    
      where postulate ext : Extensionality zero zero 
            open ≡-Reasoning
            help : (s : S)  (op (`set (let p : S × _
                                            p = eval (tm >>= ps) s in
                                 proj₁ p , return (proj₂ p))))
                            (op (`set (let p₁ : S × _
                                            p₁ = eval tm s 
                                            p₂ : S × _
                                            p₂ = eval (ps (proj₂ p₁)) (proj₁ p₁) in
                                 proj₁ p₂ , return  (proj₂ p₂))))
            help s = cong≡  { (s , k)   op (`set (s , return k)) }) (eval-compose tm ps) 


  -- From which follows the generalized congruence:

  cong₂ {V} tm tm' {ps}{qs} q qp = sound (tm >>= ps) (tm' >>= qs)
    (begin
       norm (tm >>= ps) 
    ≡⟨ norm-compose tm ps 
       norm ( norm tm  >>= λ w   norm (ps  w) ) 
    -- Remark: we are using the completeness here!
    ≡⟨ cong₂≡  t k   norm (t >>= k) ) 
              (complete q) 
              (ext  w  complete (qp w))) 
       norm ( norm tm'  >>= λ w   norm (qs  w) ) 
    ≡⟨ sym≡ (norm-compose tm' qs) 
       norm (tm' >>= qs) 
    )
      where postulate ext : Extensionality zero zero 
            open ≡-Reasoning

  
-- * Tick monad

open import Algebra
import Level

-- I've hinted at the fact that:
--   1. We could generalize much of the above machinery (free monad,
--      congruence, etc.), and
--   2. There is a general principle at play when going from signature
--      & equations to some normal form representation

-- To give another hint, let me breeze through another monad, namely
-- the 'tick' monad.

module Tick (M : Monoid Level.zero Level.zero) where

  open Monoid M

  -- Let 'M' be a monoid. We call 'R' its carrier.

  R : Set
  R = Carrier

  -- ** Signature for counter

  -- The 'Tick' monad has a single operation, 'tick' which lets us add
  -- some amount 'r : R' to a global counter.

  data ΣTick (X : Set) : Set where
    `tick : R × X  ΣTick X

  -- ** Free term algebra

  -- As usual, we get the syntax for tickful programs:

  data TickF (V : Set) : Set where
    return : V  TickF V
    op : ΣTick (TickF V)  TickF V

  tick : R  TickF 
  tick r = op (`tick (r , return tt))

  mutual
    _>>=_ : ∀{A B}  TickF A  (A  TickF B)  TickF B
    return x >>= mf = mf x
    op fa >>= mf = op (ΣTickmap mf fa)

    ΣTickmap : ∀{A B}  (A  TickF B)  ΣTick (TickF A)  ΣTick (TickF B)
    ΣTickmap mf (`tick (r , k)) = `tick (r , k >>= mf)

  -- ** Equational theory

  -- The equational theory, once again stolen from Matija's thesis, is
  -- as follows:

  data _≃_ {V : Set} : TickF V  TickF V  Set where
    -- 1. Counting ε ticks amounts to nothing:
    tick-eps : ∀{k : TickF V}  
      (tick ε >>= λ _  k)  k

    -- 2. Counting r₁ ticks followed by r₂ ticks amounts to counting 
    --    r₁ ∙ r₂ ticks:
    tick-com : ∀{k : TickF V}{r₁ r₂} 
      (tick r₁ >>= λ _  
       tick r₂ >>= λ _  k)  (tick (r₁  r₂) >>= λ _  k)

  -- ** Normal form

  -- Again, we think very hard and realize that every 'TickF' program
  -- amounts to a single tick (counting the sum of all sub-ticks):
  Tick : Set  Set 
  Tick X = ΣTick X

  -- We "prove" this a posteriori by a NbE procedure:
  eval : ∀{A}  TickF A  R × A
  eval (return a) = ε , a
  eval {A} (op (`tick (r , k))) = 
    let p : R × A 
        p = eval k in
     r  (proj₁ p) , proj₂ p

  reify : ∀{A}  R × A  Tick A
  reify {A} (r , a) = `tick (r , a)

  norm : ∀{A}  TickF A  Tick A
  norm p = reify (eval p)

  ⌈_⌉ : ∀{A}  Tick A  TickF A 
   `tick (r , a)  = tick r >>= λ _  return a

  -- Gabriel has allowed me to skip the proof: let's hope that it's
  -- indeed true!


-- * Conclusion: 

-- We have recovered the usual State (and Tick) monad from an
-- algebraic presentation based on an equational theory. The key idea
-- was to consider the equational theory as a rewriting system and
-- look for its normal forms. We have justified this equivalence
-- through a normalization-by-evaluation procedure, which we then
-- abused to get proofs by reflection.

-- I wonder how much of all that is already included in Danel Ahman
-- and Sam Staton's "Normalization by Evaluation and Algebraic
-- Effects"
-- [http://homepages.inf.ed.ac.uk/s1225336/papers/mfps13.pdf]: let's
-- push that on my toread list.


-- Exercises:

--   1. Implement a generic "free monad construction", equipped with
--      its operators (return, map, and bind). 

--   2. Recast the State and Tick monads in that mold. 

--   3. Implement another monad in that framework. Careful, you're
--      probably thinking about the Exception monad: handling
--      exceptions is not an algebraic effect, so it won't work. If
--      you restrict yourself to 'throw' (ignoring 'catch'), that
--      should work tho.

-- Open questions: 

--   * I have used intuitions and terminology from rewriting theory to
--     justify my moves: could we further iron out this connection?

--   * Stevan also wonders whether one could relate the duality
--     adjunction/monad to something rewrity? The categorical aspects
--     seem to be nicely presented in "Handling Algebraic Effects"
--     [http://arxiv.org/abs/1312.1399]

--   * I have left aside the question of *combining* theories: what
--     about combining state and tick, etc.? Again, categorically,
--     Plotkin, Power and Hyland have covered a lot of ground in
--     "Combining effects: sum and tensor"
--     [http://homepages.inf.ed.ac.uk/gdp/publications/Comb_Effects_Jour.pdf]. However,
--     Tarmo's talk at IHP seem to suggest that there is more to it
--     than tensor and sums (sorry, I can't find the slides online).

--   * Algebraic effects do not capture all monads: the Exception
--     monad (the one with a 'throw' *and* a 'catch') is such a
--     monad. Does the notion of 'effect handling'/'delimited
--     continuation' fit into the rewriting picture?

by Pierre Dagand at July 29, 2014 08:00 AM

July 28, 2014

Thomas Leonard

My first unikernel

I wanted to make a simple REST service for queuing file uploads, deployable as a virtual machine. The traditional way to do this is to download a Linux cloud image, install the software inside it, and deploy that. Instead I decided to try a unikernel.

Unikernels promise some interesting benefits. The Ubuntu 14.04 amd64-disk1.img cloud image is 243 MB unconfigured, while the unikernel ended up at just 5.2 MB (running the queue service). Ubuntu runs a large amount of C code in security-critical places, while the unikernel is almost entirely type-safe OCaml. And besides, trying new things is fun.

( this post also appeared on Reddit and Hacker News )

Table of Contents

Regular readers will know that a few months ago I began a new job at Cambridge University. Working for an author of Real World OCaml and leader of OCaml Labs, on a project building pure-OCaml distributed systems, who found me through my blog posts about learning OCaml, I thought they might want me to write some OCaml.

But no. They’ve actually had me porting the tiny Mini-OS kernel to ARM, using a mixture of C and assembler, to let the Mirage unikernel run on ARM devices. Of course, I got curious and wanted to write a Mirage application for myself…

Introduction

Linux, like many popular operating systems, is a multi-user system. This design dates back to the early days of computing, when a single expensive computer, running a single OS, would be shared between many users. The goal of the kernel is to protect itself from its users, and to protect the users from each other.

Today, computers are cheap and many people own several. Even when a physical computer is shared (e.g. in cloud computing), this is typically done by running multiple virtual machines, each serving a single user. Here, protecting the OS from its (usually single) application is pointless.

Removing the security barrier between the kernel and the application greatly simplifies things; we can run the whole system (kernel + application) as a single, privileged, executable - a unikernel.

And while we’re rewriting everything anyway, we might as well replace C with a modern memory safe language, eliminating whole classes of bugs and security vulnerabilities, allowing decent error reporting, and providing structured data types throughout.

In the past, two things have made writing a completely new OS impractical:

  • Legacy applications won’t run on it.
  • It probably won’t support your hardware.

Virtualisation removes both obstacles: legacy applications can run in their own legacy VMs, and drivers are only needed for the virtual devices - e.g. a single network driver and a single block driver will cover all real network cards and hard drives.

A hello world kernel

The mirage tutorial starts by showing the easy, fully-automated way to build a unikernel. If you want to get started quickly you may prefer to read that and skip this section, but since one of the advantages of unikernels is their relative simplicity, let’s do things the “hard” way first to understand how it works behind the scenes.

Here’s the normal “hello world” program in OCaml:

<notextile><figure class="code"><figcaption>hw.ml </figcaption>
1
2
let () =
  print_endline "Hello, world!"
</figure></notextile>

To compile and run as a normal application, we’d do:

$ ocamlopt hw.ml -o hw
$ ./hw 
Hello, world!

How can we make a unikernel that does the equivalent? As it turns out, the above code works unmodified (though the Mirage people might frown at you for doing it this way). We compile hw.ml to a hw.native.o file and then link with the unikernel libraries instead of the standard C library:

$ export OPAM_DIR=$(opam config var prefix)
$ export PKG_CONFIG_PATH=$OPAM_DIR/lib/pkgconfig
$ ocamlopt -output-obj -o hw.native.o hw.ml
$ ld -d -static -nostdlib --start-group \
    $(pkg-config --static --libs openlibm libminios-xen) \
    hw.native.o \
    $OPAM_DIR/lib/mirage-xen/libocaml.a \
    $OPAM_DIR/lib/mirage-xen/libxencaml.a \
    --end-group \
    $(gcc -print-libgcc-file-name) \
    -o hw.xen

We now have a kernel image, hw.xen, which can be booted as a VM under the Xen hypervisor (as used by Amazon, Rackspace, etc to host VMs). But first, let’s look at the libraries we added:

openlibm
This is a standard maths library. It provides functions such as sin, cos, etc.
libminios-xen
This provides the architecture-specific boot code, a printk function for debugging, malloc for allocating memory and some low-level functions for talking to Xen.
libocaml.a
The OCaml runtime (the garbage collector, etc).
libxencaml.a
OCaml bindings for libminios and some boot code.
libgcc.a
Support functions for code that gcc generates (actually, not needed on x86).

To deploy the new unikernel, we create a Xen configuration file for it (here, I’m giving it 16 MB of RAM):

<notextile><figure class="code"><figcaption>hw.xl </figcaption>
1
2
3
4
5
name = 'hw'
kernel = 'hw.xen'
memory = 16
on_crash = 'preserve'
on_poweroff = 'preserve'
</figure></notextile>

Setting on_crash and on_poweroff to preserve lets us see any output or errors, which would otherwise be missed if the VM exits too quickly.

We can now boot our new VM:

$ xl create -c hw.xl
Xen Minimal OS!
  start_info: 000000000009b000(VA)
    nr_pages: 0x800
  shared_inf: 0x6ee97000(MA)
     pt_base: 000000000009e000(VA)
nr_pt_frames: 0x5
    mfn_list: 0000000000097000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: 
       stack: 0000000000055e00-0000000000075e00
Mirage: start_kernel
MM: Init
      _text: 0000000000000000(VA)
     _etext: 000000000003452d(VA)
   _erodata: 000000000003c000(VA)
     _edata: 000000000003e4d0(VA)
stack start: 0000000000055e00(VA)
       _end: 0000000000096d64(VA)
  start_pfn: a6
    max_pfn: 800
Mapping memory range 0x400000 - 0x800000
setting 0000000000000000-000000000003c000 readonly
skipped 0000000000001000
MM: Initialise page allocator for a8000(a8000)-800000(800000)
MM: done
Demand map pfns at 801000-2000801000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000000801000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Hello, world!
main returned 0

( Note: I’m testing locally by running Xen under VirtualBox. Not all of Xen’s features can be used in this mode, but it works for testing unikernels. I’m also using my Git version of mirage-xen; the official one will display an error after printing the greeting because it expects you to provide a mainloop too. The warnings about lseek are just OCaml trying to find the current file offsets for stdin, stdout and stderr.)

As you can see, the boot process is quite short. Execution begins at _start. Using objdump -d hw.xen, you can see that this just sets up the stack pointer register and calls the C function arch_init:

0000000000000000 <_start>:
       0:   fc                      cld    
       1:   48 8b 25 0f 00 00 00    mov    0xf(%rip),%rsp        # 17 <stack_start>
       8:   48 81 e4 00 00 ff ff    and    $0xffffffffffff0000,%rsp
       f:   48 89 f7                mov    %rsi,%rdi
      12:   e8 e2 bb 00 00          callq  bbf9 <arch_init>

arch_init (in libminios) initialises the traps and FPU and then prints Xen Minimal OS! and information about various addresses. It then calls start_kernel.

start_kernel (in libxencaml) sets up a few more features (events, interrupts, malloc, time-keeping and grant tables), then calls caml_startup.

caml_startup (in libocaml) initialises the garbage collector and calls caml_program, which is our hw.native.o.

We call print_endline, which libxencaml, as a convenience for debugging, forwards to libminios’s console_print.

Using Mirage libraries

The above was a bit of a hack, which ended up just using the C console driver in libminios (one of the few things it provides, as it’s needed for printk). We can instead use the mirage-console-xen OCaml library, like this:

<notextile><figure class="code"><figcaption>hw.ml </figcaption>
1
2
3
4
5
6
7
8
9
10
open Lwt

let main =
  Console.connect "0" >>= function
  | `Error _ -> failwith "Failed to connect to console"
  | `Ok default_console ->
      Console.log_s default_console "Hello, world!"

let () =
  OS.Main.run main
</figure></notextile>

Mirage uses the usual Lwt library for cooperative threading, which I wrote about at last year in Asynchronous Python vs OCaml - >>= means to wait for the result, allowing other code to run. Everything in Mirage is non-blocking, even looking up the console. OS.Main.run runs the main event loop.

Since we’re using libraries, let’s switch to ocamlbuild and give the dependencies in the _tags file, as usual for OCaml projects:

<notextile><figure class="code"><figcaption>_tags </figcaption>
1
true: warn(A), strict_sequence, package(mirage-console-xen)
</figure></notextile>

The only unusual thing we have to do here is tell ocamlbuild not to link in the Unix module when we build hw.native.o:

$ ocamlbuild -lflags -linkpkg,-dontlink,unix -use-ocamlfind hw.native.o

In the same way, we can use other libraries to access raw block devices (mirage-block-xen), timers (mirage-clock-xen) and network interfaces (mirage-net-xen). Other (non-Xen-specific) OCaml libraries can then be used on top of these low-level drivers. For example, fat-filesystem can provide a filesystem on a block device, while tcpip provides an OCaml TCP/IP stack on a network interface.

The mirage-unix libraries

You may have noticed that the Xen driver libraries we used above ended in -xen. In fact, each of these is just an implementation of some generic interface provided by Mirage. For example, mirage/types defines the abstract CONSOLE interface as:

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
module type CONSOLE = sig
  (** Text console input/output operations. *)

  type error = [
    | `Invalid_console of string
  ]
  (** The type representing possible errors when attaching a console. *)

  include DEVICE with
    type error := error

  val write : t -> string -> int -> int -> int
  (** [write t buf off len] writes up to [len] chars of [String.sub buf
      off len] to the console [t] and returns the number of bytes
      written. Raises {!Invalid_argument} if [len > buf - off]. *)

  val write_all : t -> string -> int -> int -> unit io
  (** [write_all t buf off len] is a thread that writes [String.sub buf
      off len] to the console [t] and returns when done. Raises
      {!Invalid_argument} if [len > buf - off]. *)

  val log : t -> string -> unit
  (** [log str] writes as much characters of [str] that can be written
      in one write operation to the console [t], then writes
      "\r\n" to it. *)

  val log_s : t -> string -> unit io
  (** [log_s str] is a thread that writes [str ^ "\r\n"] in the
      console [t]. *)

end
</figure></notextile>

By linking against the -unix versions of libraries rather than the -xen ones, we can compile our code as an ordinary Unix program and run it directly. This makes testing and debugging very easy.

To make sure our code is generic enough to do this, we can wrap it in a functor that takes any console module as an input:

<notextile><figure class="code"><figcaption>unikernel.ml </figcaption>
1
2
3
4
module Main (C : V1_LWT.CONSOLE) = struct
  let start c =
    C.log_s c "Hello, world!"
end
</figure></notextile>

The code that provides a Xen or Unix console and calls this goes in main.ml:

<notextile><figure class="code"><figcaption>main.ml </figcaption>
1
2
3
4
5
6
7
8
9
10
11
open Lwt

let console =
  Console.connect "0" >>= function
  | `Error _ -> failwith "Failed to connect to console"
  | `Ok c -> return c

module U = Unikernel.Main(Console)

let () =
  OS.Main.run (console >>= U.start)
</figure></notextile>

The mirage tool

With the platform-specific code isolated in main.ml, we can now use the mirage command-line tool to generate it automatically for the target platform. mirage takes a config.ml configuration file and generates Makefile and main.ml based on the current platform and the arguments passed.

<notextile><figure class="code"><figcaption>config.ml </figcaption>
1
2
3
4
5
6
7
8
open Mirage

let main = foreign "Unikernel.Main" (console @-> job)

let () =
  register "hw" [
    main $ default_console
  ]
</figure></notextile>
$ mirage configure --unix
$ make
$ ./mir-hw 
Hello, world!

I won’t describe this in detail because at this point we’ve reached the start of the official tutorial, and you can read that instead.

Test case

Because 0install is decentralised, it doesn’t need a single centrally-managed repository (or several incompatible repositories, each trying to package every program, as is common with Linux distributions). In 0install, it’s possible for every developer to run their own repository, containing just their software, with cross-repository dependencies handled automatically. But just because it’s possible doesn’t mean we have to go to that extreme: having medium sized repositories each managed by a team of people can be very convenient, especially where package maintainers come and go.

The general pattern for a group repository is to have a public server that accepts new package uploads from developers, and a private (firewalled) server with the repository’s GPG key, which downloads from it:

Debian uses an anonymous FTP server for its incoming queue, polling it with a cron job. This turns out to be surprisingly complicated. You need to handle incomplete uploads (not processing them until they’re done, or deleting them eventually if they never complete), allow contributors to overwrite or delete their own partial uploads (Debian allows you to upload a GPG-signed command file, which provides some control), etc, as well as keep the service fully patched. Also, the cron system can be annoying: if the package contains a mistake then it will be several minutes before it discovers this and emails the packager.

Perhaps there are some decent systems out there to handle all this, but it seemed like a good opportunity to try making a unikernel.

A particularly nice feature of this test-case is that it doesn’t matter too much if it fails: the repository itself will check the developer’s signature on the files, so an attacker can’t compromise the repository by breaking into the queue; everything in the queue is intended to become public, so we need not worry much about confidentiality; lost uploads can be easily resubmitted; and if it goes down for a bit, it just means that new software can’t be added to the repository. So, there’s nothing critical about this service, which is reassuring.

Storage

The merge-queues library builds a queue abstraction on top of Irmin, a Git-inspired storage system for Mirage. But my needs are simple, and I wanted to test the more primitive libraries first, so I decided to build my queue directly on a plain filesystem. This was the first interface I came up with:

<notextile><figure class="code"><figcaption>upload_queue.mli </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
module type FS = V1_LWT.FS with
  type page_aligned_buffer = Cstruct.t and
  type block_device_error = Fat.Fs.block_error

(** An upload.
 * To avoid loading complete uploads into RAM, we stream them
 * between the network and the disk. *)
type item = {
  size : int64;
  data : string Lwt_stream.t;
}

type add_error = [`Wrong_size of int64 | `Unknown of exn]

module Make : functor (F : FS) -> sig
  (** An upload queue. *)
  type t

  (** Create a new queue, backed by a filesystem. *)
  val create : F.t -> t Lwt.t

  module Upload : sig
    (** Add an upload to the queue.
     * The upload is added only once the end of the stream is
     * reached, and only if the total size matches the size
     * in the record.
     * To cancel an add, just terminate the stream. *)
    val add :
      t -> item -> [ `Ok of unit | `Error of add_error ] Lwt.t
  end

  module Download : sig
    (** Interface for the repository software to fetch items
     * from the queue. Only one client may use this interface
     * at a time, or things will go wrong. *)

    (** Return a fresh stream for the item at the head of the
     * queue, without removing it. After downloading it
     * successfully, the client should call [delete]. If the
     * queue is empty, this blocks until an item is available. *)
    val peek : t -> item Lwt.t

    (** Delete the item previously retrieved by [peek].
     * If the previous item has already been deleted, this does
     * nothing, even if there are more items in the queue. *)
    val delete : t -> unit Lwt.t
  end
end
</figure></notextile>

Our unikernel.ml will use this to make a queue, backed by a filesystem. Uploaders’ HTTP POSTs will be routed to Upload.add, while the repository’s GET and DELETE invocations go to the Download submodule. delete is a separate operation because we want the repository to confirm that it got the item successfully before we delete it, in case of network errors.

Ideally, we might require that the DELETE comes over the same HTTP connection as the GET just in case we accidentally run two instances of the repository software, but that’s unlikely and it’s convenient to test using separate curl invocations.

We’re using another functor here, Upload_queue.Make, so that our queue will work over any filesystem. In theory, we can configure our unikernel with a FAT filesystem on a block device when running under Xen, while using a regular directory when running under Linux (e.g. for testing).

But it doesn’t work. You can see at the top that I had to restrict Mirage’s abstract FS type in two ways:

  • The read and write functions in FS pass the data using the abstract page_aligned_buffer type. Since we need to do something with the data, this isn’t good enough. I therefore declare that this must be a Cstruct.t (basically, an array of bytes). This is actually OK; mirage-fs-unix also uses this type.

  • One of the possible error codes from FS is the abstract type FS.block_device_error, and I can’t see any way to turn one of these into a string using the FS interface. I therefore require a filesystem implementation that defines it to be Fat.Fs.block_error. Obviously, this means we now only support the FAT filesystem.

This doesn’t prevent us from running as a normal process, because we can ask for a Unix “block” device (actually, just a plain disk.img file) and pass that to the Fat module, but it would be nice to have the option of using a real directory.

I asked about this on the mailing list - Mirage questions from writing a REST service - and it looks like the FS type will change soon.

Implementation

For the curious, this initial implementation is in upload_queue.ml.

Internally, the module creates an in-memory queue to keep track of successful uploads. Uploads are streamed to the disk and when an upload completes with the declared size, the filename is added to the queue. If the upload ends with the wrong size (probably because the connection was lost), the file is deleted.

But what if our VM gets rebooted? We need to scan the file system at start up and work out which uploads are complete and which should be deleted. My first thought was to name the files NUMBER.part during the upload and rename on success. However, the FS interface currently lacks a rename method. Instead, I write an N byte to the start of each file and set it to Y on success. That works, but renaming would be nicer!

For downloading, the peek function returns the item at the head of the queue. If the queue is empty, it waits until something arrives. The repository just makes a GET request - if something is available then it returns immediately, otherwise the connection stays open until some data is ready, allowing the repository to respond immediately to new uploads.

Unit-testing the storage system

Because our unikernel can run as a process, testing is easy even if you don’t have a local Xen deployment. A set of unit-tests test the upload queue module just as for any other program, and the service can be run as a normal process, listening on a normal TCP socket. A slight annoyance here is that the generated Makefile doesn’t include any rules to build the tests so you have to add them manually, and if you regenerate the Makefile then it loses the new rule.

As you might expect from such a new system, testing uncovered several problems. The first (minor) problem is that when the disk becomes full, the unhelpful error reported by the filesystem is Failure("Unknown error: Failure(\"fault\")").

( I asked about this on the mailing list - Error handling in Mirage - and there seems to be agreement that error handling should change. )

A more serious problem was that deleting files corrupted the FAT directory index. I downloaded the FAT library and added a unit-test for delete, which made it easy to track the problem down (despite my lack of knowledge of FAT). Here’s the code for marking a directory entry as deleted in the FAT library:

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
    let b = Cstruct.sub block offset sizeof in
    let delta = Cstruct.create sizeof in
    begin match unmarshal b with
      | Lfn lfn ->
	let lfn' = { lfn with lfn_deleted = true } in
	marshal delta (Lfn lfn')
      | Dos dos ->
	let dos' = { dos with deleted = true } in
	marshal b (Dos dos')
      | End -> assert false
    end;
    Update.from_cstruct (Int64.of_int offset) delta :: acc
</figure></notextile>

It’s supposed to take an entry, unmarshal it into an OCaml structure, set the deleted flag, and marshal the result into a new delta structure. These deltas are returned and applied to the device. The bug is a simple typo: Lfn (long filename) entries update correctly, but for old Dos ones it writes the new block to the input, not to delta. The fix was simple enough (I also refactored it slightly to encourage the correct behaviour in future):

<notextile><figure class="code">
1
2
3
4
5
6
7
8
    let b = Cstruct.sub block offset sizeof in
    let delta = Cstruct.create sizeof in
    marshal delta begin match unmarshal b with
      | Lfn lfn -> Lfn { lfn with lfn_deleted = true }
      | Dos dos -> Dos { dos with deleted = true }
      | End -> assert false
    end;
    Update.from_cstruct (Int64.of_int offset) delta :: acc
</figure></notextile>

This demonstrates both the good and the bad of Mirage: the bug was easy to find and fix, using regular debugging tools. I’m sure fixing a filesystem corruption bug in the Linux kernel would have been vastly more difficult. On the other hard, Linux is rather well tested, whereas I appear to be the first person ever to try deleting a file in Mirage!

The HTTP server

This turned out to be quite simple. Here’s the unikernel’s start function:

<notextile><figure class="code"><figcaption>unikernel.ml </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
module Main (C : V1_LWT.CONSOLE)
            (F : Upload_queue.FS)
            (H : Cohttp_lwt.Server) = struct
  module Q = Upload_queue.Make(F)
  [...]
  let start c fs http =
    Log.write := C.log_s c;
    Log.info "starting queue service" >>= fun () ->

    Q.create fs >>= fun q ->

    let callback _conn_id request body =
      match Uri.path request.H.Request.uri with
      | "/uploader" -> handle_uploader q request body
      | "/downloader" -> handle_downloader q request
      | path ->
          H.respond_error
	    ~status:`Bad_request
      	    ~body:(Printf.sprintf "Bad path '%s'\n" path)
	    () in

    let conn_closed _conn_id () =
      Log.info "connection closed" |> ignore in

    http { H.
      callback;
      conn_closed
    }
end
</figure></notextile>

Here, our functor is extended to take a filesystem (using the restricted type required by our Upload_queue, as noted above) and an HTTP server module as arguments.

The HTTP server calls our callback each time it receives a request, and this dispatches /uploader requests to handle_uploader and /downloader ones to handle_downloader. These are also very simple, e.g.

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  let get q =
    Q.Download.peek q >>= fun {Upload_queue.size; data} ->
    let body = Cohttp_lwt_body.of_stream data in
    let headers = Cohttp.Header.init_with
      "Content-Length" (Int64.to_string size) in
    (* Adding a content-length loses the transfer-encoding
     * for some reason, so add it back: *)
    let headers = Cohttp.Header.add headers
      "transfer-encoding" "chunked" in
    H.respond ~headers ~status:`OK ~body ()

  let handle_downloader q request =
    match H.Request.meth request with
    | `GET -> get q
    | `DELETE -> delete q
    | `HEAD | `PUT | `POST
    | `OPTIONS | `PATCH -> unsupported_method
</figure></notextile>

The other methods (put and delete) are similar.

Buffered reads

Running as a --unix process, I initially got a download speed of 17.2 KB/s, which was rather disappointing. Especially as Apache on the same machine gets 615 MB/s!

Increasing the size of the chunks I was reading from the Fat filesystem (a disk.img file) from 512 bytes to 1MB, I was able to increase this to 2.83 MB/s, and removing the O_DIRECT flag from mirage-block-unix, download speed increased to 15 MB/s (so this is with Linux caching the data in RAM).

To check the filesystem was the problem, I removed the F.read call (so it would return uninitialised data instead of the actual file contents). It then managed a very respectable 514 MB/s. Nothing wrong with the HTTP code then.

Streaming uploads

It all worked nicely running as a Unix process, so the next step was to deploy on Xen. I was hoping that most of the bugs would already have been found during the Unix testing, but in fact there were more lurking.

It worked for very small files, but when uploading larger files it quickly ran out of memory on my 64-bit x86 test system. I also tried it on my 32-bit CubieTruck ARM board, but that failed even sooner, with Invalid_argument("String.create") (on 32-bit platforms, OCaml strings are limited to 16 MB).

In both cases, the problem was that the cohttp library tried to read the entire upload in one go. I found the read function in Transfer_io:

<notextile><figure class="code">
1
2
3
4
5
6
7
8
let read ~len ic =
  (* TODO functorise string to a bigbuffer *)
  match len with
  |0 -> return Done
  |len ->
    read_exactly ic len >>= function
    |None -> return Done
    |Some buf -> return (Final_chunk buf)
</figure></notextile>

I changed it to use read rather than read_exactly (read returns whatever data is available, waiting only if there isn’t any at all):

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
let read ~remaining ic =
  (* TODO functorise string to a bigbuffer *)
  match !remaining with
  |0 -> return Done
  |len ->
    read ic len >>= fun buf ->
    remaining := !remaining - String.length buf;
    if !remaining = 0 then return (Final_chunk buf)
    else return (Chunk buf)
</figure></notextile>

I also had to change the signature to take a mutable reference (remaining) for the remaining data, otherwise it has no way to know when it’s done (patch).

Buffered writes

With the uploads now split into chunks, upload speed with --unix was 178 KB/s. Batching up the chunks (which were generally 4 KB each) into a 64 KB buffer increased the speed to 2083 KB/s. With a 1 MB buffer, I got 6386 KB/s.

Here’s the code I used:

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
let page_buffer = Io_page.get 256 |> Io_page.to_cstruct in

(* Set the first byte to N to indicate that we're not done yet.
 * If we reboot while this flag is set, the partial upload will
 * be deleted. *)
let page_buffer_used = ref 1 in
Cstruct.set_char page_buffer 0 'N';
let file_offset = ref 1 in

let flush_page_buffer () =
  Log.info "Flushing %d bytes to disk" !page_buffer_used >>= fun () ->
  let buffered_data = Cstruct.sub page_buffer 0 !page_buffer_used in
  F.write q.fs name !file_offset buffered_data >>|= fun () ->
  file_offset := !file_offset + !page_buffer_used;
  page_buffer_used := 0;
  return () in

let rec add_data src i =
  let src_remaining = String.length src - i in
  if src_remaining = 0 then return ()
  else (
    let page_buffer_free = Cstruct.len page_buffer - !page_buffer_used in
    let chunk_size = min page_buffer_free src_remaining in
    Cstruct.blit_from_string src i page_buffer !page_buffer_used chunk_size;
    page_buffer_used := !page_buffer_used + chunk_size;
    lwt () =
      if page_buffer_free = chunk_size then flush_page_buffer ()
      else return () in
    add_data src (i + chunk_size)
  ) in

data |> Lwt_stream.iter_s (fun data -> add_data data 0) >>=
flush_page_buffer
</figure></notextile>

Asking on the mailing list confirmed that Fat is not well optimised. This isn’t actually a problem for my service, since it’s still faster than my Internet connection, but there’s clearly more work needed here.

Upload speed on Xen

Testing on my little CubieTruck board, I then got:

Upload speed 74 KB/s
Download speed 1.6 KB/s

Hmm. To get a feel for what the board is capable of, I ran nc -l -p 8080 < /dev/zero on the board and nc cubietruck 8080 | pv > /dev/null on my laptop, getting 29 MB/s.

Still, my unikernel is running as a guest, meaning it has the overhead of using the virtual network interface (it has to pass the data to dom0, which then sends it over the real interface). So I installed a Linux guest and tried from there. 47.2 MB/s. Interesting. I have no idea why it’s faster than dom0!

I loaded up Wireshark to see what was happening with the unikernel transfers. The upload transfer mostly went fast, but stalled in the middle for 15 seconds and then for 12 seconds at the end. Wireshark showed that the unikernel was ack’ing the packets but reducing the TCP window size, indicating that the packets weren’t being processed by the application code. The delays corresponded to the times when we were flushing the data to the SD card, which makes sense. So, this looks like another filesystem problem (we should be able to write to the SD card much faster than this).

TCP retransmissions

For the download, Wireshark showed that many of the packets had incorrect TCP checksums and were having to be retransmitted. I was already familiar with this bug from a previous mailing list discussion: wireshark capture of failed download from mirage-www on ARM. That turned out be a Linux bug - the privileged dom0 code responsible for sending our virtual network packets to the real network becomes confused if two packets occupy the same physical page in memory.

Here’s what happens:

  1. We read 1 MB of data from the disk and send it to the HTTP layer as the next chunk.
  2. Chunked.write does the HTTP chunking and sends it to the TCP/IP channel.
  3. Channel.write_string writes the HTTP output into pages (aligned 4K blocks of memory).
  4. Pcb.writefn then determines that each page is too big for a TCP packet and splits each one into smaller chunks, sharing the single underlying page:
<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
13
  let rec writefn pcb wfn data =
    let len = Cstruct.len data in
    match write_available pcb with
    | 0 ->
      write_wait_for pcb 1 >>
      writefn pcb wfn data
    | av_len when av_len < len ->
      let first_bit = Cstruct.sub data 0 av_len in
      let remaing_bit = Cstruct.sub data av_len (len - av_len) in
      writefn pcb wfn first_bit  >>
      writefn pcb wfn remaing_bit
    | av_len ->
      wfn [data]
</figure></notextile>

My original fix changed mirage-net-xen to wait until the first buffer had been read before sending the second one. That fixed the retransmissions, but all the waiting meant I still only got 56 KB/s. Instead, I changed writefn to copy remaining_bit into a new IO page, and with that I got 495 KB/s.

Replacing the filesystem read with a simple String.create of the same length, I got 3.9 MB/s, showing that once again the FAT filesystem was now the limiting factor.

Adding a block cache

I tried adding a block cache layer between mirage-block-xen and fat-filesystem, like this:

<notextile><figure class="code">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
module Main (C : V1_LWT.CONSOLE)
            (B : V1_LWT.BLOCK)
	    (H : Cohttp_lwt.Server) = struct
  module BC = Block_cache.Make(B)
  module F = Fat.Fs.Make(BC)(Io_page)
  module Q = Upload_queue.Make(F)

  let mem_cache_size = 1024 * 1024	(* 1 MB *)

  let start c b http =
    Log.write := C.log_s c;
    Log.info "start in queue service" >>= fun () ->

    BC.connect (b, mem_cache_size) >>= function
    | `Error _ -> failwith "BC.connect"
    | `Ok bc ->
    F.connect bc >>= function
    | `Error _ -> failwith "F.connect"
    | `Ok fs ->
    Q.create fs >>= fun q ->
...
</figure></notextile>

With this in place, upload speed remains at 76 KB/s, but the download speed increases to 1 MB/s (for a 20 MB file, which therefore doesn’t fit in the cache). This suggests that the FAT filesystem is reading the same disk sectors many times. Enlarging the memory cache to cover the whole file, the download speed only increases to 1.3 MB/s, so the FAT code must be doing some inefficient calculations too.

Replacing FAT

Since most of my problems seemed to be coming from using FAT, I decided to try a new approach. I removed all the FAT code and the block cache and changed upload_queue.ml to write directly to the block device. With that (no caching), I get:

Upload speed 2.27 MB/s
Download speed 2.46 MB/s

That’s not too bad. It’s faster than my Internet connection, which means that the unikernel is no longer the limiting factor.

Here’s the new version: upload_queue.ml. The big simplification comes from knowing that the queue will spend most of its time empty (another good reason to use a small VM for it). The code has a next_free_sector which it advances every time an upload starts. When the queue becomes empty and there are no uploads in progress this variable is reset back to sector 1 (sector 0 holds the index). This does mean that we may report disk full errors to uploaders even when there is free space on the disk, but this won’t happen in typical usage because the repository downloads things as soon as they’re uploaded (if it does happen, it just means uploaders have to wait a couple of minutes until the repository empties the queue).

Managing the block device manually brought a few more advantages over FAT:

  1. No need to generate random file names for the uploads.
  2. No need to delete incomplete uploads (we only write the file’s index entry to disk on success).
  3. The system should recover automatically from filesystem corruption because invalid entries can be detected reliably at boot time and discarded.
  4. Disk full errors are reported correctly.
  5. The queue ordering isn’t lost on reboot.

Conclusions

Modern operating systems are often extremely complex, but much of this is historical baggage which isn’t needed on a modern system where you’re running a single application as a VM under a hypervisor. Mirage allows you to create very small VMs which contain almost no C code. These VMs should be easier to write, more reliable and more secure.

Creating a bootable OCaml kernel is surprisingly easy, and from there adding support for extra devices is just a matter of pulling in the appropriate libraries. By programming against generic interfaces, you can create code that runs under Linux/Unix/OS X or as a virtual machine under Xen, and switch between configurations using the mirage tool.

Mirage is still very young, and I found many rough edges while writing my queuing service for 0install:

  • While Linux provides fast, reliable filesystems as standard, Mirage currently only provides a basic FAT implementation.
  • Linux provides caching as standard, while you have to implement this yourself on Mirage.
  • Error reporting should be a big improvement over C’s error codes, but getting friendly error messages from Mirage is currently difficult.
  • The system has clearly been designed for high performance (the APIs generally write to user-provided buffers to avoid copying, much like C libraries do), but many areas have not yet been optimised.
  • Buffers often have extra requirements (e.g. must be page-aligned, a single page, immutable, etc) which are not currently captured in the type system, and this can lead to run-time errors which would ideally have been detected at compile time.

However, there is a huge amount of work happening on Mirage right now and it looks like all of these problems are being worked on. If you’re interested in low-level OS programming and don’t want to mess about with C, Mirage is a lot of fun, and it can be useful for practical tasks already with a bit of effort.

There are still many areas I need to find out more about. In particular, using the new pure-OCaml TLS stack to secure the system and trying the Irmin Git-like distributed, branchable storage to provide the queue instead of writing it myself. I hope to try those soon…

July 28, 2014 04:21 PM

Github OCaml jobs

Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong

Software Developer (Functional Programming)

Jane Street is looking to hire great software developers with an interest in functional programming. OCaml, a statically typed functional programming with similarities to Haskell, Scheme, Erlang, F# and SML, is our language of choice. We've got the largest team of OCaml developers in any industrial setting, and probably the world's largest OCaml codebase. We use OCaml for running our entire business, supporting everything from research to systems administration to trading systems. If you're interested in seeing how functional programming plays out in the real world, there's no better place.

The atmosphere is informal and intellectual. There is a focus on education, and people learn about software and trading, both through formal classes and on the job. The work is challenging, and you get to see the practical impact of your efforts in quick and dramatic terms. Jane Street is also small enough that people have the freedom to get involved in many different areas of the business. Compensation is highly competitive, and there's a lot of room for growth.

You can learn more about Jane Street and our technology from our main site, janestreet.com. You can also look at a a talk given at CMU about why Jane Street uses functional programming (http://ocaml.janestreet.com/?q=node/61), and our programming blog (http://ocaml.janestreet.com).

We also have extensive benefits, including:

  • 90% book reimbursement for work-related books
  • 90% tuition reimbursement for continuing education
  • Excellent, zero-premium medical and dental insurance
  • Free lunch delivered daily from a selection of restaurants
  • Catered breakfasts and fresh brewed Peet's coffee
  • An on-site, private gym in New York with towel service
  • Kitchens fully stocked with a variety of snack choices
  • Full company 401(k) match up to 6% of salary, vests immediately
  • Three weeks of paid vacation for new hires in the US
  • 16 weeks fully paid maternity/paternity leave for primary caregivers, plus additional unpaid leave

More information at http://janestreet.com/culture/benefits/

July 28, 2014 12:43 PM

July 27, 2014

OCamlCore Forge News

Release of OCaml-safepass 1.3

This version features a number of patches submitted by Jonathan Curran, bringing OCaml-safepass up-to-date with the latest Bcrypt sources and enabling GCC optimisations for compiling the C portions.

by Dario Teixeira at July 27, 2014 09:32 AM

July 26, 2014

Shayne Fletcher

Cartesian product (redux)

Cartesian product (redux)

In this blog post we looked at programs to compute Cartesian Products. One algorithm (given here in OCaml) if you recall is

let prod l r =
let g acc a =
let f acc x =
(a, x) :: acc
in List.fold_left f acc r
in List.fold_left g [] l |> List.rev

In that earlier post I translated the above program into C++. This time I'm doing the same straightforward translation but using the Boost.MPL library to compute such products at compile time. It looks like this:


#include <boost/mpl/pair.hpp>
#include <boost/mpl/vector.hpp>
#include <boost/mpl/push_front.hpp>
#include <boost/mpl/fold.hpp>
#include <boost/mpl/reverse.hpp>
#include <boost/mpl/placeholders.hpp>

using namespace boost::mpl::placeholders;

template <class L, class R>
struct product
{
struct g {
template <class AccT, class A>
struct apply {
typedef typename boost::mpl::fold <
R, AccT
, boost::mpl::push_front<_1, boost::mpl::pair<A, _2> >
>::type type;
};
};

typedef typename boost::mpl::reverse <
typename boost::mpl::fold <
L, boost::mpl::vector<>, g>::type>::type type;
};
The translation proceeds almost mechanically! Does it work though? You betcha! Here's a test we can convince ourselves with:

#include <boost/mpl/equal.hpp>
#include <boost/mpl/for_each.hpp>
#include <boost/mpl/int.hpp>

#include <iostream>

using namespace boost::mpl;

typedef vector<int_<1>, int_<2> > A;
typedef vector<int_<3>, int_<4>, int_<5> > B;
typedef product <A, B>::type result;

BOOST_MPL_ASSERT((
equal<
result
, vector<
pair<int_<1>, int_<3> >
, pair<int_<1>, int_<4> >
, pair<int_<1>, int_<5> >
, pair<int_<2>, int_<3> >
, pair<int_<2>, int_<4> >
, pair<int_<2>, int_<5> >
>
> ));

struct print_class_name {
template <typename T>
void operator()( T t ) const {
std::cout << typeid(t).name() << " ";
}
};

int main ()
{
boost::mpl::for_each<result>(print_class_name());

return 0;
}
The takeaway is, learning yourself some functional programming is a great way to improve your template meta-programming fu! (That of course should come as no surprise... )

by Shayne Fletcher (noreply@blogger.com) at July 26, 2014 02:50 PM

July 25, 2014

Mindy Preston

Doing Nothing in Mirage

It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.

We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.

[] dothraki@iBook:~$ mkdir trivial [] dothraki@iBook:~$ cd trivial/ [] dothraki@iBook:~/trivial$ ls -alh total 16K drwxrwxr-x 2 dothraki dothraki 4.0K Jul 23 13:17 . drwxr-xr-x 161 dothraki dothraki 12K Jul 23 13:17 .. [] dothraki@iBook:~/trivial$ mirage configure --xen [ERROR] No configuration file config.ml found. You'll need to create one to let Mirage know what to do.

Okay, we’ll have to do at least one thing to make this work. Mirage uses config.ml to programmatically generate a Makefile and main.ml when you invoke mirage --configure. main.ml uses instructions from config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.

[] dothraki@iBook:~/trivial$ touch config.ml [] dothraki@iBook:~/trivial$ mirage configure --xen Mirage Using scanned config file: config.ml Mirage Processing: /home/dothraki/trivial/config.ml Mirage => rm -rf /home/dothraki/trivial/_build/config.* Mirage => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs empty Using configuration: /home/dothraki/trivial/config.ml empty 0 jobs [] empty => ocamlfind printconf path empty Generating: main.ml empty Now run 'make depend' to install the package dependencies for this unikernel. [] dothraki@iBook:~/trivial$ ls _build config.ml empty.xl log main.ml Makefile

That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.

[] dothraki@iBook:~/trivial$ make depend opam install mirage-xen --verbose [NOTE] Package mirage-xen is already installed (current version is 1.1.1).

Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)

[] dothraki@iBook:~/trivial$ make ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml + ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml File "main.ml", line 8, characters 2-13: Error: Unbound module OS Command exited with code 2. make: *** [main.native.o] Error 10 [] dothraki@iBook:~/trivial$

Oh, bother.

Let’s have a look at the main.ml generated by mirage configure --xen with our empty config.ml.

``` ( Generated by Mirage (Wed, 23 Jul 2014 18:21:24 GMT). )

open Lwt

let _ = Printexc.record_backtrace true

let () = OS.Main.run (join []) ```

This looks like the right general idea – OS.Main.run to invoke a thread, join [] (from Lwt) to operate on an empty list of work to do. We have no work to do. It’s a nice day.

Unfortunately, we can’t accomplish our goal of doing nothing if we can’t do anything (in other words, we can’t run our zero threads if we have no idea how to run anything), so we’ll have to do a little more work first.

If we can figure out how to get OS.Main.run included, we’ll be off to the races. Let’s have a look at the Makefile, which is also programmatically generated by mirage configure and shows which libraries the invocation of make will link against.

[] dothraki@iBook:~/trivial$ grep ^LIBS Makefile LIBS = -pkgs lwt.syntax,mirage-types.lwt

Mirage applications will always get mirage-types.lwt and lwt.syntax, and might get more libraries included if they’re requested in config.ml or required by a driver requested there. Unfortunately, neither of these gets us OS.

Some snooping on utop with strace reveals that an implementation for OS lives in the mirage-xen and mirage-unix libraries. If we add mirage-xen to LIBS manually in the Makefile (or mirage-unix, if we generated the Makefile with mirage configure --unix) and then make, we do get something runnable. But that’s not good enough! We don’t want to manually edit files that a computer could generate for us! That sounds like work, and that’s not what today is all about.

One possible solution is to do is write a few lines into our config.ml to request that an appropriate library be included in LIBS. We could ask for mirage-unix directly, but that wouldn’t work when we want to do nothing in Xen, and mirage-xen has the same problem when we want to do nothing in Unix. Every time we wanted to do nothing on a different platform, we’d have to change our config.ml! No way.

Instead, let’s request a driver that imports the OS interface. Numerous drivers are available, but the simplest one is CONSOLE, which provides basic text output on a screen. Let’s pick that one.

In order to use CONSOLE, we’ll have to add some code to our config.ml, disturbing its state of complete relaxation. Minimally, we need a call to register, which is defined in mirage.ml:

ocaml val register: string -> job impl list -> unit (** [register name jobs] registers the application named by [name] which will executes the given [jobs]. *)

It appears we’ll have to define at least one job impl, where we’ll do the hard work of defining doing nothing, then register it with some name. Say, out_to_lunch. Mirage expects us to define the job in another file, then tell config.ml where to find it with foreign (source):

ocaml val foreign: string -> ?libraries:string list -> ?packages:string list -> 'a typ -> 'a impl (** [foreign name libs packs constr typ] states that the module named by [name] has the module type [typ]. If [libs] is set, add the given set of ocamlfind libraries to the ones loaded by default. If [packages] is set, add the given set of OPAM packages to the ones loaded by default. *)

So we’ll have to make another file, say maybe loaf.ml, with our top-secret and highly-valuable instructions for slacking off. (Before we get too high and mighty about all the instructions our project needs just to not do anything, remember that sometimes humans need help with this too.) loaf.ml needs to have at least one module specified that can take a CONSOLE module argument; traditionally unikernels call this module Main, but we can call it anything. Say, for example, Relax.

Modules in OCaml can’t be completely empty, and we’ll need a start function for the program to link, so we’ll just get all that hard work out of the way and define it right now. Since Mirage will want to pass information on how to use the specific instance of a console to the program, start will have to accept an argument of type console_impl. Luckily, there’s no requirement that we actually do anything of consequence in there, so we can finally define what it means to do nothing – Lwt.return (). A full explanation of Lwt is outside the scope of our laziness today, but there is great documentation both through the Mirage website and at the Ocsigen site for Lwt.

```ocaml module Relax (C: V1_LWT.CONSOLE) = struct let start console =

  Lwt.return ()

end ```

We define Relax as a module parameterized by a V1_LWT.CONSOLE module (which we have no plan to use, but config.ml won’t be able to make a sensible program for us unless we’re ready to accept it). start takes a console argument representing a particular implementation of a console. If we had any plans to output anything, we’d do it by calling a function with console as an argument, but we’re not going to do that. We’re just going to relax today.

Now that we know how to relax, we can point config.ml at this code with foreign, so that we’ll eventually be able to register it. The code we want to run is in the file loaf.ml, in a module called Relax, so our first argument to foreign will be "Loaf.Relax". (This works so simply because config.ml and loaf.ml are in the same directory – if you structure things differently, you’ll have to work harder, so I recommend against it.)

Our second argument to foreign, according to the type signature, should be a 'a typ, and if we do so, the function will return a 'a impl. In order to call register, we need at least one job impl to put in the list, so we need to somehow make 'a be job in our call to foreign. Mirage provides a job impl called, simply, job.

We can’t just pass the results of foreign "Loaf.Relax" job off to register, though, because if we do that, mirage configure won’t know that we wanted a console, and we’ll still have no libraries that know about OS.Main.run in our Makefile. What we really need is a (console -> job) typ, so we can get a (console -> job) impl, and then pass it a console impl to get a job impl out.

Mirage provides two combinators for making this work – @-> for composing a 'a typ and 'b typ into a ('a -> 'b) typ, and a combinator $ for applying a ('a -> 'b) impl to a 'a impl and getting a 'b impl back. We can use foreign "Loaf.Relax" (console @-> job) to get a (console -> job) typ, let-bind this value to a variable (say, relax), and use this along with a Mirage-provided console impl as an argument to register.

```ocaml open Mirage

let () =
let relax = foreign “Loaf.Relax” (console @–> job) in
register “out_to_lunch” [ relax $ default_console ] ```

A mirage configure --xen and make get us a small, very lazy unikernel named mir-out_to_lunch.xen and an automatically generated configuration file for Xen named out_to_lunch.xl. (Both names are taken from the string we pass as the first argument to register.) If we start it up this unikernel with sudo xm create -c out_to_lunch.xl, we get the following (somewhat disappointing) output:

[] dothraki@iBook:~/trivial$ sudo xm create -c out_to_lunch.xl [sudo] password for dothraki: Using config file "./out_to_lunch.xl". Started domain out_to_lunch (id=2) Domain has already finished Could not start console

Our unikernel finishes so quickly that Xen can’t attach a console to it. If we start it paused and then unpause it after our console attaches, we can watch it boot:

[] dothraki@iBook:~/trivial$ sudo xm create -c out_to_lunch.xl -p Using config file "./out_to_lunch.xl". Started domain out_to_lunch (id=3)

and, in another window, sudo xm unpause out_to_lunch, to let out_to_lunch start booting. We then see the output sent to the console by Mirage as it’s booting, before control is handed over to our user program – in this case, return (), which executes immediately, returns from the main program, and informs Xen that the virtual machine is shutting down.

Started domain out_to_lunch (id=3) kernel.c: Mirage OS! kernel.c: start_info: 0x11c1000(VA) kernel.c: nr_pages: 0x10000 ... x86_mm.c: Demand map pfns at 10001000-2010001000. Initialising timer interface main returned 0 [] dothraki@iBook:~/trivial$

Doing Nothing with Different Drivers

We’ve succesfully done nothing with this simple config.ml:

```ocaml open Mirage

let () = let relax = foreign “Loaf.Relax” (console @–> job) in register “out_to_lunch” [ relax $ default_console ] ```

What if we want to do nothing over the network? Or do nothing with a filesystem? Or do nothing with all of these things?

Including support for a driver requires us to have two things: a typ representation to include in the arguments to foreign, and an impl to include in the list of arguments to register. The available typs and impls are defined in mirage.mli; some typs have more than one corresponding impl, or have parameterized constructors for impls.

For example, there are two ways to get a network impl, the impl representing raw device-level access to a network card:

  • val tap0: network impl
  • val netif: string -> network impl

tap0 just returns the first available network interface; netif takes a string argument and attempts to find a matching network interface, then makes that interface available in the returned value. The same network typ applies for both:

``` open Mirage

let () = let relax = foreign “Loaf.Relax” (network @–> network @–> job) in register “out_to_lunch” [ relax $ tap0 $ netif “1” ] ```

This config.ml will build against a loaf.ml that has a Relax module parameterized by two V1_LWT.NETWORK modules, and a start function that expects two network impl arguments:

``` module Relax (IGNORED: V1_LWT.NETWORK) (ALSO_IGNORED: V1_LWT.NETWORK) = struct let start default string_parameterized =

Lwt.return ()

end ```

(To actually run this in Xen, we need to make a couple alterations to the autogenerated out_to_lunch.xl so that two network interfaces are actually provided, but this can be done once and saved off somewhere so as not to be overwritten by subsequent rebuilds.)

We can do nothing with multiple kinds of drivers, too:

``` open Mirage

let () =
let relax = foreign “Loaf.Relax” (network @–> console @–> job) in
register “out_to_lunch” [ relax $ tap0 $ default_console ]
```

If we modify loaf.ml have a Relax module parameterized by a V1_LWT.NETWORK and V1_LWT.CONSOLE, and provide a start with network impl and console impl parameters, we’re off to the races.

For The Ambitious

The start function in Loaf.Relax can do more than just immediately return, of course. Programmers with ambition, gumption, and Tasks To Accomplish can define programs that Get Things Done, and launch them from start. Programs that plan to use, say, a network interface, can call functions provided by V1_LWT.NETWORK on the network impl provided to start to generate and receive network traffic; console programs can write to their provided console impl. The “Hello Mirage World” tutorial gives great examples and instructions for making things happen – perfect for a rainy day.

Today’s more of a return () type of day, though.

July 25, 2014 06:30 PM