Dependent types are a very useful feature - the gold standard of enforcing invariants at compile time. However, they are still very much not practical, especially considering inference for unrestricted dependent types is equivalent to higher-order unification, which was proven to be undecidable.

Fortunately, many of the benefits that dependent types bring aren’t because of dependent products themselves, but instead because of associated features commonly present in those programming languages. One of these, which also happens to be especially easy to mimic, are *inductive families*, a generalisation of inductive data types: instead of defining a single type inductively, one defines an entire *family* of related types.

Many use cases for inductive families are actually instances of a rather less general concept, that of generalised algebraic data types, or GADTs: Contrary to the indexed data types of full dependently typed languages, these can and are implemented in several languages with extensive inference, such as Haskell, OCaml and, now, Amulet.

Before I can talk about their implementation, I am legally obligated to present the example of *length indexed vectors*, linked structures whose size is known at compile time—instead of carrying around an integer representing the number of elements, it is represented in the type-level by a Peano^{1} natural number, as an *index* to the vector type. By universally quantifying over the index, we can guarantee by parametricity^{2} that functions operating on these don’t do inappropriate things to the sizes of vectors.

```
type z ;;
type s 'k ;;
type vect 'n 'a =
| Nil : vect z 'a
| Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
```

Since the argument `'n`

to `vect`

(its length) varies with the constructor one chooses, we call it an *index*; On the other hand, `'a`

, being uniform over all constructors, is called a *parameter* (because the type is *parametric* over the choice of `'a`

). These definitions bake the measure of length into the type of vectors: an empty vector has length 0, and adding an element to the front of some other vector increases the length by 1.

Matching on a vector reveals its index: in the `Nil`

case, it’s possible to (locally) conclude that it had length `z`

. Meanwhile, the `Cons`

case lets us infer that the length was the successor of some other natural number, `s 'k`

, and that the tail itself has length `'k`

.

If one were to write a function to `map`

a function over a `vect`

or, they would be bound by the type system to write a correct implementation - well, either that or going out of their way to make a bogus one. It would be possible to enforce total correctness of a function such as this one, by adding linear types and making the vector parameter linear.

```
let map (f : 'a -> 'b) (xs : vect 'n 'a) : vect 'n 'b =
match xs with
| Nil -> Nil
| Cons (x, xs) -> Cons (f x, map f xs) ;;
```

If we were to, say, duplicate every element in the list, an error would be reported. Unlike some others, this one is not very clear, and it definitely could be improved.

```
Occurs check: The type variable jx
occurs in the type s 'jx
· Arising from use of the expression
Cons (f x, Cons (f x, map f xs))
│
33 │ | Cons (x, xs) -> Cons (f x, Cons (f x, map f xs)) ;;
│ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This highlights the essence of GADTs: pattern matching on them reveals equalities about types that the solver can later exploit. This is what allows the programmer to write functions that vary their return types based on their inputs - a very limited form of type-term dependency, which brings us ever closer to the Calculus of Constructions corner of Barendregt’s lambda cube^{3}.

The addition of generalised algebraic data types has been in planning for over two years—it was in the original design document. In a mission that not even my collaborator noticed, all of the recently-added type system and IR features were directed towards enabling the GADT work: bidirectional type checking, rank-N polymorphism and coercions.

All of these features had cover stories: higher-ranked polymorphism was motivated by monadic regions; bidirectional type checking was motivated by the aforementioned polymorphism; and coercions were motivated by newtype optimisation. But, in reality, it was a conspiracy to make GADTs possible: having support for these features simplified implementing our most recent form of fancy types, and while adding all of these in one go would be possible, doing it incrementally was a lot saner.

While neither higher-ranked types nor GADTs technically demand a bidirectional type system, implementing them with such a specification is considerably easier, removing the need for workarounds such as boxy types and a distinction between rigid/wobbly type variables. Our algorithm for GADT inference rather resembles Richard Eisenberg’s Bake^{4}, in that it only uses local equalities in *checking* mode.

Adding GADTs also lead directly to a rewrite of the solver, which now has to work with *implication constraints*, of the form `(Q₁, ..., Qₙ) => Q`

, which should be read as “Assuming `Q₁`

through `Qₙ`

, conclude `Q`

.” Pattern matching on generalised constructors, in checking mode, captures every constraint generated by checking the right-hand side of a clause and captures that as an implication constraint, with all the constructor-bound equalities as assumptions. As an example, this lets us write a type-safe cast function:

```
type eq 'a 'b = Refl : eq 'a 'a
(* an inhabitant of eq 'a 'b is a proof that 'a and 'b are equal *)
let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
```

Unfortunately, to keep inference decidable, many functions that depend on generalised pattern matching need explicit type annotations, to guide the type checker.

When *checking* the body of the function, namely the variable reference `x`

, the solver is working under an assumption `'a ~ 'b`

(i.e., `'a`

and `'b`

stand for the same type), which lets us unify the stated type of `x`

, namely `'a`

, with the return type of the function, `'b`

.

If we remove the local assumption, say, by not matching on `Refl`

, the solver will not be allowed to unify the two type variables `'a`

and `'b`

, and an error message will be reported^{5}:

```
examples/gadt/equality.ml[11:43 ..11:43]: error
Can not unify rigid type variable b with the rigid type variable a
· Note: the variable b was rigidified because of a type ascription
against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
and is represented by the constant bq
· Note: the rigid type variable a, in turn,
was rigidified because of a type ascription
against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
· Arising from use of the expression
x
│
11 │ let subst (_ : t 'a 'b) (x : 'a) : 'b = x ;;
│ ~
```

Our intermediate language was also extended, from a straightforward System F-like lambda calculus with type abstractions and applications, to a System F_{C}-like system with *coercions*, *casts*, and *coercion abstraction*. Coercions are the evidence, produced by the solver, that an expression is usable as a given type—GADT patterns bind coercions like these, which are the “reification” of an implication constraint. This lets us make type-checking on the intermediate language fast and decidable^{6}, as a useful sanity check.

The two new judgements for GADT inference correspond directly to new cases in the `infer`

and `check`

functions, the latter of which I present here for completeness. The simplicity of this change serves as concrete evidence of the claim that bidirectional systems extend readily to new, complex features, producing maintainable and readable code.

```
check (Match t ps a) ty = do
(t, tt) <- infer t
ps <- for ps $ \(p, e) -> do
(p', ms, cs) <- checkPattern p tt
let tvs = Set.map unTvName (boundTvs p' ms)
(p',) <$> implies (Arm p e) tt cs
(local (typeVars %~ Set.union tvs)
(extendMany ms (check e ty)))
pure (Match t ps (a, ty))
```

This corresponds to the checking judgement for matches, presented below. Note that in my (rather informal) theoretical presentation of Amulet typing judgements, we present implication constraints as a lexical scope of equalities conjoined with the scope of variables; Inference judgements (with double right arrows, \(\Rightarrow\)) correspond to uses of `infer`

, pattern checking judgements (\(\Leftarrow_\text{pat}\)) correspond to `checkPattern`

, which also doubles as \(\mathtt{binds}\) and \(\mathtt{cons}\), and the main checking judgement \(\Leftarrow\) is the function `check`

.

\[ \frac{\Gamma; \mathscr{Q} \vdash e \Rightarrow \tau \quad \Gamma \vdash p_i \Leftarrow_\text{pat} \tau \quad \Gamma, \mathtt{binds}(p_i); \mathscr{Q}, \mathtt{cons}(p_i) \vdash e_i \Leftarrow \sigma} {\Gamma; \mathscr{Q} \vdash \mathtt{match}\ e\ \mathtt{with}\ \{p_i \to e_i\} \Leftarrow \sigma} \]

Our implementation of the type checker is a bit more complex, because it also does (some) elaboration and bookkeeping: tagging terms with types, blaming type errors correctly, etc.

This new, complicated feature was a lot harder to implement than originally expected, but in the end it worked out. GADTs let us make the type system *stronger*, while maintaining the decidable inference that the non-fancy subset of the language enjoys.

The example presented here was the most boring one possible, mostly because two weeks ago I wrote about their impact on the language’s ability to make things safer.

Peano naturals are one particular formulation of the natural numbers, which postulates that zero (denoted

`z`

above) is a natural number, and any natural number’s successor (denoted`s 'k`

above) is itself natural.↩This is one application of Philip Wadler’s Theorems for Free technique: given a (polymorphic) type of some function, we can derive much of its behaviour.↩

Amulet is currently somewhere on the edge between λ2 - the second order lambda calculus, System F, and λP2, a system that allows quantification over types and terms using the dependent product form, which subsumes both the ∀ binder and the → arrow. Our lack of type functions currently leaves us very far from the CoC.↩

See his thesis. Our algorithm, of course, has the huge simplification of not having to deal with full dependent types.↩

And quite a good one, if I do say so! The compiler syntax highlights and pretty-prints both terms and types relevant to the error, as you can see here.↩

Even if we don’t do it yet—work is still ongoing to make the type checker and solver sane.↩

Ever since its inception, Amulet has strived to be a language that *guarantees* safety, to some extent, with its strong, static, inferred type system. Through polymorphism we gain the concept of *parametricity*, as explained in Philip Wadler’s Theorems for Free: a function’s behaviour does not depend on the instantiations you perform.

However, the power-to-weight ratio of these features quickly plummets, as every complicated type system extension makes inference rather undecidable, which in turn mandates more and more type annotations. Of the complex extensions I have read about, three struck me as particularly elegant, and I have chosen to implement them all in Amulet:

- Generalised Algebraic Data Types, which this post is about;
- Row Polymorphism, which allows being precise about which structure fields a function uses; and
- Rank-N types, which enables the implementation of many concepts including monadic regions.

Both GADTs and rank-N types are in the “high weight” category: inference in the presence of both is undecidable. Adding support for the latter (which laid the foundations for the former) is what drove me to re-write the type checker, a crusade detailed in my last post.

Of course, in the grand scheme of things, some languages provide way more guarantees than Amulet: For instance, Rust, with its lifetime system, can prove that code is memory-safe at compile time; Dependently-typed languages such as Agda and Idris can express a lot of invariants in their type system, but inference is completely destroyed. Picking which features you’d like to support is a game of tradeoffs—all of them have benefits, but some have exceedingly high costs.

Amulet was originally based on a very traditional, HM-like type system with support for row polymorphism. The addition of rank-N polymorphism and GADTs instigated the move to a bidirectional system, which in turn provided us with the ability to experiment with a lot more type system extensions (for instance, linear types)—in pursuit of more guarantees like parametricity.

In a sense, generalised ADTs are a “miniature” version of the inductive families one would find in dependently-typed programming (and, indeed, Amulet can type-check *some* uses of length-indexed vectors, although the lack of type-level computation is a showstopper). They allow non-uniformity in the return types of constructors, by packaging “coercions” along with the values; pattern matching, then, allows these coercions to influence the solving of particular branches.

Since this is an introduction to indexed types, I am legally obligated to present the following three examples: the type of equality witnesses between two other types; higher-order abstract syntax, the type of well-formed terms in some language; and *vectors*, the type of linked lists with statically-known lengths.

As is tradition in intuitionistic type theory, we define equality by postulating (that is, introducing a *constructor* witnessing) reflexivity: anything is equal to itself. Symmetry and transitivity can be defined as ordinary pattern-matching functions. However, this demonstrates the first (and main) shortcoming of our implementation: Functions which perform pattern matching on generalised constructors *must* have explicitly stated types.^{1}

```
type eq 'a 'b =
| Refl : eq 'a 'a ;;
let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;;
let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;;
```

Equality, when implemented like this, is conventionally used to implement substitution: If there exists a proof that `a`

and `b`

are equal, any `a`

may be treated as a `b`

.

Despite `a`

and `b`

being distinct, *rigid* type variables, matching on `Refl`

allows the constraint solver to treat them as equal.

```
type z ;; (* the natural zero *)
type s 'k ;; (* the successor of a number *)
type vect 'n 'a = (* vectors of length n *)
| Nil : vect z 'a
| Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
```

Parametricity can tell us many useful things about functions. For instance, all closed, non-looping inhabitants of the type `forall 'a. 'a -> 'a`

are operationally the identity function. However, expanding the type grammar tends to *weaken* parametricity before making it stronger. Consider the type `forall 'a. list 'a -> list 'a`

—it has several possible implementations: One could return the list unchanged, return the empty list, duplicate every element in the list, drop some elements around the middle, among *many* other possible behaviours.

Indexed types are beyond the point of weakening parametricity, and start to make it strong again. Consider a function of type `forall 'a 'n. ('a -> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a`

—by making the length of the vector explicit in the type, and requiring it to be kept the same, we have ruled out any implementations that drop or duplicate elements. A win, for sure, but at what cost? An implementation of insertion sort for traditional lists looks like this, when implemented in Amulet:

```
let insert_sort cmp l =
let insert e tl =
match tl with
| Nil -> Cons (e, Nil)
| Cons (h, t) -> match cmp e h with
| Lt -> Cons (e, Cons (h, t))
| Gt -> Cons (h, insert e t)
| Eq -> Cons (e, Cons (h, t))
and go l = match l with
| Nil -> Nil
| Cons (x, xs) -> insert x (go xs)
in go l ;;
```

The implementation for vectors, on the other hand, is full of *noise*: type signatures which we would rather not write, but are forced to by the nature of type systems.

```
let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a =
let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a =
match tl with
| Nil -> Cons (e, Nil)
| Cons (h, t) -> match cmp e h with
| Lt -> Cons (e, Cons (h, t))
| Gt -> Cons (h, insert e t)
| Eq -> Cons (e, Cons (h, t))
and go (v : vect 'k 'a) : vect 'k 'a = match v with
| Nil -> Nil
| Cons (x, xs) -> insert x (go xs)
in go v ;;
```

These are not quite theorems for free, but they are theorems for quite cheap.

```
type term 'a =
| Lit : int -> term int
| Fun : ('a -> 'b) -> term ('a -> 'b)
| App : term ('a -> 'b) * term 'a -> term 'b
```

In much the same way as the vector example, which forced us to be correct with our functions, GADTs can also be applied in making us be correct with our *data*. The type `term 'a`

represents well typed terms: the interpretation of such a value need not be concerned with runtime errors at all, by leveraging the Amulet type system to make sure its inputs are correct.

```
let eval (x : term 'a) : 'a =
match x with
| Lit l -> l
| Fun f -> f
| App (f, x) -> (eval f) (eval x)
```

While equalities let us bend the type system to our will, vectors and terms let *the type system* help us, in making incorrect implementations compile errors.

Rank-N types are quite useful, I’m sure. To be quite honest, they were mostly implemented in preparation for GADTs, as the features have some overlap.

A use case one might imagine if Amulet had notation for monads would be an implementation of the ST monad^{2}, which prevents mutable state from escaping by use of rank-N types. `St.run action`

is a well-typed program, since `action`

has type `forall 's. st 's int`

, but `St.run action'`

is not, since that has type `forall 's. st 's (ref 's int)`

.

```
let action =
St.bind (alloc_ref 123) (fun var ->
St.bind (update_ref var (fun x -> x * 2)) (fun () ->
read_ref var))
and action' =
St.bind (alloc_ref 123) (fun var ->
St.bind (update_ref var (fun x -> x * 2)) (fun () ->
St.pure var))
```

Types are very powerful things. A powerful type system helps guide the programmer by allowing the compiler to infer more and more of the *program*—type class dictionaries in Haskell, and as a more extreme example, proof search in Agda and Idris.

However, since the industry has long been dominated by painfully first-order, very verbose type systems like those of Java and C#, it’s no surprise that many programmers have fled to dynamically typed languages like ~~Go~~ Python—a type system needs to be fairly complex before it gets to being expressive, and it needs to be *very* complex to get to the point of being useful.

Complexity and difficulty, while often present together, are not nescessarily interdependent: Take, for instance, Standard ML. The first-order parametric types might seem restrictive when used to a system with like Haskell’s (or, to some extent, Amulet’s^{3}), but they actually allow a lot of flexibility, and do not need many annotations at all! They are a sweet spot in the design space.

If I knew more about statistics, I’d have some charts here correlating programmer effort with implementor effort, and also programmer effort with the extent of properties one can state as types. Of course, these are all fuzzy metrics, and no amount of statistics would make those charts accurate, so have my feelings in prose instead:

Implementing a dynamic type system is

*literally*no effort. No effort needs to be spent writing an inference engine, or a constraint solver, or a renamer, or any other of the very complex moving parts of a type checker.However, the freedom they allow the implementor they take away from the programmer, by forcing them to keep track of the types of everything mentally. Even those that swear by dynamic types can not refute the claim that data has shape, and having a compiler that can make sure your shapes line up so you can focus on programming is a definite advantage.

On the opposite end of the spectrum, implementing a dependent type system is a

*lot*of effort. Things quickly diverge into undecidability before you even get to writing a solver—and higher order unification, which has a tendency to pop up, is undecidable too.While the implementor is subject to an endless stream of suffering, the programmer is in some ways free and some ways constrained. They can now express lots of invariants in the type system, from correctness of

`sort`

to correctness of an entire compiler or an operating system kernel, but they must also state very precise types for everything.In the middle lies a land of convenient programming without an endlessly suffering compiler author, a land first explored by the ML family with its polymorphic, inferred type system.

This is clearly the sweet spot. Amulet leans slightly to the dependently type end of the spectrum, but can still infer the types for many simple and complex programs without any annotations-the programs that do not use generalised algebraic data types or rank-N polymorphism.

In reality, the details are fuzzier. To be precise, pattern matching on GADTs only introduces an implication constraint when the type checker is applying a checking judgement. In practice, this means that at least the return type must be explicitly annotated.↩

Be warned that the example does not compile unless you remove the modules, since our renamer is currently a bit daft.↩

This is

*my*blog, and I’m allowed to brag about my projects, damn it.↩

In the last post about Amulet I wrote about rewriting the type checking code. And, to everybody’s surprise (including myself), I actually did it.

Like all good programming languages, Amulet has a strong, static type system. What most other languages do not have, however, is (mostly) *full type inference*: programs are still type-checked despite (mostly) having no type annotations.

Unfortunately, no pratical type system has truly “full type inference”: features like data-type declarations, integral to actually writing software, mandate some type annotations (in this case, constructor arguments). However, that doesn’t mean we can’t try.

The new type checker, based on a constraint-generating but *bidirectional* approach, can type a lot more programs than the older, Algorithm W-derived, quite buggy checker. As an example, consider the following definition. For this to check under the old type system, one would need to annotate both arguments to `map`

*and* its return type - clearly undesirable!

```
let map f =
let go cont xs =
match xs with
| Nil -> cont Nil
| Cons (h, t) -> go (compose cont (fun x -> Cons (f h, x))) t
in go id ;;
```

Even more egregious is that the η-reduction of `map`

would lead to an ill-typed program.

```
let map f xs =
let go cont xs = (* elided *)
in go id xs ;;
(* map : forall 'a 'b. ('a -> 'b) -> list 'a -> list 'b *)
let map' f =
let go cont xs = (* elided *)
in go id ;;
(* map' : forall 'a 'b 'c. ('a -> 'b) -> list 'a -> list 'c *)
```

Having declared this unacceptable, I set out to rewrite the type checker, after months of procrastination. As is the case, of course, with such things, it only took some two hours, and I really shouldn’t have procrastinated it for so long.

Perhaps more importantly, the new type checker also supports rank-N polymorphism directly, with all appropriate checks in place: expressions checked against a polymorphic type are, in reality, checked against a *deeply skolemised* version of that poly-type - this lets us enforce two key properties:

- the expression being checked
*is*actually parametric over the type arguments, i.e., it can’t unify the skolem constants with any type constructors, and - no rank-N arguments escape.

As an example, consider the following function:

Well-typed uses of this function are limited to applying it to the identity function, as parametricity tells us; and, indeed, trying to apply it to e.g. `fun x -> x + 1`

is a type error.

As before, type checking is done by a traversal of the syntax tree which, by use of a `Writer`

monad, produces a list of constraints to be solved. Note that a *list* really is needed: a set, or similar data structure with unspecified order, will not do. The order in which the solver processes constraints is important!

The support for rank-N types has lead to the solver needing to know about a new kind of constraint: *subsumption* constraints, in addition to *unification* constraints. Subsumption is perhaps too fancy a term, used to obscure what’s really going on: subtyping. However, whilst languages like Java and Scala introduce subtyping by means of inheritance, our subtyping boils down to eliminating ∀s.

∀s are eliminated from the right-hand-side of subsumption constraints by *deep skolemisation*: replacing the quantified variables in the type with fresh type constants. The “depth” of skolemisation refers to the fact that ∀s to the right of arrows are eliminated along with the ones at top-level.

```
subsumes k t1 t2@TyForall{} = do
t2' <- skolemise t2
subsumes k t1 t2'
subsumes k t1@TyForall{} t2 = do
(_, _, t1') <- instantiate t1
subsumes k t1' t2
subsumes k a b = k a b
```

The function for computing subtyping is parametric over what to do in the case of two monomorphic types: when this function is actually used by the solving algorithm, it’s applied to `unify`

.

The unifier has the job of traversing two types in tandem to find the *most general unifier*: a substitution that, when applied to one type, will make it syntatically equal to the other. In most of the type checker, when two types need to be “equal”, they’re equal up to unification.

Most of the cases are an entirely boring traversal, so here are the interesting ones.

- Skolem type constants only unify with other skolem type constants:

```
unify TySkol{} TySkol{} = pure ()
unify t@TySkol{} b = throwError $ SkolBinding t b
unify b t@TySkol{} = throwError $ SkolBinding t b
```

- Type variables extend the substitution:

- Polymorphic types unify up to α-renaming:

```
unify t@(TyForall vs ty) t'@(TyForall vs' ty')
| length vs /= length vs' = throwError (NotEqual t t')
| otherwise = do
fvs <- replicateM (length vs) freshTV
let subst = Map.fromList . flip zip fvs
unify (apply (subst vs) ty) (apply (subst vs') ty')
```

When binding a variable to a concrete type, an *occurs check* is performed to make sure the substitution isn’t going to end up containing an infinite type. Consider binding `'a := list 'a`

: If `'a`

is substituted for `list 'a`

everywhere, the result would be `list (list 'a)`

- but wait, `'a`

appears there, so it’d be substituted again, ad infinitum.

Extra care is also needed when binding a variable to itself, as is the case with `'a ~ 'a`

. These constraints are trivially discharged, but adding them to the substitution would mean an infinite loop!

```
occurs :: Var Typed -> Type Typed -> Bool
occurs _ (TyVar _) = False
occurs x e = x `Set.member` ftv e
```

If the variable has already been bound, the new type is unified with the one present in the substitution being accumulated. Otherwise, it is added to the substitution.

```
bind :: Var Typed -> Type Typed -> SolveM ()
bind var ty
| occurs var ty = throwError (Occurs var ty)
| TyVar var == ty = pure ()
| otherwise = do
env <- get
-- Attempt to extend the environment, otherwise
-- unify with existing type
case Map.lookup var env of
Nothing -> put (Map.singleton var (normType ty) `compose` env)
Just ty'
| ty' == ty -> pure ()
| otherwise -> unify (normType ty) (normType ty')
```

Running the solver, then, amounts to folding through the constraints in order, applying the substitution created at each step to the remaining constraints while also accumulating it to end up at the most general unifier.

```
solve :: Int -> Subst Typed
-> [Constraint Typed]
-> Either TypeError (Subst Typed)
solve _ s [] = pure s
solve i s (ConUnify e a t:xs) = do
case runSolve i s (unify (normType a) (normType t)) of
Left err -> Left (ArisingFrom err e)
Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
solve i s (ConSubsume e a b:xs) =
case runSolve i s (subsumes unify (normType a) (normType b)) of
Left err -> Left (ArisingFrom err e)
Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
```

Amulet, being a member of the ML family, does most data processing through *pattern matching*, and so, the patterns also need to be type checked.

The pattern grammar is simple: it’s made up of 6 constructors, while expressions are described by over twenty constructors.

Here, the bidirectional approach to inference starts to shine. It is possible to have different behaviours for when the type of the pattern (or, at least, some skeleton describing that type) is known and for when it is not, and such a type must be produced from the pattern alone.

In an unification-based system like ours, the inference judgement can be recovered from the checking judgement by checking against a fresh type variable.

Inferring patterns produces three things: an annotated pattern, since syntax trees after type checking carry their types; the type of values that pattern matches; and a list of variables the pattern binds. Checking omits returning the type, and yields only the annotated syntax tree and the list of bindings.

As a special case, inferring patterns with type signatures overrides the checking behaviour. The stated type is kind-checked (to verify its integrity and to produce an annotated tree), then verified to be a subtype of the inferred type for that pattern.

```
inferPattern pat@(PType p t ann) = do
(p', pt, vs) <- inferPattern p
(t', _) <- resolveKind t
_ <- subsumes pat t' pt -- t' ≤ pt
case p' of
Capture v _ -> pure (PType p' t' (ann, t'), t', [(v, t')])
_ -> pure (PType p' t' (ann, t'), t', vs)
```

Checking patterns is where the fun actually happens. Checking `Wildcard`

s and `Capture`

s is pretty much identical, except the latter actually expands the capture list.

```
checkPattern (Wildcard ann) ty = pure (Wildcard (ann, ty), [])
checkPattern (Capture v ann) ty =
pure (Capture (TvName v) (ann, ty), [(TvName v, ty)])
```

Checking a `Destructure`

looks up the type of the constructor in the environment, possibly instancing it, and does one of two things, depending on whether or not the destructuring did not have an inner pattern.

- If there was no inner pattern, then the looked-up type is unified with the “goal” type - the one being checked against.

```
Nothing -> do
pty <- lookupTy con
_ <- unify ex pty ty
pure (Destructure (TvName con) Nothing (ann, pty), [])
```

- If there
*was*an inner pattern, we proceed by decomposing the type looked up from the environment. The inner pattern is checked against the*domain*of the constructor’s type, while the “goal” gets unified with the*co-domain*.

```
Just p -> do
(c, d) <- decompose ex _TyArr =<< lookupTy con
(ps', b) <- checkPattern p c
_ <- unify ex ty d
```

Checking tuple patterns is a bit of a mess. This is because of a mismatch between how they’re written and how they’re typed: a 3-tuple pattern (and expression!) is written like `(a, b, c)`

, but it’s *typed* like `a * (b * c)`

. There is a local helper that incrementally converts between the representations by repeatedly decomposing the goal type.

```
checkPattern pt@(PTuple elems ann) ty =
let go [x] t = (:[]) <$> checkPattern x t
go (x:xs) t = do
(left, right) <- decompose pt _TyTuple t
(:) <$> checkPattern x left <*> go xs right
go [] _ = error "malformed tuple in checkPattern"
```

Even more fun is the `PTuple`

constructor is woefully overloaded: One with an empty list of children represents matching against `unit`

. One with a single child is equivalent to the contained pattern; Only one with more than two contained patterns makes a proper tuple.

```
in case elems of
[] -> do
_ <- unify pt ty tyUnit
pure (PTuple [] (ann, tyUnit), [])
[x] -> checkPattern x ty
xs -> do
(ps, concat -> binds) <- unzip <$> go xs ty
pure (PTuple ps (ann, ty), binds)
```

Expressions are incredibly awful and the bane of my existence. There are 18 distinct cases of expression to consider, a number which only seems to be going up with modules and the like in the pipeline; this translates to 24 distinct cases in the type checker to account for all of the possibilities.

As with patterns, expression checking is bidirectional; and, again, there are a lot more checking cases then there are inference cases. So, let’s start with the latter.

Inferring variable references makes use of instantiation to generate fresh type variables for each top-level universal quantifier in the type. These fresh variables will then be either bound to something by the solver or universally quantified over in case they escape.

Since Amulet is desugared into a core language resembling predicative System F, variable uses also lead to the generation of corresponding type applications - one for each eliminated quantified variable.

```
infer expr@(VarRef k a) = do
(inst, old, new) <- lookupTy' k
if Map.null inst
then pure (VarRef (TvName k) (a, new), new)
else mkTyApps expr inst old new
```

Functions, strangely enough, have both checking *and* inference judgements: which is used impacts what constraints will be generated, and that may end up making type inference more efficient (by allocating less, or correspondingly spending less time in the solver).

The pattern inference judgement is used to compute the type and bindings of the function’s formal parameter, and the body is inferred in the context extended with those bindings; Then, a function type is assembled.

```
infer (Fun p e an) = do
(p', dom, ms) <- inferPattern p
(e', cod) <- extendMany ms $ infer e
pure (Fun p' e' (an, TyArr dom cod), TyArr dom cod)
```

Literals are pretty self-explanatory: Figuring their types boils down to pattern matching.

```
infer (Literal l an) = pure (Literal l (an, ty), ty) where
ty = case l of
LiInt{} -> tyInt
LiStr{} -> tyString
LiBool{} -> tyBool
LiUnit{} -> tyUnit
```

The inference judgement for *expressions* with type signatures is very similar to the one for patterns with type signatures: The type is kind-checked, then compared against the inferred type for that expression. Since expression syntax trees also need to be annotated, they are `correct`

ed here.

```
infer expr@(Ascription e ty an) = do
(ty', _) <- resolveKind ty
(e', et) <- infer e
_ <- subsumes expr ty' et
pure (Ascription (correct ty' e') ty' (an, ty'), ty')
```

There is also a judgement for turning checking into inference, again by making a fresh type variable.

Our rule for eliminating ∀s was adapted from the paper Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism. Unlike in that paper, however, we do not have explicit *existential variables* in contexts, and so must check expressions against deeply-skolemised types to eliminate the universal quantifiers.

If the expression is checked against a deeply skolemised version of the type, however, it will be tagged with that, while it needs to be tagged with the universally-quantified type. So, it is `correct`

ed.

Amulet has rudimentary support for *typed holes*, as in dependently typed languages and, more recently, GHC. Since printing the type of holes during type checking would be entirely uninformative due to half-solved types, reporting them is deferred to after checking.

Of course, holes must still have checking behaviour: They take whatever type they’re checked against.

Checking functions is as easy as inferring them: The goal type is split between domain and codomain; the pattern is checked against the domain, while the body is checked against the codomain, with the pattern’s bindings in scope.

```
check ex@(Fun p b a) ty = do
(dom, cod) <- decompose ex _TyArr ty
(p', ms) <- checkPattern p dom
Fun p' <$> extendMany ms (check b cod) <*> pure (a, ty)
```

Empty `begin end`

blocks are an error.

`check ex@(Begin [] _) _ = throwError (EmptyBegin ex)`

`begin ... end`

blocks with at least one expression are checked by inferring the types of every expression but the last, and then checking the last expression in the block against the goal type.

```
check (Begin xs a) t = do
let start = init xs
end = last xs
start' <- traverse (fmap fst . infer) start
end' <- check end t
pure (Begin (start' ++ [end']) (a, t))
```

`let`

s are pain. Since all our `let`

s are recursive by nature, they must be checked, including all the bound variables, in a context where the types of every variable bound there are already available; To figure this out, however, we first need to infer the type of every variable bound there.

If that strikes you as “painfully recursive”, you’re right. This is where the unification-based nature of our type system saved our butts: Each bound variable in the `let`

gets a fresh type variable, the context is extended and the body checked against the goal.

The function responsible for inferring and solving the types of variables is `inferLetTy`

. It keeps an accumulating association list to check the types of further bindings as they are figured out, one by one, then uses the continuation to generalise (or not) the type.

```
check (Let ns b an) t = do
ks <- for ns $ \(a, _, _) -> do
tv <- freshTV
pure (TvName a, tv)
extendMany ks $ do
(ns', ts) <- inferLetTy id ks (reverse ns)
extendMany ts $ do
b' <- check b t
pure (Let ns' b' (an, t))
```

We have decided to take the advice of Vytiniotis, Peyton Jones, and Schrijvers, and refrain from generalising lets, except at top-level. This is why `inferLetTy`

gets given `id`

when checking terms.

The judgement for checking `if`

expressions is what made me stick to bidirectional type checking instead of fixing out variant of Algorithm W. The condition is checked against the boolean type, while both branches are checked against the goal.

it is not possible, in general, to recover the type of a function at an application site, we infer it; The argument given is checked against that function’s domain and the codomain is unified with the goal type.

```
check ex@(App f x a) ty = do
(f', (d, c)) <- secondA (decompose ex _TyArr) =<< infer f
App f' <$> check x d <*> fmap (a,) (unify ex ty c)
```

To check `match`

, the type of what’s being matched against is first inferred because, unlike application where *some* recovery is possible, we can not recover the type of matchees from the type of branches *at all*.

Once we have the type of the matchee in hands, patterns can be checked against that. The branches are then each checked against the goal type.

```
ps' <- for ps $ \(p, e) -> do
(p', ms) <- checkPattern p tt
(,) <$> pure p' <*> extendMany ms (check e ty)
```

Checking binary operators is like checking function application twice. Very boring.

```
check ex@(BinOp l o r a) ty = do
(o', to) <- infer o
(el, to') <- decompose ex _TyArr to
(er, d) <- decompose ex _TyArr to'
BinOp <$> check l el <*> pure o'
<*> check r er <*> fmap (a,) (unify ex d ty)
```

Checking records and record extension is a hack, so I’m not going to talk about them until I’ve cleaned them up reasonably in the codebase. Record access, however, is very clean: we make up a type for the row-polymorphic bit, and check against a record type built from the goal and the key.

```
check (Access rc key a) ty = do
rho <- freshTV
Access <$> check rc (TyRows rho [(key, ty)])
<*> pure key <*> pure (a, ty)
```

Checking tuple expressions involves a local helper much like checking tuple patterns. The goal type is recursively decomposed and made to line with the expression being checked.

```
check ex@(Tuple es an) ty = Tuple <$> go es ty <*> pure (an, ty) where
go [] _ = error "not a tuple"
go [x] t = (:[]) <$> check x t
go (x:xs) t = do
(left, right) <- decompose ex _TyTuple t
(:) <$> check x left <*> go xs right
```

And, to finish, we have a judgement for turning inference into checking.

I like the new type checker: it has many things you’d expect from a typed lambda calculus, such as η-contraction preserving typability, and substitution of `let`

-bound variables being generally admissable.

Our type system is fairly complex, what with rank-N types and higher kinded polymorphism, so inferring programs under it is a bit of a challenge. However, I am fairly sure the only place that demands type annotations are higher-ranked *parameters*: uses of higher-rank functions are checked without the need for annotations.

Check out Amulet the next time you’re looking for a typed functional programming language that still can’t compile to actual executables.

]]>As you might have noticed, I like designing and implementing programming languages. This is another of these projects. Amulet is a strictly-evaluated, statically typed impure roughly functional programming language with support for parametric data types and rank-1 polymorphism *à la* Hindley-Milner (but no let-generalization), along with row-polymorphic records. While syntactically inspired by the ML family, it’s a disservice to those languages to group Amulet with them, mostly because of the (present) lack of modules.

Planned features (that I haven’t even started working on, as of writing this post) include generalized algebraic data types, modules and modular implicits, a reworked type inference engine based on *OutsideIn(X)*^{1} to support the other features, and, perhaps most importantly, a back-end that’s not a placeholder (i.e. something that generates either C or LLVM and can be compiled to a standalone executable).

The compiler is still very much a work in progress, and is actively being improved in several ways: Rewriting the parser for efficiency concerns (see Lexing and Parsing), improving the quality of generated code by introducing more intermediate representations, and introducing several optimisations on the one intermediate language we *do* have.

In this section, I’m going to describe the implementation of the compiler as it exists at the time of writing - warts and all. Unfortunately, we have a bit too much code for all of it to fit in this blag post, so I’m only going to include the horribly broken bits here, and leave the rest out. Of course, the compiler is open source, and is available on my GitHub.

To call what we have a *lexer* is a bit of an overstatement: The `Parser.Lexer`

module, which underpins the actual parser, contains only a handful of imports and some definitions for use with Parsec’s `Text.Parsec.Token`

module; Everything else is boilerplate, namely, declaring, at top-level, the functions generated by `makeTokenParser`

.

Our parser is then built on top of this infrastructure (and the other combinators provided by Parsec) in a monadic style. Despite having chosen to use strict `Text`

s, many of the Parsec combinators return `Char`

s, and using the Alternative type class’ ability to repeat actions makes linked lists of these - the dreaded `String`

type. Due to this, and other inefficiencies, the parser is ridiculously bad at memory management.

However, it does have some cute hacks. For example, the pattern parser has to account for being used in the parsing of both `match`

and `fun`

- in the former, destructuring patterns may appear without parenthesis, but in the latter, they *must* be properly parenthesised: since `fun`

may have multiple patterns, it would be ambiguous if `fun Foo x -> ...`

is destructuring a `Foo`

or takes two arguments.

Instead of duplicating the pattern parser, one for `match`

es and one for function arguments, we instead *parametrised* the parser over needing parenthesis or not by adding a rank-2 polymorphic continuation argument.

```
patternP :: (forall a. Parser a -> Parser a) -> Parser Pattern'
patternP cont = wildcard <|> {- some bits omitted -} try destructure where
destructure = withPos . cont $ do
ps <- constrName
Destructure ps <$> optionMaybe (patternP id)
```

When we’re parsing a pattern `match`

-style, the continuation given is `id`

, and when we’re parsing an argument, the continuation is `parens`

.

For the aforementioned efficiency concerns, however, we’ve decided to scrap the Parsec-based parser and move to an Alex/Happy based solution, which is not only going to be more maintainable and more easily hackable in the future, but will also be more efficient overall. Of course, for a toy compiler such as this one, efficiency doesn’t matter that much, but using *one and a half gigabytes* to compile a 20-line file is really bad.

To simplify scope handling in both the type checker and optimiser, after parsing, each variable is tagged with a globally unique integer that is enough to compare variables. This also lets us use more efficient data structures later in the compiler, such as `VarSet`

, which stores only the integer identifier of a variable in a big-endian Patricia tree^{2}.

Our approach, described in *Secrets of the Glasgow Haskell Compiler inliner* as “the Sledgehammer”, consists of duplicating *every* bound variable to avoid name capture problems. However, while the first of the listed disadvantages surely does apply, by doing all of the *renaming* in one go, we mostly avoid the latter. Of course, since then, the Haskell ecosystem has evolved significantly, and the plumbing required is a lot less intrusive.

In our compiler, we use MTL-style classes instead of concrete monad transformer stacks. We also run every phase after parsing in a single `GenT`

monad, which provides a fresh supply of integers for names. “Plumbing” the fresh name supply, then, only involves adding a `MonadGen Int m`

constraint to the context of functions that need it.

Since the string component of parsed names is not thrown away, we also have to make up strings themselves. This is where another cute hack comes in: We generate, lazily, an infinite stream of names that goes `["a" .. "z", "aa" .. "az", "ba" .. "bz", ..]`

, then use the `MonadGen`

counter as an index into that stream.

The desugarer is a very simple piece of code which, through use of *Scrap Your Boilerplate*-style generic programming, traverses the syntax tree and rewrites nodes representing syntax sugar to their more explicit versions.

Currently, the desugarer only expands *sections*: That is, expressions of the form `(+ e)`

become `fun x -> x + e`

(where `e`

is a fresh name), expressions like `(e +)`

become `fun x -> e + x`

, and expressions like `.foo`

becomes `fun x -> x.foo`

.

This is the only component of the compiler that I can reasonably include, in its entirety, in this post.

```
desugarProgram = everywhereM (mkM defaults) where
defaults :: Expr Parsed -> m (Expr Parsed)
defaults (BothSection op an) = do
(ap, ar) <- fresh an
(bp, br) <- fresh an
pure (Fun ap (Fun bp (BinOp ar op br an) an) an)
defaults (LeftSection op vl an) = do
(cap, ref) <- fresh an
pure (Fun cap (BinOp ref op vl an) an)
defaults (RightSection op vl an) = do
(cap, ref) <- fresh an
pure (Fun cap (BinOp vl op ref an) an)
defaults (AccessSection key an) = do
(cap, ref) <- fresh an
pure (Fun cap (Access ref key an) an)
defaults x = pure x
```

By far the most complicated stage of the compiler pipeline, our inference algorithm is modelled after Algorithm W (extended with kinds and kind inference), with constraint generation and solving being two separate steps.

We first traverse the syntax tree, in order, making up constraints and fresh type variables as needed, then invoke a unification algorithm to produce a substitution, then apply that over both the generated type (a skeleton of the actual result) and the syntax tree (which is explicitly annotated with types everywhere).

The type inference code also generates and inserts explicit type applications when instancing polymorphic types, since we internally lower Amulet into a System F core language with explicit type abstraction and application. We have `TypeApp`

nodes in the syntax tree that never get parsed or renamed, and are generated by the type checker before lowering happens.

Our constraint solver is quite rudimentary, but it does the job nicely. We operate with a State monad with the current substitution. When we unify a variable with another type, it is added to the current substitution. Everything else is just zipping the types together. When we try to unify, say, a function type with a constructor, that’s an error. If a variable has already been added to the current substitution and encounter it again, the new type is unified with the previously recorded one.

```
unify :: Type Typed -> Type Typed -> SolveM ()
unify (TyVar a) b = bind a b
unify a (TyVar b) = bind b a
unify (TyArr a b) (TyArr a' b') = unify a a' *> unify b b'
unify (TyApp a b) (TyApp a' b') = unify a a' *> unify b b'
unify ta@(TyCon a) tb@(TyCon b)
| a == b = pure ()
| otherwise = throwError (NotEqual ta tb)
```

This is only an excerpt, because we have very complicated types.

One of Amulet’s selling points (if one could call it that) is its support for row-polymorphic records. We have two types of first-class record types: *closed* record types (the type of literals) and *open* record types (the type inferred by record patterns and field getters.). Open record types have the shape `{ 'p | x_n : t_n ... x_n : t_n }`

, while closed records lack the type variable `'p`

.

Unification of records has 3 cases, but in all 3 cases it is checked that fields present in both records have unifiable types.

- When unifying an open record with a closed one, present in both records have unifiable types, and instance the type variable to contain the extra fields.
- When unifying two closed records, they must have exactly the same shape and unifiable types for common fields.
- When unifying two open record types, a new fresh type variable is created to use as the “hole” and tack the fields together.

As an example, `{ x = 1 }`

has type `{ x : int }`

, the function `fun x -> x.foo`

has type `{ 'p | foo : 'a } -> 'a`

, and `(fun r -> r.x) { y = 2 }`

is a type error^{3}.

Vytiniotis, Peyton Jones and Schrijvers argue^{4} that HM-style `let`

generalisation interacts badly with complex type system extensions such as GADTs and type families, and should therefore be omitted from such systems. In a deviation from the paper, GHC 7.2 reintroduces `let`

generalisation for local definitions that meet some criteria^{5}.

Here’s the rule. With

`-XMonoLocalBinds`

(the default), a binding without a type signature isgeneralised only if all its free variables are closed.A binding is

closedif and only if

- It has a type signature, and the type signature has no free variables; or

- It has no type signature, and all its free variables are closed, and it is unaffected by the monomorphism restriction. And hence it is fully generalised.

We, however, have chosen to follow that paper to a tee. Despite not (yet!) having any of those fancy type system features that interact poorly with let generalisation, we do not generalise *any* local bindings.

After type checking is done (and, conveniently, type applications have been left in the correct places for us by the type checker), Amulet code is converted into an explicitly-typed intermediate representation, in direct style, which is used for (local) program optimisation. The AST is simplified considerably: from 19 constructors to 9.

Type inference is no longer needed: the representation of core is packed with all the information we need to check that programs are type-correct. This includes types in every binder (lambda abstractions, `let`

s, pattern bindings in `match`

), big-lambda abstractions around polymorphic values (a \(\lambda\) binds a value, while a \(\Lambda\) binds a type), along with the already mentioned type applications.

Here, code also gets the error branches for non-exhaustive `match`

expressions, and, as a general rule, gets a lot uglier.

```
let main _ = (fun r -> r.x) { x = 2 }
(* Is elaborated into *)
let main : ∀ 'e. 'e -> int =
Λe : *. λk : 'e. match k {
(p : 'e) : 'e -> (λl : { 'g | x : int }. match l {
(r : { 'g | x : int }) : { 'g | x : int } -> match r {
{ (n : { 'g | x : int }) | x = (m : int) } : { 'g | x : int } -> m
};
(o : { 'g | x : int }) : { 'g | x : int } ->
error @int "<test>[1:15 .. 1:27]"
}) ({ {} | x : int = 2 });
(q : 'e) : 'e -> error @int "<test>[1:14 .. 1:38]"
}
```

As the code we initially get from lowering is ugly and inefficient - along with being full of the abstractions functional programs have by nature, it is full of redundant matches created by e.g. the fact that functions can not do pattern matching directly, and that field access gets reduced to pattern matching - the optimiser’s job is to make it prettier, and more efficient.

The optimiser works by applying, in order, a series of local transformations operating on individual sub-terms to produce an efficient program, 25 times. The idea of applying them several times is that, when a simplification pass kicks in, more simplification opportunities might arise.

`dropBranches`

, `foldExpr`

, `dropUselessLets`

These trivial passes remove similarly trivial pieces of code that only add noise to the program. `dropBranches`

will do its best to remove redundant arms from a `match`

expression, such as those that appear after an irrefutable pattern. `foldExpr`

reduces uses of operators where both sides are known, e.g. `2 + 2`

(replaced by the literal `5`

) or `"foo " ^ "bar"`

(replaced by the literal `"foo bar"`

). `dropUselessLets`

removes `let`

s that bind unused variables whose right-hand sides are pure expressions.

`trivialPropag`

, `constrPropag`

The Amulet optimiser does inlining decisions in two (well, three) separate phases: One is called *propagation*, in which a `let`

decides to propagate its bound values into the expression, and the other is the more traditional `inlining`

, where variables get their values from the context.

Propagation is by far the easiest of the two: The compiler can see both the definitions and all of the use sites, and could in theory decide if propagating is beneficial or not. Right now, we propagate all literals (and records made up solely of other trivial expressions), and do a round of propagation that is best described as a rule.

This *constructor propagation* allows the `match`

optimisations to kick in more often, and is semantics preserving.

`match`

-of-known-constructorThis pass identifies `match`

expressions where we can statically determine the expression being analysed and, therefore, decide which branch is going to be taken.

`match`

-of-bottomIt is always safe to turn a `match`

where the term being matched is a diverging expression into only that diverging expression, thus reducing code size several times.

As a special case, when one of the arms is itself a diverging expression, we use the type mentioned in that application to `error`

to fix up the type of the value being scrutinized.

```
match (error @foo "message") with
| _ -> error @bar "message 2"
...
(* becomes *)
error @bar "message"
```

`match`

-of-`match`

This transformation turns `match`

expressions where the expression being dissected is itself another `match`

“inside-out”: we push the branches of the *outer* `match`

“into” the *inner* `match`

(what used to be the expression being scrutinized). In doing so, sometimes, new opportunities for match-of-known-constructor arise, and the code ends up simpler.

```
match (match x with
| A -> B
| C -> D) with
| B -> e
| D -> f
(* becomes *)
match x with
| A -> match B with
| B -> e
| D -> f
| C -> match D with
| B -> e
| D -> f
```

A clear area of improvement here is extracting the outer branches into local `let`

-bound lambda abstractions to avoid an explosion in code size.

`inlineVariable`

, `betaReduce`

In this pass, use of a variable is replaced with the definition of that variable, if it meets the following conditions:

- The variable is a lambda abstraction; and
- The lambda abstraction’s body is not too
*expensive*. Computing the cost of a term boils down to computing the depth of the tree representing that term, with some extra cost added to some specific types of expression.

In doing this, however, we end up with pathological terms of the form `(fun x -> e) y`

. The `betaReduce`

pass turns this into `let x = y in e`

. We generate `let`

bindings instead of substituting the variable with the parameter to maintain the same evaluation order and observable effects of the original code. This does mean that, often, propagation kicks in and gives rise to new simplification opportunities.

I was planning to write a section with a formalisation of the language’s semantics and type system, but it turns out I’m no mathematician, no matter how hard I pretend. Maybe in the future.

Our code generator is wholly uninteresting, and, most of all, a placeholder: This is why it is not described in detail (that is, at all) in this post. I plan to write a follow-up when we actually finish the native code generator.

As previously mentioned, the compiler *is* open source: the code is here. I recommend using the Nix package manager to acquire the Haskell dependencies, but Cabal should work too. Current work in rewriting the parser is happening in the `feature/alex-happy`

branch.

Dimitrios Vytiniotis, Simon Peyton Jones, Tom Schrijvers, and Martin Sulzmann. 2011. OutsideIn(X): Modular Type Inference With Local Assumptions.

*Note that, although the paper has been published in the Journal of Functional Programming, the version linked to here is a preprint.*↩This sounds fancy, but in practice, it boils down to using

`Data.IntSet`

instead of`Data.Set`

.↩Dimitrios Vytiniotis, Simon Peyton Jones, Tom Schrijvers. 2010. Let Should not be Generalised.↩

As explained in this blog post.↩

Dependent types are pretty cool, yo. This post is a semi-structured ramble about dtt, a small dependently-typed “programming language” inspired by Thierry Coquand’s Calculus of (inductive) Constructions (though, note that the *induction* part is still lacking: There is support for defining inductive data types, and destructuring them by pattern matching, but since there’s no totality checker, recursion is disallowed).

`dtt`

is written in Haskell, and served as a learning experience both in type theory and in writing programs using extensible effects. I *do* partly regret the implementation of effects I chose (the more popular `extensible-effects`

did not build on the Nixpkgs channel I had, so I went with `freer`

; Refactoring between these should be easy enough, but I still haven’t gotten around to it, yet)

I originally intended for this post to be a Literate Haskell file, interleaving explanation with code. However, for a pet project, `dtt`

’s code base quickly spiralled out of control, and is now over a thousand lines long: It’s safe to say I did not expect this one bit.

`dtt`

is a very standard *λ*_{∏} calculus. We have all 4 axes of Barendgret’s lambda cube, in virtue of having types be first class values: Values depending on values (functions), values depending on types (polymorphism), types depending on types (type operators), and types depending on values (dependent types). This places dtt squarely at the top, along with other type theories such as the Calculus of Constructions (the theoretical basis for the Coq proof assistant) and TT (the type theory behind the Idris programming language).

The syntax is very simple. We have the standard lambda calculus constructs - *λ*-abstraction, application and variables - along with `let`

-bindings, pattern matching `case`

expression, and the dependent type goodies: ∏-abstraction and `Set`

.

*As an aside*, pi types are called as so because the dependent function space may (if you follow the “types are sets of values” line of thinking) be viewed as the cartesian product of types. Consider a type `A`

with inhabitants `Foo`

, `Bar`

and a type `B`

with inhabitant `Quux`

. A dependent product ∏_{(x : A)}`B`

, then, has inhabitants `(Foo, Quux)`

and `(Bar, Quux)`

.

You’ll notice that dtt does not have a dedicated arrow type. Indeed, the dependent product subsumes both the ∀ quantifier of System *F*, and the arrow type → of the simply-typed lambda calculus. Keep this in mind: It’ll be important later.

Since dtt’s syntax is unified (i.e., there’s no stratification of terms and types), the language can be - and is - entirely contained in a single algebraic data type. All binders are *explicitly typed*, seeing as inference for dependent types is undecidable (and, therefore, bad).^{1}

```
type Type = Term
data Term
= Variable Var
| Set Int
| TypeHint Term Type
| Pi Var Type Type
| Lam Var Type Term
| Let Var Term Term
| App Term Term
| Match Term [(Pattern, Term)]
deriving (Eq, Show, Ord)
```

The `TypeHint`

term constructor, not mentioned before, is merely a convenience: It allows the programmer to check their assumptions and help the type checker by supplying a type (Note that we don’t assume this type is correct, as you’ll see later; It merely helps guide inference.)

Variables aren’t merely strings because of the large amount of substitutions we have to perform: For this, instead of generating a new name, we increment a counter attached to the variable - the pretty printer uses the original name to great effect, when unambiguous.

The `Irrelevant`

variable constructor is used to support *a* → *b* as sugar for ∏_{(x : a)}*b* when *x* does not appear free in *b*. As soon as the type checker encounters an `Irrelevant`

variable, it is refreshed with a new name.

`dtt`

does not have implicit support (as in Idris), so all parameters, including type parameters, must be bound explicitly. For this, we support several kinds of syntatic sugar. First, all abstractions support multiple variables in a *binding group*. This allows the programmer to write `(a, b, c : α) -> β`

instead of `(a : α) -> (b : α) -> (c : α) -> β`

. Furthermore, there is special syntax `/\a`

for single-parameter abstraction with type `Set 0`

, and lambda abstractions support multiple binding groups.

As mentioned before, the language does not support recursion (either general or well-founded). Though I would like to, writing a totality checker is hard - way harder than type checking *λ*_{∏}, in fact. However, an alternative way of inspecting inductive values *does* exist: eliminators. These are dependent versions of catamorphisms, and basically encode a proof by induction. An inductive data type as Nat gives rise to an eliminator much like it gives rise to a natural catamorphism.

```
inductive Nat : Type of {
Z : Nat;
S : Nat -> Nat
}
natElim : (P : Nat -> Type)
-> P Z
-> ((k : Nat) -> P k -> P (S k))
-> (n : Nat)
-> P n
```

If you squint, you’ll see that the eliminator models a proof by induction (of the proposition *P*) on the natural number *n*: The type signature basically states “Given a proposition *P* on ℕ, a proof of *P*_{0}, a proof that *P*_{(k + 1)} follows from *P*_{k} and a natural number *n*, I’ll give you a proof of *P*_{n}.”

This understanding of computations as proofs and types as propositions, by the way, is called the Curry-Howard Isomorphism. The regular, simply-typed lambda calculus corresponds to natural deduction, while *λ*_{∏} corresponds to predicate logic.

Should this be called the term system?

Our type inference algorithm, contrary to what you might expect for such a complicated system, is actually quite simple. Unfortunately, the code isn’t, and thus isn’t reproduced in its entirety below.

The simplest case in any type system. The typing judgement that gives rise to this case is pretty much the identity: *Γ* ⊢ *α* : *τ*∴ *Γ* ⊢ *α* : *τ*. If, from the current typing context we know that *α* has type *τ*, then we know that *α* has type *τ*.

```
Variable x -> do
ty <- lookupType x -- (I)
case ty of
Just t -> pure t -- (II)
Nothing -> throwError (NotFound x) -- (III)
```

- Look up the type of the variable in the current context.
- If we found a type for it, then return that (this is the happy path)
- If we didn’t find a type for it, we raise a type error.

`Set`

sSince dtt has a cummulative hierarchy of universes, `S`

`e`

`t`

_{k} : `S`

`e`

`t`

_{(k + 1)}. This helps us avoid the logical inconsistency introduced by having *type-in-type*^{2}, i.e. `T`

`y`

`p`

`e`

: `T`

`y`

`p`

`e`

. We say that `S`

`e`

`t`

_{0} is the type of *small types*: in fact, `S`

`e`

`t`

_{0} is where most computation actually happens, seeing as `S`

`e`

`t`

_{k} for *k* ≥ 1 is reserved for ∏-abstractions quantifying over such types.

Type hints are the first appearance of the unification engine, by far the most complex part of dtt’s type checker. But for now, suffices to know that `t1 `assertEquality` t2`

errors if the types t1 and t2 can’t be made to *line up*, i.e., unify.

For type hints, we infer the type of given expression, and compare it against the user-provided type, raising an error if they don’t match. Because of how the unification engine works, the given type may be more general (or specific) than the inferred one.

This is where it starts to get interesting. First, we mandate that the parameter type is inhabited (basically, that it *is*, in fact, a type). The dependent product ∏_{(x : 0)}*α*, while allowed by the language’s grammar, is entirely meaningless: There’s no way to construct an inhabitant of 0, and thus this function may never be applied.

Then, in the context extended with (*α* : *τ*), we require that the consequent is also a type itself: The function ∏_{(x : ℕ)}0, while again a valid parse, is also meaningless.

The type of the overall abstraction is, then, the maximum value of the indices of the universes of the parameter and the consequent.

```
Pi x p c -> do
k1 <- inferSet tx
k2 <- local (insertType (x, p)) $
inferSet c
pure $ Set (k1 `max` k2)
```

Much like in the simply-typed lambda calculus, the type of a *λ*-abstraction is an arrow between the type of its parameter and the type of its body. Of course, *λ*_{∏} incurs the additional constraint that the type of the parameter is inhabited.

Alas, we don’t have arrows. So, we “lift” the lambda’s parameter to the type level, and bind it in a ∏-abstraction.

Note that, much like in the `Pi`

case, we type-check the body in a context extended with the parameter’s type.

Application is the most interesting rule, as it has to not only handle inference, it also has to handle instantiation of ∏-abstractions.

Instantation is, much like application, handled by *β*-reduction, with the difference being that instantiation happens during type checking (applying a ∏-abstraction is meaningless) and application happens during normalisation (instancing a *λ*-abstraction is meaningless).

The type of the function being applied needs to be a ∏-abstraction, while the type of the operand needs to be inhabited. Note that the second constraint is not written out explicitly: It’s handled by the `Pi`

case above, and furthermore by the unification engine.

```
App e1 e2 -> do
t1 <- infer e1
case t1 of
Pi vr i o -> do
t2 <- infer e2
t `assertEquality` i
N.normalise =<< subst [(vr, e2)] o -- (I)
e -> throwError (ExpectedPi e) -- (II)
```

Notice that, here, we don’t substitute the ∏-bound variable by the type of

*e*_{2}: That’d make us equivalent to System*F*. The whole*deal*with dependent types is that types depend on values, and that entirely stems from this one line. By instancing a type variable with a value, we allow*types*to depend on*values*.Oh, and if we didn’t get a ∏-abstraction, error.

You’ll notice that two typing rules are missing here: One for handling `let`

s, which was not included because it is entirely uninteresting, and one for `case ... of`

expressions, which was redacted because it is entirely a mess.

Hopefully, in the future, the typing of `case`

expressions is simpler - if not, they’ll probably be replaced by eliminators.

The unification engine is the man behind the curtain in type checking: We often don’t pay attention to it, but it’s the driving force behind it all. Fortunately, in our case, unification is entirely trivial: Solving is the hard bit.

The job of the unification engine is to produce a set of constraints that have to be satisfied in order for two types to be equal. Then, the solver is run on these constraints to assert that they are logically consistent, and potentially produce substitutions that *reify* those constraints.

Our solver isn’t that cool, though, so it just verifies consitency.

The kinds of constraints we can generate are as in the data type below.

```
data Constraint
= Instance Var Term -- (1)
| Equal Term Term -- (2)
| EqualTypes Type Type -- (3)
| IsSet Type -- (4)
deriving (Eq, Show, Ord)
```

- The constraint
`Instance v t`

corresponds to a substitution between`v`

and the term`t`

. - A constraint
`Equal a b`

states that the two terms`a`

and`b`

are equal under normalisation. - Ditto, but with their
*types*(We normalise, infer, and check for equality) - A constraint
`IsSet t`

asserts that the provided type has inhabitants.

Unification of most terms is entirely uninteresting. Simply line up the structures and produce the appropriate equality (or instance) constraints.

```
unify (Variable a) b = instanceC a b
unify b (Variable a) = instanceC a b
unify (Set a) (Set b) | a == b = pure []
unify (App x y) (App x' y') =
(++) <$> unify x x' <*> unify y y'
unify (TypeHint a b) (TypeHint c d) =
(++) <$> unify a c <*> unify b d
unify a b = throwError (NotEqual a b)
```

Those are all the boring cases, and I’m not going to comment on them. Similarly boring are binders, which were abstracted out because hlint told me to.

```
unify (Lam v1 t1 b1) (Lam v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Pi v1 t1 b1) (Pi v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Let v1 t1 b1) (Let v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unifyBinder (v1, v2) (t1, t2) (b1, b2) = do
(a, b) <- (,) <$> unify (Variable v1) (Variable v2) <*> unify t1 t2
((a ++ b) ++) <$> unify b1 b2
```

There are two interesting cases: Unification between some term and a pi abstraction, and unification between two variables.

```
unify ta@(Variable a) tb@(Variable b)
| a == b = pure []
| otherwise = do
(x, y) <- (,) <$> lookupType a <*> lookupType b
case (x, y) of
(Just _, Just _) -> do
ca <- equalTypesC ta tb
cb <- equalC ta tb
pure (ca ++ cb)
(Just x', Nothing) -> instanceC b x'
(Nothing, Just x') -> instanceC a x'
(Nothing, Nothing) -> instanceC a (Variable b)
```

If the variables are syntactically the same, then we’re done, and no constraints have to be generated (Technically you could generate an entirely trivial equality constraint, but this puts unnecessary pressure on the solver).

If either variable has a known type, then we generate an instance constraint between the unknown variable and the known one.

If both variables have a value, we equate their types’ types and their types. This is done mostly for error messages’ sakes, seeing as if two values are propositionally equal, so are their types.

Unification between a term and a ∏-abstraction is the most interesting case: We check that the ∏ type abstracts over a type (i.e., it corresponds to a System F ∀ instead of a System F →), and *instance* the ∏ with a fresh type variable.

```
unifyPi v1 t1 b1 a = do
id <- refresh Irrelevant
ss <- isSetC t1
pi' <- subst [(v1, Variable id)] b1
(++ ss) <$> unify a pi'
unify a (Pi v1 t1 b1) = unifyPi v1 t1 b1 a
unify (Pi v1 t1 b1) a = unifyPi v1 t1 b1 a
```

Solving is a recursive function of the list of constraints (a catamorphism!) with some additional state: Namely, a strict map of already-performed substitutions. Let’s work through the cases in reverse order of complexity (and, interestingly, reverse order of how they’re in the source code).

Solving an empty list of constraints is entirely trivial.

`IsSet`

We infer the index of the universe of the given type, much like in the inferrence case for ∏-abstractions, and check the remaining constraints.

`EqualTypes`

We infer the types of both provided values, and generate an equality constraint.

`Equal`

We merely have to check for syntactic equality of the (normal forms of) terms, because the hard lifting of destructuring and lining up was done by the unification engine.

```
solveInner map (Equal a b:xs) = do
a' <- N.normalise a
b' <- N.normalise b
eq <- equal a' b'
if eq
then solveInner map xs
else throwError (NotEqual a b)
```

`Instance`

If the variable we’re instancing is already in the map, and the thing we’re instancing it to *now* is not the same as before, we have an inconsistent set of substitutions and must error.

```
solveInner map (Instance a b:xs)
| a `M.member` map
, b /= map M.! a
, Irrelevant /= a
= throwError $ InconsistentSubsts (a, b) (map M.! a)
```

Otherwise, if we have a coherent set of instances, we add the instance both to scope and to our local state map and continue checking.

Now that we have both `unify`

and `solve`

, we can write `assertEquality`

: We unify the two types, and then try to solve the set of constraints.

The real implementation will catch and re-throw any errors raised by `solve`

to add appropriate context, and that’s not the only case where “real implementation” and “blag implementation” differ.

Wow, that was a lot of writing. This conclusion begins on exactly the 500th line of the Markdown source of this article, and this is the longest article on this blag (by far). However, that’s not to say it’s bad: It was amazing to write, and writing `dtt`

was also amazing. I am not good at conclusions.

`dtt`

is available under the BSD 3-clause licence, though I must warn you that the source code hasn’t many comments.

I hope you learned nearly as much as I did writing this by reading it.

See System U, also Girard’s paradox - the type theory equivalent of Russell’s paradox.↩

`multimethod`

, noun. A procedure which decides runtime behaviour based on the types of its arguments.

At some point, most programming language designers realise that they’ve outgrown the language’s original feature set and must somehow expand it. Sometimes, this expansion is painless for example, if the language had already features in place to facilitate this, such as type classes or message passing.

In our case, however, we had to decide on and implement a performant system for extensibility in the standard library, from scratch. For a while, Urn was using Lua’s scheme for modifying the behaviour of standard library functions: metamethods in metatables. For the uninitiated, Lua tables can have *meta*-tables attached to modify their behaviour with respect to several language features. As an example, the metamethod `__add`

controls how Lua will add two tables.

However, this was not satisfactory, the most important reason as to why being the fact that metamethods are associated with particular object *instances*, instead of being associated with the *types* themselves. This meant that all the operations you’d like to modify had to be modified in one big go - inside the constructor. Consider the constructor for hash-sets as it was implemented before the addition of multimethods.

```
(defun make-set (hash-function)
(let* [(hash (or hash-function id))]
(setmetatable
{ :tag "set"
:hash hash
:data {} }
{ :--pretty-print
(lambda (x)
(.. "«hash-set: " (concat (map pretty (set->list x)) " ") "»"))
:--compare #| elided for brevity |# })))
```

That second table, the meta table, is entirely noise. The fact that constructors also had to specify behaviour, instead of just data, was annoying from a code style point of view and *terrible* from a reuse point of view. Behaviour is closely tied to the implementation - remember that metamethods are tied to the *instance*. To extend the behaviour of standard library functions (which you can’t redefine) for a type you do not control (whose constructor you also can not override), you suddenly need to wrap the constructor and add your own metamethods.

Displeased with the situation as it stood, I set out to discover what other Lisps did, and it seemed like the consensus solution was to implement open multimethods. And so we did.

Multimethods - or multiple dispatch in general - is one of the best solutions to the expression problem. We can easily add new types, and new operations to work on existing types - and most importantly, this means touching *no* existing code.

Our implementation is, like almost everything in Urn, a combination of clever (ab)use of macros, tables and functions. A method is represented as a table - more specifically, a n-ary tree of possible cases, with a metamethod, `__call`

, which means multimethods can be called and passed around like regular functions - they are first-order.

Upon calling a multimethod, it’ll look up the correct method body to call for the given arguments - or the default method, or throw an error, if no default method is provided - and tail-call that, with all the arguments.

Before diving into the ridiculously simple implementation, let’s look at a handful of examples.

Pretty printing is, quite possibly, the simplest application of multiple dispatch to extensibility. As of `ba289d2d`

, the standard library implementation of `pretty`

is a multimethod.

Before, the implementation^{1} would perform a series of type tests and decide on the behaviour, including testing if the given object had a metatable which overrides the pretty-printing behaviour.

The new implementation is *significantly* shorter, so much so that I’m comfortable pasting it here.

That’s it! All of the logic that used to exist is now provided by the `defgeneric`

macro, and adding support for your types is as simple as using `defmethod`

.^{2}

As another example, let’s define - and assume the following are separate modules - a new type, and add pretty printing support for that.

The Urn function `type`

will look for a `tag`

element in tables and report that as the type if it is present, and that function is what the multimethod infrastructure uses to determine the correct body to call. This means that all we need to do if we want to add support for pretty-printing boxes is use defmethod again!

A more complicated application of multiple dispatch for extensibility is the implementation of the `eq?`

method in the standard library. Before^{3}, based on a series of conditionals, the equality test was chosen at runtime.

Anyone with experience optimising code is wincing at the mere thought of this code.

The new implementation of `eq?`

is also comically short - a mere 2 lines for the definition, and only a handful of lines for all the previously existing cases.

```
(defgeneric eq? (x y)
"Compare values for equality deeply.")
(defmethod (eq? symbol symbol) (x y)
(= (get-idx x :contents) (get-idx y :contents)))
(defmethod (eq? string symbol) (x y) (= x (get-idx y :contents)))
(defmethod (eq? symbol string) (x y) (= (get-idx x :contents) y))
```

If we would, as an example, add support for comparing boxes, the implementation would similarly be short.

`defgeneric`

and `defmethod`

are, quite clearly, macros. However, contrary to what one would expect, both their implementations are *quite* simple.

```
(defmacro defgeneric (name ll &attrs)
(let* [(this (gensym 'this))
(method (gensym 'method))]
`(define ,name
,@attrs
(setmetatable
{ :lookup {} }
{ :__call (lambda (,this ,@ll)
(let* [(,method (deep-get ,this :lookup ,@(map (lambda (x)
`(type ,x)) ll)))]
(unless ,method
(if (get-idx ,this :default)
(set! ,method (get-idx ,this :default))
(error "elided for brevity")))
(,method ,@ll))) }))))
```

Everything `defgeneric`

has to do is define a top-level symbol to hold the multimethod table, and generate, at compile time, a lookup function specialised for the correct number of arguments. In a language without macros, multimethod calls would have to - at runtime - loop over the provided arguments, take their types, and access the correct elements in the table.

As an example of how generating the lookup function at compile time is better for performance, consider the (cleaned up^{4}) lookup function generated for the `(eq?)`

method defined above.

```
function(this, x, y)
local method
if this.lookup then
local temp1 = this.lookup[type(x)]
if temp1 then
method = temp1[type(y)] or nil
else
method = nil
end
elseif this.default then
method = this.default
end
if not method then
error("No matching method to call for...")
end
return method(x, y)
end
```

`defmethod`

and `defdefault`

are very simple and uninteresting macros: All they do is wrap the provided body in a lambda expression along with the proper argument list and associate them to the correct element in the tree.

```
(defmacro defmethod (name ll &body)
`(put! ,(car name) (list :lookup ,@(map s->s (cdr name)))
(let* [(,'myself nil)]
(set! ,'myself (lambda ,ll ,@body))
,'myself)))
```

Switching to methods instead of a big if-else chain improved compiler performance by 12% under LuaJIT, and 2% under PUC Lua. The performace increase under LuaJIT can be attributed to the use of polymorphic inline caches to speed up dispatch, which is now just a handful of table accesses - Doing it with the if-else chain is *much* harder.

Defining complex multiple-dispatch methods used to be an unthinkable hassle what with keeping straight which cases have been defined yet and which cases haven’t, but they’re now very simple to define: Just state out the number of arguments and list all possible cases.

The fact that multimethods are *open* means that new cases can be added on the fly, at runtime (though this is not officially supported, and we don’t claim responsibility if you shoot your own foot), and that modules loaded later may improve upon the behaviour of modules loaded earlier. This means less coupling between the standard library, which has been growing to be quite large.

This change has, in my opinion, made Urn a lot more expressive as a language, and I’d like to take a minute to point out the power of the Lisp family in adding complicated features such as these as merely library code: no changes were made to the compiler, apart from a tiny one regarding environments in the REPL - previously, it’d use the compiler’s version of `(pretty)`

even if the user had overridden it, which wasn’t a problem with the metatable approach, but definitely is with the multimethod approach.

Of course, no solution is all *good*. Compiled code size has increased a fair bit, and for the Urn compiler to inline across multimethod boundaries would be incredibly difficult - These functions are essentially opaque boxes to the compiler.

Dead code elimination is harder, what with defining functions now being a side-effect to be performed at runtime - Telling which method cases are or aren’t used is incredibly difficult with the extent of the dynamicity.

Here. Do keep in mind that the implementation is

*quite*hairy, and grew to be like that because of our lack of a standard way of making functions extensible.↩`%q`

is the format specifier for quoted strings.↩Here. Do keep in mind that that the above warnings apply to this one, too.↩

The original generated code is quite similar, except the generated variable names make it a tad harder to read.↩

Constraint propagation is a new optimisation proposed for implementation in the Urn compiler^{1}. It is a variation on the idea of flow-sensitive typing in that it is not applied to increasing program safety, rather being used to improve *speed*.

The Urn compiler is decently fast for being implemented in Lua. Currently, it manages to compile itself (and a decent chunk of the standard library) in about 4.5 seconds (when using LuaJIT; When using the lua.org interpreter, this time roughly doubles). Looking at a call-stack profile of the compiler, we notice a very interesting data point: about 11% of compiler runtime is spent in the `(type)`

function.

There are two ways to fix this: Either we introduce a type system (which is insanely hard to do for a language as dynamic as Urn - or Lisp in general) or we reduce the number of calls to `(type)`

by means of optimisation. Our current plan is to do the latter.

The proposed solution is to collect all the branches that the program has taken to end up in the state it currently is. Thus, every branch grows the set of “constraints” - the predicates which have been invoked to get the program here.

Most useful predicates involve a variable: Checking if it is or isn’t nil, if is positive or negative, even or odd, a list or a string, and etc. However, when talking about a single variable, this test only has to be performed *once* (in general - mutating the variable invalidates the set of collected constraints), and their truthiness can be kept, by the compiler, for later use.

As an example, consider the following code. It has three branches, all of which imply something different about the type of the variable `x`

.

If, in the first case, the program then evaluated `(car x)`

, it’d end up doing a redundant type check. `(car)`

, is, in the standard library, implemented like so:

`assert-type!`

is merely a macro to make checking the types of arguments more convenient. Let’s make the example of branching code a bit more complicated by making it take and print the `car`

of the list.

To see how constraint propagation would aid the runtime performance of this code, let’s play optimiser for a bit, and see what this code would end up looking like at each step.

First, `(car x)`

is inlined.

`assert-type!`

is expanded, and the problem becomes apparent: the type of `x`

is being computed *twice*!

```
(cond
[(list? x)
(print! (progn (if (! (list? x))
(error! "the argument x is not a list"))
(.> x 0)))])
```

If the compiler had constraint propagation (and the associated code motions), this code could be simplified further.

Seeing as we already know that `(list? x)`

is true, we don’t need to test anymore, and the conditional can be entirely eliminated. Figuring out `(! (list? x))`

from `(list? x)`

is entirely trivial constant folding (the compiler already does it)

This code is optimal. The `(list? x)`

test can’t be eliminated because nothing else is known about `x`

. If its value were statically known, the compiler could eliminate the branch and invocation of `(car x)`

completely by constant propagation and folding (`(car)`

is, type assertion notwithstanding, a pure function - it returns the same results for the same inputs. Thus, it is safe to execute at compile time)

In this section I’m going to outline a very simple implementation of the constraint propagation algorithm to be employed in the Urn compiler. It’ll work on a simple Lisp with no quoting or macros (thus, basically the lambda calculus).

```
(lambda (var1 var2) exp) ; λ-abstraction
(foo bar baz) ; procedure application
var ; variable reference
(list x y z) ; list
t, nil ; boolean
(cond [t1 b1] [t2 b2]) ; conditional
```

The language has very simple semantics. It has three kinds of values (closures, lists and booleans), and only a couple reduction rules. The evaluation rules are presented as an interpretation function (in Urn, not the language itself).

```
(defun interpret (x env)
(case x
[(lambda ?params . ?body)
`(:closure ,params ,body ,(copy env))] ; 1
[(list . ?xs)
(map (cut interpret <> env) xs)] ; 2
[t true] [nil false] ; 3
[(cond . ?alts) ; 4
(interpret
(block (map (lambda (alt)
(when (interpret (car alt) env)
(break (cdr alt))))))
env)]
[(?fn . ?args)
(case (eval fn env)
[(:closure ?params ?body ?cl-env) ; 5
(map (lambda (a k)
(.<! cl-env (symbol->string a) (interpret k env)))
params args)
(last (map (cut interpret <> env) body))]
[_ (error! $"not a procedure: ${fn}")])]
[else (.> env (symbol->string x))]))
```

- In the case the expression currently being evaluated is a lambda, we make a copy of the current environment and store it in a
*closure*. - If a list is being evaluated, we recursively evaluate each sub-expression and store all of them in a list.
- If a boolean is being interpreted, they’re mapped to the respective values in the host language.
- If a conditional is being evaluated, each test is performed in order, and we abort to interpret with the corresponding body.
- When evaluating a procedure application, the procedure to apply is inspected: If it is a closure, we evaluate all the arguments, bind them along with the closure environment, and interpret the body. If not, an error is thrown.

Collecting constraints in a language as simple as this is fairly easy, so here’s an implementation.

```
(defun collect-constraints (expr (constrs '()))
(case expr
[(lambda ?params . ?body)
`(:constraints (lambda ,params
,@(map (cut collect-constraints <> constrs) body))
,constrs)]
```

Lambda expressions incur no additional constraints, so the inner expressions (namely, the body) receive the old set. The same is true for lists:

Booleans are simpler:

Since there are no sub-expressions to go through, we only associate the constraints with the boolean values.

Conditionals are where the real work happens. For each case, we add that case’s test as a constraint in its body.

```
[(cond . ?alts)
`(:constraints
(cond
,@(map (lambda (x)
`(,(collect-constraints (car x) constrs)
,(collect-constraints (cadr x) (cons (car x) constrs))))
alts))
,constrs)]
```

Applications are as simple as lists. Note that we make no distinction between valid applications and invalid ones, and just tag both.

```
[(?fn . ?args)
`(:constraints
(,(collect-constraints fn constrs)
,@(map (cut collect-constraints <> constrs)
args))
,constrs)]
```

References are also straightforward:

That’s it! Now, this information can be exploited to select a case branch at compile time, and eliminate the overhead of performing the test again.

This is *really* easy to do in a compiler that already has constant folding of alternatives. All we have to do is associate constraints to truthy values. For instance:

That’s it! We check if the expression is in the set of known constraints, and if so, reduce it to true. Then, the constant folding code will take care of eliminating the redundant branches.

This is a really complicated question. The Urn core language, unfortunately, is a tad more complicated, as is the existing optimiser. Collecting constraints and eliminating tests would be in completely different parts of the compiler.

There is also a series of code motions that need to be in place for constraints to be propagated optimally, especially when panic edges are involved. Fortunately, these are all simple to implement, but it’s still a whole lot of work.

I don’t feel confident setting a specific timeframe for this, but I *will* post more blags on this topic. It’s fascinating (for me, at least) and will hopefully make the compiler faster!

Efficient compilation of pattern matching is not exactly an open problem in computer science in the same way that implementing say, type systems, might be, but it’s still definitely possible to see a lot of mysticism surrounding it.

In this post I hope to clear up some misconceptions regarding the implementation of pattern matching by demonstrating one such implementation. Do note that our pattern matching engine is strictly *linear*, in that pattern variables may only appear once in the match head. This is unlike other languages, such as Prolog, in which variables appearing more than once in the pattern are unified together.

Pattern matching always involves a pattern (the *match head*, as we call it) and a value to be compared against that pattern, the *matchee*. Sometimes, however, a pattern match will also include a body, to be evaluated in case the pattern does match.

As a side note, keep in mind that `case`

has linear lookup of match bodies. Though logarithmic or constant-time lookup might be possible, it is left as an exercise for the reader.

To simplify the task of compiling patterns to an intermade form without them we divide their compilation into two big steps: compiling the pattern’s test and compiling the pattern’s bindings. We do so *inductively* - there are a few elementary pattern forms on which the more complicated ones are built upon.

Most of these elementary forms are very simple, but two are the simplest: *atomic forms* and *pattern variables*. An atomic form is the pattern correspondent of a self-evaluating form in Lisp: a string, an integer, a symbol. We compare these for pointer equality. Pattern variables represent unknowns in the structure of the data, and a way to capture these unknowns.

Pattern | Test | Bindings |
---|---|---|

Atomic form | Equality | Nothing |

Pattern variable | Nothing | The matchee |

All compilation forms take as input the pattern to compile along with a symbol representing the matchee. Patterns which involve other patterns (for instance, lists, conses) will call the appropriate compilation forms with the symbol modified to refer to the appropriate component of the matchee.

Let’s quickly have a look at compiling these elementary patterns before looking at the more interesting ones.

Atomic forms are the simplest to compile - we merely test that the symbol’s value is equal (with `=`

, which compares identities, instead of with `eq?`

which checks for equivalence - more complicated checks, such as handling list equality, need not be handled by the equality function as we handle them in the pattern matching library itself) and emit no bindings.

```
(defun variable-pattern-test (pat sym)
`true)
(defun variable-pattern-bindings (pat sym)
(list `(,pat ,sym)))
```

The converse is true for pattern variables, which have no test and bind themselves. The returned bindings are in association list format, and the top-level macro that users invoke will collect these and them bind them with `let*`

.

Composite forms are a bit more interesting: These include list patterns and cons patterns, for instance, and we’ll look at implementing both. Let’s start with list patterns.

To determine if a list matches a pattern we need to test for several things:

- First, we need to test if it actually is a list at all!
- The length of the list is also tested, to see if it matches the length of the elements stated in the pattern
- We check every element of the list against the corresponding elements of the pattern

With the requirements down, here’s the implementation.

```
(defun list-pattern-test (pat sym)
`(and (list? ,sym) ; 1
(= (n ,sym) ,(n pat)) ; 2
,@(map (lambda (index) ; 3
(pattern-test (nth pat index) `(nth ,sym ,index)))
(range :from 1 :to (n pat)))))
```

To test for the third requirement, we call a generic dispatch function (which is trivial, and thus has been inlined) to compile the *n*th pattern in the list against the *n*th element of the actual list.

List pattern bindings are similarly easy:

```
(defun list-pattern-bindings (pat sym)
(flat-map (lambda (index)
(pattern-bindings (nth pat index) `(nth ,sym ,index)))
(range :from 1 :to (n pat))))
```

Compiling cons patterns is similarly easy if your Lisp is proper: We only need to check for `cons`

-ness (or `list`

-ness, less generally), then match the given patterns against the car and the cdr.

```
(defun cons-pattern-test (pat sym)
`(and (list? ,sym)
,(pattern-test (cadr pat) `(car ,sym))
,(pattern-test (caddr pat) `(cdr ,sym))))
(defun cons-pattern-bindings (pat sym)
(append (pattern-bindings (cadr pat) `(car ,sym))
(pattern-bindings (caddr pat) `(cdr ,sym))))
```

Note that, in Urn, `cons`

patterns have the more general form `(pats* . pat)`

(using the asterisk with the usual meaning of asterisk), and can match any number of elements in the head. It is also less efficient than expected, due to the nature of `cdr`

copying the list’s tail. (Our lists are not linked - rather, they are implemented over Lua arrays, and as such, removing the first element is rather inefficient.)

Now that we can compile a wide assortment of patterns, we need a way to actually use them to scrutinize data. For this, we implement two forms: an improved version of `destructuring-bind`

and `case`

.

Implementing `destructuring-bind`

is simple: We only have a single pattern to test against, and thus no search is nescessary. We simply generate the pattern test and the appropriate bindings, and generate an error if the pattern does not mind. Generating a friendly error message is similarly left as an exercise for the reader.

Note that as a well-behaving macro, destructuring bind will not evaluate the given variable more than once. It does this by binding it to a temporary name and scrutinizing that name instead.

```
(defmacro destructuring-bind (pat var &body)
(let* [(variable (gensym 'var))
(test (pattern-test pat variable))
(bindings (pattern-bindings pat variable))]
`(with (,variable ,var)
(if ,test
(progn ,@body)
(error! "pattern matching failure")))))
```

Implementing case is a bit more difficult in a language without `cond`

, since the linear structure of a pattern-matching case statement would have to be transformed into a tree of `if`

-`else`

combinations. Fortunately, this is not our case (pun intended, definitely.)

```
(defmacro case (var &cases)
(let* [(variable (gensym 'variable))]
`(with (,variable ,var)
(cond ,@(map (lambda (c)
`(,(pattern-test (car c) variable)
(let* ,(pattern-bindings (car c) variable)
,@(cdr c))))
cases)))))
```

Again, we prevent reevaluation of the matchee by binding it to a temporary symbol. This is especially important in an impure, expression-oriented language as evaluating the matchee might have side effects! Consider the following contrived example:

```
(case (progn (print! "foo")
123)
[1 (print! "it is one")]
[2 (print! "it is two")]
[_ (print! "it is neither")]) ; _ represents a wild card pattern.
```

If the matchee wasn’t bound to a temporary value, `"foo"`

would be printed thrice in this example. Both the toy implementation presented here and the implementation in the Urn standard library will only evaluate matchees once, thus preventing effect duplication.

Unlike previous blog posts, this one isn’t runnable Urn. If you’re interested, I recommend checking out the actual implementation. It gets a bit hairy at times, particularly with handling of structure patterns (which match Lua tables), but it’s similar enough to the above that this post should serve as a vague map of how to read it.

In a bit of a meta-statement I want to point out that this is the first (second, technically!) of a series of posts detailing the interesting internals of the Urn standard library: It fixes two things in the sorely lacking category: content in this blag, and standard library documentation.

Hopefully this series is as nice to read as it is for me to write, and here’s hoping I don’t forget about this blag for a year again.

]]>As some of you might know, Urn is my current pet project. This means that any potential upcoming blag posts are going to involve it in some way or another, and that includes this one. For the uninitiated, Urn is a programming language which compiles to Lua^{1}, in the Lisp tradition, with no clear ascendance: We take inspiration from several Lisps, most notably Common Lisp and Scheme.

As functional programmers at heart, we claim to be minimalists: Urn is reduced to 12 core forms before compilation, with some of those being redundant (e.g. having separate `define`

and `define-macro`

builtins instead of indicating that the definition is a macro through a parameter). On top of these primitives we build all our abstraction, and one such abstraction is what this post is about: *Delimited continuations*.

Delimited continuations are a powerful control abstraction first introduced by Matthias Felleisein^{2}, initially meant as a generalisation of several other control primitives such as `call-with-current-continuation`

from Scheme among others. However, whereas `call-with-current-continuation`

captures a continuation representing the state of the entire program after that point, delimited continuations only reify a slice of program state. In this, they are cheaper to build and invoke, and as such may be used to implement e.g. lightweight threading primitives.

While this may sound rather limiting, there are very few constructs that can simultaneously be implemented with `call-with-current-continuation`

without also being expressible in terms of delimited continuations. The converse, however, is untrue. While `call/cc`

be used to implement any control abstraction, it can’t implement any *two* control abstractions: the continuations it reifies are uncomposable^{3}.

Our implementation of delimited continuations follows the Guile Scheme tradition of two functions `call-with-prompt`

and `abort-to-prompt`

, which are semantically equivalent to the more traditional `shift`

/`reset`

. This is, however, merely an implementation detail, as both schemes are available.

We have decided to base our implementation on Lua’s existing coroutine machinery instead of implementing an ad-hoc solution especially for Urn. This lets us reuse and integrate with existing Lua code, which is one of the goals for the language.

`call-with-prompt`

is used to introduce a *prompt* into scope, which delimits a frame execution and sets up an abort handler with the specified tag. Later on, calls to `abort-to-prompt`

reify the rest of the program slice’s state and jump into the handler set up.

One limitation of the current implementation is that the continuation, when invoked, will no longer have the prompt in scope. A simple way to get around this is to store the prompt tag and handler in values and use `call/p`

^{4} again instead of directly calling the continuation.

Unfortunately, being implemented on top of Lua coroutines does bring one significant disadvantage: The reified continuations are single-use. After a continuation has reached the end of its control frame, there’s no way to make it go back, and there’s no way to copy continuations either (while we have a wrapper around coroutines, the coroutines themselves are opaque objects, and there’s no equivalent of `string.dump`

to, for instance, decompile and recompile them)

In my opinion (which, like it or not, is the opinion of the Urn team), Guile-style delimited continuations provide a much better abstraction than operating with Lua coroutines directly, which may be error prone and feels out of place in a functional-first programming language.

As a final motivating example, below is an in-depth explanation of a tiny cooperative task scheduler.

```
(defun run-tasks (&tasks) ; 1
(loop [(queue tasks)] ; 2
[(empty? queue)] ; 2
(call/p 'task (car queue)
(lambda (k)
(when (alive? k)
(push-cdr! queue k)))) ; 3
(recur (cdr queue)))) ; 4
```

We begin, of course, by defining our function. As inputs, we take a list of tasks to run, which are generally functions, but may be Lua coroutines (

`threads`

) or existing continuations, too. As a sidenote, in Urn, variadic arguments have`&`

prepended to them, instead of having symbols beginning of`&`

acting as modifiers in a lambda-list. For clarity, that is wholly equivalent to`(defun run-tasks (&rest tasks)`

.Then, we take the first element of the queue as the current task to run, and set up a prompt of execution. The task will run until it hits an

`abort-to-prompt`

, at which point it will be interrupted and the handler will be invoked.The handler inspects the reified continuation to see if it is suitable for being scheduled again, and if so, pushes it to the end of the queue. This means it’ll be the first task to execute again when the scheduler is done with the current set of working tasks.

We loop back to the start with the first element (the task we just executed) removed.

Believe it or not, the above is a fully functioning cooperative scheduler that can execute any number of tasks.^{5}

I think that the addition of delimited continuations to Urn brings a much needer change in the direction of the project: Moving away from ad-hoc abstraction to structured, proven abstraction. Hopefully this is the first of many to come.

Though this might come off as a weird decision to some, there is a logical reason behind it: Urn was initially meant to be used in the ComputerCraft mod for Minecraft, which uses the Lua programming language, though the language has outgrown it by now. For example, the experimental

`readline`

support is being implemented with the LuaJIT foreign function interface.↩Oleg Kiselyov demonstrates here that abstractions built on

`call/cc`

do not compose.↩`call-with-prompt`

is a bit of a mouthful, so the alias`call/p`

is blessed.↩There’s a working example here (with syntax highlighting) as runnable Urn. Clone the compiler then execute

`lua bin/urn.lua --run tasks.lisp`

.↩

```
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
module StatefulParsing where
import Control.Monad.State.Class
import Control.Applicative
```

In this post I propose an extension to the monadic parser framework introduced in a previous post, *You could have invented Parsec*, that extends the parser to also support embedded user state in your parsing.

This could be used, for example, for parsing a language with user-extensible operators: The precedences and fixidities of operators would be kept in a hashmap threaded along the bind chain.

Instead of posing these changes as diffs, we will rewrite the parser framework from scratch with the updated type.

`newtype`

Our new parser is polymorphic in both the return type and the user state that, so we have to update the `newtype`

declaration to match.

```
newtype Parser state result
= Parser { runParser :: String
-> state
-> Either String (result, state, String) }
```

Our tuple now contains the result of the parsing operation and the new user state, along with the stream. We still need to supply a stream to parse, and now also supply the initial state. This will be reflected in our functions.

For convenience, we also make a `Parser' a`

type alias for parsers with no user state.

Seeing as type constructors are also curried, we can apply η-reduction to get the following, which is what we’ll go with.

`Functor`

instanceThe functor instance remains mostly the same, except now we have to thread the user state around, too.

The instance head also changes to fit the kind signature of the `Functor`

typeclass. Since user state can not change from fmapping, this is fine.

```
fn `fmap` (Parser p) = Parser go where
go st us = case p st us of
Left e -> Left e
Right (r, us', st') -> Right (fn r, us', st')
```

As you can see, the new user state (`us'`

) is just returned as is.

`Applicative`

instanceThe new implementations of `pure`

and `<*>`

need to correctly manipulate the user state. In the case of `pure`

, it’s just passed as-is to the `Right`

constructor.

Since `(<*>)`

needs to evaluate both sides before applying the function, we need to pass the right-hand side’s generated user state to the right-hand side for evaluation.

```
(Parser f) <*> (Parser v) = Parser go where
go st us = case f st us of
Left e -> Left e
Right (fn, us', st') -> case v st' us' of
Left e -> Left e
Right (vl, us'', st'') -> Right (fn vl, us'', st'')
```

`Monad`

instanceSince we already have an implementation of `pure`

from the Applicative instance, we don’t need to worry about an implementation of `return`

.

The monad instance is much like the existing monad instance, except now we have to give the updated parser state to the new computation.

```
(Parser p) >>= f = Parser go where
go s u = case p s u of
Left e -> Left e
Right (x, u', s') -> runParser (f x) s' u'
```

`MonadState`

instanceSince we now have a state transformer in the parser, we can make it an instance of the MTL’s `MonadState`

class.

The implementation of `put`

must return `()`

(the unit value), and needs to replace the existing state with the supplied one. This operation can not fail.

Since this is a parsing framework, we also need to define how the stream is going to be affected: In this case, it isn’t.

The `get`

function returns the current user state, and leaves it untouched. This operation also does not fail.

Since we’re an instance of `MonadState`

, we needn’t an implementation of `modify`

and friends - They’re given by the MTL.

`Alternative`

instanceThe `Alternative`

instance uses the same state as it was given for trying the next parse.

The `empty`

parser just fails unconditionally.

`(<|>)`

will try both parsers in order, reusing both the state and the stream.

```
(Parser p) <|> (Parser q) = Parser go where
go st us = case p st us of
Left e -> q st us
Right v -> Right v
```

This was a relatively short post. This is because many of the convenience functions defined in the previous post also work with this parser framework, if you replace `Parser`

with `Parser'`

. You can now use `get`

, `put`

and `modify`

to work on the parser’s user state. As a closing note, a convenience function for running parsers with no state is given.