Abstract Syntax Graphs for Domain Specific Languages

In 2014, Bruno Oliveira and Andres Loeh published their paper Abstract Syntax Graphs for Domain Specific Languages. In this paper, they look at a way to model embedded domain specific languages as a deep embedding, but crucially they leverage techniques from PHOAS in order to model recursive bindings. This addition enables us to perform observable sharing, and retain sharing when they manipulate the DSL.

Introduction

  • Embedded Domain Specific Languages (EDSLs) are specific programming languages built to solve a particular problem, but are embedded into the host language. This allows us to get a lot of functionality “for free” by inheriting it from the host. In our case, the host is Haskell.

  • EDSLs are often split into two extremes:

    Shallow embeddings
    • A very thin layer over the host language
    • Implement DSL constructs directly by their semantics
    • Lose the ability to inspect the syntax of the program
    Deep embeddings
    • Represent programs by their abstract syntax trees.
    • One or several semantics can be given by defining interpretation functions over the AST.
    • Transformations of the program are possible - just transform the AST.
  • In reality, we often need sharing that is observable and preservable, but we don’t get with ASTs.

    • We need to augment the AST with an explicit environment
    • Now we have to keep track of names and bindings, and the problem is now more difficult.
    • Need to deal with avoiding name capture in substitutions or having dangling references.
  • Rather than use abstract syntax trees, this paper advocates the use of abstract syntax graphs.

    • Realized using Oliveira and Cook’s work on structured graphs (see “Functional Programming with Structured Graphs”).
    • Enforce well-scopedness
    • Regain sharing
    • This paper extends work on structured graphs by considering typed syntax graphs - this is new research, previously only researched untyped abstract syntax.

Sharing in EDSLs

This section uses a small example language to demonstrate the differences between shallow and deep embeddings as well as the issues with representing sharing.

  • The language is really simple - expressions are either 1 or the addition of two expressions.
Shallow embedding

Directly encode the semantics:

type Expr = Int
one = 1
plus = (+)
eval = id
  • Simple to implement
  • We can use sharing from Haskell to get sharing in our EDSL:

    tree 0 = one
    tree n = let shared = tree (pred n) in shared `plus` shared`
  • Simplicity came at a cost of comitting to a single semantics.

Deep embedding

Directly encode the syntax

data Expr = One | Plus One One
one = One
plus = Plus
eval One = 1
eval (Plus l r) = eval l + eval r
  • Permits multiple interpretations - we can also pretty-print the same Expr.
  • No ability for sharing - eval (and friends) are now exponential, even though we’d like to have sharing as in tree.
Sharing via HOAS

In order to recover sharing, we have to explicitly mention it as syntax. As an initial attempt, we try and use HOAS and add a new constructor:

data Expr = ... | Let Expr (Expr -> Expr)

Problems arise here when we try and write an evaluator. In the case where we evaluate Let we only have a way to evualate to Int - but we have to feed an Expr into the body of the let binding. Thus we need some way to reflect data from the host back into the EDSL.

In this case, rather than battling with that we go straight to PHOAS. Now Expr takes a type parameter to indicate the type of variables, and has two new constructors - variable reference through Var and variable introduction through Let:

data Expr a = One | Add (Expr a) (Expr a) | Var a | Let (Expr a) (a -> Expr a )

And we universally quantify over a to reach ClosedExpr, as is customary in PHOAS

type ClosedExpr = forall a. Expr a

Now, provided we use Let, we will have sharing in our evaluator.

Due to the universal quantification of a, we can still write different evaluators that instantiate a however they please.

(Mutual) Recursion

In this section, we will extend the solution to sharing to recursive and mutually recursive bindings.

  • They begin by extending the language with IfZero, Lam and App.

  • Lam and App are the standard encoding of lambda abstraction and application in PHOAS.

  • Recursive functions can be encoded exactly like Lam, but we will need to interpret them differently. They introduce a new constructor, Mu, and an interpretation:

    data Expr a = ... | Mu (a -> Expr a)
    
    eval (Mu f) = fix (eval . f)
  • Mu is sufficient for expressing simple (general?) recursion, but doesn’t let us express mutually recursive bindings.

    • We look at the example of defining even and odd predicates, as mutually recursive predicates:

      even x = if x == 0 then True else odd (pred x)
      odd  x = if x == 0 then False else even (pred x)
    • Instead, we need the ability to introduce multiple names at once.
    • Add a new constructor - LetRec.

      data Expr a = ... | LetRec ([a] -> [Expr a]) ([a] -> Expr a)

    There is an inmportant invariant here - all three lists must be the same length. The author’s solve this problem in section 5.

  • There is a brief discussion on the inconvenience of forcing EDSL users into a different form of let binding. It would be good if we could simply allow them to use Haskell’s let construct - but as we saw, that give the wrong form of sharing.

    • We can use data-reify to inspect the sharing let builds in an impure-manner.

    • data-reify turns a structure using sharing into an explicit Graph with numbered nodes. This is fairly messy, but we can transform this graph back into ClosedExpr with an ordinary function.

  • The section closes by considering some situations where the work so far is not satisfactory:

    • If the target language has a type system, then we would like to work with a data type of well typed terms.

    • LetRec requires the programmer maintain both the invariant of list lengths, but also be careful with laziness when defining bindings.

Typed Lists

This section presents typed lists. Typed lists are a generalization of both homogeneous and heterogeneous lists of statically known length.

  • Section options we a standard definition of heterogeneous lists, with elements under some functor:

    data TList :: (k -> *) -> [k] -> *
      TNil :: TList f '[]
      (:::) :: f t -> TList f ts -> TList f (t ': ts)

    (I have taken the liberty of updating this to use DataKinds).

  • The opening paragraph mentions that typed lists can be heterogeneous or homogeneous - we see this variation is influenced by the choice of functor:

    • TList Identity models heterogeneous lists.
    • TList (Const a) is a representation of [a].
  • The section proceeds to demonstrate how one can write thead (totally), ttail, tlength, tmap and tzipWith. These are fairly clear concepts to me by now - we covered all of these in Andres’ Skills Matter course, but they are also prevalent in generics-sop and tilt.

  • One function that is somewhat novel to me is an extension of iterate to TList.

    • While iterate normally produces lists of infinite length, in our case the length is determined by the list of types to TList. Therefore, we need a way to indicate the desired amount of iterations. We do so by providing a “shape” argument to iterate, where the shape is given by values under.

      data RList :: [k] -> * where
        RNil :: RList '[]
        RCons :: RList ts -> RList (t ': ts)
    • It’s annoying to have to actually construct RLists, when the result already specifies what the shape is. To work around this, the authors define a type class to implicitly generate RLists on behalf of the user.

Typed ASGs and DSLs