# Abstract Syntax Graphs for Domain Specific Languages

In 2014, Bruno Oliveira and Andres Loeh published their paper Abstract Syntax Graphs for Domain Specific Languages. In this paper, they look at a way to model embedded domain specific languages as a deep embedding, but crucially they leverage techniques from PHOAS in order to model recursive bindings. This addition enables us to perform observable sharing, and retain sharing when they manipulate the DSL.

## Introduction

*Embedded Domain Specific Languages*(*EDSLs*) are specific programming languages built to solve a particular problem, but are*embedded*into the host language. This allows us to get a lot of functionality “for free” by inheriting it from the host. In our case, the host is Haskell.EDSLs are often split into two extremes:

- Shallow embeddings
- A very thin layer over the host language
- Implement DSL constructs directly by their semantics
- Lose the ability to inspect the
*syntax*of the program

- Deep embeddings
- Represent programs by their
*abstract syntax tree*s. - One or several semantics can be given by defining
*interpretation functions*over the AST. - Transformations of the program are possible - just transform the AST.

- Represent programs by their

In reality, we often need sharing that is

*observable*and*preservable*, but we don’t get with ASTs.- We need to augment the AST with an explicit
*environment* - Now we have to keep track of names and bindings, and the problem is now more difficult.
- Need to deal with avoiding
*name capture in substitutions*or having dangling references.

- We need to augment the AST with an explicit
Rather than use abstract syntax

*trees*, this paper advocates the use of abstract syntax*graphs*.- Realized using Oliveira and Cook’s work on
*structured graphs*(see “Functional Programming with Structured Graphs”). - Enforce well-scopedness
- Regain sharing
- This paper extends work on
*structured graphs*by considering*typed*syntax graphs - this is new research, previously only researched*untyped*abstract syntax.

- Realized using Oliveira and Cook’s work on

## Sharing in EDSLs

This section uses a small example language to demonstrate the differences between shallow and deep embeddings as well as the issues with representing sharing.

- The language is really simple - expressions are either
`1`

or the addition of two expressions.

- Shallow embedding
Directly encode the semantics:

`type Expr = Int one = 1 plus = (+) eval = id`

- Simple to implement
We can use sharing from Haskell to get sharing in our EDSL:

`tree 0 = one tree n = let shared = tree (pred n) in shared `plus` shared``

Simplicity came at a cost of comitting to a

*single*semantics.

- Deep embedding
Directly encode the syntax

`data Expr = One | Plus One One one = One plus = Plus eval One = 1 eval (Plus l r) = eval l + eval r`

- Permits multiple interpretations - we can also pretty-print the same
`Expr`

. - No ability for sharing -
`eval`

(and friends) are now exponential, even though we’d like to have sharing as in`tree`

.

- Permits multiple interpretations - we can also pretty-print the same
- Sharing via HOAS
In order to recover sharing, we have to explicitly mention it as syntax. As an initial attempt, we try and use HOAS and add a new constructor:

`data Expr = ... | Let Expr (Expr -> Expr)`

Problems arise here when we try and write an evaluator. In the case where we evaluate

`Let`

we only have a way to evualate to- but we have to feed an`Int`

`Expr`

into the body of the let binding. Thus we need some way to reflect data from the host back into the EDSL.In this case, rather than battling with that we go straight to PHOAS. Now

`Expr`

takes a type parameter to indicate the type of variables, and has two new constructors - variable reference through`Var`

and variable introduction through`Let`

:`data Expr a = One | Add (Expr a) (Expr a) | Var a | Let (Expr a) (a -> Expr a )`

And we universally quantify over

`a`

to reach`ClosedExpr`

, as is customary in PHOAS`type ClosedExpr = forall a. Expr a`

Now, provided we use

`Let`

, we will have sharing in our evaluator.Due to the universal quantification of

`a`

, we can still write different evaluators that instantiate`a`

however they please.

## (Mutual) Recursion

In this section, we will extend the solution to sharing to recursive and mutually recursive bindings.

They begin by extending the language with

`IfZero`

,`Lam`

and`App`

.`Lam`

and`App`

are the standard encoding of lambda abstraction and application in PHOAS.Recursive functions can be encoded exactly like

`Lam`

, but we will need to interpret them differently. They introduce a new constructor,`Mu`

, and an interpretation:`data Expr a = ... | Mu (a -> Expr a) eval (Mu f) = fix (eval . f)`

`Mu`

is sufficient for expressing simple (general?) recursion, but doesn’t let us express*mutually recursive*bindings.We look at the example of defining

`even`

and`odd`

predicates, as mutually recursive predicates:`even x = if x == 0 then True else odd (pred x) odd x = if x == 0 then False else even (pred x)`

- Instead, we need the ability to introduce multiple names at once.
Add a new constructor -

`LetRec`

.`data Expr a = ... | LetRec ([a] -> [Expr a]) ([a] -> Expr a)`

There is an inmportant invariant here - all three lists must be

*the same length*. The author’s solve this problem in section 5.There is a brief discussion on the inconvenience of forcing EDSL users into a different form of

`let`

binding. It would be good if we could simply allow them to use Haskell’s`let`

construct - but as we saw, that give the wrong form of sharing.We can use

`data-reify`

to inspect the sharing`let`

builds in an impure-manner.`data-reify`

turns a structure using sharing into an explicit`Graph`

with numbered nodes. This is fairly messy, but we can transform this graph back into`ClosedExpr`

with an ordinary function.

The section closes by considering some situations where the work so far is not satisfactory:

If the target language has a type system, then we would like to work with a data type of

*well typed*terms.`LetRec`

requires the programmer maintain both the invariant of list lengths, but also be careful with laziness when defining bindings.

## Typed Lists

This section presents

typed lists. Typed lists are a generalization of both homogeneous and heterogeneous lists of statically known length.

Section options we a standard definition of heterogeneous lists, with elements under some functor:

`data TList :: (k -> *) -> [k] -> * TNil :: TList f '[] (:::) :: f t -> TList f ts -> TList f (t ': ts)`

(I have taken the liberty of updating this to use

`DataKinds`

).The opening paragraph mentions that typed lists can be heterogeneous

*or*homogeneous - we see this variation is influenced by the choice of functor:`TList Identity`

models heterogeneous lists.`TList (Const a)`

is a representation of`[a]`

.

The section proceeds to demonstrate how one can write

`thead`

(totally),`ttail`

,`tlength`

,`tmap`

and`tzipWith`

. These are fairly clear concepts to me by now - we covered all of these in Andres’ Skills Matter course, but they are also prevalent in`generics-sop`

and`tilt`

.One function that is somewhat novel to me is an extension of

`iterate`

to`TList`

.While

`iterate`

normally produces lists of infinite length, in our case the length is determined by the list of types to`TList`

. Therefore, we need a way to indicate the desired amount of iterations. We do so by providing a “shape” argument to iterate, where the shape is given by values under.`data RList :: [k] -> * where RNil :: RList '[] RCons :: RList ts -> RList (t ': ts)`

It’s annoying to have to actually construct

`RList`

s, when the result already specifies what the shape is. To work around this, the authors define a type class to implicitly generate`RList`

s on behalf of the user.