Recently I gave a short intro course on functional programming languages to people at work, introducing some basic comments. At the end, I added a very very tiny compiler to show how easy it is to create a compiler in Haskell.

I thought it might be interesting for the people out there to see it as well. As mentioned, it is very minimalistic. Keeping with the trend of the previous post, I will ensure this blogpost is proper literal haskell code.

So first we create our module. We also import the Control.Monad for liftM and liftM2. Finally, we import QuickCheck so we can create some easy tests at the end.

> module Alg where

> import Control.Monad

> import Test.QuickCheck

Next, we define the domain of our compiler. Namely, our compiler will compile simple arithmetic expressions, that can be nested arbitrarily deep, to a stack machine. An expression consists of either a simple number, the addition of two expressions, the multiplication of two expressions, or the substraction of two expressions. We add some standard typeclasses that allow us to easily work with them in the GHC interpreter (for instance Show to show them).

> data Exp = Simple Integer

> | Mul Exp Exp

> | Add Exp Exp

> | Sub Exp Exp

> deriving (Eq, Ord, Show)

Without compiling, we can already write a mini interpreter that interprets an expression straight away. One option of making this more generic is abstracting away the specific binary operator instead of creating a specific data-constructor for each, but I will leave that as an excercise.

> interpretExp (Simple i) = i

> interpretExp (Mul a b) = interpretExp a * interpretExp b

> interpretExp (Add a b) = interpretExp a + interpretExp b

> interpretExp (Sub a b) = interpretExp a - interpretExp b

Next we define the codomain, or the target, of our compiler. For this I have opted for a very simple stack machine with only four instructions. Either one pushes a number, or one applies an operator to the top two numbers on the stack. As for the stack, it is simply a list of numbers.

> data Op = OpPush Integer | OpAdd | OpMul |OpSub

> deriving (Eq, Ord, Show)

> type Stk = [Integer]

We can also write an ‘interpreter’ for this stack-based language. First, we write the function that interprets a single stack operation with a given stack and returns a new stack. For clarity sake, I have not included error code for when the stack is empty but numbers are required.

> interpret :: Stk -> Op -> Stk

> interpret s (OpPush i) = i:s

> interpret (a:b:s) OpAdd = (a+b):s

> interpret (a:b:s) OpSub = (a-b):s

> interpret (a:b:s) OpMul = (a*b):s

To run a set operations, one can simply fold over the list of operations, starting with an empty stack:

> run :: [Op] -> Stk

> run = foldl interpret []

Next, we define the compiler function. This compiles algebraic expressions to a list of stack operations. Notice to do this, first we calculate the two sub expressions, and then compile the operation in question:

> compile :: Exp -> [Op]

> compile (Simple i) = [OpPush i]

> compile (Mul a b) = compile b ++ compile a ++ [OpMul]

> compile (Add a b) = compile b ++ compile a ++ [OpAdd]

> compile (Sub a b) = compile b ++ compile a ++ [OpSub]

The code is now done, and in fact, one can simply type ‘ghci Alg.lhs” and try it out. However, we will add a quickcheck instance so we can test the correctness of the compiler. Simply, we require that interpreting an expression yields the same result as the top of the stack after compiling and interpreting the stack operations. To enable this, we first need to define a quickcheck instance for the domain, namely algebraic expressions. The code is a bit more complicated as it makes sure that it never generates infinite expression trees, so I will not explain it in detail. I suggest for those interested to check The quickcheck manual, or the haskell documentation.

> instance Arbitrary Exp where

> arbitrary = sized tree'

> where tree' 0 = liftM Simple arbitrary

> tree' n | n > 0 =

> oneof[liftM Simple arbitrary,

> liftM2 Mul subtree subtree,

> liftM2 Add subtree subtree,

> liftM2 Sub subtree subtree]

> where subtree = tree' (n `div` 2)

> coarbitrary (Simple i) =

> variant 0 . coarbitrary i

> coarbitrary (Mul a b) =

> variant 1 . coarbitrary a . coarbitrary b

> coarbitrary (Add a b) =

> variant 2 . coarbitrary a . coarbitrary b

> coarbitrary (Sub a b) =

> variant 3 . coarbitrary a . coarbitrary b

Now that we have an implementation that generates arbitrary algebraic expressions, it’s time to write our test case. Namely we always require (True ==>), that the result of interpretation of the algebraic expression is the same as the top of the stack after compiling and interpreting the stack operations. We could add an additional requirement that the stack only has 1 element remaining in it.

> prop_compile tree = True ==> (head $ run $ compile tree) == interpretExp tree

Well, I hope that was useful

A friend of mine suggested I add some examples. Once you have saved the above in a file named Alg.lhs, load it up in the interpreter with ‘ghci Alg.lhs’. Then you try it out:

> interpretExp (Add (Mul (Simple 5) (Simple 4)) (Simple 3))

23

> compile (Add (Mul (Simple 5) (Simple 4)) (Simple 3))

[OpPush 3,OpPush 4,OpPush 5,OpMul,OpAdd]

> let x = [OpPush 3,OpPush 4,OpPush 5,OpMul,OpAdd] in run x

[23]

Cheers!

As final note, please feel free to leave comments or questions.