blog.poucet.org Rotating Header Image

Simplistic compiler in Haskell

Recently I gave a short intro course on functional programming languages to people at work, introducing some basic comments. At the end, I added a very very tiny compiler to show how easy it is to create a compiler in Haskell.

I thought it might be interesting for the people out there to see it as well. As mentioned, it is very minimalistic. Keeping with the trend of the previous post, I will ensure this blogpost is proper literal haskell code.


So first we create our module. We also import the Control.Monad for liftM and liftM2. Finally, we import QuickCheck so we can create some easy tests at the end.


> module Alg where
> import Control.Monad
> import Test.QuickCheck

Next, we define the domain of our compiler. Namely, our compiler will compile simple arithmetic expressions, that can be nested arbitrarily deep, to a stack machine. An expression consists of either a simple number, the addition of two expressions, the multiplication of two expressions, or the substraction of two expressions. We add some standard typeclasses that allow us to easily work with them in the GHC interpreter (for instance Show to show them).


> data Exp = Simple Integer
> | Mul Exp Exp
> | Add Exp Exp
> | Sub Exp Exp
> deriving (Eq, Ord, Show)

Without compiling, we can already write a mini interpreter that interprets an expression straight away. One option of making this more generic is abstracting away the specific binary operator instead of creating a specific data-constructor for each, but I will leave that as an excercise.


> interpretExp (Simple i) = i
> interpretExp (Mul a b) = interpretExp a * interpretExp b
> interpretExp (Add a b) = interpretExp a + interpretExp b
> interpretExp (Sub a b) = interpretExp a - interpretExp b

Next we define the codomain, or the target, of our compiler. For this I have opted for a very simple stack machine with only four instructions. Either one pushes a number, or one applies an operator to the top two numbers on the stack. As for the stack, it is simply a list of numbers.


> data Op = OpPush Integer | OpAdd | OpMul |OpSub
> deriving (Eq, Ord, Show)
> type Stk = [Integer]

We can also write an ‘interpreter’ for this stack-based language. First, we write the function that interprets a single stack operation with a given stack and returns a new stack. For clarity sake, I have not included error code for when the stack is empty but numbers are required.


> interpret :: Stk -> Op -> Stk
> interpret s (OpPush i) = i:s
> interpret (a:b:s) OpAdd = (a+b):s
> interpret (a:b:s) OpSub = (a-b):s
> interpret (a:b:s) OpMul = (a*b):s

To run a set operations, one can simply fold over the list of operations, starting with an empty stack:


> run :: [Op] -> Stk
> run = foldl interpret []

Next, we define the compiler function. This compiles algebraic expressions to a list of stack operations. Notice to do this, first we calculate the two sub expressions, and then compile the operation in question:


> compile :: Exp -> [Op]
> compile (Simple i) = [OpPush i]
> compile (Mul a b) = compile b ++ compile a ++ [OpMul]
> compile (Add a b) = compile b ++ compile a ++ [OpAdd]
> compile (Sub a b) = compile b ++ compile a ++ [OpSub]

The code is now done, and in fact, one can simply type ‘ghci Alg.lhs” and try it out. However, we will add a quickcheck instance so we can test the correctness of the compiler. Simply, we require that interpreting an expression yields the same result as the top of the stack after compiling and interpreting the stack operations. To enable this, we first need to define a quickcheck instance for the domain, namely algebraic expressions. The code is a bit more complicated as it makes sure that it never generates infinite expression trees, so I will not explain it in detail. I suggest for those interested to check The quickcheck manual, or the haskell documentation.


> instance Arbitrary Exp where
> arbitrary = sized tree'
> where tree' 0 = liftM Simple arbitrary
> tree' n | n > 0 =
> oneof[liftM Simple arbitrary,
> liftM2 Mul subtree subtree,
> liftM2 Add subtree subtree,
> liftM2 Sub subtree subtree]
> where subtree = tree' (n `div` 2)
> coarbitrary (Simple i) =
> variant 0 . coarbitrary i
> coarbitrary (Mul a b) =
> variant 1 . coarbitrary a . coarbitrary b
> coarbitrary (Add a b) =
> variant 2 . coarbitrary a . coarbitrary b
> coarbitrary (Sub a b) =
> variant 3 . coarbitrary a . coarbitrary b

Now that we have an implementation that generates arbitrary algebraic expressions, it’s time to write our test case. Namely we always require (True ==>), that the result of interpretation of the algebraic expression is the same as the top of the stack after compiling and interpreting the stack operations. We could add an additional requirement that the stack only has 1 element remaining in it.


> prop_compile tree = True ==> (head $ run $ compile tree) == interpretExp tree

Well, I hope that was useful :)

A friend of mine suggested I add some examples. Once you have saved the above in a file named Alg.lhs, load it up in the interpreter with ‘ghci Alg.lhs’. Then you try it out:


> interpretExp (Add (Mul (Simple 5) (Simple 4)) (Simple 3))
23
> compile (Add (Mul (Simple 5) (Simple 4)) (Simple 3))
[OpPush 3,OpPush 4,OpPush 5,OpMul,OpAdd]
> let x = [OpPush 3,OpPush 4,OpPush 5,OpMul,OpAdd] in run x
[23]

Cheers!


As final note, please feel free to leave comments or questions.

Be Sociable, Share!

4 Comments

  1. Conal says:

    Cool!

    For your QuickCheck property, I think you’d also want to know that the stack has only one element. Also, what’s the “True ==>” for? How about this replacement:

    prop_compile tree = run $ compile tree == [interpretExp tree]

  2. Christophe (vincenz) says:

    Hello,

    Thanks for your suggestion regarding the (run $) bit. Not being that familiar with quickcheck yet, I presupposed it was required to start them with a condition and therefore “True” was such a simple condition.

    Regarding the stack, indeed it should only have one element, though at the time I was more concerned with the right value, not so much the right stack behaviour.

  3. Kyle Smith says:

    Sweet code.

    This is Kyle of `Scheme from the Florida Keys’. I just finished pulling off your compiler in Haskell, and it worked like a charm; but only after I realized that .lhs files are written as a form of literate programming. I’m just starting my adventures with Haskell, and the fact that it has this facility makes it a winner in my book already. The only part I have any issue with is that white space matters in Haskell, and the last time I worked in a language like that was RPG on an IBM 360. Clearly, there is no comparison; I used punched cards to write RPG programs, which made it simpler to have everything in the correct column, but RPG makes Basic seem like a sophisticated language. Except for the twenty some odd years I spent as an independent consultant, where I wrote in C/C++, I’ve been spending all my time hacking up Scheme code. Which means I’m not used to worrying about types. I’ll have to switch gears to work with Haskell’s strict typing.

    I stumbled on to your site after noticing we share an almost identical name for an article on delimited continuations, and when I got here, I noticed that you had a link to my blog. So I figured I would return the favor, after I read several of your articles. Very well done.

    So, I noticed that you use vim. Do you have any tips on how to optimize vim for coding Haskell; i.e., an auto-indent mode that will make my life easier, by keeping everything aligned for me? Whenever I find myself on a *nix box I just naturally use vi/vim, since I learned it back in the early 90′s while in grad school. Everyone else was religious about Emacs, and it is an excellent program, but vi is somehow hardwired into my fingers, and I really dislike chorded key strokes.

    Keep the Haskell articles coming, they make for excellent reading, coming from someone just trying to get the hang of the language.

    Well, I just thought I’d be neighborly, and leave a comment. Always nice to see other people who appreciate the fine art of coding up a program. I only wish Scheme had such a convenient method of incorporating pros into executable code. I may become a Haskell convert.

    Take care,

    –kyle

  4. Christophe (vincenz) says:

    Hello Kyle,

    Indeed, I think I had your article on delimited continuations in the back of my mind while I wrote mine. I didn’t realize that it was an identical title :)

    For Haskell, you do not need to use whitespace if you do not want to. They implicitly desugar to ‘{}’ and ‘;’; for marking the scope and for separating parts in a scope, respectively. I’d show an example here, but I fear the comment-formatting will mess it up.

    I also came from the C++ world, went through various ‘scripting languages’ (Python, Ruby), then onto O’Caml before finally making the switch to Haskell. I feared that Haskell would be slow (lazyness and all that), which is what led me to stick to O’Caml for so long, but in the end I could not resist out on all the great features (proper first class dataconstructors, type classes, etc…) and my fear of performance was ill-placed.

    As for indentation, I do not use any special scripts, I just tend to align on tabs (my tab-spacing = 2). Fortunately, most keywords (let/where) are 5 spaces, so I get the freespace at the end. Perhaps I could wrote a short article to show how I style my typical Haskell code. In general I do not require a lot of nesting either. I’ll have:
    foo x = …. [newline][indent]where bar y = ..[newline][alignwithbar]dum z = …

    As for monadic code, it’s just

    foo y = do[newline][indent]step1[newline][indent]s step2…

    I hope it is clear, for I can not use the ‘pre’ marker in this comment field. I might just show some code snippets in a following post.

    As for Scheme, I learned it in my first year of university (I was one of the lucky ones before they made the switch to Java). From a theoretical standpoint I very much admire the language, for it seems perfect to explore semantics in. However, I have not yet been able to code anything significant in it. I quickly get lost in parentheses and lack of types.

    Thanks for the comment :)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>