Rotating Header Image

Flatten Benchmark

In this blog article, we look at the benchmarks as proposed in Kyle’s Smith’s blog regarding the flattening of lists in Scheme. However, we perform the same experiment in Haskell. Obviously, due to typing reasons, this will not be done on lists but N-ary trees. It should be interesting to see how these algorithms perform in Haskell, especially due to the very different semantics caused by laziness.

In fact, a very common idiom in Scheme, or even SML or O’Caml, namely ensuring tail-call recursion, is not always a good idea in Haskell when working with data-structures. They are too strict in a sense, and as such they do not work well on infinite data-structures.

The first thing to do in Haskell, of course, is to define a module. We also define the data-type that will hold a list to be flattened. In this case, it is an N-ary tree where the options for the data-type is sumtype of a child of a single element, or a parent of a list of this sum-type. To keep it concise, I will give the data-constructors short names.

> module Flatten where
> import Control.Monad(liftM, liftM2)
> import CPUTime(getCPUTime)
> import Data.List(foldl')
> import System(getArgs)
> data Tree a = C a | P [Tree a]

As example, we show the encoding of a one of the ‘lists’ from Kyle’s blog. The notation is a bit verbose, but one could easily add a read-instance to fix this. Since this is not crucial, I will not do so at the moment (let’s just say, yay for VIM and some regexp-fu to turn the data into Haskelly data).

(define abc ‘(a (b (c (d (e (f (g (h (i (j (k (l (m n) o) p) q) r) s) t) u) v) w) x) y) z)) ; would become:

> abc = P [C 'a', P [C 'b', P [C 'c', P [C 'd', P [C 'e', P [C 'f', P [C 'g', 
> P [C 'h', P [C 'i', P [C 'j', P [C 'k', P [C 'l', P [C 'm', C 'n'], C 'o'], C 'p'], C 'q'], C 'r'], C 's'],
> C 't'], C 'u'], C 'v'], C 'w'], C 'x'], C 'y'], C 'z']
> flat = P $ map C $ [1..26]
> flat_abc = P $ map C $ ['a'..'z']
> tree = P [C 'a', P [C 'b', P [C 'd', P [C 'i', C 'q', C 'r'], P [C 'j', C 's', C 't']],
> P [C 'f', P [C 'k', C 'u', C 'v'], P [C 'l', C 'w', C 'x']]], P [C 'c', P [C 'g', P [C 'm', C 'y', C 'z'],
> P [C 'n', C 'a', C 'b']], P [C 'h', P [C 'o', C 'c', C 'd' ], P [C 'p', C 'e', C 'f']]]]
> treec = P [C 'a', P [C 'b', P [C 'd', P [C 'i', C 'q', C 'r'], P [C 'j', C 's', C 't']], P [C 'f', P [C 'k', C 'u', C 'v'],
> P [C 'l', C 'w', C 'x']]], P [C 'c', P [C 'g', P [C 'm', C 'y', C 'z'], P [C 'n', C 'a', C 'b']], P [C 'h', P [C 'o', C 'c', C 'd'], P [C 'p', C 'e', C 'f']]]]

All the algorithms will by definition be slightly different as we have one more case to check due to the design of our data-type. First of all, instead of checking on whether the input is null, we will instead check when we have a child element. And secondly, because each level of the tree is a pure list, we can use standard list operations like map and fold.

Looking through the algorithms, a lot use side-effects or reversing at the end. Obviously, side-effects are not a standard idiom in Haskell. Reversing is not a good idiom for Haskell programming, because reversing tends to destroy laziness. As such, the version that is most straightforward to code in Haskell is the one that uses ‘cons’ with an accumulator. The version in scheme required reversing at teh end. We do not need to reverse the list at the end because we use ‘foldr’ while the algorithms in scheme used a fold-left mechanism. The reason why we want foldr is because it respects laziness, while the reason why schemers and mlers want foldl is because it is tail-call recursive.

I will add type-declarations to the function but these are purely for documentation purposes and are not actually required for Haskell. The scheme version uses a mutation on the result variable. Instead of doing this, we will take a more Haskelly approach (I was going to say pure, in the purely functional, but that could be misconstrued to mean pure as in purity of algorithm) and thread this result through the calls.

> flatten :: Tree a -> [a]
> flatten t = f t [] -- Call helper with the tree and an empty list
> -- The case we have list of nodes
> where f (P l) r = foldr f r l
> -- The case where we have a single element
> f (C a) r = a:r

To test we simply run ‘ghci Flatten.lhs’ and then when we run ‘flatten abc’ we get “abcdefghijklmnopqrstuvwxyz. We also see that this definition respects lazyness in the horizontal (but therefore not perse the vertical) direction. Namely, we construct an infinite flat tree of elements and see whether it succeeds in flattening. We will also construct an infinite vertical tree (obviously with some elements at eachc level, or it will take an infinite amount of time to reach the first element.

> infinite_flat_tree = P $ map C [0..]
> infinite_vert_tree = build [0..]
> where build (x:xs) = P [C x, build xs]

Indeed, if we load Flatten.lhs in ghci and run flatten, everything works out fine (make sure to do something like: ‘take 10 $ flatten infinite_flat_tree’, or your terminal will spam a lot of numbers). Similarly, it also respects the infinitely vertical tree (which the reader should note, has been built up in a way to ensure the creation of the tree itself is also lazy).

Originally I had intended to try out all the different algorithms that were written in Kyle’s blog, but it seems most of the trade-off is between using set! as well as reversing and other techniques to circumvent this. However, when using foldr and lazy structures, as long as you make sure to respect laziness, this is not required and there’s only one way of coding this function, which is probably the way anyone would write it in Haskell. If any of the readers have feedback on other versions that might be applicable for Haskell, I am definitely interested in seeing their version.

Therefore, to give this article any meaningful content, we will add some benchmarks. My computer is not so powerful, it is a simple Dell Latitude D610 laptop with 1GB of memory and a Centrino of 1.6Ghz.

First we need to generate some data-sets besides the one above. The first one was far too small and infinite data-sets will have obv
ious results if computed completely. To ensure the calculation actually occurs, I will get the last element of the computed list to force the entire list to be generated. To measure how complicated an N-ary tree is, Kyle introduces the concept of complexity. I will translate this to Haskell (If anyone spots algorithmic differences, feel free to point them out).

> complexity :: Tree a -> Integer
> complexity (C x) = 1
> complexity (P l) = foldl' (\c e -> 1 + c + complexity e) 0 l
> buildTestTree :: [Tree a] -> Int -> Tree a
> buildTestTree ts n = P $ take n $ cycle ts -- Cycle through the different
> -- trees, take only 'n' in total
> -- and create a parent node
> -- Thanks apfelmus from #haskell (P.s: getCPUTime returns picoseconds)
> time m = liftM2 subtract getCPUTime (m >> getCPUTime)

Due to laziness, it is important that we force the creation of the input tree before actually timing the flattening of the tree. But since we need to know the complexity of the tree before we try to flatten it, we can let the complexity function force the creation of the tree. Therefore, to get a test,case, we simply get the following code:

> runTest :: (Show a) => Tree a -> IO ()
> runTest x = do
> putStr "Complexity: "
> t1 <- time $ print $ complexity $ x -- Force the creation of the tree
> putStr "Last element: "
> t2 <- time $ print $ last $ flatten x -- Time the flatten function
> putStrLn $ "Times: " ++ show t1 ++ ", " ++ show t2

Finally, we can tie it all together with a little main that will allow us to run this program to test different complexities. While the scheme version allowed the user to choose which subtrees to use to build up the big tree, I will not give this option as in the end, the only metric used was complexity. I would like to briefly point out, however, that it might be interesting to test deeper trees, as the trees seem to have been limited to the depth of trees defined at the beginning.

> main :: IO ()
> main = do
> n <- liftM (read . head) getArgs
> let t = buildTestTree [abc, flat_abc, treec] n
> sequence_ $ replicate 10 $ runTest t -- Run timing test 10 times

The results can be seen below in the figure. I hope this was useful.

Be Sociable, Share!


  1. Neil Mitchell says:

    GHC -O3 is worse than GHC -O0 in most cases. Use GHC -O2 and rerun the benchmarks :)

  2. Antoine says:

    A lot of your code appears truncated, perhaps by your blog layout (in Firefox).

  3. Christophe (vincenz) says:

    @neil: I will try to do it soon, (it was a lot of manual hackery and a lot of waiting for the benchmarks, so I’ll have to wait until I have a window of time to do it).

    @antoine: Indeed I use firefox and just noticed I have the same problem. I’ve switched back to my old template which should at least give you more screen-estate, hope that helps.

  4. Kyle Smith says:


    I getting the following error:

    kylesmith:~/ghc$ ghc –make flatten-benchmark.lhs -o flatten-benchmark
    [1 of 1] Compiling Flatten ( flatten-benchmark.lhs, flatten-benchmark.o )

    The last statement in a ‘do’ construct must be an expression

    In my source line 83 is the last line of this:
    > runTest :: (Show a) => Tree a -> IO ()
    > runTest x = do
    > putStr “Complexity: “
    > t1 < - time $ print $ complexity $ x -- Force the creation of the tree
    > t2 < - time $ print $ last $ flatten x -- Time the flatten function

    Does any thing look out of order to you?



  5. Christophe (vincenz) says:

    Hello Kyle,

    Indeed, you must remember to use proper indentation. Namely, the statements in the do-block should be indented more than the do itself, so the lexer can transform it to do {stm1; stm2; stm3}.

    I doublechecked the article and it seems that the indentation is correct, perhaps an issue while copy-pasting the text?

    Additionally, I forgot to mention that you should use as well -main-is Flatten (Yes, that’s a small quirk of ghc, that command only starts with one -). This tells the compiler that it should add a main to the runtime that calls the ‘main’ function in that module. (If you already compiled the code and have a .o file you’ll have to remove the .o file or it’ll complain about the lack of this low-level main function).

    Hope that helps! If you have more questions/comments, don’t hesitate to ask.

  6. Peter Berry says:

    Here is a more general fold with flattening as a special case. No idea if this is faster, but you asked for alternative implementations.

    > import Data.Foldable
    > import Data.Monoid
    > instance Foldable Tree where
    > foldMap f (C a) = f a
    > foldMap _ (P []) = mempty
    > foldMap f (P (x:xs)) = foldMap f x `mappend` foldMap f (P xs)

    Then we have:

    > flatten :: Foldable f => f a -> [a]
    > flatten = foldMap (:[])

    or even, given a suitable instance MonadPlus m => Monoid (m a):

    > mflatten :: (Foldable f, MonadPlus m) => f a -> m a
    > mflatten = foldMap return

    (flatten is just a special case of mflatten, btw.)

  7. Christophe (vincenz) says:


    Thank you for this other proposition. I think that your case is a more cleanly designed generalisation of the rather trivial algorithm that I placed in the blog. I admit I wasn’t aware of Data.Foldable, but indeed your implementation is vastly more useful and generic.

    Of course other type-classes can be defined as well, for instance the Functor one.

  8. Kyle Smith says:

    Sorry about the long delay, but I kept expecting to get an email reply from you, and it appears it somehow got directed to my seldom used gmail account, which coincidentally I was having trouble logging into, so my home page never showed any email on that account.

    Well, anyway, the codes indentation was mangled in the `Leave your comment’ box, because you can’t specify <pre> tags in comments on blogger. So it wasn’t the indentation, it turned out to be my misinterpreting how .lhs files work. You have a couple of lines that have a `–’ after some code. So, me being new to Haskell assumed that everything after that was a comment. But, in fact, that is precisely where you placed additional code; e.g., `t1 < - time $ ... $ x -- ... > putStrLn …’. I naturally assumed everything after the — was a comment, regardless of what looked like code, since thats the way it works with every other language I’ve ever used. I’ve looked at your blog with IE7, Safari, and FireFox, and they all show the same thing. You might need a couple of <br />’s in a few places to get the code setup properly to be copied directly off the screen. Of course, I’m sure any Haskell programmer would have spotted the formatting issue straight away, and simply added a LF where appropriate, but being new I took everything literally.

    Well anyway, I ended up striping out all your commentary and made a plain .hs file out of it and it compiled, after I removed some tabs which had crept in, because `expandtab’ was off in vim. I hate tabs, and evidently GHC has the same opinion. Once I knew what the real issue was, I simply corrected the original .lhs, so that I have your comments preserved.

    So I ran it, but since its just showing the one function I don’t know how to interpret it in any meaningful way, other than to say it looks to be fairly similar, in that, with just the overview graph, the small complexities appear to have nearly constant execution time, and then they reach a critical point (2.5E^8 as best I can make out), at which point the execution time begins to rise rather sharply, but looking linear non-the-less. I’ve come to believe that the critical point is when the virtual memory manager starts thrashing.

    Clearly Haskell has a more efficient representation of homogenous lists, than the heterogeneous lists of Scheme, as your able to reach complexities of something like 1.5xE^9, if I’m reading your graph correctly. And you were running with a mere 1GB of installed memory as I recall. That’s quite impressive, given that I was only able to get out to 3.86E^8 on a 4GB Mac Pro using `MzScheme’ or `Petite Chez Scheme’.

    I think their is some discrepancy in how we interpret n, your argument to Flatten. My interpretation of n is the number of list-segments cons’d together, each list segment being composed of the three (3) sub-lists `abc’, `flat-abc’ and `tree’.

    I checked your function for calculating complexity against a single instance of the `abc’ list, and it correctly computes 65. However, with my interpretation of n, a value of n equal to 1E^6 should give you a complexity of 1.93E^8. Where as, you calculate a complexity of only 6.5E^7.

    NOTE: Commas added by me for clarity.
    kylesmith:~/ghc$ flatten-benchmark 1,000,000
    Complexity: 65,000,000

    Where as with n = 3.0E^6:

    kylesmith:~/ghc$ flatten-benchmark 3,000,000
    Complexity: 195,000,000

    Which is generally in the ballpark of of what I calculate with an n one third (1/3) that size.

    I’m guessing your using n to simply grab that many elements from the Tree you build. That would mean that your complexities are over reported (using my interpretation of n) by a factor of three (3). Roughly speaking, that would place your highest reported complexity of 1.5E^9 at 5.0E^8. This is still an astounding result for a computer with one-fourth the installed memory. Very Impressive indeed!

    So, I tried to roughly find the max that I could produce with a 4GB box. First I tried an n equal to 2E^7.

    kylesmith:~/ghc$ flatten-benchmark 20,000,000
    Complexity: 1,299,999,988 = 1.29×10^9
    flatten-benchmark: interrupted

    That got me to roughly your max reported complexity. Then I went for n=5E^8.

    kylesmith:~/ghc$ flatten-benchmark 50,000,000
    Complexity: -1,044,967,308
    flatten-benchmark: interrupted

    Unfortunately, the number overflowed on me. I’m guessing GHC is using a single 32 bit word for it’s default real number precision, which would overflow somewhere around 4.2E^9. Is there a magical incantation you must do to get double precision real values?

    I guess with homogenous lists it would seem natural that one would be able to construct a more efficient ADT in memory for them. This result really shows off one of the strong suites of working in Haskell, as compared to Scheme. Well done article. I appreciate how much work goes into making even a single graph, so I thank you for taking the time to compare and contrast Haskell and Scheme.

    When you get exciting results, benchmarking is fun, but the reality of the job, is that its a lot of book keeping and waiting on runs to finish, and when you have your results you still have to make nice looking graphs, or nobody will understand the rows of numbers.

    Take care,


  9. Christophe Poucet says:

    Hello Kyle,

    Indeed you are correct, it seems I have a bug in my blog :) Thank you for the interest in the article and for evaluating the complexity function.

    Regarding the reason you are getting negative numbers, it is because I used Int and not Integer. The former is 32 bit, the later is any-sized numbers. Thanks for picking up that other bug :)

    I’ll fix the blog asap.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>