blog.poucet.org Rotating Header Image

April, 2009:

Flattening Data.Map

While at the haskell hackathon, I decided to work on adaptive containers, which is an initiative that had been started by Don Stewart, to get Data.Map and Data.IntMap more memory compact. Together with Nicolas Pouillard(ertai) we tackled this project. He worked on IntMap while I worked on Map. You can find the code at: http://patch-tag.com/repo/adaptive-containers.

The main motivation for these adaptive containers it that it is impossible to flatten (or unpack) small data-types directly into the memory used by the data-constructor, so you end up with extra pointers for each value inside a container (specifically 2 pointers, 1 from the container to the data-type, and then one from the data-type to the unboxed one).

Update: A video was posted where Don Stewart presents the basic concepts and where I briefly present the associative adaptive containers mentioned in this article.

It is very easy to create flattened maps now, all you have to do is instantiate the following typeclass:

class AdaptMap k a where
  data Map k a

  tip :: Map k a
  unsafeBin :: Size -> k -> a -> Map k a -> Map k a -> Map k a
  mapMap :: c -> (Size -> k -> a -> Map k a -> Map k a -> c) -> Map k a -> c
  mapMap2 :: (AdaptMap k b)
          => c
          -> (Size -> k -> a -> Map k a -> Map k a -> c) -- Right is Tip
          -> (Size -> k -> b -> Map k b -> Map k b -> c) -- Left is Tip
          -> (Size -> k -> a -> Map k a -> Map k a ->
              Size -> k -> b -> Map k b -> Map k b -> c)
          -> Map k a
          -> Map k b
          -> c
  mapMap2 t b1 b2 c t1 t2 = mapMap tip1 bin1 t1
    where tip1                = mapMap t b2 t2
          bin1 s1 k1 i1 l1 r1 = mapMap tip2 bin2 t2
            where tip2                = b1 s1 k1 i1 l1 r1
                  bin2 s2 k2 i2 l2 r2 = c s1 k1 i1 l1 r1 s2 k2 i2 l2 r2

Notice that the case for two tree deconstruction is completely defined based on single tree deconstruction, so you only need to worry about tip, unsafeBin and mapMap.  Also note that this representation assumes you will have an implementation that is similar to the original one.

An example would be:

instance AdaptMap Int Int where
  data Map Int Int  = TipIntInt
                    | BinIntInt
                      {-# UNPACK #-} !Size
                      {-# UNPACK #-} !Int
                      {-# UNPACK #-} !Int
                      !(Map Int Int)
                      !(Map Int Int)

  tip                              = TipIntInt
  unsafeBin                        = BinIntInt
  mapMap t b TipIntInt             = t
  mapMap t b (BinIntInt s k i l r) = b s k i l r

After we got the basic code working, I thought it would be interesting to flatten nodes inside of Data.Adaptive.Map further, so I created a different instance for Int32 (to get as similar as possible to the original keys and values).  The definition is much hairier:

instance AdaptMap Int32 Int32 where
  data Map Int32 Int32  = TipInt32Int32
                        | BinInt32Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          !(Map Int32 Int32)
                          !(Map Int32 Int32)
                        | BinTipTipInt32Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                        | BinBinTipInt32Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          !(Map Int32 Int32)
                          !(Map Int32 Int32)
                        | BinTipBinInt32Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          !(Map Int32 Int32)
                          !(Map Int32 Int32)
                        | BinBinBinInt32Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          !(Map Int32 Int32)
                          !(Map Int32 Int32)
                          {-# UNPACK #-} !Size
                          {-# UNPACK #-} !Int32
                          {-# UNPACK #-} !Int32
                          !(Map Int32 Int32)
                          !(Map Int32 Int32)

  tip                              = TipInt32Int32
  unsafeBin s k v TipInt32Int32 TipInt32Int32 = BinTipTipInt32Int32 s k v
  unsafeBin s k v (BinInt32Int32 s1 k1 v1 l1 r1) TipInt32Int32
    = BinBinTipInt32Int32 s k v s1 k1 v1 l1 r1
  unsafeBin s k v TipInt32Int32 (BinInt32Int32 s2 k2 v2 l2 r2)
    = BinTipBinInt32Int32 s k v s2 k2 v2 l2 r2
  unsafeBin s k v (BinInt32Int32 s1 k1 v1 l1 r1) (BinInt32Int32 s2 k2 v2 l2 r2)
    = BinBinBinInt32Int32 s k v s1 k1 v1 l1 r1 s2 k2 v2 l2 r2
  unsafeBin s k v l r
    = BinInt32Int32 s k v l r

  mapMap t b TipInt32Int32                              = t
  mapMap t b (BinInt32Int32 s k v l r)                  = b s k v l r
  mapMap t b (BinTipTipInt32Int32 s k v)                =
    b s k v TipInt32Int32 TipInt32Int32
  mapMap t b (BinBinTipInt32Int32 s k v s1 k1 v1 l1 r1) =
    b s k v (BinInt32Int32 s1 k1 v1 l1 r1) TipInt32Int32
  mapMap t b (BinTipBinInt32Int32 s k v s2 k2 v2 l2 r2) =
    b s k v TipInt32Int32 (BinInt32Int32 s2 k2 v2 l2 r2)
  mapMap t b (BinBinBinInt32Int32 s k v s1 k1 v1 l1 r1 s2 k2 v2 l2 r2) =
    b s k v (BinInt32Int32 s1 k1 v1 l1 r1) (BinInt32Int32 s2 k2 v2 l2 r2)

The reason for this is that I’ve read in the past that flattening 3 tree nodes into one can give you very good cache locality.  While you are recreating nodes on deconstruction, these will be cache local anyways, so it should not be that expensive to access the data in these.

To see this in action (apologies about the long data-constructors), the following image is a representation of fromList $ zip [1..100] [1..100].  While the image is not fully clear, it is clear that the number of nodes used to represent this is much smaller.

100-nodes

In contrast, the simpler Int version of the same data-structure has many more nodes:

100-nodes2