blog.poucet.org Rotating Header Image

Lazy memoization

It seems the concept of an “infinite array” in my previous blog post prompted some interest. Two very good suggestions came from Chung-chieh Shan. As usual, his code was minimal but powerful.

  • Use an infinite list of exponentially growing arrays.
  • Use the following data-structure: data Tree a = Tree a (Tree a) (Tree a)

I have to admit that I am not fully clear yet on how to encode it in the above tree.  Conceptually it should be trivial to encode, for instance, [0..] to such a binary structure, after all it’s just bits.  However, momentarily it escapes me.

Another solution came up through a discussion on #haskell with dcoutts and sjanssen(Spencer Janssen). At first the idea was to deal with lazy arrays, but quickly the discussion turned to the concept of a generic caching policy, brought up by dcoutts. One can imagine a combinator library of caching policies. And then one would memoize a function as follows:

memo :: CachingPolicy -> (i -> e) -> (i :->: e)

With this structure, one would generate a memoized function that uses the caching policy to evict old entries. Obviously this would require the use of unsafePerformIO under the hood to mutate the data-structure. It would definitely be an interesting paper, according to dcoutts, all that is required is an application as a strong showcase.

Potentially, a thread could even be used to do time-based eviction policies. However, here either a lock would have to be used, or the current STM system would have to be modified to allow for STM transactions to run in unsafePerformIO, something that is currently unsupported.

With the new concept of Type Families, it would even be possible to choose an appropriate data-structure based on the key-type, to guarantee optimal performance.

Be Sociable, Share!

5 Comments

  1. Jessie says:

    Chung-chieh Shan <—- I can’t believe it!!!! I know this guy!!!! I knew him since high school!

    I didn’t know him that well though, but it’s still surprising to discover the name of an old acquaintance!!

  2. For infinite arrays, check out Patricia trees (big- and little-endian) and their implementation by Daan Leijen in Data.IntMap. See also Chris Okasaki’s publications on “random-access lists”.

  3. Ryan Ingram says:

    for data Tree a = Node a (Tree a) (Tree a), the encoding is easy; and very obvious if you start from 1 instead of 0:

    1
    2 3
    4567

    in each case, a node’s two children are 2*n and 2*n+1. From that, you can trivially determine the path to a given node n; represent it in binary and remove the leading “1″ bit. Its depth is the length of the resulting string, and the binary string itself encodes the path directly: “0″ means to go left, “1″ to go right.

  4. cpoucet says:

    Hello Ryan,

    Indeed you are right. It’s also possible to store it hte other way, that way lookup is easier by doing even/odd and then shifting to the right. The key insight is basically that the bits are reversed when storing and when looking it up.

  5. cpoucet says:

    Conal,

    Thanks for the tip, I’ll definitely take a look at those.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>