Higher Order Zippers

In this blog article we explore the meaning of zippers as well as higher order zippers. The idea for this blog article came while discussing with fasta on Freenode today about zippers for graphs. Out of the blue, the question popped to me, what does it mean to have higher order zippers. Not having thought it through completely, I hope this blog will offer some clarity, and perhaps some feedback.

[EDIT: This is a new version as of 2007-07-11 that takes some comments from reddit and the comments here into account, cleaning up a bit and adding some more content]

For those new to the concept of Zippers, I shall briefly explain what they are and how one can reason with them. The idea of a zipper is to provide an iterator that one can move through an arbitrary functor.

A functor is basically a container of elements. For instance, [] in Haskell is a functor. If you apply to it to a specific type, you get a container-type (in this case a list) of that type. Other typical examples include Maybe, Binary trees and the N-ary tree introduced in Flatten Benchmark.

Due to the fact that Haskell does not have mutability, at first it can seem inefficient to mutate an element inside such a Functor or container. That is why Huet invented the concept of a Zipper. A Zipper allows one to walk through such a container such that one specific element has O(1) access, namely the element that is currently in the ‘hole’. Then, typically, there will be functions that allow one to move the element we’re currently looking at.

First Order Zippers

To illustrate, I will demonstrate the Zipper for a list first.

> {-# OPTIONS_GHC -fglasgow-exts #-}> module Zipper where> import Data.Maybe(fromJust)> > -- A zipper is the current element and a context that stores where this> --  element is.> data Zipper d a = Zipper a (d a)  deriving (Eq, Ord, Show)>> -- The context for a list is a combination of the elements before and the> --  elements after the current element.> data DList a = DList [a] [a]   >   deriving (Eq, Ord, Show)> type ZList a = Zipper DList a

The advantage of this zipper is that one can easily (with O(1) complexity) move the current hole we’re looking at back and forth. It also provides an O(1) complexity way of changing the current element. As last requirement is that it should be possible to reconstruct the entire data-type that has been ‘zipped’ from just the Zipper. This process is often referred to as ‘unzipping’ the data structure. The methods to perform these operations, as well as to create a zipper from a list are given here:

> enterList :: (Monad m) => [a] -> m (ZList a)> enterList []     = fail "Can not zip an empty list"> enterList (x:xs) = return $Zipper x (DList [] xs)>> unzipList :: ZList a -> [a]> unzipList (Zipper a (DList [] suf)) = a:suf> unzipList z = unzipList . fromJust . prev$ z>> prev :: (Monad m) => ZList a -> m (ZList a)> prev (Zipper a (DList [] tl))     = fail "Already at the first element"> prev (Zipper a (DList (x:xs) tl)) = return $Zipper x (DList xs (a:tl))>> next :: (Monad m) => ZList a -> m (ZList a)> next (Zipper a (DList hd [])) = fail "Already at the last element"> next (Zipper a (DList hd (x:xs))) = return$ Zipper x (DList (a:hd) xs)>> update :: (a -> a) -> Zipper d a -> Zipper d a> update f (Zipper a d) = Zipper (f a) d

In The derivative of a Regular Type is a Type of One-Hole Contests Conor McBride shows us how it is possible to determine the context for any data-type using algebra. Basically, the context for a data-type is it’s derivative. How can one derive a data-type, however? Well this is done by using the more algebraic notation of a data type. If one takes a list, then the actual data type is: ‘List a = Nil | Cons a (List a)’. In an algebraic notation, this would be: L = 1 + a·L. Namely, a list is a sumtype of a constant (1) and a tuple of an element ‘a’ and the list again. Well, this is not fully correct, a list is actually the least-fixed-type that satisfies this equation (namely: L = 1 + a·L μ L.1 + a·L). However I will gloss over this. The context of a list is then the derivative of this equation using standard derivation as taught in algebra/calculus:

a L = ∂a a · L + a·∂a L = 1·L + a·∂a L = L + a·∂a L

 => data DList a = [a] | DCons a (DList a)

By factoring out the [a] (because any DList will contain exactly one), one gets “[a]*(mu dL.1 + a*dL)” or “[a]*[a]“. In general, I believe that in an equation “(mu x.S + Tx)” one can factor out the S (as long as it doesn’t contain a reference to x of course) to get “(S*(mu x.1 + Tx))”. For exercise’s sake, we will derive the context for a binary tree:

> data Tree a = Leaf | Node a (Tree a) (Tree a)>   deriving (Eq, Ord, Show)> -- Tree a (or t for short)

a t = t2 + a·t·∂_ t = t2 × (μ x. 1 + 2·a·t·x) = t2·p where p = 1 + 2·a·t·p or p = 2·a·t

Looking at this from an intuitive point of view, this makes sense. The context of an element in a tree, are the left and right child-trees of the node this element is in, as well as the path that we took to get there by saving the trees that we skipped over along the way to the element.

> data DTree a = DTree {leftChild   :: (Tree a),>                       rightChild  :: (Tree a),>                       path        :: [SEither (Step a)]}>   deriving (Eq, Ord, Show)> type SEither a = Either a a> -- The path through a btree to the current node, saving those trees we did not enter.> data Step a = Step a (Tree a)>   deriving (Eq, Ord, Show)> type ZTree a = Zipper DTree a> > enterTree :: (Monad m) => Tree a -> m (ZTree a)> enterTree Leaf         = fail "Can not zip an empty btree"> enterTree (Node x l r) = return $Zipper x$ DTree l r []>> unzipTree :: (ZTree a) -> Tree a> unzipTree (Zipper x (DTree l r [])) = Node x l r> unzipTree dt                         = unzipTree . fromJust . up $dt>> up :: (Monad m) => ZTree a -> m (ZTree a)> up (Zipper x (DTree l r [])) = fail "Already at the top of a tree"> up (Zipper x (DTree l r (p:pp))) => case p of > Right (Step px pr) -> return$ Zipper px (DTree (Node x l r) pr pp)>     Left  (Step px pl) -> return $Zipper px (DTree pl (Node x l r) pp)>> left :: (Monad m) => ZTree a -> m (ZTree a)> left (Zipper x (DTree Leaf pr pp)) = fail "Can not go further left"> left (Zipper x (DTree (Node v l r) pr pp)) = return$ Zipper v $DTree l r$ ((Right $Step x pr):pp)> right :: (Monad m) => ZTree a -> m (ZTree a)> right (Zipper x (DTree pl Leaf pp)) = fail "Can not go further right"> right (Zipper x (DTree pl (Node v l r) pp)) = return$ Zipper v $DTree l r$ ((Left $Step x pl):pp) I will leave the implementation for an N-ary tree as an exercise for the reader. Second Order Zippers Now nothing introduced so far is really new. The question that arose to me is: what would a higher-order zipper be? What does it mean to have a double derivative of a data-type. I will first show what the structure for the double derivative of a list looks like by working it out and then give an intuition of what the meaning of this data-type might mean. To make sure that this intuition makes any sense, we will then work out the double derivative for a binary tree, to ascertain that the intuition we had for the list is valid for other data-types. So first we define the generic meaning of a second-order zipper: > type Zipper2 d2 a = Zipper (Zipper d2) a>> update2 :: (a -> a) -> Zipper2 d2 a -> Zipper2 d2 a> update2 f (Zipper x (Zipper y d)) = Zipper x (Zipper (f y) d) A second-order zipper is simply the zipper of the zipper of a datatype parametrized with the second-order derivative of that data-type and a the element-type. So we can already see that a second-order zipper will be containing two ‘a’s as well as this second-order derivative. So what is the second-order derivative of a list? Well we had: “[a]‘ = [a]_h*[a]_t”. Applying simple derivation in the algebraic sense we get: “[a]” = [a]_h*[a]_t’ + [a]_h’*[a]_t”. Second Order Zipper for a List If we remember the meaning we gave to the two lists in the case of “[a]‘”, then we had the head part and the tail part of the list around the current element we’re looking at. Since we’re now either deriving one or the other, my intuition tells me that we’ll have either the prefix and suffix of the head part or the prefix and suffix of the tail part. If we take the definition of “[a]‘” and expand it in “[a]”” we get: “[a]*[a]*[a] + [a]*[a]*[a]“. Putting this in a Haskell data-type with names based on my intuition, we get: > -- DDList a = Prev (reverse of prefix of prefix) > -- (suffix of prefix) > -- suffix > -- | Next (rever se prefix)> -- (rever se of prefix of suffix) > -- (suffix of suffix)> data DDList a = Prev [a] [a] [a] | Next [a] [a] [a]> deriving (Eq, Ord, Show)> type ZZList a = Zipper2 DDList a>> openLeft :: (Monad m) => ZList a -> m (ZZList a)> openLeft (Zipper r (DList [] tl)) = fail "Can not enter an empty list"> openLeft (Zipper r (DList (l:hd) tl)) = return$ Zipper l (Zipper r (Prev hd [] tl))>> openRight :: (Monad m) => ZList a -> m (ZZList a)> openRight (Zipper l (DList hd []))     = fail "Can not enter an empty list"> openRight (Zipper l (DList hd (r:tl))) = return $Zipper r (Zipper l (Next hd [] tl))>> unzipZList :: ZZList a -> ZList a> unzipZList (Zipper l (Zipper r (Prev a [] c))) = Zipper r (DList (l:a) c)> unzipZList z@(Zipper l (Zipper r (Prev a b c))) = unzipZList . fromJust . shrink$ z> unzipZList (Zipper r (Zipper l (Next a [] c)))  = Zipper l (DList a (r:c))> unzipZList z@(Zipper r (Zipper l (Next a b c))) = unzipZList . fromJust . shrink $z>> enlarge :: (Monad m) => ZZList a -> m (ZZList a)> enlarge (Zipper l (Zipper r (Prev [] bs cs))) = fail "Can not move further to the left"> enlarge (Zipper l (Zipper r (Prev (a:as) bs cs))) = return$ Zipper a (Zipper r (Prev as (l:bs) cs))> enlarge (Zipper r (Zipper l (Next as bs [])))      = fail "Can not move further to the right"> enlarge (Zipper r (Zipper l (Next as bs (c:cs))))  = return $Zipper c (Zipper l (Next as (r:bs) cs))>> shrink :: (Monad m) => ZZList a -> m (ZZList a)> shrink (Zipper l (Zipper r (Prev as [] cs))) = fail "Can not further shrink"> shrink (Zipper l (Zipper r (Prev as (b:bs) cs))) = return$ Zipper b (Zipper r (Prev (l:as) bs cs))> shrink (Zipper r (Zipper l (Next as [] cs)))        = fail "Can not further shrink"> shrink (Zipper r (Zipper l (Next as (b:bs) cs)))    = return $Zipper b (Zipper l (Next as bs (r:cs)))>> switch :: ZZList a -> ZZList a> switch (Zipper l (Zipper r (Prev as bs cs))) = Zipper r (Zipper l (Next as (reverse bs) cs))> switch (Zipper r (Zipper l (Next as bs cs))) = Zipper l (Zipper r (Prev as (reverse bs) cs))>> splice :: ZZList a -> [a] -> ZZList a> splice (Zipper l (Zipper r (Prev as _ cs))) bs = Zipper l (Zipper r (Prev as bs cs))> splice (Z ipper r (Zipper l (Next as _ cs))) bs = Zipper r (Zipper l (Next as bs cs))>> -- A more generic function (splice can be implemented in terms of this)> modify :: ([a] -> [a]) -> ZZList a -> ZZList a> modify f (Zipper l (Zipper r (Prev as bs cs))) = Zipper l (Zipper r (Prev as (f bs) cs))> modify f (Zipper r (Zipper l (Next as bs cs))) = Zipper r (Zipper l (Next as (f bs) cs))> So what my intuitition tells me is that where a Zipper gives O(1) mutation of an element, a Zipper2 gives O(1) splicing. After all, we can directly access the part that is between the two singular elements stored in the Zipper2. Of course, to ascertain that this statement is valid for more than just lists, we should ascertain this is true for other data-types. In this case, we shall do the experiment for binary trees. Again, the exercise for N-ary trees is left as an exercise for the reader (feel free to put in a comment if you want to). Deriving the rules for a tree Note: I redid the math with a cleaner technique, and it seems I had made some mistakes in the original math, which is why my data-type was not as intuitive as I had hoped it to be. The correct version makes much more sense Namely: I was missing a “p” in the last term. <!–$t = 1 + a * t^2 \part_a t = t^2 + a*t*\part_ t = t^2 \x (\mu x. 1 + 2*a*t*x) = t^2*p$where$p = 1 + 2*a*t*p$or$p = [2*a*t]$So we derive:$\part_a^2 t = 2 * t * \part_a t * p + t^2 * \part_a = D + R$where$D = 2 * t * \part_a t * p = 2 * t * (t^2) * p * p$and$R = t^2 * \part_a p = t^2 * \part_x[x] |_{x=2*a*t} * \part_a(2*a*t) =…… = t^2 * [2*a*t]*[2*a*t]*\part_a(2*a*t) = t^2 * p^2 * (2*t + 2*a*t’) = t^2 * p^2 *(2*t +2*a*t^2*p) = …… = 2*t*t^2*p^2 + 2*a*t^2*t^2*p*p^2 = U + UD$So finally we get (where we label the different parts to refer to them):$\part_a^2 t = D + U + UD$where$D = 2 * t_1 * (t^2)_2 * p_3 * p_4U = 2 * t_1 * (t^2)_2 * p_3 * p_4UD = 2 * a_1 * (t^2)_2L * (t^2)_2R * p_3 * (p^2)_4\$
–>

Returning to the definitions, we had:
t = 1 + a · t2

a t = t2 × (μ x. 1 + 2·a·t·x) = t2·p
where p = 1 + 2·a·t·p or p = 2·a·t

So we derive:
2a t = 2 · t · ∂a t · p + t2 · ∂a = D + R
where D = 2 · t · ∂a t · p = 2 · t · (t2) · p · p
and R = t2 · ∂a p = t2 · ∂x[x] |‎x=2·a·t‏ · ∂a(2·a·t) =…
… = t2 · 2·a·t·2·a·t·∂a(2·a·t) = t2 · p2 · (2·t + 2·a·t’) = t2 · p2 ·(2·t +2·a·t2·p) = …
… = 2·t·t2·p2 + 2·a·t2·t2·p·p2 = U + UD

So finally we get (where we label the different parts to refer to them):
2a t = D + U + UD where
D = 2 · t1 · (t2)2 · p3 · p4
U = 2 · t1 · (t2)2 · p3 · p4
UD = 2 · a1 · (t2)2L · (t2)2R · p3 · (p2)4

The first term, D, is basically a derivative of one or the left or right sub-trees of the tree we are currently looking at it with our first-order zipper. More specifically, if we go down the left-subtree, we save the right subtree in t1. Then the second finger will be pointing at a tree further down, so we need to save it’s children, namely: (t2)2. And obviously we have the two paths, one from the root of the tree to the first finger, p3, and one from this finger to the second finger, p4.

The second term, U, is analogous, except that the first finger will now be at the bottom and the second finger more at the top. This means that the path that goes from the first finger to the second finger, p4, will be reversed. Additionally (at least I think this should be so for efficiency reasons), the path from the root, p3, will only extend down to the second finger. Moving the second-finger around is then O(1) as we can just pop a path-element from one of the paths and put it on the other path or vice-versa.

The final term, UD, should then obviously be a path that walks up along the tree and then down from another node. In this case, we still have the path from the root of the tree, p3, but instead of going down to the first finger or second finger, it will go down to the join-point of these two fingers (namely the place where the path stops going up and starts going down again. Because neith
er subtree of this node is being stored in this path (as we’re deriving both sides), we need to save the element at this point, namely in a1. The two paths, (p2)4, then simply consist of the path up from the first finger and then down to the second finger. We have two options for this one, as we might be going up from the left side of the join point or the right side of the join point. Finally, the two pairs of trees, (t2)2L and (t2)2R obviously are the subtrees of the two nodes where our fingers are pointing at.

> data DDTree a = Down (Either (Mono a)) | Up (Either (Mono a)) | UpDown (Either (Duo a))>   deriving (Eq, Ord, Show)> data Mono a   = Mono {fromRoot    :: [Path a],>                       toSecond    :: [Path a],>                       otherChild  :: Tree a,>                       leftChild2  :: Tree a,>                       rightChild2 :: Tree a,>                       toBottom  :: DTree a}>   deriving (Eq, Ord, Show)>> data Duo a    = Duo {fromRoot         :: [Path a],>                      joinValue        :: a,>                      leftleftChild    :: Tree a,>                      leftrightChild   :: Tree a,>                      rightleftChild   :: Tree a,>                      rightrightChild  :: Tree a>                      firstToRoot      :: [Path a],>                      rootToSecond     :: [Path a]}>   deriving (Eq, Ord, Show)

The operations for this tree are left as an exercise for the reader (mostly for conciseness purposes and because I have other things to do, I might add them later). Suffice to say that intuitively (due to the way the lists are ordered (reverse path or not), moving the second finger will be easier than moving the first finger). Obviously, splicing will depend on whether we have a stricly unidirectional path, or an updown path.

I hope the above is clear, otherwise I can add figures, though if possible I’d prefer not to to say on time (and I don’t have a nice diagram editor :/).

Generating higher-order zippers

By now the pattern for creating higher-order zippers should be clear. Given the definition of a data-type as an algebraic entity. The first-order zipper would be:

type FirstOrderZipper d a = Zipper d a

where ‘d’ is the first derivative of the data-type in question. To then go to higher-order zippers, we simply apply the meta-rule (note that this is a meta-rule as there are no haskell facilities to derive data-types that I am aware of).

Zipper d a -> Zipper (Zipper d') a

Where d’ obviously stands for the derivative of d. As a reader of reddit pointed out, this is “simply” having several pointers that point into a functional data-structure. However, due to the way that these ‘pointers’ work (for instance, you can not just get a second one from a data-type, you need to go higher-order to get more pointers) we get extra functionality, for instance the O(1) splicing which I mentioned previously. For completeness’ sake, I here show the rules that were introduced in McBride’s paper.

x x = 1
x y = 0 where x ≠ y
x 0 = 0
x(S + T) = ∂x S + ∂x T
x 1 = 1
x(S × T) = ∂x S × T + S × ∂x T
x(μ y.F) = μ z.∂x F|‎y=μ y.F‏ + ∂y F|‎y=μ y.F‏ × z where z is fresh
x(F|‎y = S‏) = ∂x F|‎y=S‏ + ∂y F|‎y=S‏ × ∂x S
x T = 0 when T does not contain x
μ z.T + S × z = T × μ z. 1 + S × z when T and s do not contain z

Be Sociable, Share!

1. ph says:

I think there is a typo in the first derivation ; I think you meant

d(1)/da + d(a*l)/da

d(1)/da + d(a*a)/da

(or did I miss something ?)

2. Christophe (vincenz) says:

Correct,

I apologize about any possible mistakes. I had this epiphany last night and worked it out until 4am. Thank you for the feedback

Btw, I changed the colouring, I hope this is more legible

3. Jake says:

What about Tertiary or even higher derivatives? Hmm, this is terribly interesting, I wonder what it would mean to do integrals over types? That is– what happens if you “integrate” a list? I would bet that if derivation gave you context, the integration probably removes it. So you would get the type for which Lists are the one-hole context… hmm

Further, what about solving differential equations with types. I know Diff-Eq’s are everyones _favorite_ bit of mathematics. What would it mean to have an exponential type? God, this stuff makes my brain spin. Interesting stuff,

~~Joe

PS, you mind if I link to this in some future post at disparatemathematician.blogspot.com?

This is the kind of stuff I wish I’d thought of.

4. Christophe (vincenz) says:

@jake

Indeed, quite interesting. I have this hunch, but it’s completely unfounded, just purely intuitition nagging at me that. A fixpoint type is actually exponential. After all, if you differentiate a fixpoint, you get another fixpoint, with some part worked out.

I should try to cast that into math to see what’s going on

5. Christophe (vincenz) says:

@jake: No feel free to link

6. sigfpe says:

vincenz,

Awesome!

Jake,

> what about solving differential equations with types

Makes perfect sense – sometimes. The equation defining a list of X’s is

L(X) = 1+X.L(X)

Differentiate both sides and we get the zipper for lists,

L’(X)=X.L’(X)+L(X)

This is a differential equation. Try playing with other types and you’ll get some other less trivial differential equations.

Taylor series also make perfect sense here – and that leads to some crossover with generating functions. It’s also fun to try pulling in stuff from quantum mechanics where we also find differential operators that make holes in things.

7. Christophe (vincenz) says:

One question which I’m left with is:

How do you define the derivative (in terms of ‘a’) of a type such as:

a -> b

or

b -> a

8. Jake says:

@sigfpe and cristophe

I was actually thinking more in terms of systems of differential equations, they’d probably have some nifty relations to mutually recursive datatypes. Heck, Systems of Equations period would be handy for specifying MR Datatypes. I wonder how Population-Dynamic systems would translate to data- mutually limiting each other dynamically.

Also– is subtraction and division defined over data? It seems that something like the mutual recursion

F = 1 + SB
B = 1 + PF

implies that F = (B-1)/S
which further implies that
B = 1 + (PB-P)/S

But that only makes sense if we can “Divide” by a type S and “Subtract” a type P. (I have no Idea whether that particular Datatype would be useful, but it sure is interesting.)

So many things to think about, I really like the idea of dynamicly altering datatypes. I could see a potential use for them in dynamically keeping Data tagged in proportion to one another, kind of like the Salt Tank problem or Rabbit/Fox problem. So then, at any point, we can define a function “partition” which returns a list of lists (or tuple, if it’s a finite set of partitions) of the partitioned data- could potentially be useful in distributed programs. When a given client program sends its completed work to a server, it’s tagged with his particular “chemical” and put into the “tank” The server can then use its differential datatype to determine who is the most efficient client, and give them the “harder” work, whereas the less effecient ones can have “easier” work. so that the “tank” remains balanced.

Certainly would be interesting.

~~Joe/Jake

9. Christophe (vincenz) says:

@jake:

Actually, the derivation would be as follows:

F = 1 + SB
B = 1 + PF

=>

F = 1 + S(1+PF) or:
F = mu x . 1 + S + SPx

I’m afraid I am not familiar with the salt tank problem. As for the rabbit/fox problem, are you referring to loop-detection in graphs?

10. Emil Hedevang says:

Regarding the derivative of a -> b and b -> a with respect to a:

I would start by considering a -> b as b ^ a and then make the expansion of b ^ a into its Taylor series which is “easy” to calculate the derivative of.

As for b -> a, that is a ^ b, the derivative might be b * a ^ (b – 1). How would one interpret b – 1? Perhaps one should make an expansion into a series as well.

I don’t now if this makes any sense, I’m just starting to learn Haskell.

– Emil Hedevang

11. Anonymous says:

Isn’t the type a->b or b->a impossible? So it doesn’t really mean anything to define a derivative for it?

12. Jake says:

I was actually just using some algebra to come up with those derivations, but the question of (-) and (/) over types is still interesting.

The Salt Tank problems is generally stated as follows:

Assume you have a tank filled with a solution of chemicals. The tank has inputs from some tanks filled with a certain chemical, these chemicals flow into the take at a given rate.

Further, assume there is an output from the tank which allows material in the tank to flow out. The problem is then to model the change in chemicals in the tank over time. In the simple cases, the tank tends toward a steady state where all the chemicals are in some proportion to one another.

The Rabbit/Fox problem is a Population Dynamics problem. Which is stated as:

If you have a starting population of rabbits who reproduce at a given rate, and who die at a given rate. Then if you are given a population of foxes who eat rabbits at a given rate, and also have given rates of death and reproduction, model the population over time.

Typically this situation results in a periodic shifting between high populations of foxes vs high populations of rabbits.

Anyway. My idea was to translate useful solutions to analogous problems from diffeqs to types. which would be terribly useful to a programmer. Because we could then model a datatype around the math involved in the problem. Like I mentioned with distrubuted load balancing. The problem of finding a good algorithm gets moved to the math of the structure, which allows for an easier time writing the actual code.

Like I said, my brains just spinning about it, so who knows how use(ful|less) this will be.

~~Joe

13. Cale Gibbard says:

Regarding the integral of a (strict) list type, it would be the type of directed cycles, which can’t really be easily expressed in Haskell.

Putting a hole in a cycle defines a list consisting of the remaining elements in order.

14. Anonymous says:

Thanks for the post! Two quick things:

1) I think the link to your “Flatten Benchmark” actually points to Huet’s paper (as the next link does, but rightly in that case)

2) the use of “l” for list makes the first derivative equation a bit hard to read, since “l” (lowercase ell) and “1″ (digit one) look very similar, at least with my browser settings

— Fritz (lazily anonymous)

15. Christophe (vincenz) says:

Alright,

In case you’re wondering, the article has been revamped and cleaned up a bit. I might add figures and the operations for DDTree when I’m inspired, but I have other stuff to do as well