Rotating Header Image

Levenshtein Distance in Haskell

Seeing how lately the levenschtein distance is so popular, I thought I’d add a haskell version to the mix.

The code below is pretty self-explanatory, using the typical memoization pattern. I do wonder, however, what would happen if somehow GHC supported lazily-resizable arrays. Perhaps then you could do this on infinite lists and only ask the levenschtein distance up to a certain index.

module Lev whereimport Data.Array

levenshtein :: (Eq a) => [a] -> [a] -> Integer
levenshtein xs ys = arr ! (lenx, leny)
  where arr :: (Array (Int, Int) Integer)
        arr                   = array ((0,0), (lenx, leny))  [((ix, iy), worker ix iy) |
                                                              ix <- [0..lenx], iy <- [0..leny] ]
        lenx                  = length xs
        leny                  = length ys
        arrx                  = listArray (1, lenx) xs
        arry                  = listArray (1, leny) ys
        worker 0      iy      = fromIntegral iy
        worker ix     0       = fromIntegral ix
        worker ix     iy      = minimum [1 + arr ! (ix-1, iy), 1 + arr ! (ix, iy-1),
                                         cost ix iy + arr ! (ix-1, iy-1)]
        cost ix iy            = if (arrx ! ix) == (arry ! iy) then 0 else 1

Be Sociable, Share!


  1. jorpic says:

    this is a little bit shorter and faster:

    levenshtein sa sb = last $ foldl (transform sa) [0..length sa] sb
    transform str xs@(x:xs’) c = res
    res = x+1 : zipWith4 compute str xs xs’ res
    compute c’ x y z = minimum [y+1, z+1, x + if c' == c then 0 else 1]

  2. Interesting code :) Rather cryptic, I must admit.

  3. lf says:

    if c’ == c then 0 else 1

    can also be written

    fromEnum (c /= c’)

    if you don’t like inline ifs :-)

  4. Anonymous says:

    Small improvements on jorpic’s version:

    levenshtein sa sb = last $ foldl transform [0..length sa] sb where
    transform xs@(x:xs’) c = scanl compute (x+1) (zip3 sa xs xs’) where
    compute z (c’, x, y) = minimum [y+1, z+1, x + fromEnum (c' /= c)]

  5. gasche says:

    What is the complexity of your implementation, given (li ! n) is in O(n) ? I would expect O(n^3) instead of the usual O(n^2).

  6. gasche says:

    Sorry, I just realized you were using Data.Array, not list, so you have constant-time random access.

  7. Grazer says:

    It is also possible to have a much faster algorithm for similar sequences. See Lloyd Allison’s haskell implementation of a O(N*d) solution.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>