Stemming with Haskell
Last week we worked on building a small search engine with Haskell. As you might know, when searching you’ll need some index you’ll search and possibly stemming to allow people to search for variants of a word and still come up with accurate results.
Fortunately for us, there are already good libraries and tools out there to help us. So instead of trying to write everything from scratch, we made a small library based on Snowball’s libstemmer_c and a very (very!) rough start of a Sphinx client (more about that in a later post).
We’ve released the library on Hackage so check out stemmer 0.1
A small code example to give you a taste…
module Main where import qualified NLP.Stemmer as Stemming import Control.Monad (unless) import System.IO (hSetBuffering, stdout, BufferMode(NoBuffering)) main :: IO () main = do stemmer <- Stemming.new Stemming.English putStrLn "Enter a sentence to stem, an empty line to stop." hSetBuffering stdout NoBuffering -- to print a prompt stemUserInput stemmer Stemming.delete stemmer stemUserInput :: Stemming.Stemmer -> IO () stemUserInput stemmer = do putStr "> " string <- getLine unless (string == "") $ do string' <- mapM (Stemming.stem stemmer) $ words string putStrLn $ "< " ++ unwords string' stemUserInput stemmer
Save this to Main.hs and then do something like
$ ghc --make Main.hs -o stemmer
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking stemmer …
$ ./stemmer
Enter a sentence to stem, an empty line to stop.
> The fishes worked forever with their fins
< The fish work forev with their fin
> Stemming with Haskell
< Stem with Haskel
It was pretty easy to implement this library and also a nice exercise in using Haskell’s Foreign Function Interface.
12 Comments
Jump to comment form | comments rss | trackback uri