<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tupil Code Blog &#187; Snowball</title>
	<atom:link href="http://blog.tupil.com/tag/snowball/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tupil.com</link>
	<description>(Get up early, code often)</description>
	<lastBuildDate>Fri, 27 Aug 2010 10:50:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Stemming with Haskell</title>
		<link>http://blog.tupil.com/stemming-with-haskell/</link>
		<comments>http://blog.tupil.com/stemming-with-haskell/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 14:55:41 +0000</pubDate>
		<dc:creator>Eelco Lempsink</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[Snowball]]></category>
		<category><![CDATA[stemmer]]></category>

		<guid isPermaLink="false">http://blog.tupil.com/?p=22</guid>
		<description><![CDATA[Last week we worked on building a small search engine with Haskell. As you might know, when searching you&#8217;ll need some index you&#8217;ll search and possibly stemming to allow people to search for variants of a word and still come up with accurate results. Fortunately for us, there are already good libraries and tools out [...]]]></description>
			<content:encoded><![CDATA[<p>Last week we worked on building a small search engine with Haskell. As you might know, when searching you&#8217;ll need some <em>index</em> you&#8217;ll search and possibly <a href='http://en.wikipedia.org/wiki/Stemming'>stemming</a> to allow people to search for variants of a word and still come up with accurate results.</p>
<p>Fortunately for us, there are already good libraries and tools out there to help us. So instead of trying to write everything from scratch, we made a small library based on <a href='http://snowball.tartarus.org/'>Snowball&#8217;s libstemmer_c</a> and a very (very!) rough start of a <a href='http://www.sphinxsearch.com/'>Sphinx</a> client (more about that in a later post).</p>
<p>We&#8217;ve released the library on <a href='http://hackage.haskell.org/'>Hackage</a> so check out <a href='http://hackage.haskell.org/cgi-bin/hackage-scripts/package/stemmer'>stemmer 0.1</a></p>
<p>A small code example to give you a taste&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="haskell" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">module</span> Main <span style="color: #06c; font-weight: bold;">where</span>
&nbsp;
<span style="color: #06c; font-weight: bold;">import</span> <span style="color: #06c; font-weight: bold;">qualified</span> NLP<span style="color: #339933; font-weight: bold;">.</span>Stemmer <span style="color: #06c; font-weight: bold;">as</span> Stemming
<span style="color: #06c; font-weight: bold;">import</span> Control<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">Monad</span> <span style="color: green;">&#40;</span>unless<span style="color: green;">&#41;</span>
<span style="color: #06c; font-weight: bold;">import</span> System<span style="color: #339933; font-weight: bold;">.</span><span style="color: #cccc00; font-weight: bold;">IO</span> <span style="color: green;">&#40;</span>hSetBuffering<span style="color: #339933; font-weight: bold;">,</span> stdout<span style="color: #339933; font-weight: bold;">,</span> BufferMode<span style="color: green;">&#40;</span>NoBuffering<span style="color: green;">&#41;</span><span style="color: green;">&#41;</span>
&nbsp;
main <span style="color: #339933; font-weight: bold;">::</span> <span style="color: #cccc00; font-weight: bold;">IO</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span>
main <span style="color: #339933; font-weight: bold;">=</span> <span style="color: #06c; font-weight: bold;">do</span>
    stemmer <span style="color: #339933; font-weight: bold;">&lt;-</span> Stemming<span style="color: #339933; font-weight: bold;">.</span>new Stemming<span style="color: #339933; font-weight: bold;">.</span>English
    <span style="font-weight: bold;">putStrLn</span> <span style="">&quot;Enter a sentence to stem, an empty line to stop.&quot;</span>
    hSetBuffering stdout NoBuffering <span style="color: #5d478b; font-style: italic;">-- to print a prompt</span>
    stemUserInput stemmer
    Stemming<span style="color: #339933; font-weight: bold;">.</span>delete stemmer
&nbsp;
stemUserInput <span style="color: #339933; font-weight: bold;">::</span> Stemming<span style="color: #339933; font-weight: bold;">.</span>Stemmer <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">IO</span> <span style="color: green;">&#40;</span><span style="color: green;">&#41;</span>
stemUserInput stemmer <span style="color: #339933; font-weight: bold;">=</span> <span style="color: #06c; font-weight: bold;">do</span>
    <span style="font-weight: bold;">putStr</span> <span style="">&quot;&gt; &quot;</span>
    string <span style="color: #339933; font-weight: bold;">&lt;-</span> <span style="font-weight: bold;">getLine</span>
    unless <span style="color: green;">&#40;</span>string <span style="color: #339933; font-weight: bold;">==</span> <span style="">&quot;&quot;</span><span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span> <span style="color: #06c; font-weight: bold;">do</span> 
        string' <span style="color: #339933; font-weight: bold;">&lt;-</span> <span style="font-weight: bold;">mapM</span> <span style="color: green;">&#40;</span>Stemming<span style="color: #339933; font-weight: bold;">.</span>stem stemmer<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">$</span> <span style="font-weight: bold;">words</span> string
        <span style="font-weight: bold;">putStrLn</span> <span style="color: #339933; font-weight: bold;">$</span> <span style="">&quot;&lt; &quot;</span> <span style="color: #339933; font-weight: bold;">++</span> <span style="font-weight: bold;">unwords</span> string'
        stemUserInput stemmer</pre></div></div>

<p>Save this to Main.hs and then do something like<br />
<code><br />
$ ghc --make Main.hs -o stemmer<br />
[1 of 1] Compiling Main             ( Main.hs, Main.o )<br />
Linking stemmer ...<br />
$ ./stemmer<br />
Enter a sentence to stem, an empty line to stop.<br />
> The fishes worked forever with their fins<br />
< The fish work forev with their fin<br />
> Stemming with Haskell<br />
< Stem with Haskel<br />
</code></p>
<p>It was pretty easy to implement this library and also a nice exercise in using <a href='http://www.cse.unsw.edu.au/~chak/haskell/ffi/'>Haskell's Foreign Function Interface</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tupil.com/stemming-with-haskell/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>
