<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Stemming with Haskell</title>
	<atom:link href="http://blog.tupil.com/stemming-with-haskell/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tupil.com/stemming-with-haskell/</link>
	<description>(Get up early, code often)</description>
	<lastBuildDate>Mon, 08 Feb 2010 22:03:01 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Tupil Code Blog &#187; Blog Archive &#187; Stemming with Haskell reloaded</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-104</link>
		<dc:creator>Tupil Code Blog &#187; Blog Archive &#187; Stemming with Haskell reloaded</dc:creator>
		<pubDate>Sat, 19 Jul 2008 14:46:54 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-104</guid>
		<description>[...] to the nice discussion with Reinier Lamers of the previous post, I&#8217;ve updated and released the stemmer library with a more Haskell-like interface. As a point [...]</description>
		<content:encoded><![CDATA[<p>[...] to the nice discussion with Reinier Lamers of the previous post, I&#8217;ve updated and released the stemmer library with a more Haskell-like interface. As a point [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Lamers</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-99</link>
		<dc:creator>Reinier Lamers</dc:creator>
		<pubDate>Fri, 18 Jul 2008 22:11:51 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-99</guid>
		<description>Haskelling on a vacation... watch out for becoming the first person diagnosed with Haskell addiction.

That API looks great! But I think you should ask dons for an authoritative opinion :-)</description>
		<content:encoded><![CDATA[<p>Haskelling on a vacation&#8230; watch out for becoming the first person diagnosed with Haskell addiction.</p>
<p>That API looks great! But I think you should ask dons for an authoritative opinion :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eelco Lempsink</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-98</link>
		<dc:creator>Eelco Lempsink</dc:creator>
		<pubDate>Fri, 18 Jul 2008 20:12:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-98</guid>
		<description>It took me a couple of attempts and a number of segfaults (brrr), but now I&#039;ve solved in a way I (and hopefully you) am happy with it. Thank you very much for your feedback!

I&#039;ll release it, together with a new blogpost tomorrow. (Don&#039;t have time now, it&#039;s a friday night and I&#039;m on vacation ;)

To give you a preview, I&#039;ve removed &#039;unsafeStem&#039;, moved the low-level stuff to a new module (NLP.Stemmer.C) and added (to the exported functions) 
&lt;pre lang=&#039;haskell&#039;&gt;
withStemmer :: Algorithm -&gt; (Stemmer -&gt; IO a) -&gt; IO a
stem :: Algorithm -&gt; String -&gt; String
stemWords :: Algorithm -&gt; [String] -&gt; [String]
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>It took me a couple of attempts and a number of segfaults (brrr), but now I&#8217;ve solved in a way I (and hopefully you) am happy with it. Thank you very much for your feedback!</p>
<p>I&#8217;ll release it, together with a new blogpost tomorrow. (Don&#8217;t have time now, it&#8217;s a friday night and I&#8217;m on vacation ;)</p>
<p>To give you a preview, I&#8217;ve removed &#8216;unsafeStem&#8217;, moved the low-level stuff to a new module (NLP.Stemmer.C) and added (to the exported functions)</p>

<div class="wp_syntax"><div class="code"><pre class="haskell">withStemmer <span style="color: #339933; font-weight: bold;">::</span> Algorithm <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: green;">&#40;</span>Stemmer <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">IO</span> a<span style="color: green;">&#41;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">IO</span> a
stem <span style="color: #339933; font-weight: bold;">::</span> Algorithm <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: #cccc00; font-weight: bold;">String</span>
stemWords <span style="color: #339933; font-weight: bold;">::</span> Algorithm <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: green;">&#91;</span><span style="color: #cccc00; font-weight: bold;">String</span><span style="color: green;">&#93;</span> <span style="color: #339933; font-weight: bold;">-&gt;</span> <span style="color: green;">&#91;</span><span style="color: #cccc00; font-weight: bold;">String</span><span style="color: green;">&#93;</span></pre></div></div>

]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Lamers</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-96</link>
		<dc:creator>Reinier Lamers</dc:creator>
		<pubDate>Fri, 18 Jul 2008 19:57:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-96</guid>
		<description>You don&#039;t delete, you just make sure you never create a second stemmer for the same language. So you never create more stemmers than there are languages and you hopefully never run out of memory.

If your memory were so precious that that would be a problem, you&#039;d not be writing your app in Haskell anyway.

I&#039;m pretty comfortable with using unsafePerformIO in a library (but I&#039;m no more seasoned in Haskell than you of course!). You have something that is essentially a pure computation (namely a stemming algorithm) and it should be exposed as such to Haskell application writers.

Using unsafePerformIO in a pure Haskell application usually means that the author made a design mistake, but when you&#039;re talking to C it is OK with me to write wrapper code that deals with the statefulness of the underlying C API, and then run the wrapper code inside unsafePerformIO.

That doesn&#039;t mean you shouldn&#039;t be very cautious when using unsafePerformIO. Looking at my example code, and what you say about the dangers of unsafePerformIO, I see that I should probably have one unsafePerformIO call in the &quot;stem&quot; function, because now it might call &quot;Stemming.stem &quot; too early.</description>
		<content:encoded><![CDATA[<p>You don&#8217;t delete, you just make sure you never create a second stemmer for the same language. So you never create more stemmers than there are languages and you hopefully never run out of memory.</p>
<p>If your memory were so precious that that would be a problem, you&#8217;d not be writing your app in Haskell anyway.</p>
<p>I&#8217;m pretty comfortable with using unsafePerformIO in a library (but I&#8217;m no more seasoned in Haskell than you of course!). You have something that is essentially a pure computation (namely a stemming algorithm) and it should be exposed as such to Haskell application writers.</p>
<p>Using unsafePerformIO in a pure Haskell application usually means that the author made a design mistake, but when you&#8217;re talking to C it is OK with me to write wrapper code that deals with the statefulness of the underlying C API, and then run the wrapper code inside unsafePerformIO.</p>
<p>That doesn&#8217;t mean you shouldn&#8217;t be very cautious when using unsafePerformIO. Looking at my example code, and what you say about the dangers of unsafePerformIO, I see that I should probably have one unsafePerformIO call in the &#8220;stem&#8221; function, because now it might call &#8220;Stemming.stem &#8221; too early.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eelco Lempsink</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-95</link>
		<dc:creator>Eelco Lempsink</dc:creator>
		<pubDate>Fri, 18 Jul 2008 18:25:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-95</guid>
		<description>Two things come to mind: I don&#039;t really like unsafePerformIO, especially not as &#039;default&#039;.  Also, at what point (in your code) do you delete the reference (to free the memory)? Using a cache implies having the user to do the deletion still by hand.

That said, I&#039;m not a seasoned library developer, so I&#039;m not sure if unsafePerformIO might be fine in these cases. From the documentation: &quot;If the I/O computation wrapped in unsafePerformIO performs side effects, then the relative order in which those side effects take place (relative to the main I/O trunk, or other calls to unsafePerformIO) is indeterminate.&quot;  If I understand that correctly it might be that deleting the pointer could happen before a stemmer has been assigned. Since the &#039;delete&#039; function will not fail when passed a nullPtr, this probably should work fine, except the occasional lack of freeing the memory. It&#039;s probably possible to implement it in a way that&#039;s still perfectly safe, I&#039;ll investigate it a bit ;)</description>
		<content:encoded><![CDATA[<p>Two things come to mind: I don&#8217;t really like unsafePerformIO, especially not as &#8216;default&#8217;.  Also, at what point (in your code) do you delete the reference (to free the memory)? Using a cache implies having the user to do the deletion still by hand.</p>
<p>That said, I&#8217;m not a seasoned library developer, so I&#8217;m not sure if unsafePerformIO might be fine in these cases. From the documentation: &#8220;If the I/O computation wrapped in unsafePerformIO performs side effects, then the relative order in which those side effects take place (relative to the main I/O trunk, or other calls to unsafePerformIO) is indeterminate.&#8221;  If I understand that correctly it might be that deleting the pointer could happen before a stemmer has been assigned. Since the &#8216;delete&#8217; function will not fail when passed a nullPtr, this probably should work fine, except the occasional lack of freeing the memory. It&#8217;s probably possible to implement it in a way that&#8217;s still perfectly safe, I&#8217;ll investigate it a bit ;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Lamers</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-93</link>
		<dc:creator>Reinier Lamers</dc:creator>
		<pubDate>Fri, 18 Jul 2008 12:22:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-93</guid>
		<description>The blog needs a preview feature... I hope the code is rendered readably this time.

&lt;pre lang=&quot;haskell&quot;&gt;
data Language = English &#124; Dutch &#124; Piraha &#124; ...
                            deriving (Eq, Ord)

stem :: Language -&gt; String -&gt; String
stem lang word = unsafePerformIO $ Stemming.stem stemmer word
    where stemmer = unsafePerformIO $ do
                                    cache &lt;- readIORef stemmerCache
                                    case Map.lookup lang cache of
                                        Just s -&gt; s
                                        Nothing -&gt; do
                                            s &lt;- Stemming.new lang
                                            writeIORef (Map.insert lang s cache)
                                            return s 

stemmerCache :: IORef (M.Map Language Stemming.Stemmer)
stemmerCache = -- insert dirty top-level unsafePerformIO here
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>The blog needs a preview feature&#8230; I hope the code is rendered readably this time.</p>

<div class="wp_syntax"><div class="code"><pre class="haskell"><span style="color: #06c; font-weight: bold;">data</span> Language <span style="color: #339933; font-weight: bold;">=</span> English <span style="color: #339933; font-weight: bold;">|</span> Dutch <span style="color: #339933; font-weight: bold;">|</span> Piraha <span style="color: #339933; font-weight: bold;">|</span> <span style="color: #339933; font-weight: bold;">...</span>
                            <span style="color: #06c; font-weight: bold;">deriving</span> <span style="color: green;">&#40;</span><span style="color: #cccc00; font-weight: bold;">Eq</span><span style="color: #339933; font-weight: bold;">,</span> <span style="color: #cccc00; font-weight: bold;">Ord</span><span style="color: green;">&#41;</span>
&nbsp;
stem <span style="color: #339933; font-weight: bold;">::</span> Language <span style="color: #339933; font-weight: bold;">-</span>&amp;gt; <span style="color: #cccc00; font-weight: bold;">String</span> <span style="color: #339933; font-weight: bold;">-</span>&amp;gt; <span style="color: #cccc00; font-weight: bold;">String</span>
stem lang word <span style="color: #339933; font-weight: bold;">=</span> unsafePerformIO <span style="color: #339933; font-weight: bold;">$</span> Stemming<span style="color: #339933; font-weight: bold;">.</span>stem stemmer word
    <span style="color: #06c; font-weight: bold;">where</span> stemmer <span style="color: #339933; font-weight: bold;">=</span> unsafePerformIO <span style="color: #339933; font-weight: bold;">$</span> <span style="color: #06c; font-weight: bold;">do</span>
                                    cache &amp;lt;<span style="color: #339933; font-weight: bold;">-</span> readIORef stemmerCache
                                    <span style="color: #06c; font-weight: bold;">case</span> Map<span style="color: #339933; font-weight: bold;">.</span><span style="font-weight: bold;">lookup</span> lang cache <span style="color: #06c; font-weight: bold;">of</span>
                                        Just s <span style="color: #339933; font-weight: bold;">-</span>&amp;gt; s
                                        Nothing <span style="color: #339933; font-weight: bold;">-</span>&amp;gt; <span style="color: #06c; font-weight: bold;">do</span>
                                            s &amp;lt;<span style="color: #339933; font-weight: bold;">-</span> Stemming<span style="color: #339933; font-weight: bold;">.</span>new lang
                                            writeIORef <span style="color: green;">&#40;</span>Map<span style="color: #339933; font-weight: bold;">.</span>insert lang s cache<span style="color: green;">&#41;</span>
                                            <span style="font-weight: bold;">return</span> s 
&nbsp;
stemmerCache <span style="color: #339933; font-weight: bold;">::</span> IORef <span style="color: green;">&#40;</span>M<span style="color: #339933; font-weight: bold;">.</span>Map Language Stemming<span style="color: #339933; font-weight: bold;">.</span>Stemmer<span style="color: green;">&#41;</span>
stemmerCache <span style="color: #339933; font-weight: bold;">=</span> <span style="color: #5d478b; font-style: italic;">-- insert dirty top-level unsafePerformIO here</span></pre></div></div>

]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Lamers</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-92</link>
		<dc:creator>Reinier Lamers</dc:creator>
		<pubDate>Fri, 18 Jul 2008 12:18:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-92</guid>
		<description>That&#039;s much better :)

I was thinking of an API like this myself:

&lt;code&gt;
data Language = English &#124; Dutch &#124; Piraha &#124; ...
                            deriving (Eq, Ord)

stem :: Language -&gt; String -&gt; String
stem lang word = unsafePerformIO $ Stemming.stem stemmer word
    where stemmer = unsafePerformIO $ do
                                    cache  s
                                        Nothing -&gt; do
                                            s &lt;- Stemming.new lang
                                            writeIORef (Map.insert lang s cache)
                                            return s 

stemmerCache :: IORef (M.Map Language Stemming.Stemmer)
stemmerCache = -- insert dirty top-level unsafePerformIO here
&lt;/code&gt;

This gives you a nice functional API at the price of making the implementation unsafe*-ridden. What do you think about such a thing?</description>
		<content:encoded><![CDATA[<p>That&#8217;s much better :)</p>
<p>I was thinking of an API like this myself:</p>
<p><code><br />
data Language = English | Dutch | Piraha | ...<br />
                            deriving (Eq, Ord)</p>
<p>stem :: Language -&gt; String -&gt; String<br />
stem lang word = unsafePerformIO $ Stemming.stem stemmer word<br />
    where stemmer = unsafePerformIO $ do<br />
                                    cache  s<br />
                                        Nothing -&gt; do<br />
                                            s &lt;- Stemming.new lang<br />
                                            writeIORef (Map.insert lang s cache)<br />
                                            return s </p>
<p>stemmerCache :: IORef (M.Map Language Stemming.Stemmer)<br />
stemmerCache = -- insert dirty top-level unsafePerformIO here<br />
</code></p>
<p>This gives you a nice functional API at the price of making the implementation unsafe*-ridden. What do you think about such a thing?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eelco Lempsink</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-90</link>
		<dc:creator>Eelco Lempsink</dc:creator>
		<pubDate>Fri, 18 Jul 2008 09:53:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-90</guid>
		<description>Reinier, you&#039;re absolutely right. That&#039;s definitely something that&#039;s planned for the next release ;)  It will probably look something like:

&lt;pre lang=&#039;haskell&#039;&gt;
withStemmer stemmerAlgorithm action = do
  stemmer &lt;- new stemmerAlgorithm
  result &lt;- action stemmer
  delete stemmer
  return result
&lt;/pre&gt;

What do you think?</description>
		<content:encoded><![CDATA[<p>Reinier, you&#8217;re absolutely right. That&#8217;s definitely something that&#8217;s planned for the next release ;)  It will probably look something like:</p>

<div class="wp_syntax"><div class="code"><pre class="haskell">withStemmer stemmerAlgorithm action <span style="color: #339933; font-weight: bold;">=</span> <span style="color: #06c; font-weight: bold;">do</span>
  stemmer <span style="color: #339933; font-weight: bold;">&lt;-</span> new stemmerAlgorithm
  result <span style="color: #339933; font-weight: bold;">&lt;-</span> action stemmer
  delete stemmer
  <span style="font-weight: bold;">return</span> result</pre></div></div>

<p>What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Lamers</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-87</link>
		<dc:creator>Reinier Lamers</dc:creator>
		<pubDate>Fri, 18 Jul 2008 09:06:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-87</guid>
		<description>It looks like you&#039;ve done little to hide the object-oriented interface. Is that a destructor that I see there? :)

Have you thought about making the API more Haskelllike? Perhaps you could create and destroy Stemmer objects behind the scenes in global variables (unsafePerformIO FTW).</description>
		<content:encoded><![CDATA[<p>It looks like you&#8217;ve done little to hide the object-oriented interface. Is that a destructor that I see there? :)</p>
<p>Have you thought about making the API more Haskelllike? Perhaps you could create and destroy Stemmer objects behind the scenes in global variables (unsafePerformIO FTW).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eelco Lempsink</title>
		<link>http://blog.tupil.com/stemming-with-haskell/comment-page-1/#comment-75</link>
		<dc:creator>Eelco Lempsink</dc:creator>
		<pubDate>Wed, 16 Jul 2008 17:19:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.tupil.com/?p=22#comment-75</guid>
		<description>Hi Duane, we will blog about it soon(ish), but we also released a &lt;a href=&#039;http://hackage.haskell.org/cgi-bin/hackage-scripts/package/sphinx&#039; rel=&quot;nofollow&quot;&gt;library&lt;/a&gt; and &lt;a href=&#039;http://hackage.haskell.org/cgi-bin/hackage-scripts/package/sphinx-cli&#039; rel=&quot;nofollow&quot;&gt;demo-app / cli&lt;/a&gt; to use &lt;a href=&#039;http://www.sphinxsearch.com/&#039; rel=&quot;nofollow&quot;&gt;Sphinx&lt;/a&gt; as a search backend. The library only deals with the searching part, for installing Sphinx, building indexes and running the search daemon you&#039;ll have to consult their docs.</description>
		<content:encoded><![CDATA[<p>Hi Duane, we will blog about it soon(ish), but we also released a <a href='http://hackage.haskell.org/cgi-bin/hackage-scripts/package/sphinx' rel="nofollow">library</a> and <a href='http://hackage.haskell.org/cgi-bin/hackage-scripts/package/sphinx-cli' rel="nofollow">demo-app / cli</a> to use <a href='http://www.sphinxsearch.com/' rel="nofollow">Sphinx</a> as a search backend. The library only deals with the searching part, for installing Sphinx, building indexes and running the search daemon you&#8217;ll have to consult their docs.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
