Adding Hyphenation to NSString

Khoi Vinh recently showed that the typesetting in Apple’s iBooks is quite horrendous. One obvious problem is that the text is layout with justification (which is probably an appropriate decision when typesetting books), but lacks hyphenation. John Gruber does not approve.

The fact is, there are pretty good algorithms for hyphenation.  The Hunspell project has a library that powers, among other projects, OpenOffice.org and extends the algorithm that was implemented for TeX long ago.  Time to bring some of that goodness to Cocoa!

I implemented a simple category for NSString to add UTF-8 soft hyphens to a string. In this post I show how to use it in your project, including some examples.

Setup

First, gather all required files:

Second, after unzipping put the code in place.

  • Add the NSString+Hyphenate .h and .m file to your project.
  • To statically add the hyphen library to your project add the hyphen.h, hyphen.c and hnjalloc.h, hnjalloc.c files.

Third, and finally, add the .dic files to the Hyphenate.bundle and add the bundle to your project. Your project source tree should now contain all the necessary files and look something like this.

Usage

The Hyphenate category gives you one method: -stringByHyphenatingWithLocale:. Its usage is straightforward:

    NSString* text = @"It was in the fourth year of my apprenticeship to Joe, and it was a Saturday night.";
    NSLocale* en = [[[NSLocale alloc] initWithLocaleIdentifier:@"en_US"] autorelease];
    NSString* hyphenated = [text stringByHyphenatingWithLocale:en];

UIKit has limited support for the soft hyphen. This is the result for setting the string above to a UILabel, UITextView and UIWebView respectively.

As you can see, UILabel will simply display all soft hyphens. The behavior of UITextView and UIWebView is more useful: the soft hyphen is shown only when needed and it allows word wrapping.

Using (X)HTML

Since UITextView is pretty limited in how much you can style and typeset text, a UIWebView will usually be the way to go for displaying nicely looking, hyphenated text.

Obviously running -stringByHyphenatingWithLocale: on an HTML document will not give the required result. Unfortunately, unless you are willing to use libxml2 directly, your options for working with XML documents on the iPhone are limited.

The best option (as far as I know) is to use TouchXML, a friendly wrapper for libxml2 with an API that mimics Cocoa’s NSXML* classes. However, TouchXML only supports reading XML documents, not creating them. To apply hyphenation, we would need at least a way to modify text nodes. Luckily that turned out to only require a small change to TouchXML, which you can find as a patch in the hyphenate repository.

Next, after patching and setting up TouchXML, we use a simple XPath expression to fetch all the text nodes and modify each.

    CXMLDocument* document = [[[CXMLDocument alloc] init...] autorelease];
 
    NSArray* textNodes = [document nodesForXPath:@"//body//text()" error:NULL];
    for (CXMLNode* node in textNodes) {
       [node setStringValue:[[node stringValue] stringByHyphenatingWithLocale:en]];
    }
 
    NSString* hyphenatedDocument = [[[NSString alloc] 
                                     initWithData:[document XMLData] 
                                     encoding:NSUTF8StringEncoding] autorelease];

Note that this code is a bit of a oversimplification. Whether this simple XPath expression is appropriate for you wholly depends on your actual documents.

Compare the results with and without hyphenation:

For more information and documentation, check out the hyphenate repository on GitHub.

(Texts from Charles Dicken’s Great Expectations.)

Update 1: I have now found KissXML to be a better option than TouchXML for my purposes. Also, it supports setting the text nodes out of the box, no patching necessary!

Update 2: Frank Zheng has figured out a simple solution to use hyphenation in Core Text. See his blogpost for more information. Thanks, Frank!


About this entry