Sunday, December 4, 2011

Hyphenation on the web

Basically, hyphenation is the splitting of words with dashes at the end of text lines. By using it, the text displayed on the screen can by gracefully justified or lined-wrapped, and shown in a neat column.
Up to now, there were two ways to get them: either setting soft hyphens in HTML from the server side (telling the browser where it can cut words by inserting the ­ character) or using a javascript library to hyphenate your text on the client side. You can now do it directly with CSS3 styling.

CSS3 hyphenation


By adding these three lines in your stylesheet, you will help your justified text to be properly displayed without adding huge blank spaces in the middle of the lines.
-webkit-hyphens: auto;
-moz-hyphens: auto;
hyphens: auto;

The three possible values are: none, manual and auto.

Well, you also need to tell the browser the language you are using if you want it to apply the correct hyphenation rules. For instance, German, English and French do not use the same rules.
To do so, you can use the HTML lang or the XML xml:lang attributes.

Firefox currently supports the following languages (depending on its version): Afrikaans (af), Bulgarian (bg), Catalan (ca), Swiss German (Traditional Orthography, de-CH), Danish (da), Dutch (nl), English (United States, en-US), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (Traditional Orthography, de-1901 and Reformed Orthography, de-1996), Hungarian (hu), Icelandic (is), Italian (it), Kurmanji (kmr), Latin (la), Lithuanian (lt), Mongolian (mn), Norwegian Bokmål (nb), Norwegian Nynorsk (nn), Portuguese (pt), Russian (ru), Serbo-Croatian (sh), Slovenian (sl), Spanish (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Upper Sorbian (hsb), and Welsh (cy).
Two artificial languages are already supported: Esperanto (eo) and Interlingua (ia).
I can't wait for Sindarin (sjn), Quenya (qya) or even Klingon (tlh) to surface here (Has someone ever defined Klingon hyphenation rules?).

Internet Explorer 10 supports Catalan (ca), Czech (cs), Danish (da), Dutch (nl), English (en, en-US), French (fr), Italian (it), Norwegian Bokmål (nb), Norwegian Nynorsk (nn), Polish, Portuguese (pt), Brazilian Portuguese, Russian (ru), Spanish (es), Swedish (sv) and Turkish (tr).

Javascript hyphenation


Hyphenator is a javascript library that does it all on the client side. The basic usage is as straightforward as linking to the script, setting up the hyphenator parameters and calling it on the text string to hyphenate. Here's an example with some neat parameters, such as useCSS3hyphenation that gives the hand back to CSS3 hyphenation whenever possible, safecopy which enables the copy/paste of the text with the soft hyphen out of the picture, and hyphenchar that enables you to choose the separating character (the HTML-encoded soft hyphen by default).
Hyphenator.config({
    minwordlength:2,
    remoteloading: false,
    useCSS3hyphenation: true,
    safecopy:true
});
Hyphenator.hyphenate('Please, hyphenate me, I am too long: pseudopseudohypoparathyroidism!');

There are other ways to use Hyphenator, as a bookmarklet, or to hyphenate different languages on the same page.

At the time of writing, the version 4.0.0 of that library already supports the following languages: Armenian, Belarusian, Bengali, Catalan, Czech, Danish, Dutch, English (US and GB), Finnish, French, German, Greek (monoton, polyton and ancient), Gujarati, Hindi, Hungarian, Italian, Kannada, Latin, Latvian, Lithuanian, Malayalam, Norwegian, Oriya, Panjabi, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Turkish, and Ukrainian.

Server-side hyphenation


The same hyphenation algorithms can be applied on the server-side to put as many soft hyphen HTML codes (­) in the words. However, as the display is taken care of by the browsers, it is better to let the browser taking care of it (besides, it can prevent the copy/paste side-effect where all the potential hyphens are shown in the copied text). 

Conclusion


With these three different ways to handle automatic hyphenation, there is no excuse anymore to have a text that does not behave beautifully even on a small space. My favorite method is the use of the Hyphenator javascript library, as it includes gracefully the CSS3 hyphenation.


Les césures automatiques (in French)
A hifenização automática (in Portuguese)
Los guiones de corte automático (in Spanish)

No comments:

Post a Comment