When I first took up the subject of Unicode implementation and support in 2001, I was dismayed to report that Unicode had made almost no difference for typical multilingual end-users because very few consumer applications supported Unicode in any meaningful way. When I picked up the topic again last year, the situation had improved dramatically, and I am happy to report that this improvement has accelerated over the last year, with Unicode support and implementation now increasingly robust.
For a long time, the problem with Unicode had been that the hard work of back-end engineering had to take place before really useful user-facing implementations of Unicode could appear. It didn’t matter from the user perspective if the OS (operating system) was entirely Unicode “under the hood” if that support was not accessible to the end user. For a long time, even such staples as Adobe Illustrator, Quark XPress and Microsoft Office (on the Mac at least) did not support Unicode in any meaningful way.
Fortunately, that has all begun to change (well, almost all, Quark XPress still does not support Unicode… more on that later), and recent months have seen really impressive gains in support for Unicode and OpenType, which is at present the only really commercially viable implementation of advanced Unicode type technology.
So let’s look at the progress that has been made…
Most major type foundries are fully engaged with OpenType, although there are some strange exceptions (such as the release of the beautiful Optima Nova in PostScript format). It really makes sense for foundries to supply OpenType fonts since it frees them from having to deliver platform-specific files (although Mac OS X can read Windows fonts, so this is less of an issue than it was a few years ago). It also lets them include lots of the “goodies” that users are coming to expect. Most of the fonts released do not support the “extra” characters needed to support non-Western European locales, but the move to OpenType is a good start, and for those who do need extended character sets, Adobe’s library of “Pro” fonts is growing and provides a good variety.
Although I mentioned it, it should be noted that FontLab 4.6, from FontLab, provides a good consumer-oriented font editing program that supports 99% of the OpenType features most consumers will ever want to deal with. And for those who want a beautiful typeface with an extensive (and growing) set of Roman, Greek and Cyrillic characters, check out Gentium. Gentium does not yet support any advanced OpenType features, but Gaultney is working on adding these capabilities to his font, and, best of all, it’s entirely free for most uses!
There is little to report in terms of improvements in Unicode fonts bundled with the Microsoft and Apple operating systems since last year: both had excellent support then. The only major change is that Apple’s rendering for some “exotic” scripts such as Devanagari has improved dramatically from sub-par to very good, and Apple has added support for some scripts that have regional importance such as Cherokee and Canadian Aboriginal characters (which have official standing in the Canadian province of Nunavut, and thus may be important to companies doing business in Canada).
One improvement is that Mac users (a stubborn lot as I can attest, having been a Mac user since 1984) are finally getting the message that OS 9 is really dead and that they need to move to OS X. Many of the limitations with regards to Unicode seen in Mac applications have had to do with providing back compatibility with OS 9, which had very poor support for Unicode. As developers have stopped worrying about OS 9 compatibility, they have been able to take full advantage of OS X’s multilingual support, and the results have been very positive. The release of Quark 6 (which is OS X native) removed the most substantial obstacle to OS X adoption in the broader Mac community, and uptake of OS X has been quite strong since then.
There is also little new to report with regards to Unicode support in web browsers. Microsoft is no longer developing Internet Explorer for Macintosh, and Safari has really taken over the Macintosh browser user base. Since Safari has excellent Unicode support (which Internet Explorer did not), as does Internet Explorer for Windows, Unicode web browsing is now a reality for most users, unless their companies have typical IT policies that force them to use software long after it is outdated.
Microsoft’s press for Microsoft Office 2004 for Macintosh curiously neglected the single most significant upgrade to the software from older versions (from a multilingual user’s standpoint, anyway). Office 2004 now has excellent support for Unicode, and supports all of OS X’s input methods.
Microsoft has put up pictures and demos of a number of new features in Office 2004, and Unicode support does make it on this highlights page, but it is relegated to the status of such mundane features as automatic error reporting. I’m sure that adding Unicode support required much more engineering effort than many of the higher-billed features, and the fact that it is listed so far down should tell us that multilingual support is not so important to the public at large as we might like, but I am nonetheless very impressed by the scope and capabilities of this upgrade.
Unicode output support in Word 2004 (Mac) and Word X (Mac). Note that most characters would be unavailable in Word X.
The only limitations I have found are that some complex scripts (notably Devanagari) are not supported, and Office won’t even display them (replacing the characters with box characters, a much better behavior than trying to interpret the characters as belonging to another code space, which is what would have happened only a few years ago). It also includes right-to-left support for Arabic, but does not seem to support contextual rendering of Arabic glyphs, limiting the usability of its support.
While some of the changes from Office X were widely derided in the press as being mere “eye candy,” the sorely-needed overhaul of the text engine in Office 2004 means that users in cross-platform multilingual shops should run, not walk, to upgrade to Office 2004. (Of course Windows users can be smug and point out that Office for Windows has had good support for Unicode for a number of years now.)
What can be said about Quark XPress? Quark just doesn’t seem to get Unicode or OpenType. It still charges more than twice the price of an English copy of Quark XPress for the “Passport” version that adds support for additional Western European languages (a capability that comes by default in InDesign). There is still no Unicode support at all, and the non-Roman versions now run a full two versions behind the English versions. (I understand that sim-ship might not be possible, but to have the non-Roman versions stuck at a version that is now nine years old is astounding.) I find it mystifying that Quark hasn’t jumped on Unicode since it would help cement contracts with some of its largest customers who require extensive multilingual support. Whatever happens, Quark still retains a large legacy base, and localization companies will be forced to deal with Quark’s lack of international support for some time to come.
Adobe InDesign is Quark’s primary competitor for the high-end DTP market, and it has been growing very aggressively. The newest release, called CS (Creative Suite), that came out in October 2003, does not add significantly to the already-excellent Unicode support, except that Mac users will now find that almost all of the input methods (keyboard layouts) now work (with InDesign 2, the only ones that worked were those that were essentially non-Unicode OS 9 input methods - Windows users did not have a similar limitation). For users dealing with Japanese, the Japanese version adds a number of features specific to Japanese publishing (vertical layout grids, Japanese binding edges, Japanese-specific text features, etc.), but both the Japanese and Roman versions share a single file format, meaning that files can be freely moved back and forth between them, unlike Quark with its a one-way trip into Asian versions.
Adobe PhotoShop and Illustrator
While Adobe’s release of the Creative Suite in October did not significantly impact InDesign with regards to Unicode support, it made a huge difference in PhotoShop and Illustrator. Prior to the CS versions, both PhotoShop and Illustrator had very limited multilingual and Unicode support, treating most OpenType fonts as though they were old-fashioned, single-byte fonts. One of Adobe’s major accomplishments in the CS versions was the harmonization of the text engines between InDesign, Illustrator and PhotoShop. Each one previously had very different text engines that had arisen over the course of each product’s history. Although the text engines are not entirely the same (and there are some minor interface inconsistencies between the programs), they all now support Unicode and OpenType, including many advanced OpenType features, as shown below:
Support for OpenType/Unicode in Adobe Illustrator CS
Users of Illustrator working with CJK (Chinese, Japanese, Korean) text may initially wonder where CJK support has gone since the CJK features visible in earlier versions of Illustrator appear to have vanished. The support has, in fact, greatly improved and now includes the ability to create virtual composite fonts. However, the display of Asian-specific text options is turned off by default. To turn it on, go to the Type & Auto-Tracing preference panel where Show Asian Options can be turned on.
After my last article, we received a few letters about limitations in Unicode support for Adobe Acrobat. The most substantial complaint was that the copy/paste operations out of Acrobat did not support Unicode, and there was no convenient way to extract non-Roman text from PDF files. Fortunately this has changed with Acrobat 6, which fully supports Unicode in copy and paste.
In addition, previous versions of Acrobat did not support search using Unicode characters, but version 6 does support Unicode in the search function, including full support for Arabic and Devanagari. Unfortunately, the designers of the user interface (UI) did not really take international needs into account, and there is no way to enlarge the box in which search strings are entered or to increase the font size of search strings. While the box is (barely) adequate for Roman text, many non-Roman scripts are essentially illegible at such a small size. This, of course, points out the need to consider international concerns in UI design in places developers may not consider. (As a workaround, I suggest typing search strings in another application, and then copying and pasting them into the Acrobat search dialog. Not an elegant solution, to be sure, but it gets the job done.)
Acrobat 6.0 search dialog showing Devanagari text. Note that the small size of the text renders it essentially illegible.
The reality promised by Unicode is becoming more and more apparent as companies move to adopt Unicode and make use of it to solve their multilingual needs. Although we will continue to see holdouts and will need to support legacy file formats for many years, Unicode is already impacting the tasks we do on a daily basis. It will continue to simplify processes and eliminate the troublesome character-conversion roadblocks that we used to deal with everyday.
By Arle Lommel