Loquendo
Interview with Richard Ishida by Paolo Baggia (Loquendo) - April 2006
[Paolo Baggia] I'm looking at your business card and I see you are the W3C Lead of Internationalization Activity. What does that mean exactly?

[Richard Ishida] The W3C Internationalization Activity has the mission of ensuring universal access to the Web, regardless of language, script or culture. It does this by proposing & coordinating any techniques, conventions, guidelines and activities within the W3C that help to make and keep the Web international. We tend to refer to ourselves as the i18n Activity, using the abbreviation for 'internationalization' that is used widely in the industry and means 'i' + 18 letters + 'n'.

There are two W3C staff working on internationalization at the W3C, myself and Felix Sasaki, but we are ably assisted by numerous participants from W3C member organizations and Invited Experts.

[P. B.] On the reverse of your business card I see 'W3C' followed by words in a great many languages. How many languages? Why?

[Richard Ishida] Well, there are 12 languages (Arabic, Hebrew, Inuktitut, Chinese, Khazakh, Hindi, Greek, Japanese, Korean, Panjabi, Telugu, Thai) on the back of the card, each written in a different script. It says, "Richard Ishida, Internationalization Activity, W3C" in each language. I actually have 18 languages at http://people.w3.org/rishida/articles/phrases, but I couldn't fit more than 12 on the card.

             Richard Ishida's Business Card

The idea is to remind people that English, despite its widespread use, is still just one of many languages used on the Web. It usually provides a great talking point, and people often try straight away to see how many languages or scripts they can recognize.

By the way, you'll also see that on the front, in addition to my Japanese name, I have a French snail mail address, a US-based main telephone number and a UK mobile phone number. So I try to live the internationalization experience ;-) Smile

[P. B.] When did you start to focus on the Internationalization of the Web?

[Richard Ishida] My initial focus on internationalization per se began in the early 90s while working for the localization group at Xerox. We had to struggle to translate user interfaces for the large Xerox printers and copiers, but many of the problems we faced were caused by incorrect assumptions or lack of flexibility in the products built in to the product by the developers and designers. Unfortunately, they were unaware that they were even causing these problems, since they never got involved in the localization process.

So I began feeding back information about how they could change their approach to design. That eventually became a full-time occupation, since internationalization wasn't something on many peoples' radar in those days.

By the late 90s, the Internationalization Working Group, and then Activity, had been established at the W3C, with Misha Wolf as the Working Group Chair, and Martin Dürst as the team contact. As the work at the W3C expanded, Martin needed help, so I joined the W3C in 2002.

[P.B.] What are the goals of this work?

[Richard Ishida] Our goal is to make the World Wide Web worldwide!

We review the technologies coming out of the W3C for internationalization related issues and feed those back to the Working Groups. We also develop our own specifications in i18n-related areas, such as the Character Model for the World Wide Web (guidelines for specification writers and implementers), the Ruby Annotation Specification, etc. We also try, at an early stage, to help Working Groups understand international requirements, and we develop material on the W3C Internationalization subsite (http://www.w3.org/International/) to help content authors and users of W3C technologies to better understand and use the international aspects of W3C technology.

[P. B.] What were the major successes? Where there any failures or missed opportunities?

[Richard Ishida] One major success was to establish Unicode as the base character set for all W3C specifications. This was very much down to much hard work by Martin Dürst and his colleagues, before I arrived on the scene. Having Unicode as the document character set in HTML and XML really simplifies things tremendously.

I can't think of many failures, but there is certainly still much to do. I often feel like the platoon leader who is completely surrounded and says "Right, chaps. We now have the enemy right where we want them. We can fire in any direction!".

[P. B.] What projects are under development these days?

[Richard Ishida] The Core Working Group is busy finishing off Character Model parts dealing with normalization on the Web and resource identifiers, as well as looking at locale identification in Web Services. They have recently commented on Working Drafts for the Pronunciation Lexicon Specification, CSS3 Selectors, Arabic mathematical notation, Extensible MultiModal Annotation markup language, and Web Content Accessibility Guidelines 2.0, but we already have plenty more on our radar (http://www.w3.org/International/core/reviews).

The GEO (Guidelines, Education & Outreach) Working Group continues to produce articles, tutorials and best practices documents that are available from http://www.w3.org/International/, and have been working recently on improving access to the now pretty large number of available articles. The new specification for declaring language values has just been approved by the IETF, and we will be busy very soon creating material to explain how that should be used, as well as looking at security issues surrounding the use of multiple scripts and languages in Web addresses.

And the ITS (Internationalization Tag Set) Working Group should hopefully publish a Last Call Working Draft that proposes standardised tags for inclusion in DTDs, XML Schemas and RelaxNG. These markup conventions should ensure that XML formats can be used by people around the world (for example, Arabic and Hebrew documents will need direction-related markup), but they are also targeting efficient localization. This strong focus on localization is relatively new for the Activity, and we are excited to have a number of excellent contributors from the localization field involved in this work.

[P. B.] Are there resources to learn more about this work?

[Richard Ishida] We hope that the Internationalization subsite http://www.w3.org/International/ contains a lot of useful information for people, and we encourage people to let us know if they don't find what they are looking for there.

We also have a public mailing list (http://lists.w3.org/Archives/Public/www-international/) which people can auto-subscribe to, and where they can ask questions and receive notification of new materials available.
Alternatively you could subscribe to the RSS feed for the Activity home page (recently converted to blog format) to receive notification of new materials, translations, etc.

By the way, we are always happy to receive translations of the material on our site. There are instructions at http://www.w3.org/International/2004/06/translation if you are interested.

And, particularly if you work for an organization that is a member of the W3C, if you feel you can help out, we'd always like to hear from you. For example, we always want help with reviewing W3C Working Drafts.

[P. B.] I'd like move the discussion on to speech technology, the area I know better. Do you think this area is affected by the problems of internationalization?

[Richard Ishida] Well, it's great to see the Loquendo is concerned with such issues. Certainly it is important to ensure that you establish from the outset what language you are dealing with in speech technology. We were very happy that Loquendo and other SSML folks took the incorporation of xml:lang attributes into their markup so seriously.

I was at a workshop in Beijing last November where we invited people to comment on language-specific improvements that could be incorporated into the specification, and we had a number of very interesting proposals. One related to dealing with Chinese, which has no spaces or other ways of marking word boundaries. There are some circumstances where the actual locations of word boundaries are ambiguous - so the same sentence could be read in more than one way. So the Chinese participants were asking for markup that could be used, specifically in cases of ambiguity, to identify the word boundaries.

There were other points raised about prosody and local phonetic alphabets, as well as an interesting contribution from some Polish folks who were talking about the difficulties of dealing with de-accented text in things like email and chat programs.

I'm looking forward to the next workshop, scheduled for May in Crete, and hoping that we'll get further insights into local needs, particularly from Middle Eastern and South and South-East Asian perspectives.

[P. B.] Do you have any advice to offer me and other people in the W3C that are interested in voice and multimodal interaction?

[Richard Ishida] I think you need to always keep your mind open to the idea that people with very different cultures and languages will want to use your technology or your formats. We all know that this is called the 'World Wide' Web, but it's easy to forget that sometimes and get wrapped up in your local issues.

Try, if you can, to develop on a foundation of internationalised technologies, such as Unicode, and ensure that you get requirements from people in multiple countries and test with people in multiple countries wherever possible. And try to find out about how other cultures and languages work. That's just great fun, as well as useful.

Even if you think your solution or approach is only going to be used by the local population, sit back from time to time and think whether you can make things more flexible. Many times I've seen people produce great solutions for needs that they later recognise in other countries and languages too, but they developed themselves into a situation where the reengineering costs to adapt their approach for other cultures was too high.

I'd also say, as I always do, that the W3C doesn't own the World Wide Web. If there are things you'd like to see done better, please get involved and help us.

[P. B.] Many thanks. I know you are also a good photographer. Would you like our readers to take a browse through your pictures?

[Richard Ishida] Thanks. I don't know that they're that good, but I like to share the nice things I've been lucky enough to see around the world. It's really great that digital cameras and the Web now make that possible in a way I always dreamed of since I was little. I think of my photos on http://www.flickr.com/photos/ishida/ as a way of expressing thoughts about life just as others do through diaries and the like, but through a newly empowered, visual medium. I'm very much a visual person, myself.


Richard Ishida

Richard Ishida joined the W3C (World Wide Web Consortium) in July 2002 to help expand the work of the Internationalisation Activity, particularly in the area of guidelines, education and outreach. His role at the W3C is to help make the World Wide Web worldwide! He is the Internationalization Activity Lead, and chair and staff contact for the GEO Working Group (Internationalization Guidelines, Education and Outreach). He participates in the ITS (Internationalized Tag Set) and the Internationalization Core Working Groups and is in the committee of several conferences on internationalization.

Prior to joining the W3C, he was an internationalization consultant attached to Xerox, evangelizing and educating people with regard to the international design and localizability of user interfaces and documents. He also worked on translation tool design.

You can find out more about his work on: http://www.w3.org/People/Ishida/