MULTILINGUALWEB

Standards and best practices for the Multilingual Web

Takeaways from the first two W3C Workshops “Multilingual Web” related to W3C ITS

The first two events related to the Thematic Network “Multilingual Web” provided a couple of opportunities to share information on the W3C Internationalization Tag Set (ITS). Presentations, in which ITS was mentioned included:

a. Best Practices and Standards for Improving Globalization-related Processes

b. W3C Internationalization Tag Set (ITS)

c. The Bricks to Build Tomorrow's Translation Technologies and Processes

d. Using ITS in the Common Content Formats

Especially the workshop in Pisa provided a couple of interesting ITS-related thoughts:

1. Several speakers mentioned that it would be good if content could be categorized in a standard way as "Generated by Machine Translation (MT)". I guess there are various ways of looking at this from an ITS point of view:

  • a. an additional data category with a semantics such as "generatedBy"
  • b. via a special, BCP47-compliant, value for the existing ITS data category "Language Information"; that special value may actually be a composite one since there may be a need to capture things like the following
    1. Name of MT system that generated
    2. Quality of the input
    3. (Semi-)official quality rating of the system (BLEU score or the like)

2. Several speakers explained that it would be good if content could be categorized in a standard way as "OK to be submitted to Natural Language Processing (NLP)". Example: In order to build models for statistical Machine Translation the Web is deemed to be an invaluable resource. However, some uncertainty seems to exist whether this use of Web-based content would be permitted or not. A standardized categorization could help. I guess there are various ways of looking at this from an ITS point of view: a. an additional data category with a semantics such as "nlpOK" b. something similar to the existing ITS data category "Localization Note" (namely one that captures information for machine processing, not for human consumption; see the discussion).

3. Charles McCathieNevile mentioned the addition of the notion of a default locale to the Widget Packaging and Configuration (see http://www.w3.org/TR/widgets/#widget-package ). This made me wonder if "defaultLocale" might not be something that could be useful in quite a number of contexts - and thus would be a candidate for an additional ITS data category. The Widget document actually initiated another localization related thought (namely that the Widget document should be required reading for anyone who works on standardized packaging for translation-related processes).

P.S.: The above is similar to post to the mailing list for the W3C ITS Interest Group.