Standards and best practices for the Multilingual Web
Today, the World Wide Web is fundamental to communication in all walks of life. As the share of English web pages decreases and that of other languages increases, it is vitally important to ensure the multilingual success of the World Wide Web.
The MultilingualWeb initiative is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps. The core vehicle for this is a series of four events which are planned over a two year period.
On 15-16 March 2012 the W3C ran the fourth workshop in the series, in Luxembourg, entitled "The Multilingual Web – The Way Ahead". The Luxembourg workshop was hosted by the Directorate-General for Translation (DGT) of the European Commission. Piet Verleysen, European Commission - Resources Director, responsible for IT at the Directorate-General for Translation, gave a brief welcome address.
As for the previous workshops, the aim of this workshop was to survey, and introduce people to, currently available best practices and standards that are aimed at helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web. The key objective was to share information about existing initiatives and begin to identify gaps.
Like the workshop in Limerick, this event ran for one and a half days, and the final half day was dedicated to an Open Space discussion forum, in breakout sessions. Participants pooled ideas for discussion groups at the beginning of the morning, split into 7 breakout areas and reported back in a plenary session at the end of the morning. Participants could join whichever group they found interesting, and could switch groups at any point. During the reporting session participants in other groups could ask questions of or make comments about the findings of the group. This, once more, proved to be a popular part of the workshop. The final attendance count for the event was a little over 130.
As for previous workshops, we video-recorded the presenters and with the assistance of VideoLectures, made the video available on the Web. We were unable to stream the content live over the Web. We also once more made available live IRC scribing to help people follow the workshop remotely, and assist participants in the workshop itself. As before, people were tweeting about the conference and the speakers during the event, and you can see these linked from the program page.
The program and attendees continued to reflect the same wide range of interests and subject areas as in previous workshops and we once again had good representation from industry (content and localization related) as well as research.
In what follows, after a short summary of key highlights and recommendations, you will find a short summary of each talk accompanied by a selection of key messages in bulleted list form. Links are also provided to the IRC transcript (taken by scribes during the meeting), video recordings of the talk (where available), and the talk slides. All talks lasted 15 minutes. Finally, there are summaries of the breakout session findings, most of which are provided by the participants themselves. It is strongly recommended to watch the videos, where available, since these are short but carry much more detail.
What follows is an analysis and synthesis of ideas brought out during the workshop. It is very high level, and you should watch or follow the individual speakers talks to get a better understanding of the points made.
Our keynote speaker, Ivan Herman, described in a very easy to understand way the various technologies involved in the Semantic Web and the kinds of use cases that they address. Although it was easy to understand, it was very comprehensive and gave a good idea of the current status of the various technologies. He then posited some areas where the Semantic Web and the Multilingual Web communities could benefit each other. The Semantic Web has powerful technologies for categorizing knowledge, describing concepts, and interlinking information in different languages. This may help in binding information and translating, etc. The Multilingual Web community can offer advice on things such as how to describe the language of a literal or resource, how to refer to concepts across multilingual instantiations, and how to conceptualise the world across different cultures. It can also help to address the problems of Internationalized Resource Identifier equivalence.
In the Developers session we heard from Jan Anders Nelson about the importance of apps on Windows8, and saw a demo of tools that Microsoft is making available to app developers that support localization into an impressive number of languages. We also heard that Microsoft is paying greater attention these days to linguistic sub-markets, such as Spanish in the USA. Tony Graham introduced attendees to XSL-FO in a very entertaining way, and went on to make the case that community support is needed to uncover the requirements for text layout and typographic support on the Web if XSL-FO is to support users around the world. He raised the idea of setting up a W3C Community Group to address this, and called on people around the world to take action.
From Richard Ishida and Jirka Kosek we heard about various useful new features for multilingual text support being implemented or discussed in HTML5. These features are still being developed and participants are encouraged to participate in reviewing and discussing them. They include better support for bidirectional text in right-to-left languages, local annotation methods for Far-Eastern scripts (ruby), and the recent addition of a translate attribute, which allows you to specify what text should or should not be translated.
During the Creators session, Brian Teeman showed how the Joomla content management system has been improved to better support either translation or content adaptation. Joomla is a widely used authoring tool, and is largely supported by volunteer effort.
Loïc Dufresne de Virel talked about some of the problems Intel faced, and how they addressed those, and Gerard Meijssen discussed some of the issues involved in supporting the huge world of content in Wikipedia in over 300 languages. There are significant problems related to the availability of fonts and input mechanisms for many languages, but Gerard made a special point of improvements they need to CLDR's locale-specific data. Some issues relate to improving the CLDR interface, but like earlier speakers Gerard also called for wider participation from the public to supply needed information.
The Localizers session began with an an overview of the new MT@EC translation system that the European Commission is working on, from Spyridon Pilos. In terms of standards gaps, he called for standard data models, and structures for data storage and publication, so that less time is wasted on conversions. Matjaž Horvat then demonstrated a tool called Pontoon that is under development at Mozilla and that allows translators to translate Web pages in situ.
The session ended with a talk from Dave Lewis about the MultilingualWeb-LT project, that started recently under the aegis of a W3C Working Group. He showed examples of metadata that the group will be addressing, that ranges across creation, localization, consumption of content and language technologies and resources. The group aims to produce a new version of the ITS standard that is relevant not only to XML, but to HTML5 and content management systems. He called for public participation refining the requirements for the project.
In the Machines session we had overviews of best practices in the European Publications Office, and the Monnet and Enrycher projects.
Peter Schmitz showed how and why the Publications Office, which deals with a huge quantity of content for the European Union, has based its approach on standards, but has also been involved in developing standard approaches to various aspects of the work, such as Core metadata (restricted shared set of metadata for each resource based on Dublin Core, to enable global search), Common authority tables (harmonize metadata), Exchange protocol for EU legislative procedures (interoperability), and the European Legislative Identifier (ELI) (for interoperability). .
Paul Buitelaar packed into his presentation an impressive amount of information about the Monnet project and some of the issues they are grappling with related to multilingual use of the Semantic Web. They are exploring various research topics related to ontologies and domain semantics.
Tadej Štajner discussed the approaches used by the Enrycher project to identify named entities in content (such as place names), so that they can be handled appropriately during translation. Future work will need to address re-use of language and semantic resources to improve performance on NLP tasks across different languages, and lower the barriers for using this technology for enriching content within a CMS.
The Users session began with a talk from Annette Marino and Ad Vermijs from the European Commission's translation organization about how serious the Commission' takes its commitment to a multilingual web presence.
Murhaf Hossari then followed with description of a number of issues for people writing in right-to-left scripts where the Unicode Bidirectional Algorithm needs help. He called for changes to the algorithm itself to support these needs, rather than additional control characters or markup.
Nicoletta Calzolari introduced the new Multilingual Language Library, an LREC initiative to gather together all the linguistic knowledge the field is able to produce. The aim is to assemble a massive amounts of multi-dimensional data is the key to foster advancement in our knowledge about language & its mechanisms, in the way other sciences work, such as the Genome project. The information needs to be contributed and used by the community – requiring a change not only to technical tools, but more importantly to organizational thinking.
Fernando Serván brought the session and the first day to a close with a look into the recent experiences of the Food and Agriculture Organization of the United Nations. They have struggled with tools that require plain text, and with questions about how to address localization issues as their content goes out beyond the reach of their tool s onto social media. He called for increased interoperability. Each part of the process and each language resource has its own standard (TBX, TMX, XLIFF) but it is difficult to make them work together.
Over the course of the day we heard of many interesting initiatives where standards play a key role. There was a general concern for increasing and maintaining interoperability, and we heard repeated calls for greater public participation in initiatives to move forward the multilingual Web.
The second day was devoted to Open Space breakout sessions. The session topics were: MultilingualWeb-LT Conclusions; Semantic Resources and Machine Learning for Quality, Efficiency and Personalisation of Accessing Relevant Information over Language Borders; Speech Technologies for the Multilingual Web; MultilingualWeb, Linked Open Data & EC "Connecting Europe Facility"; Tools: issues, needs, trends; and,Multilingual Web Sites,
The discussions produced a good number of diverse ideas and opinions, and these are summarized at the bottom of this report. There are also links to slides used by the group leaders and video to accompany them, which give more details about the findings of each of the breakout groups.
Piet Verleysen, European Commission Resources Director, responsible for IT at the Directorate-General for Translation, welcomed the participants to Luxembourg and gave a brief speech encouraging people to come up with ideas that will make it easier to work with the Multilingual Web.
This was followed by a brief welcome from Kimmo Rossi, Project Officer for the MultilingualWeb project, and working at the European Commission, DG for Information Society and Media, Digital Content Directorate.
Iván Herman, Semantic Web Activity Lead at the W3C, gave the keynote speech, entitled "What is Happening in the W3C Semantic Web Activity?". In this talk he gave, in a short time, a information rich overview of the current work done at the W3C related to the Semantic Web, Linked Data, and related technical issues. The goal was not to give a detailed technical account but, rather, to give a general overview that could be used as a basis for further discussions on how that particular technology can be used for the general issue of Multilingual Web. Most of the presentation described the aims and goals of the Semantic Web work, and the various aspects of the technology, and described the status of each of these areas. Towards the end of the presentation, Ivan drew links between the Semantic Web and Multilingual Web, proposing that each could help the other. The bullet points below include some of the areas where that might be the case.
The developers Session was chaired by Reinhard Schäler of the The University of Limerick's Localisation Research Centre.
Jan Anders Nelson, Senior Program Manager Lead at Microsoft, started off the Developers session with a talk entitled, "Support for Multilingual Windows Web Apps (WWA) and HTML 5 in Windows 8". The talk looked at support for web apps running on Windows8 and stepped through the workflow of how a developer can optimize creation of multilingual applications, organizing their app projects to support translation and available related tools, that make creation of multilingual apps easy for anyone considering shipping in more than one market, perhaps for the first time. Other significant remarks:
Tony Graham, Consultant at Mentea, talked about "The XSL-FO meets the Tower of Babel". XSL Formatting Objects (XSL-FO) from the W3C is an XML vocabulary for specifying formatting semantics. While it shares many properties with CSS, it is most frequently used with paged media, such as formatting XML documents as PDF. XSL-FO 2.0 is currently under development, and one of its top-level requirements is for further improved non-Western language support. However, the requirement for improved support in XSL-FO 2.0 is actually less specific than the 1998 requirements for XSL 1.0 since the W3C recognized that they didn't have the knowledge and expertise to match their ambitions. For that, they would need more help -- either from individual experts or from the W3C forming more task forces along the lines of the Japanese Layout Task Force to capture and distill expertise for use by all of the W3C and beyond. Other significant remarks:
Richard Ishida, Internationalization Lead at the W3C, and Jirka Kosek, XML Guru at the University of Economics, Prague, co-presented "HTML5 i18n: A report from the front line". The talk briefed attendees on developments related to some key markup changes in HTML5. In addition to new markup to support bidirectional text (eg. in scripts such as Arabic, Hebrew and Thaana), the Internationalization Working Group at the W3C has been proposing changes to the current HTML5 ruby markup model. Ruby are annotations used in Japanese and Chinese to help readers recognize and understand ideographic characters. It is commonly used in educational material, manga, and can help with accessibility. The Working Group has also proposed the addition of a flag to indicate when text in a page should not be translated. This talk delivered an up-to-the-minute status on the progress made in these areas. Other significant remarks:
The Developers session on the first day ended with a Q&A period with questions about Indic layout, use of CSS and translate flags, applicability of new HTML5 features to other formats, fine-grained identification of language variants for localization with the MAT tool, extensions to the translate flag, and formats supported by the MAT tool. For more details, see the related links.
This session was chaired by Jan Nelson of Microsoft.
Brian Teeman, Director School of Joomla! JoomlalShack University, gave the first talk for the Creators session, "Building Multi-Lingual Web Sites with Joomla! the leading open source CMS". Joomla is the leading Open Source CMS used by over 2.8% of the web and by over 3000 government web sites. With the release in Jan 2012 of the latest version (2.5) building and creating truly multi-lingual web sites has never been easier. This presentation showed how easy it is to build real multi-lingual web sites and not to rely on automated translation tools. Other significant remarks:
Loïc Dufresne de Virel, Localization Strategist at Intel Corp., spoke on "How standards (could) support a more efficient web localization process by making CMS - TMS integrations less complicated". As Intel just deployed a new Web Content Management system, which they integrated into their TMS, they had to deal with multiple challenges, and also faced a great deal of complexity and customization. In this 15-min talk, Loïc looked into what Intel did well, what they did wrong, what we could have done better, and attempted to put a dollar figure on the cost of ignoring a few standards. Other significant remarks:
Gerard Meijssen, Internationalization/Localization Outreach Consultant at the Wikimedia Foundation, presented "Translation and localisation in 300+ languages ... with volunteers. There are over 270 Wikipedias and over 30 languages are requesting one. These languages represent most scripts and represent small and large populations. The Wikimedia Foundation enables the visibility of text with web fonts and it supports input methods. There is a big multi-application localization platform at translatewiki.net and they are implementing translation tools for their "pages" for documentation and communication to their communities. To do this, they rely on standards. Standards get more relevance as they are implemented in more and more places in their software. Some standards don't support all the languages that the Wikimedia Foundation supports. Other significant remarks:
The Q/A part of the Creator session included questions about support for users of Wikipedia in multiple languages, which standard Wikipedia uses for language tagging, how to address the need for a more standard approach to content development in general, how to motivate people to localize for free, and whether Joomla supports styling changes at the point of translation. For more details, see the related links.
This session was chaired by Arle Lommel of GALA and DFKI.
Spyridon Pilos, Head of Language Applications Sector in the Directorate General for Translation at the European Commission, talked about "The Machine Translation Service of the European Commission". The Directorate General for Translation (DGT) has been developing, since October 2010, a new data-driven machine translation service for the European Commission. MT@EC should be operational in the second semester of 2013. One of the key requirements is for the service to be flexible and open: it should enable, on one hand, the use of any type of language resource and any type of MT technology and, on the other, facilitate easy access by any client (individual or service). Spyridon presented the approach taken and highlighted problems identified, as pointers to broader needs that should be addressed. Other significant remarks:
Matjaž Horvat, L10n driver at Mozilla, talked about "Live website localization". He reported on Pontoon, a live website localization tool developed at Mozilla. Instead of extracting website strings and then merging translated strings back, Pontoon can turn any website into editable mode. This enables localizers to translate websites in-place and also provides context and spatial limitations. At the end of the presentation, Matjaž ran through a demo. You can see a limited version of the demo page.
David Lewis, Research Lecturer at the Centre for Next Generation Localisation at Trinity College Dublin, spoke about "Meta-data interoperability between CMS, localization and machine translation: Use Cases and Technical Challenges". January 2012 saw the kick-off of the MLW-LT ("MultilingualWeb - Language Technologies") Working Group at the W3C as part of the Internationalization Activity. This WG will define metadata for web content (mainly HTML5) and "deep Web" content (CMS or XML files from which HTML pages are generated) that facilitates content interaction with multilingual language technologies such as machine translation, and localization processes. The Working Group brings together localization and content management companies with content and language metadata research expertise, including strong representation from the Centre for Next Generation Localisation. This talk presented three concrete business use cases that span CMS, localization and machine translation functions. It discussed the challenges in addressing these cases with existing metadata (e.g. ITS tags) and the technical requirements for additional standardised metadata. (This talk was complemented by a breakout session to allow attendees to voice their comments and requirements in more detail, in order to better inform the working group. See below.) Other significant remarks:
During the Q&A session questions were raised about crowd-sourcing issues for the Pontoon approach, and how extensible the Pontoon approach is for wider usage. There was some discussion of open data policies related to the MT@EC system and opportunities for collaborative work (eg. with Adobe), as well as whether it makes sense to control source text for machine translation. For details, see the related links.
This session was chaired by Felix Sasaki of DFKI.
Peter Schmitz, Head of Unit "Enterprise Architecture"at the European Commission, talked about "Common Access to EU Information based on semantic technology". Publications Office is setting up a common repository to make available at a single place all metadata and digital content related to public official EU information (law and publications) in a harmonised and standardised way in order: to guarantee to the citizen a better access to law and publications of the European Union; to encourage and facilitate reuse of content and metadata by professionals and experts. The common repository is based on semantic technology. At least all official languages of the EU are supported by the system, thus the system is a practical example of a multilingual system accessible through the Web. Other significant remarks:
Paul Buitelaar, Senior Research Fellow at the National University of Ireland, Galway, spoke about the "Ontology Lexicalisation and Localisation for the Multilingual Semantic Web". Although knowledge processing on the Semantic Web is inherently language-independent, human interaction with semantically structured and linked data will be text or speech based – in multiple languages. Semantic Web development is therefore increasingly concerned with issues in multilingual querying and rendering of web knowledge and linked data. The Monnet project on 'Multilingual Ontologies for Networked Knowledge' provides solutions for this by offering methods for lexicalizing and translating knowledge structures, such as ontologies and linked data vocabularies. The talk discussed challenges and solutions in ontology lexicalization and translation (localization) by way of several use cases that are under development in the context of the Monnet project. Other significant remarks:
Tadej Štajner, Researcher at the Jožef Stefan Institute, gave a talk about "Cross-lingual named entity disambiguation for concept translation". The talk focused on experience at the Jožef Stefan Institute in developing an integrated natural language processing pipeline, consisting of several distinct components, operating across multiple languages. He demonstrated a cross-language information retrieval method that enables reuse of the same language resources across languages, by using a knowledge base in one language to disambiguate named entities in text, written in another language, as developed in the Enrycher system (enrycher.ijs.si). He discussed the architectural implications of this ability on the development practices and its prospects as a tool for automated translation of specific concepts and phrases in a content management system. Other significant remarks:
Among the topics discussed during the Q&A session was a question about whether the CELLAR project is sharing with other data sets, such as government data, how domain lexicon generation is done in Monnet, how Enrycher name disambiguation deals with city names that occur many times in one country, and how machine translation relates to the identification of language neutral entities in Monnet and Enrycher. For the details, follow the related links.
This session was chaired by Tadej Štajner of the Jožef Stefan Institute.
Annette Marino, Head of Web Translation Unit at the European Commission, and Ad Vermijs, DGT Translator, started the Users session with a talk entitled "Web translation, public service & participation". For most Europeans, the internet provides the only chance they have for direct contact with the EU. But how can the Commission possibly inform, communicate and interact with the public if they don't speak their language on the web? With the recent launch of the European Citizens' Initiative website in 23 languages, there's no doubting the role of web translation in participatory democracy, or the Commission's commitment to a multilingual web presence. But as well as enthusiasm, they need understanding – of how people use websites and social media, and what they want – so that they can make best use of translation resources to serve real needs. As the internet evolves, the Commission is on a steep learning curve, working to keep up with the possibilities – and pitfalls – of web communication in a wide range of languages. Other significant remarks:
Murhaf Hossari, Localization Engineer at University College Dublin, talked about how "Localizing to right-to-left languages: main issues and best practices". Internationalization and localization efforts need to take extra care when dealing with right-to-left languages due to specific features those languages have. Many localization issues are specific to right-to-left languages. The talk attempted to categorize those issues that face localizers when dealing with right-to-left languages with special focus on text direction part and handling bidirectionality. The talk also mentioned best practices and areas for improvements. Other significant remarks:
Nicoletta Calzolari, Director of Research at CNR-ILC, talked about how "The Multilingual Language Library". The Language Library is a quite new initiative – started with LREC 2012 – conceived as a facility for gathering and making available, through simple functionalities, all the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the Language Technology community. Its main characteristic is to be collaboratively built, with the entire community providing/enriching language resources by annotating/processing language data and freely using them. The aim is to exploit today's trend towards sharing for initiating a collective movement that works also towards creating synergies and harmonization among different annotation efforts that are now dispersed. The Language Library could be considered as the beginning of a big Genome project for languages, where the community will collectively deposit/create increasingly rich and multi-layered linguistic resources, enabling a deeper understanding of the complex relations between different annotation layers. Other significant remarks:
Fernando Serván, Senior Programme Officer at the Food and Agriculture Organization of the United Nations, talked about how "Repurposing language resources for multilingual websites". This presentation addressed lessons learned regarding the re-use of language resources (translation memories in TMX, terminology databases in TBX) to improve content and language versions of web pages. It addresses the need for better integration between existing standards, the needs for interoperability and the areas where standards and best practices could help organizations with a multilingual mandate. Other significant remarks:
The Q&A, session dealt with questions about whether the data in the Language Library is freely licensed, and what that means, and how to assess the quality of the data that is being gathered. Other questions: whether the DGT is translating social media content, and how the FAO addresses quality issues with MT and whether they use SKOS. For the details, follow the related links.
This session was chaired by Jaap van der Meer of TAUS.
Workshop participants were asked to suggest topics for discussion on small pieces of paper that were then stuck on a set of whiteboards. Jaap then lead the group in grouping the ideas and selecting a number of topics for breakout sessions. People voted for the discussion they wanted to participate in, and a group chair was chosen to facilitate. The participants then separated into breakout areas for the discussion, and near the end of the workshop met together again in plenary to discuss the findings of each group. Participants were able to move between breakout groups.
At Luxembourg we split into the following groups:
Summaries of the findings of these groups are provided below, some of which have been contributed by the breakout group chair.
This session was facilitated by David Lewis, one of the co-chairs of the new MultilingualWeb-Language Technology W3C workgroup. He introduced MultilingualWeb-LT as the successor to the ITS standard, but with the aim to better integrate localization, machine translation and text analytics technologies with content management. Therefore, the standardization of metadata for interoperability between processes in content creation, through localization/translation to publication was now in scope. MultilingualWeb-LT would retain successful features of ITS, including support for current ITS data categories, the separation of data category definitions from its mapping to a specific implementation and the independence between data categories so that conformance could be claim by implementing any one. It would however address implementation in HTML5, using existing metadata annotation such as RDFa and microdata and round-trip interoperability with XLIFF.
The session focussed largely on the general characteristics of data categories more than on specific data categories, e.g. whether it was generated automatically or manually (or both) or whether it affected the structure of a HTML document or the interaction with other metadata processing, e.g. for style sheets or accessibility. One specific data category to emerge was the priority assigned to content for translation. A point that repeatedly emerged was the importance of relating metadata definitions to specific processes. This would influence how metadata for processing provenance and processing instruction would be represented and was also important in defining processing expectations. The challenge identified here was that process flow definitions are intimately linked to localization contracts and service level agreements. Therefore trying to standardize process models could meet resistance due to the homogenizing effect on business models and service offerings. CMS-based workflows that were contained within a single content-creating organization and assembled from multiple LT components, rather than outsourced to LSPs, may be more accepting of a common process definition. It was recognized, however, that language technologies, including machine translation and crowd-sourcing may mean that many process boundaries will change, quickly dating any standardised process model.
No short summary of this session was provided. Please see the slides and video links.
The web is becoming more vocal and speech contents are increasingly being brought onto the Internet in a variety of forms: e-learning classes, motivational talks, broadcast news captioning, among others. These contents strengthen and make more appealing and dynamic the information access. The Discussion Group on Speech Technologies for the Multilingual Web concluded that speech technology needs to be discussed, and best practices and standards are needed to address current gaps in the web's speech contents, pointing out the socioeconomic value of spoken language in enabling more efficient human communication, helping businesses advertising, marketing and selling their products and services, addressing educational inclusion and special communication needs, and creating new opportunities to spread and share knowledge on a global scale.
Kimmo Rossi (European Commission; EC) and Christian Lieske (SAP AG) proposed to look into the status quo, and possible actions concerning the intersection of the MultilingualWeb Thematic Network, Linked Open Data & the EC's "Connecting Europe Facility". The breakout attracted attendees from constituencies such as users/service requesters (e.g. users and implementers of Machine Translation Systems), facilitators (e.g. the EC), Service providers (e.g. for translation-related services), and enablers (World Wide Web consortium, W3C). Outcomes of the breakout included a first step towards a mutual understanding of the topic and subtopics, information on actions that already have been started, and suggestions for follow-ups
The general question was: How should work started in Multilingual Web (MLW) Network continue with a view towards Linked Open Data (LOD) and Connecting Europe Facility (CEF)? Currently, it is still hard to find good freely (re)usable NLP resources (MT systems, rules for them, dictionaries etc.) as LOD on the Web. In addition, a change of paradigm for data creation is needed; currently it is hard to have a unified "picture" across languages (example "population of Amsterdam" differs across Wikipedia language versions). A new, language technology supported paradigm, pursued e.g. in the Monnet project, could be to create resources in one language or even language agnostic/neutral and use (automatic) approaches to generate resources across languages
The attendees committed to some immediate action items, like: making the relationship between MLW, linked open data and CEF facility easy to understand for everyone; gather more use cases for language resources and language related service; and further discussions about this topic at dedicated events, like the upcoming MultilingualWeb-LT workshop (11 June, Dublin). Further information about the session is available in an extended session report.
Tools development for language services plays an important role to bring innovations and simplify everyday work. Due to this it requires attention from all stakeholders: tolls developers, tools users on different levels, including customers and vendors. However the process is not smooth. The issues detected during discussion are:
The main conclusions are to get organized and produce standards.
An approach is to create a Multilingual Web Sites Community Group at the W3C; in the mean time, this has been done.
The starting point is the document Open architecture for multilingual web sites. The final document might be quite different; this document also helps as a primer.
Following a preliminary exchange, there was a round-robin where each participant explained his/her point of view. By the end, it seemed that the participants were describing the same object from different angles and that this break-out group could only scratch the surface of the problem.
Tomas Carrasco filled in some background information, in particular the point of view of the user and the webmaster; and techniques such as Transparent Content Negotiation (TCN). Multilingual web sites (MWS) has two main stakeholders:
What was generally represented poorly in the talks during the 4th MultilingualWeb workshop entitled "The Way Ahead" was the debate about multilingual content itself and its handling: what does one present in which language for whom? How to avoid a localization that the user relies on, but that undermines their confidence in the content translated into their language, etc.? The Open Space discussions that took place on 16 March provided an opportunity to discuss these questions.
The group, composed of participants in other European institutions, of the FAO and of Lionbridge, arrived at the following conclusions:
Author: Richard Ishida. Contributors: Scribes for the workshop sessions, Jirka Kosek, Arle Lommel, Felix Sasaki, Reinhard Schäler, and, Charles McCathieNevile. Photos in the collage at the top and various other photos, courtesy Richard Ishida. Video recording by the European Commission, and hosting of the video content VideoLectures.