Limerick Workshop

2011 Limerick Workshop Program

W3C Workshop Program:
A Local Focus for the Multilingual Web
21-22 September 2011, Limerick, Ireland

Workshop sponsors

Become a sponsor.

The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. Coordinated by the W3C, the project aims to raise the visibility of existing best practices and standards and identify gaps. This third workshop in Limerick, Ireland, was hosted by the University of Limerick. The workshop was co-located with the 16th Annual LRC Conference.

Each main session on the first day contained a series of 15 minute talks followed by some time for questions and answers. On the second day, the workshop lasted for the morning only, and was dedicated to an Open Space discussion forum, where participants can discuss the themes of the workshop in breakout sessions. This was organized by TAUS. All attendees participated in all sessions.

The IRC log is the raw scribe log, which has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC was used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following on IRC can also add contributions to the flow of text themselves.

Where no link is provided to slides, we are still waiting to receive them. Some video links are unavailable because the speaker requested it. In two cases the speaker was unable to attend the workshop, but their slides are available. You can also find links to all videos on the VideoLectures workshop page. Thanks to VideoLectures for hosting the videos.

Related links: Workshop report • About W3C

21 September

0900

Welcome

Kieran Hodnett

Dean of the Faculty of Science and Engineering, University of Limerick

Brief welcome address

Richard Ishida

W3C Internationalization Activity Lead & MultilingualWeb Project Coordinator

Workshop logistics

0920

Keynote

Daniel Glazman

Disruptive Innovations / W3C CSS Working Group Co-Chair

Babel 2012 on the Web

abstract

If Open Web and Internet Standards were mostly western-centric in the early years, things have drastically changed. English is not any more the most common language on the net and the various standard bodies have improved the support for the languages and scripts of the world. The new cool kids on the block 2012 will be HTML5, CSS3, EPUB3 and this talk will show you how Standards are paving the way for the Multilingual Web.

Slides

IRC

Video

1000

Break

1030

Developers

Dr. David Filip

LRC/CNGL/LT-Web

MultilingualWeb-LT: Meta-data interoperability between Web CMS, Localization tools and Language Technologies at the W3C

abstract

MLW-LT, an FP7 funded coordination action, is going to set up a W3C Working Group (WG) for standardizing metadata exchange between Web CMS, Localization Tools and Language Technologies. This session will open the public discussion of the WG Charter and encourage participation in the WG from outside of the initial EC funded consortium. The WG aims to address three major interoperability gaps in the multilingual web content lifecycle, namely between Deep Web meta-data and localization (L10n); Surface Web meta-data and Real time Machine Translation; and Deep Web meta-data and meta-data driven MT training. Addressing these gaps will include alignment with other existing and ongoing LT and L10n standardization activities; prominently W3C ITS and OASIS XLIFF TC effort, as XLIFF will be used for prototyping MLW-LT metadata round-trips in the three main scenarios outlined above.

Slides

IRC

Video

Christian Lieske

SAP

The journey of the W3C Internationalization Tag Set - current location and possible itinerary

abstract

The W3C Internationalization Tag Set (ITS) is an enabler for the internationalization and localization of content. Although ITS is a rather young standard, its uptake has been impressive. One reason behind this are the activities of the ITS Interest Group (ITS IG) which promotes its adoption, and gathers feedback. This presentation will sketch insights of the ITS IG. The following will be covered: 1. Brief Introduction to W3C ITS; 2. Review of ITS use in commercial and open source tools; 3. Existing Rule Sets; 4. Overview of suggested enhancements; 5. Relationships; 6. Outlook (Contributors: Yves Savourel, Jirka Kosek, Felix Sasaki, Richard Ishida, Christian Lieske)

Slides

IRC

No Video

Gunnar Bittersmann

brands4friends

CSS & i18n: dos and don'ts when styling multilingual Web sites

abstract

The talk covers best practices and pitfalls when dealing with languages that create large compound words (like German), languages with special capitalization rules (again, like German), or languages written in right-to-left scripts. This includes things like box sizes, box shadows and corners, image replacement etc. It also covers benefits that new CSS 3 properties and values offer in terms of internationalization, a discussion wheather the :lang pseudo-class selector meets all needs or if there's more to wish for, and how to implement style sheets for various languages and scripts (all rules in a single file or spread over multiple files?). The talk will be of rather practical than theoretical nature.

Slides

IRC

Video

[Chair, Tadej Štajner • Scribe, Jirka Kosek]

1115

Q&A

IRC

No Video

1130

Creators

Moritz Hellwig

Cocomore

CMS and Localisation – Challenges in Multilingual Web Content Management

abstract

Content Management Systems (CMS) have come to be widely used to provide and manage content on the Web. As such, CMS are increasingly used for multilingual content, which presents new challenges to developers and content providers. This presentation will explore these challenges and show how and why a closer alignment of CMS developers and LSP can improve translation management, workflows and quality.

Slides

IRC

No Video

Danielle Boßlet

Translator

Multilinguality on Health Care Websites – Local Multi-Cultural Challenges

abstract

Globally acting health care organisations like the World Health Organization have to present their websites in a variety of languages to make sure that as many people as possible can benefit from their online offer. The same applies to the European Union, which publishes its official documents in 23 languages and therefore has to guarantee that its websites are equally multilingual. Due to the fact that Germany is a country with a large number of immigrants, the government and other official institutions would do well to present their websites not only in German or English, but also in other languages, like Turkish or Russian. The websites of the WHO, the EU and some German institutions were checked for their multilingual offer and possible shortcomings of the different language versions. The severest and most frequent shortcomings and their consequences for users will be highlighted in this talk.

Slides

IRC

Video

Lise Bissonnette Janody

Dot-Connection

Balance and Compromise: Issues in Content Localization

abstract

Web content managers need to make choices with respect to the content they translate and localize on their websites. What guides these decisions? When in the process should they be made? What are their impacts? This talk provides a high-level overview of these choices, and how they fit into the overall content strategy cycle.

Slides

IRC

Video

[Chair, Charles McCathieNevile • Scribe, Christian Lieske]

1215

Q&A

IRC

Video

1230

Lunch

1345

Localizers

Matthias Heyn

SDL

Efficient translation production for the Multilingual Web

abstract

The translation editor has seen major technological advances over the last years. Compared to classic translation memory applications, current systems allow expert users to double, if not triple, the amount of words translated. Whereas the key technology advances are in the area of sub-segment reuse and statistical machine translation (SMT), the actual productivity gains relate to the ergonomics of how systems allow users to interact, control and automate the various data sources. This presentation will review key capabilities on the various document, segment and sub-segment levels like: Document level SMT, TrustScore, dynamic routing, dynamic preview; Match type differentiation, Auto-propagation, SMT integration and SMT configurations, segment-level SMT trust scores and feedback cycles (segment level); Auto-suggest dictionary and phrase completions (sub-segment level). The discussed capabilities will be brought into perspective of how the vast amount of multilingual online content are affected by such innovation.

Slides

IRC

Video

Asanka Wasala

LRC/CNGL

A Micro Crowdsourcing Architecture to Localize Web Content for Less-Resourced Languages

abstract

We will report on a novel browser extension-based client-server architecture using open standards that allows localization of web content using the power of the crowd. We address issues related to MT-based solutions and propose an alternative approach based on translation memories (TMs). The approach is inspired by Exton et al. (2009) on real-time localization of desktop software using the crowd and Wasala and Weerasngihe (2008) on browser based pop-up dictionary extensions. The architectural approach chosen enables in-context real-time localization of web content supported by the crowd. To best of our knowledge, this is the only practical web content localization methodology currently being proposed that incorporates Translation Memories. The approach also supports the building of resources such as parallel corpora – resources that are still not available for many, but especially for under-served languages.

Slides

IRC

Video

Sukumar Munshi

Across Systems

Interoperability standards in the localization industry – Status today and opportunities for the future

abstract

Interoperability and related standards are topics still frequently and controversially discussed. While standards such as TMX and TBX are established within the industry, others, such as XLIFF are rated differently and not that widely implemented. This presentation is covering the current status of interoperability in the localization and translation industry, historical development, understanding of interoperability, related business requirements, effects on delivery models, interoperability between tools, open standards, current challenges and opportunities for the future.

Slides

IRC

No Video

[Chair, Christian Lieske • Scribe, Felix Sasaki]

1430

Q&A

IRC

No Video

1445

Machines

Thomas Dohmen

SemLab

The use of SMT in financial news sentiment analysis

abstract

Statistical Machine Translation systems are a welcome development for news analytics. They enable topic-specific translation services, but are not without problems. The SMT system that is developed for the Let'sMT (FP7) project is trained and used to translate financial news for SemLab's news sentiment analysis platform. This talk will give an example of the benefits and problems of integrating such systems.

Slides

IRC

Video

Sebastian Hellmann

University of Leipzig

NLP Interchange Format (NIF)

abstract

NIF is an RDF/OWL-based format that allows to combine and chain several NLP tools in a flexible, light-weight way. The core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Based on these URIs, annotations can be interchanged between different NLP tools. Although NLP Tools are abundantly available on all linguistic levels for the English language, this is often not the case for languages with fewer speakers. Thus, it becomes especially necessary to create a format that allows the integration and interoperability of NLP tools. Web site: http://aksw.org/Projects/NIF . With respect to multilinguality, two use cases come to mind: 1. an already existing English software system, that uses an English NLP tool needs to be ported to another language. The NLP tool for the other language is not compatible to the system, because there is no common interface (Example: A CMS with keyword extraction). 2. Paragraphs in different kinds of documents can be annotated in RDF with multilingual translations that can potentially remain stable over the life-time of a document. Especially, the introduced URI recipe (Context-Hash) possesses advantageous properties, which withstand comparison to other URI naming approaches.

Slides

IRC

Video

Yoshihiko Hayashi

Osaka University

LMF-aware Web services for accessing lexical resources

abstract

This talk will demonstrate that Lexical Markup Framework (LMF), the ISO standard for modeling and representing lexicons, can be nicely applied to the design and implementation of lexicon access Web services, in particular, when the service is designed with so-called RESTful style. As the implemented prototype service provides access to bilingual/multilingual semantic resources, in addition to standard WordNets, slight revisions to the LMF specification will also be proposed.

Slides

IRC

Video

[Chair, Felix Sasaki • Scribe, Dag Schmidtke]

1530

Q&A

IRC

Video

1545

Break

1615

Users

Alexander O'Connor

CNGL/Trinity College Dublin

Digital Content Management Standards for the Personalised Multlingual Web

abstract

The World Wide Web is at a critical phase in its evolution right now. The user experience is no longer limited to a single offering in a single language. Localisation has offered a web of many languages to users, and this is now becoming a hyper-focused tailoring that makes each web experience different for each user. The need to address the key requirements of a web which is real-time, personal and in the right language is paramount to the future of how information is consumed. This talk will discuss the key trends in personalisation, with particular focus on work being undertaken in the Digital Content Management track of the CNGL, and will provide an insight into current and future trends, both in research and in the living web.

Slides

IRC

Video

Olaf-Michael Stefanov

JIAMCATT

An Open Source tool helps a global community of professionals shift from traditional contacts and annual meetings to continuous interaction on the web

abstract

The challenges of maintaining and developing a multilingual web site with open source software tools and crowd-sourced translations, for a community of professional translators and terminologists working for international organizations and multilateral bodies where that "community" has no budget, depends on members' contributions in kind, but continues to grow, and has been growing since 1987. Using an Open Source tool which supports multilingualism to provide a complex support site for an international working group on language issues. How use of the Tiki CMS Wiki Groupware software made it possible to provide an ongoing interactive support site for JIAMCATT, helping convert the "International Annual Meeting on Computer-Assisted Translation and Terminology" into an ongoing year-round affair. The site, which is run without a budget and on the spare time of members, nevertheless is fully bilingual English-French, with parts in Arabic, Chinese, Russian and Spanish (all official languages of the United Nations) as well as some German.

Slides

IRC

Video

[Chair, Reza Keschawarz • Scribe, Jirka Kosek]

1645

Q&A

IRC

Video

1700

Policy

Gerhard Budin

University of Vienna

Terminologies for the Multilingual Semantic Web - Evaluation of Standards in the light of current and emerging needs

abstract

In recent years several standards have emerged or have come of age in the field of terminology management (such as ISO 30042 (TBX), ISO 26162), ISO 12620, etc.). Different user communities in language industry (incl. translation and localization), language technology research, industrial engineering and other domain communities are increasingly interested in using such standards in their local application contexts. This is exactly where problems more often than not arise in the natural need to adapt global and sometimes abstract, heavy-weight standards specifications to local situations that differ from each other. Thus the way standards are prepared needs to be adapted in such a way that different requirements from user groups and from local situations can be processed and taken into account appropriately and efficiently. The papers discusses innovative (web-service-oriented) approaches to standards creation in the field of terminology management in relation to different web-based user groups and semantic web-application contexts, integrating vocabulary-oriented W3C recommendations such as SKOS. The speaker will integrate his experiences in the strategic contexts of FlareNet, CLARIN, ISO/TC 37 and in concrete user communities, e.g. in legal and administrative terminologies (the "LISE" project) and in risk terminologies (the "MGRM" project).

Slides

IRC

Video

Georg Rehm

META-NET/DFKI

META-NET: Towards a Strategic Research Agenda for Multilingual Europe

abstract

META-NET is a Network of Excellence, consisting of 47 research centres in 31 countries, dedicated to fostering the technological foundations of a multilingual European information society. A continent-wide effort in Language Technology (LT) research and engineering is needed for realising applications that enable automatic translation, multilingual information and knowledge management and content production across all European languages. The META-NET Language White Paper series "Languages in the European Information Society" reports on the state of each European language with respect to LT and explains the most urgent risks and chances. The series covers all official and several unofficial as well as regional European languages. After a brief introduction of META-NET we will present key results of the 30 Language White Papers which provide valuable insights concerning the technological, research, and also standards-related gaps of a multilingual Europe realised with the help of LT. These insights are an important piece of input for the Strategic Research Agenda for Multilingual Europe which will be finalised by the beginning of 2012.

Slides

IRC

Video

Arle Lommel

GALA

Beyond Specifications: Looking at the Big Picture of Standards

abstract

In the localization industry standardization has been seen primarily as a technical activity: the development of technical specifications. As a result there are many technical standards that have failed to achieve widespread adoption. The GALA Standards Initiative, an open, non-profit effort, is attempting to address areas that surround standards development—education, promotion, coordination of development activities, and development of useful guidelines and business cases, and non-technical, business-oriented standards—to help achieve an environment in which the needs of various user groups will help drive greater adoption of standards.

Slides

IRC

Video

[Chair, Jörg Schütz • Scribe, Charles McCathieNevile]

1745

Q&A

IRC

No Video

1800

End

2000

Evening reception

At the Carlton Castletroy Park Hotel

details

To further promote networking among attendees, there will be a reception in the restaurant of the Carlton Castletroy Park Hotel, starting at 8pm. (This is same location as the workshop venue.)

22 September

0900

Set up

Jaap van der Meer

TAUS

Explanation of the format for the morning, and selection of discussion topics. Topics are suggested by participants, and the most popular are allocated to breakout groups. A chair is chosen for each group from volunteers.

0930

Open space

Break-out discussions

Various locations are available for breakout groups. Participants can join whichever group they find interesting, and can switch groups at any point. Group chairs facilitate the discussion and ensure that notes are taken to support the summary to be given to the plenary.

1045

Break

1115

Open space

Group reports and discussion

Everyone meets again in the main conference area and each breakout group presents their findings. Other participants can comment and ask questions.