Standards and best practices for the Multilingual Web
Today, the World Wide Web is fundamental to communication in all walks of life. As the share of English web pages decreases and that of other languages increases, it is vitally important to ensure the multilingual success of the World Wide Web.
The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps. The core vehicle for this is a series of four events which are planned over a two year period.
On 4-5 April 2011 the W3C ran the second workshop in the series, in Pisa, entitled "Content on the Multilingual Web". The Pisa workshop was hosted jointly by the Istituto di Informatica e Telematica and Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche.
As for the previous workshop, the aim of this workshop was to survey, and introduce people to, currently available best practices and standards that are aimed at helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web. The key objective was to share information about existing initiatives and begin to identify gaps.
The workshop was originally planned to be a small discussion-based workshop for around 40 people, but after the format of the Madrid workshop proved to be such a success, it was decided to run a similar type of event with a similar number of people. The final attendance count was 95, and in addition to repeating the wide range of sessions of the Madrid workshop we added another Policy session.
Another innovation of this event was that we not only video recorded the presenters, but streamed that content live over the Web. A number of people who were unable to attend the workshop, including someone as far away as New Zealand, followed the streamed video. We also made available live IRC minuting, and some people used that to follow the conference and contribute to discussion. As in Madrid, there were numerous people tweeting about the conference and the speakers during the event, and a number of people afterwards wrote blog posts about their experience. The tweets and blog posts are linked to from the Social Media Links page for the workshop.
The program and attendees reflected the same unusually wide range of topics as in Madrid and, attendee feedback indicated, once again, that the participants appreciated not only the unusual breadth of insights, but also the interesting and useful networking opportunities. We had a good representation from industry (content and localization related) as well as research.
What follows will describe the topics introduced by speakers, followed by a selection of key messages raised during their talk in bulleted list form. Links are also provided to the IRC transcript (taken by scribes during the meeting), video recordings of the talk (where available), and the talk slides. Most talks lasted 15 minutes, though some sessions started with a half-hour 'anchor' slot.
As in the previous workshop, a wide range of topics were covered, and several themes seem to span more than one session. What follows is an analysis and synthesis of ideas brought out during the workshop. It is very high level, and you should watch the individual speakers talks to get a better understanding of the points made. Alongside some of the points below you will find examples of speakers who mentioned a particular point (this is not an exhaustive list).
During the workshop we heard about work on a number of new or under-used technologies that should have an impact on the development of the multilingual Web as we go forward. These included content negotiation, XForms, Widgets, HTML5, and IDN (Pemberton, Caceres, Ishida, Bittersmann, Laforenza). This is still work in progress, and the community needs to participate in the ongoing discussions to ensure that these developments meet its needs and come to fruition. The time to participate is now.
We also heard that the MultilngualWeb project inspired a new widget extension for Opera to help users choose a language variant of a page (McCathieNevile).
We were given an overview of the Internationalisation Tag Set (ITS) and how it is implemented in various formats (Kosek). A key obstacle to its use, however, is the inability to work with ITS customizations in various authoring tools. Several speakers voiced a desire for better training of authoring tool implementers in internationalisation needs, in order to make it easier for content authors to produce well internationalised content (Leidner, Pastore, Serván).
Also there was a call for universities to add training in internationalisation to their curricula for software engineers and developers in general (Leidner, Pastore, Nedas, Serván). More best practice guides should also be produced, complemented by more automation of support tools for authors (Pastore, Schmidtke, Carrasco, Serván).
In the Localizers session, several speakers stressed the need for and benefits of more work on standards as a means to enable interoperability of data and tools (Lieske). Lack of interoperability is seen as an important failing in the industry by many speakers (van der Meer, etc.). Existing standards need to be improved upon with more granular and flexible approaches, and a view to standardising more than just the import and export of files. It was proposed, however, that standards development should be speeded up, and use a more 'agile' approach (Andrë) - proving viability with implementation experience, and discarding quickly things that don't work (an idea revisited later in the workshop). They should not impede innovation (van der Meer, Andrë).
There was a pronouncement that TMX is dead (Filip), but that was modified slightly afterwards by several people who felt that the size of its legacy base would keep it around for several more years, just like CD-ROMs (Herranz). There were a lot of hopes and expectations surrounding the upcoming version of XLIFF (Filip, van der Meer).
There was also a call for more work on the elaboration and use of metadata to support localization processes, building on the foundation provided by ITS (Leidner, Filip) but also using Semantic Web technologies such as RDF (Lewis). One particular suggestion was the introduction of a way to distinguish the original content from its translations. This was also picked up in a discussion session.
We saw how crowd sourcing was implemented at Opera, and some of the lessons learned (Nes). Crowd sourcing would reappear several times during the workshop, in speakers talks, but also in the discussion sessions. The industry is still trying to understand how and where this is best applied, and where it is most useful. Facebook shared with us how their system works (Pacella).
The Social Web is leading to an explosion of content that is nowadays directly relevant to corporate and organizational strategies. This is leading to a change, where immediacy trumps quality in many situations (Shannon, Truscott). This and other factors are placing increased emphasis on automated approaches to handling data and producing multilingual solutions (Herranz, Lewis, Truscott), but in order to cope with this there is a need for increased interoperability between initiatives via standard approaches.
While many speakers are looking to improvements in language automation, there appears to be a strong expectation that machine translation can now, or will soon be able to provide useable results (Schmidtke, Grunwald, Herranz, Vasiljevs), either for speeding up translation (using post-editors rather than full translation), providing gist translations for social media content, or extracting data to feed language technology, etc.. We heard about various projects that are aiming to produce data to support machine translation development, specialising in the 'Hidden Web' (Pajntar) or smaller languages (Vasiljevs), and the META-SHARE project that aims to assist in sharing data with those who need it (Piperidis). One thing that such tools need to address is how to deal with comparable texts (ie. texts that are not completely parallel, since one page has slightly different content than another) (de Rijke).
On the other hand, machine translation is unlikely to translate poetry well any time soon. We saw a demonstration of a tool that helps human translators align data within poems (Brelstaff). This project benefited greatly from the use of standardised, open technologies, although there were still some issues with browser support in some cases.
Changes in the way content is generated and technology developments are also expected to shift emphasis further onto the long tail of translation (Lewis, Lucardi). There is also a shift to greater adaptation of content and personalisation of content for local users. Speakers described their experiences in assuring a web presence that addresses local relevance (Schmidtke, Hurst, Truscott). The ability to componentise content is a key enabler for this, as is some means of helping the user find relevant content, such as geolocation or content negotiation.
Inconsistencies in user interfaces for users wanting to switch between localized variants of sites needs investigation and standardisation. In some cases this is down to differences in browser support (Bittersmann, Carrasco).
We also saw how one project used mood related information in social media in various ways to track events or interests (de Rijke), and received advice on how to do search engine optimisation in a world that includes the social Web (Lucardi). Following W3C best practices was cited as important for the latter. And Facebook described how they manage controlled and uncontrolled text when localizing composite messages for languages that modify words in substantially different ways in different contexts according to declension, gender and number (Pacella).
In the Policy session (an addition since the Madrid conference) we heard how the industry is at the beginning of radical change, such as it hasn't seen for 25 years (van der Meer), and interoperability and standards will be key to moving forward into the new era.
The next workshop will take place in Limerick, on 21-22 September, 2011.
Domenico Laforenza, Director of the Institute for Informatics and Telematics (IIT), Italian National Research Council (CNR), opened the workshop with a welcome and a talk about "The Italian approach to Internationalised Domain Names (IDNs)". Basically this is a system through which you can use URLs on the Internet in, for example, Danish or Chinese, using accented letters or non-Latin characters. Until recently, the choice of domain names was limited by the twenty-six Latin characters used in English (in addition to the ten digits and the hyphen "-"). IDN, introduced by ICANN (Internet Corporation for Assigned Names and Numbers) represents a breakthrough, for hundreds of millions of Internet users in the world that until now were forced to use an alphabet that was not their own. With regard to Italy, the impact of accents will certainly be less marked, but it will give everyone the opportunity to register domains which completely match the name of the person, company or brand name chosen. Domenico described the Italian registry, and the basic concepts of how IDNs work.
Following this talk, Richard Ishida gave a brief overview of the MultilingualWeb project, and introduced the format of the workshop.
Oreste Signore, employee of CNR and Head of the W3C Italian Office, also welcomed delegates with a talk entitled "Is the Web really a "Web for All"?". This talk was a brief reminder of the basic issues of the Web: multicultural, multilingual, for all. It also took a look into the relevant W3C activities to pursue the ultimate goal of One Web, which include accessibility as well as multilinguality.
Kimmo Rossi, Project Officer for the MultilingualWeb project, and working at the European Commission, DG for Information Society and Media, Digital Content Directorate, praised the enthusiasm and voluntary contributions of the project partners. Kimmo has found this to be a wonderful forum for networking and finding out about the various aspects of the multilingual Web. Now we need to start putting these ideas into practice, so he is looking for good recommendations for industry and stakeholders about what needs to be done, and preferably who could do it. Kimmo described some key findings of a EuroBarometer survey that is soon to be published: about 90% interviewed prefer to use their own language for non-passive use, and around 45% believe that they are missing out on what the Web has to offer due to lack of content in their language.
The keynote speaker was Ralf Steinberger, of the European Commission's Joint Research Centre JRC. In his talk he said that there is ample evidence that information published in the media in different countries is largely complementary and that only the biggest stories are being discussed internationally. This applies to facts (e.g. on disease outbreaks or violent events) and to opinions (e.g. the same subject may be discussed with very different emotions across countries), but there is also a more subtle bias of the media: National media prefer to talk about local issues and about the actions of their politicians, giving their readers an inflated impression of the importance of their own country. Monitoring the media from many countries and aggregating the information found there would allow readers a less biased and more equilibrated view, but how to achieve this aggregation? The talk gave evidence of such information complementarity from the Europe Media Monitor family of applications (accessible at http://emm.newsbrief.eu/overview.html) and showed first steps towards the aggregation of information from highly multilingual news collections.
The developers Session was chaired by Adriane Rinsche (LTC).
Steven Pemberton of CWI/W3C gave the anchor talk for the Developers session, "Multilingual forms and applications". After an introduction to content management and to XForms, the talk described the use of XForms to simplify the administration of multilingual forms and applications. A number of approaches are possible, using generic features of XForms, that allow there to be one form, with all the text centralised, separate from the application itself. This can be compared to how style sheets allow styling to be centralised away from a page, and allow one page to have several stylings; the XForms techniques can provide a sort of Language-Sheet facility. Key points:
Marcos Caceres, Platform Architect at Opera Software, prepared a talk entitled "Lessons from standardizing i18n aspects of packaged web applications". Since Marcos was unable to make it to the workshop, the talk was delivered by Charles McCathieNevile, Chief Standards Officer at Opera. The W3C's Widget specifications have seen a great deal of support and uptake within industry. Widget-based products are now numerous in the market and play a central role in delivering packaged web applications to consumers. Despite this, the W3C's Widget specifications, and its proponents, have faced significant challenges in both specifying and achieving adoption of i18n capabilities. This talk described how the W3C's Web Apps and i18n Working Group collaborated to create an i18n model, the challenges they faced in the market and within the W3C Consortium, and how some of those challenges were overcome. This talk also proposed some rethink of best practices and relayed some hard lessons learned from the trenches. Key points:
Richard Ishida, Internationalisation Activity Lead at the W3C, presented "HTML5 proposed markup changes related to internationalisation". HTML5 is proposing changes to the markup used for internationalisation of web pages. They include character encoding declarations, language declarations, ruby, and the new elements and attributes for bidirectional text support. HTML5 is still very much work in progress, and these topics are still under discussion. The talk aimed to spread awareness of proposed changes so that people can participate in the discussions. Key points:
Gunnar Bittersmann, a developer at VZ Netzwerke, talked about "Internationalisation (or the lack of it) in current browsers". The talk addressed two common i18n problems that users of current mainstream browsers face. Users should get content from multilingual Web sites automatically in a language they understand, hence they need a way to tell their preferences. Some browsers give users this option, but others don't. Gunnar demonstrated live if and how languages can be set in various browsers and discussed the usability issue that browser vendors have to deal with: the trade-off between functionality and a simple user interface. Users should also be able to enter email addresses with international domain names into forms. That might not be possible in modern browsers that already support HTML5's new email input type. Gunnar showed how to validate email addresses without being too restrictive and raised the question: Does the HTML5 specification have to be changed to reflect the users' needs? Key points:
Jochen Leidner, Senior Research Scientist with Thomson Reuters, gave a talk "What's Next in Multilinguality, Web News & Social Media Standardization?" The talk reviewed the state of the art in multilingual technology for the Web and its adoption by companies like Thomson Reuters. According to Jochen, the Web is no longer just a protocol (HTTP) and a mark-up language (XHTML); rather, it has become an ecosystem of different content mark-up standards, conventions, proprietary technologies, and multimedia (audio, video, 3D). The static Web page is no longer the sole inhabitant of that ecosystem: there are Web applications (from CGI to AJAX), Web services, and social media hubs with huge transaction volumes that exhibit some properties of IT systems and social fabric. In this talk, he discussed some of the challenges that this diversity implies for the technology and stack, assessed the standardization situation, and speculated what the future may (and perhaps should?) bring. He concluded that for the most part, the internationalisation and localization technologies are working, and have been adopted in computer software, programming languages, and Web sites. Key points:
The Developers session on the first day ended with a Q&A question about language negotiation, and a suggestion that it should be possible to identify the original language in which a page was written.
This session was chaired by Felix Sasaki of DFKI.
Dag Schmidtke, Senior International Project Engineer at the Microsoft European Development Centre, gave the anchor talk for the Creators session with "Office.com 2010: Re-engineering for Global reach and local touch". Office.com is one of the largest multilingual content driven web-sites in the world. With more than 1 billion visits per year, it reaches 40 languages. For the Office 2010 release, authoring and publishing for Office.com was changed to make use of Microsoft Word and SharePoint. A large migration effort was undertaken to move 5 million+ assets for 40 markets to new file formats and management systems. This talk presented lessons learnt from this major re-engineering exercise for designing and managing multilingual web-sites. Key points:
Jirka Kosek, XML Guru from the University of Economics, Prague, presented "Using ITS in the common content formats". The Internationalisation Tag Set (ITS) is set of generic elements and attributes which can be used in any XML content format to support easier internationalisation and localization of documents. In this talk examples and advantages of using ITS in formats like XHTML, DITA and DocBook were shown. Also problems of integration with HTML5 were briefly discussed. Key points:
Serena Pastore from The National Institute of Astrophysics (INAF), was due to present the talk "Obstacles for following i18n best practices when developing content at INAF", but was unable to attend the workshop. Her slides, and a summary of her talk are included here. INAF is an Italian research institute whose goals are scientific and technological research in Astronomy and Astrophysics: its researchers and technologists make large use of web platform to deploy content as means of web pages (including test, images, forms, sounds, etc.) to every multimedia format (i.e. audio/video) and social objects (i.e. tweets). Moreover people that need to be reached are heterogeneous (i.e. from public to science users) and come from different countries. To reach all these kinds of potential stakeholders, web content should be multilingual, but INAF encountered great obstacles in achieving such a goal. The great availability of authoring tools that makes web publishing very easy for everyone and thus without paying attention in what is the final web product, bars very often with developing content actually accessible, usable and international. As INAF we are trying to educate and persuade our content authors that following web standards, best practices also in i18n area will give content a value-added since this is the only way to disseminate an information that could reach every stakeholder. Thus we are trying to promote and disseminate knowledge into these subjects: an example is the realization of a document taken from W3C i18n best practices for our authors and web managers that fixes all the main steps needed to put the basis for multilingual content. Meanwhile we hope that will be a new generation of authoring tools able to automate these mechanisms. The context and some issues related to reach at least internationalisation are the following:
Manuel Tomas Carrasco Benitez, of the European Commission Directorate-General for Translation prepared a presentation about "Standards for Multilingual Web Sites". Because he was unable to attend the workshop, Charles McCathieNevile gave the talk for him. The talk argued that additional standards are required to facilitate the use and construction of multilingual web sites. The user interface standards should be a best practices guide combining existing mechanisms such as transparent content negotiation (TCN) and new techniques such as a language button in the browser. Servers should expect the same API to the content, though eventually one should address the whole cycle of Authorship, Translation and Publishing Chain (ATP-chain). Key points:
Sophie Hurst, Director of Global Corporate Communications at SDL, presented about "Local is Global: Effective Multilingual Web Strategies". The talk proposed that Web on-the-go is now an everyday reality. It touches all of our lives from the moment we wake, to our commute, from work to an evening out on the town. This reality presents both an opportunity and an incredible challenge as Web content managers attempt to optimise customer engagement. Because visitors do not see themselves as part of a global audience but as individuals, the talk examined the Web content management software requirements that enable organizations to maintain central control while providing their audiences with locally relevant and translated content. From a Global Brand Management perspective, the talk examined how organizations can manage, and build and sustain a global brand identity by reusing brand assets across all channels (multiple, multilingual web sites, email and mobile web sites). It also took a fresh look at automated personalization and profiling, and how Web content can be targeted for specific language requirements as well as the local interests of local audiences. Key points:
The Q/A part of the Creator session began with questions about why we need standard approaches to multilingual web navigation if companies have already figured out how to do it, whether companies use locally-adapted CSS, and how accurate geolocation is. A large part of the session was dedicated to a discussion about the value or opportunities for sub-locale personalisation. This brought in other topics such as how many people are multilingual, aspects of dealing with the social web, and approaches to crowdsourcing. For more details, see the related links.
This session was chaired by Jörg Schütz of bioloom group.
Christian Lieske, Knowledge Architect at SAP AG, talked about "The Bricks to Build Tomorrow's Translation Technologies and Processes". His co-authors were Felix Sasaki, of DFKI, and Yves Savourel, of Enlaso. Two questions were addressed: Why talk about tomorrow’s Translation Technologies and Processes?, and What are the most essential Ingredients for building the Tomorrow? Although support for standards such as XLIFF and TMX has increased interoperability among tools, today's translation-related processes are facing challenges beyond the ability to import and export files. They require standards that are granular and more flexible. Using concrete examples of the ways that various tools can interoperate beyond the exchange of files, this session walked through some of the issues encountered and outlined the use of a new approach to standardization in which modular standards, similar to Lego® blocks, could serve as core components for tomorrow's agile, interoperable, and innovative translation technologies.
Answers to the Why? included remarks related to the growing demand for language services (in particular translations), and lacking interoperability between language-related tools. Additionally, Christian mentioned shortcomings in existing standards such as the XML Localization Interchange File Format (XLIFF) and lacking adoption of Web-based technologies as challenges for the status quo.
The What? was summarized by the observation that not static entities (such as data models) should be the starting point for and evolution of translation technologies and processes. Rather, the right mindset and an overall architecture/methodology should be put in focus first. Detailed measures that were mentioned included the following:
The Core Components Technical Specification (CCTS) developed within UN/CEFACT, UBL and ebXML were mentioned as an example from a non-language business domain that exemplified the measures.
Dr. David Filip, Senior Researcher from the Centre for Next Generation Localisation (CNGL), the University of Limerick and the LRC, talked about "Multilingual transformations on the web via XLIFF current and via XLIFF next". David argued that content metadata must survive language transformations to be of use in multilingual web. In order to achieve that goal, meta-data related to content creation and content language transformation must be congruent, i.e. designed up front with the transformation processes in mind. To make the point for XLIFF as the principal vehicle for critical metadata throughout multilingual transformations, it was necessary to give a high level overview of XLIFF structure and functions, both in the current version and the next generation standard that is currently a major and exciting work in progress in the OASIS XLIFF TC. Key points:
Sven C. Andrä, CEO of Andrä AG, spoke about "Interoperability Now! A pragmatic approach to interoperability in language technology". Existing language technology standards give the false impression of interoperability between tools. There's a gap to bridge that is mostly about mindsets, technology and mutual consent on the interpretation of standards. A couple of players agreed to search for this mutual consent based on existing standards to bridge this gap. The talk gave some background on the issues with the use of existing standards and how Interoperability Now! is approaching this. Key points:
Eliott Nedas, Business Development Manager at XTM International, spoke about "Flexibility and robustness: The cloud, standards, web services and the hybrid future of translation technology". After introducing the current state of affairs, describing leading innovations, and also lamenting the demise of LISA, the talk moved to describing the possible future and who will be the winners, who will be the losers. The last part of the talk looked at what we can do to get standards moving internally in medium, large, organisations. Key points:
Pål Nes, Localization Coordinator at Opera Software, gave a talk about "Challenges in Crowd-sourcing". Opera Software has a large community, with members from all over the world. The talk presented various obstacles encountered and lessons learned from using a community of external volunteer resources for localization in a closed-source environment. Included topics were training and organization of volunteers and managing terminology and branding, as well as other issues that come with the territory. The talk also describes the tools and formats used by Opera. Key points:
Manuel Herranz, CEO of PangeaMT, talked about "Open Standards in Machine Translation". The web is an open space and the standards by which it is "governed" must be open. However, according to the talk, one barrier clearly remains to make the web even more transnational and truly global. This has been called "the language barrier". Language Service Providers translation business model is clearly antiquated and it is increasingly being questioned when we face real translation needs by web users. Here, immediacy is paramount. This talk is about open standards in machine translation technologies and workflows, supporting a truly multilingual web. Key points:
David Grunwald, CEO of GTS Translation, spoke about "Website translation using post-edited machine translation and crowdsourcing". In his talk he describes a plugin for web sites that GTS has developed using the open-source Wordpress CMS. It is the only solution that supports post-editing MT and allows content publishers to create their own translation community. This talk presented the GTS system and described some of the challenges in translation of dynamic web content and the potential rewards that their concept holds. Key points:
The Q&A dwelt briefly on crowdsourcing considerations. A comment was also made that initiatives, such as Interoperability Now, should be sure to talk with standards bodies at some point. It was mentioned that the W3C has started up a Community Group program to enable people to discuss and develop ideas easily, and then easily take them on to standardisation if it is felt that it is appropriate. For details, see the related links.
This session was chaired by Tadej Štajner of the Jožef Stefan Institute.
Dave Lewis, Research Lecturer at the Centre for Next Generation Localisation (CNGL) and Trinity College Dublin gave the anchor talk for the Machines session: "Semantic Model for end-to-end multilingual web content processing". This talk presented a Semantic Model for end-to-end multilingual web content processing flows that encompass content generation, its localisation and its adaptive presentation to users. The Semantic Model is captured in the RDF language in order to both provide semantic annotation of web services and to explore the benefits of using federated triple stores, which form the Linked Open Data cloud that is powering a new range of real world applications. Key applications include the provenance-based Quality Assurance of content localisation and the harvesting and data cleaning of translated web content and terminology needed to train data-driven components such as statistical machine translation and text classifiers. Key points:
Alexandra Weissgerber, Senior Software Engineer at Software AG, spoke next about "Developing multilingual Web services in agile software teams". Developing multilingual Web services in agile software teams is a multi-facetted enterprise which comprises various areas that include methodology, governance and localization. The talk reports on Software AG's employment of standards and best practices, particularly where and how they fit or did not fit, and the gaps they have encountered and their strategies to bridge them effectively as well as some of their workarounds.
Andrejs Vasiljevs of Tilde spoke about "Bridging technological gap between smaller and larger languages". Small markets, limited language resources, tiny research communities – these are some of the obstacles in development of technologies for smaller languages, according to this talk. This presentation shared experiences and best practices from EU collaborative projects with a particular focus on acquiring resources and developing machine translation technologies for smaller languages. Novel methods helped to collect more training data for statistical MT, involve users in data sharing and MT customisation, collect multilingual terminology and adapt MT to terminology and stylistic requirements of particular applications. Key points:
Boštjan Pajntar, Researcher at the Jozeph Stefan Institute gave a talk about "Collecting aligned textual corpora from the Hidden Web". With the constant growth of web based content large collections of textual become available. Many if not most professional non-English web sites offer translated web pages to English and other languages of their clients and partners. These are usually professional translations and are abundant. The talk refers to this as the Hidden Web, and presents possibilities, problems and best practices for harnessing such aligned textual corpora. Such data can then be efficiently used as a translation memory for example as help for a human translators or as training data for machine translation algorithms. Key points:
Gavin Brelstaff, Senior Researcher at CRS4 Sardinia, provided the final talk in the Machines session, entitled "Interactive alignment of Parallel Texts – a cross browser experience". His co-author was Francesca Chessa, of the University of Sassari. The talk reported their experience test-driving current standards and best-practice related to multilingual Web applications. Following an overview of their pilot demonstrator for the interactive alignment of parallel texts (e.g. poetic translations in/out of Sardinian), they indicated pros and cons of the practical deployment of key standards - including TEI-p5, XML, XSL, UTF-8, CSS2, RESTful-HTTP, XQuery, W3C-range. Key points:
Topics discussed during the Q&A session included the following: whether semantic tagging can assist machine translation; what are the implications of copyright when harvesting resources from the hidden Web; how does localization apply within the Scrum model; the effectiveness of matching on the hidden Web when the content of two comparable pages has gaps; can one ever expect to translate poetry, and what is the actual purpose of Gavin's tool; and will RDF semantic tagging lead to new approaches for natural language generation. For the details, follow the related links.
This session was chaired by Christian Lieske of SAP.
Paula Shannon, Chief Sales Officer at Lionbridge, presented the anchor talk for the Users session with a talk entitled "Social Media is Global. Now What?". Paula began the session with a short video entitled "Social Media Revolution 2" which can be seen on Youtube. According to Paula, there is no question about it, companies are embracing social media and working it on a global scale. But the expansion is not without its challenges. Chief among them is how to effectively communicate on multiple platforms, in multiple languages, with a variety of cultural audiences. So this talk looked at how companies are making it happen, in what ways are they using social media globally, and what are the emerging best practices for dealing with language and culture on blogs, Twitter, community forums and other platforms? Key points:
Maarten de Rijke, from the University of Amsterdam, presented about "Emotions, experiences and the social media". There is little doubt, said Maarten, that the web is being fundamentally transformed by social media. The realization that we now live a significant part of our lives online is giving rise to new perspectives on text analytics and to new interaction paradigms. The talk proposed that motions and experiences are key to communication in social media: recognizing and tracking them in highly dynamic multilingual text streams produced by users around Europe, or even around the globe, is an emerging area for research and innovation. In his talk, Maarten illustrated this with a few examples derived from online reputation management and large scale mood tracking. Key points:
Gustavo Lucardi, COO of Trusted Translations, spoke about "Nascent Best Practices of Multilingual SEO". The talk touched, from the perspective of a Language Service Provider (LSP), on how Multilingual Search Engine Optimisation (MSEO) is already an essential part of the language Localization process. The presentation provided an in-depth look at the nascent Best Practices and explained the concepts behind Multilingual Search Engine Optimisation. Key points:
Chiara Pacella, of Facebook Ireland, gave a talk about "Controlled and uncontrolled environments in social networking web sites and linguistic rules for multilingual web sites". She argued that in social networking web sites, a "controlled" component, generated by content creators, must coexist with an "uncontrolled" component, that is generated by the users. Even if the latter is more difficult to control, it is the former that create more challenges in terms of l10n/i18n. The use of a crowdsourcing approach has proven successful for Facebook, but this was achieved thanks to the implementation of standard linguistic rules that are complex and detailed but, at the same time, easily understandable by the actors involved in the translation process. Key points:
Ian Truscott, VP Products at SDL Tridion, finished off the Users session with a talk entitled "Customizing the multilingual customer experience – deliver targeted online information based on geography, user preferences, channel and visitor demographics ". The talk posited that users are increasingly using social media and different devices next to the 'traditional' web and offline media, and therefore information that was previously unavailable or inaccessible is today shaping their opinions and buying behaviour. As a result, users' expectations have changed and have raised the bar for any organization that interacts with them. They expect that information is always targeted and relevant to their needs, available in their language and on the device of their choice. The presentation sought to highlight some of the specific challenges that are emerging as well as demonstrate the technology available to solve them. Key points:
The Q&A, began a question about what progress Lionbridge and SDL have made with regard to managing social media translations. There was a comment that ICU is working on library support for handling gender and plural variations for complex language display. And there was a question about the sources of the theories that underlie Maarten's work. For details, see the related links.
This session was chaired by Charles McCathieNevile of Opera Software.
Jaap van der Meer of TAUS presented the anchor talk for the Policy session with a talk entitled "Perspectives on interoperability and open translation platforms". This presentation gave a summary of the joint TAUS-LISA survey on translation industry interoperability and a report from the recent Standards Summit in Boston (February 28-March 1) as well as perspectives on open translation platforms from TAUS Executive Forums. Key points:
Fernando Serván, Senior Programme Officer in the Meeting Planning and Language Support Group of the FAO of the UN, presented "From multilingual documents \\to multilingual web sites: challenges for international organizations with a global mandate". International organizations face many challenges when trying to reach their global audience in as many languages as possible. The Food and Agriculture Organization of the United Nations (FAO) works in six languages (Arabic, Chinese, English, French, Russian and Spanish) to try to have an impact in the agricultural sector of its member countries. The presentation focused on the need of multilingual support on the Web and referred to standards and best practices needed. It covered aspects such as the creation and deployment of multilingual content, the translation needs and possible integration of TM and MT, the availability of CAT tools, etc. Key points:
Stelios Piperidis, Senior Researcher at ILSP-"Athena" RC, gave a talk entitled "On the way to sharing Language Resources: principles, challenges, solutions". This talk presented the basic features of the META-SHARE architecture, the repositories network, and the metadata schema. It then discussed the principles that META-SHARE uses regarding language resource sharing and the instruments that support them, the membership types along with the privileges and obligations they entail, as well as the legal infrastructure that META-SHARE will employ to achieve its goals. The talk concluded by elaborating on potential synergies with neighbouring initiatives and future plans at large. Key points: