13.07.2015 Views

CNGL Annual Report 2009 [pdf - 6.5 MB]

CNGL Annual Report 2009 [pdf - 6.5 MB]

CNGL Annual Report 2009 [pdf - 6.5 MB]

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

PrefaceThe Centre for Next Generation Localisation (<strong>CNGL</strong>) is a Centre for Science Engineeringand Technology (CSET) funded by Science Foundation Ireland (SFI) and Industry Partners.Centres for Science, Engineering & Technology (CSETs) help link scientists and engineers in partnerships acrossacademia and industry to address crucial research questions, foster the development of new and existing Irish-basedtechnology companies, attract industry that could make an important contribution to Ireland and its economy, andexpand educational and career opportunities in Ireland in science and engineering. CSETs are expected to exhibitoutstanding research quality, intellectual breadth, active collaboration, flexibility in responding to new researchopportunities, and integration of research and education in the fields that SFI supports.Science Foundation Ireland (SFI) is a key organisation in the implementation of Ireland’s National Development Plan(NDP 2007-2013) and the Strategy for Science, Technology and Innovation 2006-2013. A sum of €8.2 billion hasbeen allocated for scientific research under the NDP and SSTI of which SFI has responsibility to invest €1.4 billion. SFIwill continue to invest in academic researchers and research teams who are most likely to generate new knowledge,leading edge technologies and competitive enterprises in the fields of science and engineering.SFI VisionIreland will be a global knowledge leader that places scientificand engineering research at the core of its society to powereconomic development and social progress.


ContentsExecutive Summary 5<strong>CNGL</strong> Leadership 7Management Team Biosketches 10Research Overview 17Three Global Challenges to Localisation 18Addressing the Challenges: the <strong>CNGL</strong> Research Strategy 19<strong>CNGL</strong> Demonstrator Systems 20Research Achievements in <strong>2009</strong> 21Strand Name: Integrated Language Technologies (ILT) 22Area Co-ordinator: Prof. Andy Way 23Research Overview: Integrated Language Technologies (ILT) 25Strand Name: Digital Content Management (DCM) 30Area Co-ordinator: Prof. Vincent Wade TCD 31Research Overview: Digital Content Management (DCM) 32Strand Name: Systems Framework (SF) 39Area Co-ordinator: Saturnino Luz 40Research Overview: Systems Framework (SF) 41Strand Name: Next Generation Localisation (LOC) 47Area Co-ordinator: Reinhard Schäler 48Research Overview: Next Generation Localisation (LOC) 49Year 2 Demonstrator 55Demonstrator Goals 56Demonstrator Teams 56Methodology 57Bulk Localisation Workflow (BLW+) Demo Scenarios 58Personalised Multilingual Customer Care (PMCC) Demo Scenarios 58The Demonstration Systems Framework 60Future Plans 62Impacts/Industry Partners/Technology Transfer 63Overview 64Current Industrial Partners 64Potential New Industrial Partnerships 69Intellectual Property Management 70Commercialisation 71The Rosetta Foundation 71<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>3


ContentsManagement and Governance 72Management Overview 73<strong>2009</strong> Significant Accomplishments 78Education and Outreach 79Objectives 80<strong>2009</strong> Accomplishments – Internal Activities 80<strong>2009</strong> Accomplishments – External Activities 81Appendix 1: People and Partnerships 85CSET Research Teams 86Industry Partners and Contact Names 91Governance Committee and Scientific Advisory Board members 91Appendix 2: Outputs 92All CSET publications 93All conference presentations 96Workshops and conferences hosted 98Invention Disclosures submitted 99Patent Applications submitted or granted, and license agreements signed 99Spin-out companies created 99All awards and honours received 99Media coverage 100Other funding obtained 1014 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in hereExecutive Summary<strong>CNGL</strong> is already becoming a household name in localisation, languageand content management technologies and research across the world,with a strong flow of world-class research publications, inventiondisclosures, patent applications and industry-academia collaborations.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>5


Executive SummaryLocalisation is the industrial process of adapting digital contentto culture, locale and linguistic environment.Localisation is facing massive challenges: the amount ofcontent to be localised is growing rapidly beyond today’smost advanced localisation and translation capacities. Smallhand-held devices supporting instant multi-modal accessto digital content anytime and anywhere now outnumbertraditional desktop devices worldwide. Information needsto be personalised to user and task to ensure relevanceand avoid overflow. Sophisticated technologies need to beintegrated into complex localisation workflows supportedby standards and metadata to ensure interoperability.The combined effect of these challenges is that only asmall fraction of relevant content is currently localised intoa restricted set of languages in a coarse-grained fashion.Vital information is not available in many languages inlarge parts of the world. This results in missed businessopportunities and contributes to the global “digital divide”.<strong>CNGL</strong> is a young research centre: at the end of <strong>2009</strong> <strong>CNGL</strong>is two years old. Despite this, <strong>CNGL</strong> is already becominga household name in localisation, language and contentmanagement technologies and research across the world,with a strong flow of world-class research publications,invention disclosures, patent applications and industryacademiacollaborations, well beyond the original <strong>CNGL</strong>targets. <strong>CNGL</strong> has diversified its funding base with apipeline of over €3.1m in new funding awards from arange of sources, including six new projects funded underthe European Commissions’ FP7 programme. <strong>CNGL</strong> isstrongly developing its commercialisation model: <strong>CNGL</strong>is developing targeted applications with <strong>CNGL</strong> industrypartners, the Rosetta Foundation is the first <strong>CNGL</strong> spin-outactivity, and <strong>CNGL</strong> is increasingly providing services andexpertise to companies outside <strong>CNGL</strong> on a project andcontract basis.The <strong>CNGL</strong> research programme is founded on the visionof enabling people to interact with content, products andservices in their own language, according to their ownculture, and according to their own personal needs.This vision can only be achieved with a strong industryacademiapartnership. <strong>CNGL</strong> combines four academicand nine industry partners. To achieve the vision, <strong>CNGL</strong>research strategically focuses on:i.ii.iii.iv.v.machine translation technologies supportingautomation;speech technologies supporting access tomultilingual digital content anytime andanywhere;personalisation technologies maximising therelevance of multilingual content to user and task;standards and workflows to optimally combinetechnological advances and human translation;anddemonstrator systems articulating andinstantiating the <strong>CNGL</strong> vision.6 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in here<strong>CNGL</strong> Leadership<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>7


<strong>CNGL</strong> Leadership<strong>CNGL</strong> Contact InformationCentre for Next Generation LocalisationSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6700Fax: +353 1 700 6702Email: info@cngl.ieManagement TeamCentre DirectorProf. Josef van GenabithSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6700Fax: +353 1 700 6702Email: josef@computing.dcu.ieTrack Leader: Systems FrameworkDr. Saturnino LuzSchool of Computer Science and StatisticsTrinity College DublinDublin 2Phone: +353 1 896 3686Fax: +353 1 677 2204Email: luzs@cs.tcd.ieDeputy Centre Director, Track Leader: Digital ContentManagementProf. Vincent WadeDepartment of Computer Science and StatisticsTrinity College DublinDublin 2Phone: +353 1 896 1765Fax: +353 1 677 2204Email: vincent.wade@cs.tcd.ieTrack Leader: Next Generation LocalisationMr. Reinhard SchälerDepartment of Computer Science and Information SystemsUniversity of LimerickLimerickPhone: +353 61 202 881Fax: +353 61 202 734Email: reinhard.schaler@ul.ieOperations DirectorDr. Páraic SheridanSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6706Fax: +353 1 700 6702Email: psheridan@computing.dcu.ieTrack Leader: Integrated Language TechnologiesProf. Andy WaySchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 5644Fax: +353 1 700 6702Email: away@computing.dcu.ieOperations TeamCentre AdministratorMs. Ríona FinnSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6707Fax: +353 1 700 6702Email: rfinn@computing.dcu.ieProject ManagerMs. Hilary McDonaldSchool of Computer Science and StatisticsO’Reilly InstituteTrinity College DublinDublin 2Phone: +353 1 896 4244Fax: +353 1 677 2204Email: mcdonah@scss.tcd.ie8 Centre for Next Generation Localisation (<strong>CNGL</strong>)


IP ManagerMr. Steve GotzSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6710Fax: +353 1 700 6702Email: sgotz@computing.dcu.ieCentre SecretaryMs. Eithne McCannSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6700Fax: +353 1 700 6702Email: emccann@computing.dcu.ieLRC AdministratorMs. Geraldine HarrahillDepartment of Computer Science and Information SystemsUniversity of LimerickLimerickPhone: +353 61 202 881Fax: +353 61 202 734Email: geraldine.harrahill@ul.ieSystems AdministratorMr. Joachim WagnerSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6915Fax: +353 1 700 6702Email: jwagner@computing.dcu.ieEducation and Outreach TeamEducation & Outreach DirectorProf. Harold SomersSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6703Fax: +353 1 700 6702Email: hsomers@computing.dcu.ieEducation & Outreach ManagerMs. Cara GreeneSchool of ComputingDublin City UniversityDublin 9Phone: +353 1 700 6704Fax: +353 1 700 6702Email: cgreene@computing.dcu.ieLRC ManagerMr. Karl KellyDepartment of Computer Science and Information SystemsUniversity of LimerickLimerickPhone: +353 61 202 748Fax: +353 61 202 734Email: karl.kelly@ul.ie<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>9


Management Team BiosketchesCentre DirectorProf. Josef van GenabithSchool of Computing, Dublin City UniversityBrief BiographyProf. Josef van Genabith is the Director of the Centre forNext Generation Localisation (<strong>CNGL</strong>) and an AssociateProfessor in the School of Computing, DCU. He graduatedin Electronic Engineering and English at RWTH Aachen(Germany) in 1988 and received his PhD in Linguisticsfrom the University of Essex (U.K.) in 1993. He workedas a researcher at the University of Essex (1991–1992)and at the Institut für Maschinelle Sprachverarbeitung IMS,Universität Stuttgart (Germany) (1992–1996). He joinedthe School of Computing at DCU as Lecturer in 1996,became Senior Lecturer in 1999 and Associate Professorin 2002. He was Chair of the Programme Board for theB.Sc. in Applied Computational Linguistics (DCU)1997–2001. In 2001 he became Director of the NationalCentre for Language Technology (NCLT) and developedthe NCLT to its current 45 members, with a research grantincome of over €3.5 million since 2001 (excluding <strong>CNGL</strong>).He has led Science Foundation Ireland (SFI-), EnterpriseIreland (EI-) and European Union (EU-) funded researchprojects and was awarded an SFI Principal Investigatorgrant in 2004. He became a Visiting Researcher at IBM’sDublin Center for Advanced Studies (CAS) in 2003 anda Faculty Fellow in 2004. He has graduated 13 PhD and6 M.Sc. by Research students. He is currently supervising7 PhD students. He is (joint) author of more than 90peer-reviewed international research publications(including the journals Computational Linguistics,Machine Translation, Research on Language andComputation and COLING, ACL, EACL and EMNLPconferences).Research InterestsProf. van Genabith works on multi-lingual treebankbaseddeep grammar acquisition, statistical parsing andgeneration, machine translation and localisation.Career Highlights• 2007–now Advisory Board European Associationfor Computational Linguistics EACL• 2007–now Lead PI and Director of SFI <strong>CNGL</strong> CSETAward €16.8M• 2005–now Faculty Fellow IBM Center forAdvanced Studies CAS, Dublin• 2004–2005 Visiting Scientist IBM Center forAdvanced Studies CAS, Dublin• 2004–<strong>2009</strong> SFI Principal Investigator, ScienceFoundation Ireland, GramLab, €839K• 2001–2008 Director National Centre for LanguageTechnology (NCLT), DCU• 1997–2001 Chair of Programme Board B.Sc.in Applied Computational Linguistics (ACL),School of Computing, DCU10 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Deputy Centre Director, Track LeaderDigital Content ManagementProf. Vincent P. WadeSchool of Computer Science and Statistics,Trinity College. DublinBrief BiographyProf. Vincent Wade holds the position of Professor inthe Schools of Computer Science and Statistics and isResearch Director for Intelligent Systems. The IntelligentSystems Discipline in TCD comprises the research groups:the Knowledge and Data Engineering Group, the GraphicVision and Visualisation Group, the ComputationalLinguistics Group and the Artificial Intelligence Group aswell as the Centre for Health Informatics. The Disciplineconsists of over 21 academic staff and over 80 researchersand PhD students. Vincent graduated from UCD with aBSc (Hons) in Computer Science (1987) and was awardedhis MSc & PhD degrees in Computer Science from TCD.In 2002 he was awarded Fellowship of Trinity College forhis contribution to research in the areas of knowledgemanagement and adaptive web technologies. Successfulindustry research collaborations include such companies asINTEL, IBM, Microsoft, Google, Symantec and HoughtonMifflin Harcourt.Research InterestsVincent’s research interests focuses on KnowledgeEngineering research, in particular adaptive hypermediasystems, dynamic personalisation, and adaptive web. Hisresearch has been applied in several technology applicationareas such as eLearning and Management systems for nextgeneration networks and distributed services.Career Highlights• Published in excess of 120 Scientific Papers inpeer reviewed international conferences andjournals of repute. Awarded 5 Outstanding/BestPaper Awards at International Peer ReviewedConferences. Guest editor of 3 internationaljournals including IEEE Internet Computing.Reviewer of more than 20 ACM, IEEE and AACEinternational Journals and Conferences in thelast 5 years.• Founder and Former Director of Centre forLearning Technology (1999), Director of Centrefor Academic Practice and Student learning(2004-2005), founder of the Centre for LearningTechnology• Conference Co-Chair for International Conferenceon Adaptive Hypermedia and Adaptive WebSystems Conference in Dublin 2006, (ACM),Management of Ubiquitous Communications andServices MUCS 2008 (IEEE) as part of NOMS 2008• SFI PI award(<strong>2009</strong>-2013) AMAS AdaptiveMultimedia and Services in eLearning• Coordinator of National Digital LearningRepository Service (2005 to 2010)• TCD’s principal investigator for over 14 EUresearch projects under the EU RACE, Telematics,ESPRIT, ACTS and IST programmes. These includedADVANCE (1988–93), Guideline (1988–1993),Dessert (1993–1996), OpenLabs (1993–1996),PROSPECT (1995–2000), FlowThru (1998–2000),Virtues (1997–2000), Gestalt (1998–2000),FORM (2000–2002), Easel (2000– 2002),iClass (2004–2008), Elektra (2006–2008),80 Days (2008-2010), GRAPPLE (2008-2011).Enterprise Ireland Technology Innovationprojects have included ADAPT (2005–2007)and Pudecas (2005–2007).<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>11


Management Team BiosketchesOperations DirectorDr. Páraic SheridanSchool of Computing, Dublin City UniversityBrief BiographyDr. Páraic Sheridan is the Operations Director at <strong>CNGL</strong>.He received his B.Sc. degree (1st class honours) inComputer Applications from Dublin City University (DCU)in 1989. He then completed an M.Sc. degree in ComputerApplications at DCU by research in 1991, studying theuse of Natural Language Processing in InformationRetrieval. This was followed in 1994 by an M.S. degree inComputational Linguistics at Carnegie Mellon University(CMU) in Pittsburgh, PA. His study at CMU was funded byClaris Corporation (Dublin) for whom he researched theuse of Translation Memories in the software localisationprocess. He completed his doctoral work in 1998 at theSwiss Federal Institute of Technology (ETH) Zürich with adissertation on the topic of Cross-Language InformationRetrieval. While at ETH he also helped develop the SPIDERinformation retrieval system which was commercialisedand spun out from ETH into the EuroSpider company.Dr. Sheridan then joined TextWise LLC, a start-upcompany in Syracuse, NY which was a spin-out fromSyracuse University based on research by Prof. ElizabethLiddy in the area of Natural Language Processing andInformation Retrieval. Over the course of a 10-year careerat TextWise, Dr. Sheridan held a variety of positions inresearch management, programme management andproduct management, ultimately achieving the positionof Chief Scientist at the company. This reflected his workon the CINDOR cross-language search system, initiallyas a government-funded research project which wasthen commercialised and marketed by TextWise in theenterprise search space. He also led the effort in adaptingthe CINDOR product to the needs of the U.S. IntelligenceCommunity; developing a cross-language English-Arabicquery translation module to integrate with standardenterprise search platforms.Career HighlightsDr. Sheridan is a regular member of the ProgramCommittee for the <strong>Annual</strong> International ACM SIGIRConference on Research and Development in InformationRetrieval. He has also participated in the SIGIR ConferenceMentoring Program to mentor PhD students in preparingpaper submissions to the conference. He has presentedtutorials on the topic of Multilingual Information Retrievaland Multilingual Access to Digital Information at severalconferences, workshops and summer schools and isregularly invited to speak on Commercial Applications ofMultilingual Retrieval technologies. He served as a memberof an international working group, jointly funded by theEuropean Commission and the U.S. National ScienceFoundation (NSF) to draft a white paper on ‘MultilingualInformation Access for Digital Libraries’. He was a memberof the European DELOS network of excellence for researchin Digital Libraries and organised the DELOS workshop onCross-Language Information Retrieval for Digital Libraries.12 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Track LeaderIntegrated Language TechnologiesProf. Andy WaySchool of Computing, Dublin City UniversityBrief BiographyProf. Andy Way is the <strong>CNGL</strong> PI for the Integrated LanguageTechnologies research track and is Work-Package Leaderfor ILT.1. Prof. Way obtained his BSc (Hons) in 1986,MSc in 1989, and PhD in 2001 from the University ofEssex, Colchester, UK. From 1988 to 1991 worked at theUniversity of Essex, UK, on the Eurotra MT project. Hejoined DCU in 1991 and was promoted to Senior Lecturerin 2001 and Associate Professor in 2006. Prof. Way wasa DCU Senior Albert College Fellow in 2002–03. He isalso an IBM CAS Scientist 2003–date and an SFI Fellow2005–ongoing. He has had Research Grants totallingover €4.2 million, including €3.7 million since 2000and €1.9 million in the recent EU FP7 call. He currentlysupervises 8 students on PhD programmes of study,all of whom are externally funded, and has in additiongraduated 13 PhD and 11 MSc students. He currently holdsthe position of President of the European Association forMachine Translation (EAMT), and is Vice-President of theInternational Association for Machine Translation.Research InterestsAll areas of machine translation: rule-based MT, statisticalMT, hybrid models of MT, evaluation, teaching MT etc.Career Highlights• Grants currently held: €1.2 million for FP7 projectPLuTO (jointly with Sheridan); €320K for FP7project Panacea (jointly with Van Genabith);€400K for FP7 project T4ME (jointly with VanGenabith); €300K for FP7 project EuroMatrix+(jointly with Van Genabith).• Over 130 peer-reviewed papers, includingpublications in the three premier journals in thefield of MT and Natural Language Processing,namely Machine Translation, ComputationalLinguistics and Natural Language Engineering.• Editor of the ‘Machine Translation’ journal2007–to date.• IBM CAS Scientist 2003–to date.• Reviewer for all the major NLP/MT journals andconferences.• Grants previously held: €647K grant for ‘Prospect:Probabilistic Solutions to the Problems ofComputerised Translation’; €200K for project‘ATTEMPT’ from SFI under their RFP scheme;€275K from SFI under their Basic Researchprogramme (jointly with Van Genabith); twoEnterprise Ireland Basic Research (EIBR) grantsjointly worth €305K (with Van Genabith, andMonica Ward); €86K from Enterprise Irelandunder their Commercialisation (Proof of Concept)programme (with Dorothy Kenny, and MinakoO’Hagan); and €36,750 by SFI/Royal Irish Academyunder their China-Ireland scheme.• Elected member of EAMT Committee (2004–todate; currently President), and ILFGA Committee(2003–05).• Organised conference: TMI-07; co-organisedconference: EAMT-CLAW 2003.• Co-organised three major EBMT workshops,at MT Summit VIII (2001), MT Summit X (2005),and most recently as a standalone event inDublin (<strong>2009</strong>).• Track Manager for MT at EACL-06 and ACL-07.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>13


Management Team BiosketchesTrack LeaderSystems FrameworkDr. Saturnino LuzSchool of Computer Science and Statistics,Trinity College. DublinBrief BiographyDr. Luz has worked on the development of noveltechnologies for human-computer interfaces in theareas of computer-supported cooperative work, spokenlanguage systems, natural language processing, dialoguemanagement, and design support tools for multi-modalsystems. He is currently the principal investigator of aResearch Frontiers project aimed at enhancing supportfor medical team meetings, funded by Science FoundationIreland, and supervises PhD and Msc in the areas ofnatural language processing, computer-supportedcooperative work, human-computer interaction andmachine learning. He has also been active in the areas ofmultimedia and health informatics. Dr. Luz participatedin a number of Irish- and EU-funded research projects,working on connected communities and dialogue systemsengineering. He served in the program committee ofseveral international conferences and the editorial boardof international journals. He has been a member of theAssociation for Computing Machinery (ACM) since 1994and contributes regularly to the ACM Computing Reviews.Research InterestsThe theoretical bases of computer-supported collaboration,more specifically processes related to informationstructuring and retrieval, in scenarios encompassingmultimedia data and multimodal interaction. Naturallanguage parsing, text classification, and dialogue systems,particularly human-factors research.Career Highlights• Acted as Principal Investigator ECOMMETproject on Enhanced Computing Support forMultidisciplinary Medical Team Meetings, fundedby Science Foundation Ireland• Principal investigator of a Basic Research projecton content indexing for multimedia meetingrecordings, funded by Enterprise Ireland.• Review selected as a Computing Reviewhighlight; featured as profiled reviewer inacknowledgement of his contributions to thatpublication (2004).• Invited talks at the University of Ulster (2002),at the German Research Centre for ArtificialIntelligence (2003), at the Seminar on New Trendsin Corpus Linguistics for Language Teaching andTranslation Studies (Granada, Spain, 2008), andinvited workshop at the University of South Africa(2003).• Chaired the programme committed of the IrishHuman-Computer Interaction Conference (<strong>2009</strong>)and co-chaired the Special Track on SupportingCollaboration among Healthcare Workers, at theIEEE International Symposium on Computer-BasedMedical Systems (2008-2010).• Served as member of the Editorial Board ofInformation from 2000 to 2003.• Lecturer in Computer Science at TCD since 2001.14 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Track LeaderNext Generation LocalisationReinhard SchälerDepartment of Computer Science and InformationSystems, University of LimerickBrief BiographyReinhard Schäler has been involved in the localisationindustry in a variety of roles since 1987. He is the founderand editor of Localisation Focus – The InternationalJournal of Localisation, a founding editor of the Journalof Specialised Translation (JosTrans), a former memberof the editorial board of Multilingual Computing (Oct 97to Jan 07, covering 70 issues), a founder and CEO of TheInstitute of Localisation Professionals (TILP), and a memberof OASIS. He has attracted more than €5.5M in researchfunding and has published more than 50 articles, bookchapters and conference papers on language technologiesand localisation. He has been an invited speaker at EU andinternational government-organised conferences in Africa,the Middle East, South America and Asia. He is a lecturerat the Department of Computer Science and InformationSystems (CSIS), University of Limerick, and the founder anddirector of the Localisation Research Centre (LRC) at UL,established in 1995.Research InterestsSchäler’s main research area is the automation oflocalisation workflows and the application of toolsand technologies to the localisation of digital content,including translation, engineering and testing. He has beenresearching approaches to Machine Translation (MT) andComputer Assisted Translation (CAT) systems since 1990and has researched different approaches to Example BasedMachine Translation (EBMT) which contributed to thedevelopment of TransRouter, a decision support systemfor project managers who need to select appropriatetools and resources for projects.Career Highlights• Establishment of the Localisation ResourcesCentre (LRC), 1995, £250K• Establishment of the GradDip/MSc in SoftwareLocalisation in 1997.• EU-funded IGNITE project on LinguisticInfrastructure for Localisation: Language Data,Tools and Standards, together with four Europeanindustrial partners, total budget: €3.5M,2005–2007.• Invited keynotes: Localisation andInternationalisation of Software for Export,Florianópolis, Brazil (23–24 November 2004);Manufacturers’ Association for InformationTechnology (MAIT), New Delhi, India(08–10 December 2004); The First InternationalConference on Persian Script & LanguageLocalisation, Supreme Council of ICT & IranTelecom Research Centre, Tehran, Iran(15–16 May 2005); The IEEE ProfessionalCommunication Society, InternationalProfessional Communication Conference,Limerick, Ireland (10–13 July 2005); Schäler, R.,The Irish Model – Localisation, LISA Forum Cairo,The Localisation Industry Standards Association,Cairo, (05–08 December 2005).• Establishment of The Rosetta Foundation in thesummer of <strong>2009</strong>, a not-for-profit organisation(charity) promoting equality through languageand cultural diversity through access to digitalknowledge and information independent oflanguage.• Establishment of the Dynamic Coalition for aGlobal Localisation Platform: Localisation4all,under the umbrella of the United NationsInternet Governance Forum (IGF).<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>15


Management Team BiosketchesEducation & Outreach DirectorProf. Harold SomersSchool of Computing, Dublin City UniversityBrief BiographyHarold Somers joined <strong>CNGL</strong> as Director of E&O from theUniversity of Manchester, which he left after 30 years inthe Centre for Computational Linguistics in UMIST, prior tothe amalgamation of UMIST with Manchester University inOctober 2004. His first degree was in Linguistics from theUniversity of Wales (UCNW Bangor), followed by an MAin General Linguistics from Manchester, and a PhD fromUMIST. At UMIST since 1978, he became a full Professorin 1996 and Head of Department in 2000. During thisperiod he had three breaks on leave of absence, at ISSCOin Geneva in 1979–80, as a Toshiba Fellow in Japanin 1986–7, and in 2005 at the Centre for LanguageTechnology, Macquarie University, Sydney. He was editorof the journal Machine Translation, published by Springer,for ten years, and now assists his successor in that role,Andy Way of DCU. He is also on the Editorial Board of thejournal Localisation Focus. He was programme chair for theEACL89, TMI99 and EAMT09 conferences and has beenlocal organiser for several conferences, most notablyColing 2008 which took place in Manchester. He is aregular reviewer for all the conferences and journals inthe field. He was until last year a member of the ExecutiveCommittee of the European Association for MT (EAMT),and has been a member of the Advisory Committee of theEuropean Chapter of the Association for ComputationalLinguistics, which he helped found in 1982 and of whichhe was secretary from 1982–86.Research InterestsSomers is probably best known for his work on MachineTranslation (MT): as a leading member of the British Eurotragroup in the 1980s, his research interests have mostrecently focused on corpus-based methods, especiallyas applied to under-resourced (minority) languages:his current research includes work on English-to-IrishSign Language MT. Beyond MT, his research relatesto applications of language technology as a branch ofAssistive Technologies in areas of healthcare provision,where it can be used to help patients with limited or noEnglish, typically people from ethnic minorities.Career Highlights• 1981–1994: Worked on EU’s Eurotra MT project;UK group leader in its final years• 1989: Local organiser and programme chair ofEACL in Manchester• 1990: research grant £60.5K: pioneering workon TM forerunner• 1991–4: UK research grant £140K to developinteractive multilingual avalanche warningmessaging system• 1995–8: EU research grant £127K to develop on-linemultilingual employment information tool• 1996–2006: Editor of Machine Translation• 2004–6: UK research grant £139K to developassistive technology for patients with limitedEnglish• 2008: Local organiser, Coling, in Manchester• <strong>2009</strong>: Programme chair, EAMT (in Barcelona)16 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in hereResearch OverviewThe <strong>CNGL</strong> vision is to enable people to interact with content, products andservices in their own language, according to their own culture, and accordingto their own personal needs<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>17


Research ObjectivesLocalisation is the process of adapting digital content to culture, localeand linguistic environment. Localisation brings products and services tomarkets that are otherwise inaccessible. Because of this, localisation isa core multiplier and value-adding component of the global software,services, manufacturing and content distribution industries. Currently,there are there massive challenges facing localisation:Three Global Challenges to LocalisationVolume: the amount of content that needs to be localisedinto ever more languages is growing steadily and massivelyoutstrips current translation and localisation capacities.As a consequence, only a fraction of the content thatneeds to be localised is localised and usually only into alimited set of languages. Many business opportunities aremissed and, what is more, lack of localisation contributesto the digital divide, with essential (e.g. health and hygiene)information, products and services unavailable in languageswhich currently do not promise ROI on localisation costs.Access: traditionally, localisation assumes print or fullscreen- and keyboard-based access to content. Morerecently however, new and evolving generations of smalldevices (smart phones and PDAs) support on-the-moveand instant access to digital content. Novel interactionmodalities such as speech-enabled access are not supportedby current localisation technologies. Traditional localisationworkflows assume predictable, stable, corporate contentand localisation is viewed as a well-managed, large-scale,off-line process. Today, however, much digital content isperishable with frequent updates and rapidly increasingvolumes of user-generated content (user forums, blogsetc.). Instant access to on-line content requires a newbreed of fully automated on-line localisation technologies.Conceptually, we represent the three challenges in termsof a localisation cube (Figure 1):AccessPersonalisationVolumeFigure 1: The Localisation Cube (current localisationtechnologies)Current state-of-the-art localisation technologiesinstantiate large and well-managed localisation workflows,targeting the lower, front-right part of the localisationcube (Figure 1), with large parts of the cube remainingunaddressed.Personalisation: traditionally, localisation is coarse-grainedaccording to generic notions of locales and linguisticenvironments. What is localised is information. Informationis most valuable if adapted to personal requirementsincluding task at hand, level of expertise, age-groupand personal preferences and expectations. Traditionallocalisation needs to be overlaid and integrated withfine-grained personal information cutting across traditionalnotions of locale and linguistic environment: the personis the ultimate locale.Prof. Josef van Genabith, <strong>CNGL</strong> Director, presenting anoverview of <strong>CNGL</strong> research18 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Unified ModelDigital ContentManagementIntegrated LanguageTechnologiesEnterprise LocalisationNextGenerationLocalisationPersonaliased LocalisationSystemsFrameworkFigure 2: Organisation of the <strong>CNGL</strong> Research ProgrammeAddressing the Challenges: the <strong>CNGL</strong>Research StrategyThe challenge is to develop next-generation localisationtechnologies and processes that allow us to address anypoint in the space defined by the localisation cube (Figure 1),at configurable speed and quality, realising the <strong>CNGL</strong> visionto enable people to interact with content, products andservices in their own language, according to their ownculture, and according to their own personal needs. Inorder to overcome the combined challenges of volume,access and personalisation, the <strong>CNGL</strong> research programmeis structured as shown above (Figure 2).The programme intertwines four research tracks: to afirst approximation, two of them, Integrated LanguageTechnologies (ILT) and Digital Content Management (DCM)are basic research tracks, and the remaining two, NextGeneration Localisation (LOC) and Systems Framework (SF)are more applied, integrating research tracks.Integrated Language Technologies: ILT focuses onMachine Translation (MT), improving upon current MTtechnologies through integration of syntactic informationin both SMT and example-based MT, the development ofnovel hybrid MT systems, automatic domain adaptation, novelMT evaluation methods and investigating the impact ofcontrolled language on MT. ILT features a Speech Technologycomponent, closely intertwined with the MT research,to develop Speech Technologies that are less languagedependent and can be adapted more easily to multilingualapplications and tightly coupled Speech-MT systems wherethe Speech system can profitably use information providedby the MT system and vice-versa. ILT features a TextAnalytics component focusing on automatic annotationof localisation relevant meta-data, text classification(to e.g. support domain tuning of MT) and dependencyannotation (to e.g. support syntax-enhanced MT).Digital Content Management: DCM focuses oncombining Adaptive Hypermedia (AH) with InformationRetrieval (IR) technologies to support the <strong>CNGL</strong>personalisation agenda in a multilingual setting. In orderto achieve its objectives, DCM concentrates on automaticacquisition of domain information and shallow subjectontologies from raw text, as manual construction is timeconsuming, expensive and difficult to scale. As informationqueries are often the starting points of an interaction withdigital content, DCM focuses on query expansion andoptimisation in multi-lingual contexts. Content needs to besliced and recomposed to deliver personalised informationresponses. DCM investigates novel methods based oninsights from AH and IR for personalised multi-lingualinformation access and delivery.Next Generation Localisation (LOC): the technologicaladvances from ILT and DCM need to be integrated into theworkflows of the Next Generation Localisation Factory.In order to achieve optimal integration, LOC researchesthe whole life-cycle of digital content, including contentdevelopment and design for internationalisation. Standardsare a crucial factor in achieving reusable and modularcomponents in localisation workflows, and ensure thatlocalisation-relevant information can be exploited optimallyby those components. Sophisticated language and digitalcontent management technologies need to be evaluatedand integrated into workflows and combined with existinglocalisation technologies (such as Translation Memories(TMs) and Terminology Management Systems) and humanpre- and post-processing, including crowdsourcing. Finally,LOC develops the blue-prints for the Next GenerationLocalisation Factory, which will be able to respond flexiblyto localisation requirements addressing different pointsin the localisation cube (Figure 1) at configurable speedand quality.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>19


Research ObjectivesSystems Framework: to date, software engineeringaspects of complex language and digital contentmanagement technology based systems areunderexplored. The Next Generation LocalisationFactory will be highly modular and adaptive witheasily and on the fly reconfigurable workflows. SFinvestigates rapid prototyping systems and designssupporting adaptive workflows, using web-based servicearchitectures. User interfaces are a crucial component insuch systems and novel interfaces need to be developed(to e.g. optimally support post-editing MT output). Finally,SF coordinates and implements the evolution of <strong>CNGL</strong>demonstrator systems.AccessPersonalisationVolumePersonalisedMultilingualSocial NetworkingPersonalisedMultilingualCustomer CareBulk LocalisationFigure 3: <strong>CNGL</strong> Demonstrator Systems and Use Scenariosin the Localisation Cube.<strong>CNGL</strong> Demonstrator SystemsDemonstrator systems are a core part of <strong>CNGL</strong> research.The demonstrators provide focal points for projectcohesion and collaboration, combining technologiesand teams from across <strong>CNGL</strong>.The demonstrators are essential for cross track and overallproject evaluation and provide platforms for research andexperimentation across the <strong>CNGL</strong>. They showcase <strong>CNGL</strong>technologies to the outside world. <strong>CNGL</strong> demonstratorsystems instantiate core use scenarios in the space definedby the localisation cube (Figure 3).The Bulk Localisation Workflow (BLW) scenario targetslarge volume localisation tasks with and withouthuman pre- and post-editing familiar from current largelocalisation projects. The focus is on predictable corporatecontent, automation (MT) and the optimal integration ofcrowdsourcing (where applicable) in an off-line localisationprocess with modest levels of personalisation and standardprint and full screen based access modalities.The Personalised Multilingual Customer Care (PMCC)scenario focuses on users interacting with on-line andperishable digital corporate and user-based (product blogs)content, providing for frequent updates, speech-basedinteraction modalities (in addition to the more traditionalmodalities) and sophisticated levels of personalisationin real time interactions, without human pre- and postprocessinginterventions.Finally, the Personalised Multilingual Social Networking(PMSN) scenario focuses on user generated (in contrastto corporate) and highly perishable content prevalent onsocial networking sites, with high levels of personalisationand full use of all access modalities, to put networking sitesin contact across linguistic barriers. <strong>CNGL</strong> demonstratorsystems evolve according to a five year researchprogramme (Figure 4):ILT DCM LOC SF D1 – Y1 ‘08BaselineBLWD2 – Y2 ‘09 D3 – Y3 ‘10 D4 – Y4 ‘11 D5 – Y5 ‘12BLW+PMCCBLW++PMCC+PMSNULFULF+BLWBulk Localisation WorkflowPMSNPersonalised Multilingual Social NetworkingPMCCPersonalised Multilingual Customer CareULFUnified Localisation FactoryFigure 4: Evolution of <strong>CNGL</strong> Demonstrator Systems20 Centre for Next Generation Localisation (<strong>CNGL</strong>)


with base-lines evolving into more sophisticated systemsin subsequent project years. Years four and five will seethe emergence of an adaptive Unified Localisation Factory(ULF), which can instantiate any point in the localisationcube (Figure 1) on demand with configurable quality andspeed. In order to realise this trajectory, workflows needto be highly flexible, reconfigurable and adaptive: alldemonstrator systems (starting with year one) are basedon the shared <strong>CNGL</strong> components framework (includingcomponents from the <strong>CNGL</strong> academic and industrypartners) and web services based architectures, wherecomponents cluster into sub-scenarios, and sub-scenariosprovide important parts of the <strong>CNGL</strong> demonstratorsystems (Figure 5).<strong>CNGL</strong> ComponentsSS1SS2SS3SS4SS5SS6SS7SS8SS9Figure 5: <strong>CNGL</strong> Component and Demonstrator FrameworkD1D2 <strong>CNGL</strong>VISIOND3Research Achievements in <strong>2009</strong>Although <strong>CNGL</strong> is a young research centre (at the end of<strong>2009</strong> <strong>CNGL</strong> is two years old), <strong>CNGL</strong> is rapidly becominga household name in localisation, language and contentmanagement technologies and research across the world,with a strong flow of research publications, inventiondisclosures, patent applications and industry-academiacollaborations, substantially outperforming the original<strong>CNGL</strong> key performance indicator targets for <strong>2009</strong> (Table 1):<strong>CNGL</strong> Research Outputs <strong>2009</strong> Target ActualJournal papers, book chaptersand books5 8Conference publications 15 42Invited talks 5 8Conferences / workshops hosted 4 20Patent applications 0 2Invention disclosures 4 4Spin-outs 0 1Detailed accounts of research highlights are provided inthe sections on the <strong>CNGL</strong> research areas and the <strong>CNGL</strong>demonstrators systems below.At the same time <strong>CNGL</strong> has become a sought afterresearch partner and research leader in internationalresearch projects in the core <strong>CNGL</strong> research areas: <strong>2009</strong>has seen <strong>CNGL</strong> develop a diversified funding stream ofover €2M in addition to SFI’s CSET award, including rolesas a project partner or project leader in five EuropeanCommission FP7 projects with partners across Europe andthe US (Table 2). These developments are evidence of therapid development of the international research standingand recognition of <strong>CNGL</strong>.Table 1: <strong>CNGL</strong> Research OutputsType EC Participation Title FundingFP7 STREPPartnerEuroMatrixPlus: Bringing Machine Translation for European Languagesto the User273,210FP7 STREP PartnerPANACEA: Platform for Automatic, Normalised Annotation andCost-Effective Acquisition of Language Resources for Human Language 299,200TechnologyFP7 STREP Partner CoSyne: Multilingual Content Synchronisation with Wikis 303,186FP7 ICT-PSP Lead PLUTO: Patent Language Translation Online 825,271FP7 Networkof ExcellencePartnerTable 2: <strong>CNGL</strong> International Research ProjectsT4ME NET: Technologies for the Multilingual European InformationSociety379,740<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>21


Heading in hereIntegrated LanguageTechnologies (ILT)Having successfully completed the ramp-up phase of ourefforts in the first year of the <strong>CNGL</strong>, ILT research is nowwell underway and significant progress has been made.22 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Strand Name: Integrated Language Technologies (ILT)Area Co-ordinator: Prof. Andy WayParticipant Names & AffiliationIndustrial CollaboratorsDr. Declan GrovesDr. Fred HollowoodMr. Paul McManusMr. Dag SchmidtkeDr. Alexander TroussovTraslánSymantecSDLMicrosoftIBMInternational CollaboratorsProf. Rens BodProf. Walter DaelemansProf. Bernd MöbiusProf. Hermann NeyProf. Khalil Sima’anProf. Eiichiro SumitaProf. Antal van den BoschProf. François YvonSt. Andrew’s, UKAntwerp, BelgiumStuttgart, GermanyRWTH Aachen, GermanyAmsterdam, NetherlandsATR, JapanTilburg, NetherlandsParis, FranceFacultyProf. Nick Campbell Trinity College Dublin ILT2Prof. Julie Carson-Berndsen University College Dublin ILT2 leaderDr. Martin Emms Trinity College Dublin ILT3Dr. Christer Gobl Trinity College Dublin ILT2Dr. Dorothy Kenny Dublin City University ILT1Dr. Saturnino Luz Trinity College Dublin ILT3Prof. Ailbhe Ní Chasáide Trinity College Dublin ILT2Dr. Sharon O’Brien Dublin City University ILT1Prof. Harold Somers Dublin City University ILT1Prof. Josef van Genabith Dublin City University ILT1, ILT3Dr. Carl Vogel Trinity College Dublin ILT3 leaderProf. Andy Way Dublin City University ILT1 leaderPost-Doctoral ResearchersDr. Anton Bryl Dublin City University ILT1Dr. Peter Cahill University College Dublin ILT2Dr. Özlem Çetinoğlu Dublin City University ILT3Dr. Jinhua Du Dublin City University ILT1Dr. Jie Jiang Dublin City University ILT2Dr. Patrik Lambert Dublin City University ILT1Dr. Baoli Li Trinity College Dublin ILT3Dr. Yanjun Ma Dublin City University ILT1Dr. Julie Mauclair University College Dublin ILT2Dr. Sara Morrissey Dublin City University ILT1Dr. Sudip Naskar Dublin City University ILT1Dr. Irena Yanushevskaya Trinity College Dublin ILT2<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>23


Strand Name: Integrated Language Technologies (ILT)PhD StudentsMr. Mohamed Abou-Zleikha University College Dublin ILT2Mr. Zeeshan Ahmed University College Dublin ILT2Ms. Hala Al-Maghout Dublin City University ILT1Mr. Pratyush Banerjee Dublin City University ILT1Ms. Hanna Béchara Dublin City University ILT1Mr. Sandipan Dandapat Dublin City University ILT1Mr. Stephen Doherty Dublin City University ILT1Mr. Hector Hugo Franco Penya Trinity College Dublin ILT3Mr. Rejwanul Haque Dublin City University ILT1Mr. Yifan He Dublin City University ILT1Mr. John Kane Trinity College Dublin ILT2Mr. Mark Kane University College Dublin ILT2Mr. Gerard Lynch Trinity College Dublin ILT3Ms. Liliana Mamani Sanchez Trinity College Dublin ILT3Mr. Udochukwu Kalu Ogbureke University College Dublin ILT2Mr. Tsuyoshi Okita Dublin City University ILT1Mr. Sergio Penkale Dublin City University ILT1Mr. Robert Smith Dublin City University ILT1Mr. Ankit Srivastava Dublin City University ILT1Ms. Eva Szekely University College Dublin ILT2Ms. Amalia Zahra University College Dublin ILT2MSc StudentsMr. Alfredo Maldonado Guerra Trinity College Dublin ILT3Research AssistantsMr. Aengus Walton Trinity College Dublin ILT3Funding<strong>2009</strong> funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €1,173,0262010 expected funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €1,258,552<strong>2009</strong> funding from other sources:Way (ILT1): SFI-RFP (06/RF/CMS064): €166K (to Sept. 09)Way (ILT1): SFI-PI (05/IN/1732): €624K (to Aug. 09)Vogel (ILT3): SFI-RFP (05/RF/CMS002): €155K (to Oct. 09)Van Genabith (ILT3): SFI-PI (€840K)Forcada (Walton): €124K2010 expected funding from other sources:Way (ILT1): EU FP7 PSP PLuTO: €825K (to Feb 2013)Way (ILT1): EU FP7 STREP Panacea: €299K (to Dec 2012)Van Genabith (ILT1): EU FP7 STREP EuroMatrix+: €273K (to March 2012)Van Genabith (ILT1): EU FP7 NoE T4ME: €379K (to Feb 2013)Somers (ILT1): EU FP7 STREP CoSyne: €303K (to Feb 2013)24 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Research Overview: Integrated Language Technologies (ILT)The main focus in ILT is to improve and generate new models ofmachine translation (MT) systems capable of high-quality output andfacilitating a range of input and output modalities (ILT1); to produceintelligent speech recognition and synthesis engines for multilingualeyes-busy, hand-busy scenarios (ILT2); and to develop novel methodsof automatic annotation of monolingual and multilingual dataaccording to well-defined linguistic and localisation criteria, in orderto facilitate improved MT technology (ILT3).Research Barriers and Methodologies toAddress ThemMore and more people believe that state-of-the-artphrase-based approaches to MT have reached a ceilingbeyond which significant improvements will not comeabout unless more linguistic information can be capturedby such models. In ILT1, we are enhancing our systemswith syntax and semantics at all levels in the MT pipeline.In parallel, we are developing different kinds of MTsystems, including hybrid and combined systems, andmachine learning-based transfer systems, all with thecapability to be tuned to specific domains and differentmodes of input and output. Similarly, in ILT2, researchis addressing the shortcomings of current spokenlanguage technologies in scaling to open domains andother languages by explicitly using fine-grained linguisticknowledge and by automating as much as possible of thedata acquisition and structuring. Recognising that there aresynergies between the technologies and methodologiesused in MT and speech technology which can be utilisedfor innovations in both areas, ILT is seeking to tightlycouple MT engines from ILT1, and speech recognition andsynthesis engines from ILT2. In ILT3, research addressesthe level of linguistic and localisation metadata required tosupport MT in the localisation process, as well as focusingon domain and text classification, using a variety ofsupervised and unsupervised approaches.Year 2 ProgressHaving successfully completed the ramp-up phase of ourefforts in the first year of the <strong>CNGL</strong>, ILT research is nowwell underway, and significant progress has been made inILT during year 2. In addition, the ILT1 track has been verysuccessful in attracting external funding in the past year.The EuroMatrix+ project was already underway last year,but since then, four new projects have been approved forfunding by the European Commission, with negotiationsfinished for all four projects, and contracts about to besigned. The total of five FP7 projects will generate around€2.1 million in income over the next three years, and see11 new staff (10 research, 1 administrative) in place, all inDCU. Other projects are under review (by the Commission,as a Marie Curie Initial Training Network has beensubmitted with DCU as a partner), and other FP7 proposalsare in the process of being drafted.During year 2, there have been some changes in staff, withtwo postdoctoral researchers leaving for positions in France,but replacements are in situ, and recently new postdocs andPhD students have arrived. In addition, we were delightedto welcome Prof. Nick Campbell as a fully-fledged PrincipalInvestigator in ILT2 following his move to Trinity CollegeDublin. Prof. Campbell had previously been involved with<strong>CNGL</strong> as one of our International Collaborators.In sum, at the end of <strong>2009</strong>, the ILT team consists of67 staff and students. In ILT1, the team comprises 5 PIs(3 at Professorial level, 1 Senior Lecturer and 1 Lecturer),5 Post-doctoral researchers, and 11 postgraduate students(10 PhD, 1 MSc). With the other 10 MT-related staff andstudents affiliated to the CSET from the FP7 projects, inaddition to staff and students in the NCLT (National Centrefor Language Technology, School of Computing) and theCTTS (Centre for Translation and Textual Studies, Schoolof Applied Languages and Intercultural Studies), we have42 staff and students in DCU conducting research into avery wide range of topics related to MT and the wider fieldof translation.For ILT2, the Speech Technology team comprises15 researchers across two sites (UCD and TCD), with4 PIs (3 at Professorial level, and 1 Senior Lecturer), 4 Postdoctoralresearchers, and 6 postgraduate students (all atPhD level). Regarding ILT3, the Text Analytics team consistsof 10 researchers, 4 of whom are PIs (1 at Professorial level,and 3 College Lecturers), 3 Post-doctoral researchers, and3 postgraduate students (all at PhD level), all of whom willinterface with other non-<strong>CNGL</strong> researchers to bring theteam to around 15 in total.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>25


Research Overview: Integrated Language Technologies (ILT)The ILT group has an impressive publication record; in <strong>2009</strong>alone, we have had over 50 publications accepted in arange of targeted leading journals, conferences, and books,with others under review. ILT researchers have contributedsignificantly to both the Bulk Localisation Workflow (BLW)and Personalised Multilingual Customer Care (PMCC)Demonstrators, in terms of MT, Text Classification,Controlled Language, and Speech Technology. Others(including Speech Technology researchers) are alreadymembers of the Personalised Multilingual Social Network(PMSN) Demonstrator team which is tracking the othertwo Demonstrator teams.Following last year’s world-leading performances,research highlights were (i) the world-leading positionsachieved by the DCU team for their MaTrEx MT system:for English−French at WMT (EACL <strong>2009</strong>), and forChinese−English at IWSLT <strong>2009</strong>; and (ii) the first <strong>CNGL</strong>patent application emerged from joint work betweenDr. Peter Cahill (ILT2, UCD) and Dr. Jinhua Du (ILT1, DCU),with contributions from Profs. Berndsen and Way, in thearea of MT facilitating Speech Synthesis. The ILT1 teamalso reported good performance in the NEWS-09English–Hindi named entity translation task (at ACL <strong>2009</strong>),and collaborated with DCM researchers on the CLEF-09tasks. The ILT3 team achieved good performance inthe CoNLL-09 shared task on multilingual semantic rolelabelling. A significant achievement of the ILT2 teamin <strong>2009</strong> was the open-sourcing of the Muse SpeechTechnology Platform.In <strong>2009</strong>, three research events were organised by membersof the ILT team: in ILT2, Post-doctoral researchers Dr. Cahilland Dr. Mauclair put together a speech research workshopat UCD for young researchers on 25 April <strong>2009</strong>, attendedby 30 delegates; and PhD students Srivastava, He, Okita(all ILT1) and Lynch (ILT3) organised the 12th <strong>Annual</strong>CLUKI Research Colloquium for PhD students in UK andIreland (http://www.cngl.ie/cluki) on the 23-24 April <strong>2009</strong>,with 40 participants. These two events were co-located,and contributed significantly to the progress of the PhDresearchers in all three ILT strands. Furthermore, Prof. MikelForcada, a visiting SFI-funded Walton scholar undertakingthe task of releasing an open-source Example-Based MTtoolkit, and Prof. Andy Way jointly organised the 3rdInternational Workshop on EBMT (http://computing.dcu.ie/~mforcada/ebmt3/) attended by over 50 internationalresearchers, and opened by Dr. Stephen Flinter of SFI.Other Relevant Work in the Field andHow This Compares<strong>CNGL</strong> research has extended the state-of-the-art inMachine Translation (MT). While the ILT1 group carryout work on Phrase-Based Statistical models, we havecome up with novel ways of enhancing these through theincorporation of different types of syntactic and semanticinformation at various stages in the MT pipeline. Thesesystem improvements have demonstrated real successin large-scale MT evaluation campaigns. In addition, wecontinue to develop our in-house tree-to-tree MT systemin the expectation that the field will eventually gravitatetowards using structured models of both source and targetlanguages. Furthermore, we continue developing novelways of performing MT evaluation, which correlate betterwith human judgement than mainstream methods. Therange of innovative MT applications under continuousdevelopment deploys the said MT systems in well-definedscenarios where there is real need to address users notcurrently supported.<strong>CNGL</strong> speech synthesis is in line with the state-of-the-art.Both unit selection and statistical synthesis methods arebeing pursued and hybrid models will be investigatedin the near future. These technologies will be used asplatforms for research on multilingual synthesis andsynthesis with speaker variation. <strong>CNGL</strong> speech recognitionutilises stochastic modelling methodologies and machinelearning techniques while also experimenting with theexplicit integration of phonetic and linguistic informationthat is applicable across languages. Research efforts on<strong>CNGL</strong> recognition focuses on addressing open domainswith unrestricted vocabularies, on the inclusion ofphonetic similarity measures for modelling of variation andvariability, and on the detection of language-independentfeatures.<strong>CNGL</strong> Text Analytics research is also in line with thestate-of-the-art. Automatic dependency annotation forlarge multilingual text corpora supports novel methodsof probabilistic transfer rule acquisition, syntax-enhanceddata-driven models of MT, and query processing in DCM.Automatic metadata annotation to support the localisationprocess uses a range of semi-supervised and active learningtechniques. Multilingual text type and genre classificationmethods use probabilistic supervised learning models,while unsupervised approaches for the detection of corpushomogeneity are under active development. This researchwas franked by a strong performance at CoNLL <strong>2009</strong> onsemantic role labelling with structured resources.26 Centre for Next Generation Localisation (<strong>CNGL</strong>)


AchievementsSignificant accomplishments by the ILT1 group includeBest Thesis Award at the LRC Conference in Sept <strong>2009</strong>won by Dr. Hany Hassan. Two world-leading results wereachieved in MT Evaluation competitions for English–French(WMT, April <strong>2009</strong>) and Chinese–English (IWSLT, Dec <strong>2009</strong>).Prof. Harold Somers was conference programme co-chairfor EAMT-09, held in Barcelona, May <strong>2009</strong>. Prof. AndyWay was elected President of the European Associationfor Machine Translation for <strong>2009</strong>-11, and becomesthe Vice-President for the International Association forMachine Translation for the same period. PhD studentDaniel Galron from New York University visited the <strong>CNGL</strong>between February and May <strong>2009</strong> to work on MT, andDr. David Farwell from UPC Barcelona spent part of hissabbatical leave in the <strong>CNGL</strong> during Sept-Dec <strong>2009</strong>working in the MT group. The work of the group hasbeen further strengthened by the award of an SFI-fundedWalton Scholarship to Prof. Mikel Forcada (University ofAlicante, Spain), to work in the <strong>CNGL</strong> from June <strong>2009</strong>to May 2010. Dr. Johann Roturier (Symantec) was invitedto give the keynote address at the Twelfth MachineTranslation Summit in Ottawa, Canada in August <strong>2009</strong>.Contract research is being carried out by ILT1 for Snap-OnDiagnostics, a leading automotive diagnostics companylocated in Cork. The sign-language translation subgroupis setting up a cross-university, cross-disciplinary signlanguage special interest group with the Centre for DeafStudies in Trinity, the School of Informatics at the Instituteof Technology, Blanchardstown, and members of the IrishDeaf community. EU funding from FP7 projects has beensecured in connection with ILT1 projects EuroMatrix+(Van Genabith and Way, €273K), CoSyne (Somers,Dr. Sara Morrissey and Mr. Shane Gilchrist discussing theirwork on Machine Translation for Sign Language€303K), PLuTO (Way and Sheridan, €825M), Panacea(Way, €299K), and the T4ME Network of Excellence(Van Genabith, €379K).In ILT2, the most significant accomplishments weretwofold: (i) the open-source release of the Muse SpeechTechnology Platform, shown in Figure 6, which facilitatesstate-of-the-art research in speech technology by makingit easier to compare experimental results, establishingstandard and repeatable experiments, integrating supportfor common file formats, providing reusable componentsand serving as a platform for new PhD students within theproject to access current technologies; and (ii) togetherwith researchers in ILT1, the development of a samelanguageMT system to create alternative target sequencesfor the synthesis engine, paraphrasing text input so thatthe synthesiser can identify and synthesise the version ofa sentence it can say best. This integration resulted in a<strong>CNGL</strong> patent application.ApplicationsSpeechSynthesisersRecognisersAnnotationsOtherLexicaShellMuse PlatformAlgorithmsExperimentsNew AlgorithmsWork in ProgressMetadataProof of ConceptESPSPraatFigure 6: The Muse Speech Technology Platform<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>27


Research Overview: Integrated Language Technologies (ILT)SourceMarker WordsTargetMarker WordsSource ChunksTarget ChunksWord AlignerChunk Aligner‘Distance’ MetricAligned SentencesAligned WordsAligned ChunksInput SentenceDecoderOutput TranslationFigure 7: The proposed open-source MaTrEx system incorporating Marclator (the shaded area)In ILT3, the team enjoyed success in a shared task at CoNLL<strong>2009</strong> on semantic role labelling with structured resources,taking advantage of latent frequency information implicitin the ordering of senses in electronic lexical resources.In addition, text classification methods were improved bymaking recourse to sub-class data in positive and negativeexamples for categories. In addition, the ILT3 group hasmade a range of classification tools available to webinterfaces to support <strong>CNGL</strong> demonstration activities.PlansIn ILT1, the push for research excellence will continue,and will be complemented by the new FP7 projects.The Panacea project aims to build a factory for MTand translation resources of use to both the academicand industrial sectors. This will be used as a testbed inthe T4ME Network of Excellence, established to bringvarious sectors together (MT, machine learning, languagetechnology, social computing etc.) in order to supportEuropean linguistic and cultural diversity bysharing resources and consolidating diverse effortsbeing undertaken by different funding agencies,language communities, corporate users, andcommercial technology providers. The CoSyneproject seeks to automate the synchronisationof dynamic multilingual wikis such as Wikipedia.PLuTO aims to facilitate multilingual retrieval ofpatent information. In conjunction with theseefforts, the pre-<strong>CNGL</strong> version of the world-leadingMaTrEx MT platform will be released as open-source,starting with the release of Marclator, the DCUExample-Based MT system, as seen in Figure 7.ILT researchers discussing their work with Dr. Doug Arnoldof the International Review Panel.In ILT2, collaboration on phonotactic modellingfor exemplar representations and modelling of thefundamental frequency, F0, for speech applicationswill be carried out with <strong>CNGL</strong> international partnerProf. Bernd Möbius. As far as speech synthesisis concerned, additional support for stochastic(HMM) synthesis methods will be added to theUCD synthesiser, along with investigations intohybrid unit selection and stochastic synthesis.This will enable the incorporation of some of thevoice source characteristics being investigatedin ILT2.1. Building on the same-language MT28 Centre for Next Generation Localisation (<strong>CNGL</strong>)


model for synthesis that was developed in <strong>2009</strong>, furtheropportunities will be sought for a more tightly coupledintegration of the <strong>CNGL</strong> speech synthesis engines withother MT engines of ILT1. As regards speech recognition,the intended work is threefold: (i) accounting for nativeand non-native speaker variation in closed domains formultiple languages; (ii) the explicit integration of linguisticinformation into the multi-level recognition process tofacilitate extension to open domains via phonetic featuresand phonotactic models; and(iii) integration of the <strong>CNGL</strong> speech recognition engineswith the MT engines of ILT1 in a principled and motivatedway at varying points on a linguistic hierarchy in order tobest utilise the knowledge acquired in and required byeach engine. By the <strong>CNGL</strong> Spring Scientific Meeting, inclose collaboration with DCM, the <strong>CNGL</strong> speech synthesisengine will be integrated into the <strong>CNGL</strong> PersonalisedMultilingual Customer Care (PMCC) Demonstrator toprovide speech output for personalised presentation ofinformation. A command-based speech input modalityis also being investigated for this demonstrator. Beyondthe PMCC Demonstrator, ILT2 will collaborate with DCMresearchers in the context of extending the Language Trapgame scenario to additional languages, with closed domainrecognition and synthesis. The integration of the <strong>CNGL</strong>speech synthesis and recognition engines with MT enginesof ILT1 will also be demonstrated.In the third year of the project ILT3 plans to:• continue engagement with the SF track onexploiting text classification scenarios inindustrially relevant scenarios of scientific interest.This includes intra-sentential and document levelclassification tasks; and• explore theoretical upper bounds of classificationaccuracy as a function of data-set properties.Industry EngagementThe <strong>CNGL</strong> industrial partners, especially Symantec, IBM,Microsoft, Traslán, Alchemy and VistaTEC, are all wellintegrated in research track ILT1. Staff from Microsoft,Traslán, Symantec and IBM have all featured in peerreviewedjoint publications from this track. Alchemy andVistaTEC feature in the <strong>CNGL</strong> EYECON project, whichcentres on the integration between MT and TranslationMemory systems, with eye-tracking used as a predictorof the cognitive load involved in post-editing MT output.This initiative has received extra financial support fromAlchemy over and above its initial commitments tothe <strong>CNGL</strong>.The ILT1 MT group is already heavily engaged incommercialisation activities. ILT1 researchers were partof the group (with ILT2) that developed the first <strong>CNGL</strong>patent application in <strong>2009</strong>. Contract research is alreadybeing carried out, which may lead to additional projectopportunities and new <strong>CNGL</strong> industrial partners. Inaddition, moves have been taken to create spin-offcompanies centred on the MaTrEx MT system, including,as part of the PLuTO FP7 project featuring Prof. Way andDr. Sheridan, a spin-off company to facilitate multilingualpatent search.While limited domain speech recognition and synthesiscan provide speech interface for specific applications,much of the research in ILT2 aims to underpin opendomain recognition and synthesis and facilitateextendibility of interfaces not only to other languagesbut also allow for more natural interactions rather thanrestricted dialogues. This facility is of interest to industrypartner SpeechStorm who provide self-service solutionsfor managing customer interactions. SpeechStorm areproviding speech data and are advising on prioritisationof tasks in the expansion to open domains.In the past year, ILT3 researchers have continued to workwith the datasets supplied by Symantec and VistaTEC, andhave supported mutual scientific interests in industriallyrelevant problems.Mr. Ankit Srivastava, an ILT PhD student at DCU<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>29


Heading in hereDigital Content Management (DCM)The challenge for DCM within Next Generation Localisationis to enhance and combine key aspects of AdaptiveHypermedia (AH) and Information Retrieval (IR) researchto provide techniques, technology and prototype systems,to implement advanced content retrieval, slicing andadaptive composition of multilingual digital content.30 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Research Strand Overview: Digital Content Management (DCM)The key challenge of the DCM research track is to provide a stepchange in content localisation focusing on three areas: user queryenhancement, metadata and model development, and dynamiccomposition of localised content, customised for the users’ needsand context of use.The DCM track is divided between these three work areas,called DCM 1, DCM 2 and DCM 3:• Enhancement of user queries based on usercontext information and feedback (DCM1)• Automation and semi-automation of the modelsand generation of metadata required forlocalised content composition (DCM 2)• Support for dynamic composition of localisedcontent, customised for the user’s need andcontext (DCM 3)However, the research is integrated across the work areasvia combined prototypes and experiments. Key prototypesare integrated across research tracks (i.e. ILT, LOC, SF) viaDemonstrator systems. Principally the DCM prototypes arebeing used within the Personalised Multilingual CustomerCare Management Demonstrator (PMCC) and will be usedwithin the Personalised Multilingual Social NetworkingDemonstrator (PMSN).Research Barriers and Methods toAddress ThemWith the increasing volume of digital content and thediversity of devices upon which localised content needsto be rendered, it is becoming impossible to manuallyannotate, slice and compose appropriate localised content.In addition, localisation needs to not only adapt to suitspecific corporate localisation requirements, but also satisfyindividual user needs for localised translations. The threeprincipal areas of DCM research relate to the challenges oflocating and retrieving content; of modelling knowledgein a structured, reusable way; and in supporting theuser by harnessing adaptivity to give users significantlyimproved access to the information they need. A centraltheme running through all of these challenges is theneed to provide the information in a form that is tailoredto the user’s requirements and preferences, and whichincludes not only the direct response to their query, but keysupporting information that the user might need to achievetheir goal.The challenge for DCM within Next Generation Localisationis to enhance and combine key aspects of AdaptiveHypermedia (AH) and Information Retrieval (IR) researchto provide techniques, technology and prototype systems,to implement advanced content retrieval, slicing andadaptive composition of multilingual digital content. TheDCM1 work package addresses the issues of IR researchdirectly. There is a large community of research involvedin IR, particularly on web data. DCM1 research includesthe application of cross-lingual techniques to permitusers to gain access to information not in their nativetongues. Personalisation in IR is addressed both in theuse of user modelling techniques to alter the behaviourof IR systems, and also through the creation of hybridAdaptive IR systems, which combine research in AdaptiveHypermedia with traditional IR. The focus of DCM2 is onthe metadata required by systems to provide this moreintelligent behaviour. DCM2 includes work on generating,managing and linking structured knowledge in the formof ontologies. The main focus of this work is in addressingthe shortcomings in current work on creating and sharingmetadata between different intelligent systems. Finally,DCM3 focuses directly on the improvement of multimodalAdaptive Hypermedia. This work includes applyingAH techniques in tandem with multi-modal approaches,allowing, for example, for speech synthesis to be usedwhere it improves the behaviour of the system.Prof. Vincent Wade and Dr. Saturnino Luz of TCD with Prof.Mike McTear of the University of Ulster.32 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Year 2 ProgressThe first year of DCM was focused on ramp up ofresearch staff. Year 2 has focused on progress in researchinto the three research areas in IR, AH and knowledgemanagement research. The focus of Year 2’s activities hasbeen on creating research systems which contribute to thestate-of-the-art, and which can be integrated at differentstages into the demonstrator systems.Query Enhancement Personalisation (DCM1)For query enhancement, the Year 2 effort has progressed toresearch and development of new techniques and modelsfor the personalisation and contextualisation of queries. Thisis being achieved by enhancing the original user-generatedqueries to produce more accurate and informed queries,which would lead to more effective user responses. Thechallenge in this area of query adaptation is to captureappropriate user/context models for the user’s queries.This can provide a genuine empowerment of the user byfacilitating better query formation and execution withoutthe user needing to expend effort in query optimisation.Several techniques are being prototyped including relevancefeedback (direct and indirect) as well as direct user controls.Key prototypes and experiments in this area include:Design and Coordination of a PersonalisedMultilingual Information Retrieval DemonstratorScenario: This scenario examines multilingual informationaccess and presentation techniques in a customer supportcontext based on Microsoft Office help documentation.Automatic Methods of Query Expansion: In order toimprove automatic methods for query expansion, a studyof user behaviour in query reformulation based on querylogs has been conducted. Results of this study will allow fora better personalisation of search and for improving automaticquery expansion. Also in this area a machine learningapproach has been developed and applied to automaticallyclassify blind feedback terms (e.g. query expansion terms)into “good” and “bad” terms. Using the classifier to selectonly “good” terms significantly improves mean averageprecision (MAP) for German and English IR experimentscompared to experiments without query expansion andto experiments using traditional blind relevance feedback.Investigation of Indexing Methods: Other researchto improve IR effectiveness include investigatingdifferent indexing strategies (e.g. indexing of sub-words),comparing state-of-the-art IR models (e.g. languagemodelling and BM25), and applying document expansionand document reduction.Latent Document Re-ranking: A novel documentre-ranking method was developed based on LatentDirichlet Allocation (LDA) which exploits the implicitstructure of the documents with respect to original queries.Rather than relying on graph-based techniques to identifythe internal structure, the approach tries to find the latentstructure of “topics” or “concepts” in the initial retrievalset. Then the distance between queries and initial retrievalresults based on latent semantic information deduced iscomputed. Empirical results demonstrate that the methodcan comfortably achieve significant improvements overvarious baseline systems.Structured Document Search: Many documentsconsist of multiple fields describing different facets of thedocument. Searching collections of such documents canbe improved by taking account of these fields as part ofthe retrieval process. This work has investigated structureddocument search in the context of patent retrieval. Thiswork is now being extended to search social mediacharacterised by content fields being augmented bysocial annotations by multiple users of the content.Initial IR Experiments: have been conducted fordifferent search scenarios (e.g. image retrieval, patentretrieval, ad-hoc IR) and evaluated in internationalevaluation competitions/events (e.g. CLEF-IP, WikiMM,GikiCLEF, LADS, TEL) or on standard benchmark data. Thesearch situations can be characterised by different typesof questions and/or documents (e.g. user-generated tagsfor images; long, structured documents in patent search,short structured metadata for bibliographic data; mediumlength newspaper articles without additional metadata).Also, the IR experiments have been conducted for differentmonolingual settings (for English, German, French, Hindi,Bangla, Marathi) as well as cross-lingual IR with differentlanguage pairs.Investigation of Document Expansion: The mismatchbetween documents and queries can be a significantproblem for effective information retrieval. This isparticularly the case for short documents or documentsdescribed only by a limited set of keywords. This researchis exploring methods to use external document resourcesto expand the set of words used to describe a document toincrease the chance of matching between query words andthe words in relevant documents. The approach developedin our work has initially been shown to be effective ontext-based image search applications where photographsare described only by a limited set of manual keywords.This technique has potential applications for automaticallyand semi-automatically developing document metadata.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>33


Research Strand Overview: Digital Content Management (DCM)Model and Metadata Generation forPersonalisation and Localisation (DCM2)For the second DCM area of research, progress hasfocused on the research into (semi) automatic generationand management of domain models, metadata andsemantic descriptions of digital content. The research hasyielded early prototypes in relation to tools for annotationof content, analysis of content, model generation andsupport for metadata annotation. Additionally researchhas progressed on the mapping between different(multilingual) content domain models or ontologies, andearly prototypes have been developed.Design and Coordination of Demonstrators Scenario(for Demonstrator 2): DCM has led the definition andinitial designs of scenarios for the PMCC Demonstrator for<strong>CNGL</strong> (described later in the report). This demonstratorconsists of several interlinked scenarios illustrating futurelocalisation and personalisation of Customer Care usingcorporate information. DCM is leading four of thesescenarios for the <strong>CNGL</strong> demonstrator. In addition DCMhas, in close cooperation with industry partners, identifiedand gathered large corpora of ‘Customer Care and ProductInformation/Documentation’ from multiple sources,over which initial prototypes will be evaluated. One suchscenario is the Annotation scenario which is examiningsemi-automatic and manual, crowd sourced, annotationtechniques as part of a pipeline covering content sourcingand preparation to adaptive delivery. This pipeline willfunction as a service-based architecture utilising linked datatechniques. The content for this scenario will be drawnfrom Symantec product support documentation anduser forums.Development of Index/Browser for Google 1Tn-gram Corpus: DCM 2 uses the Google 1T databaseof web n-grams as a large text corpus for developinginformation extraction tools and techniques. An initialapplication, called Idiom Savant (originally called GoogleGoggles, before Google applied that name to anotherproduct) provides a structured index for these n-grams thatallows fast retrieval of matches for given patterns. Layeredon top of this index, a variety of semantic filters havebeen constructed that allow users to perform semanticinformation retrieval. ‘Idiom Savant’ is now a functioningcreative thesaurus, one that allows users to retrieve phrasesthat concisely express a given meaning. In addition, IdiomSavant supports the creation of user-defined semanticcategories, and what cognitive psychologists call “ad-hoccategories”. This is a recent development that is intendedto be leveraged further, to allow users to organise theirown semantic categories into ontologies, and to share theircategories with others.Development of Initial Relationship Extraction/Validation System: Using the above index of theGoogle n-grams, an online system called Mondrianhas been developed that extracts basic relationshipsbetween strongly associated ideas. Mondrian uses theserelationships to support a degree of analogical reasoning.ManagementToolAnnotation Store3. Metadata entries created for slices 6. Store annotationsSlicer & Content Store4. Slices sent for annotation on demandAnnotation Client2. Content is seperated into slices5. Add annotationsHarvester1. Web content is harvestedAnnotatorsForum SiteDocumentationFigure 8: Annotation PipelineKnowledgebase34 Centre for Next Generation Localisation (<strong>CNGL</strong>)


In addition, a system has been developed to serve as anonline editor, called EdMond, to allow users to contributeto the Mondrian knowledge-base. Unlike other volunteerdrivenknowledge-acquisition efforts (e.g. ConceptNet,MindPixel), EdMond uses corpus-derived constraints toensure that users are presented with sensible options forchange, rather than allowing users free-rein to make theirown changes.Acquisition of Large-Scale Test-Set to EvaluateExtraction System: Using EdMond, we have acquired alarge body of relationships (approx. 20,000) that will, in thenext step of the project, allow us to test various hypothesesregarding our approach to extraction.Acquisition of Text Corpora of Product Reviews(Cameras + Others): Three corpora of product reviewshave been harvested from the web, for digital cameras,laptops and mobile phones. DCM researchers plan touse these review corpora to develop and test techniquesfor extracting ontological information about productsfrom online reviews of those products. DCM2 has alsoconstructed an initial OWL ontology for digital cameras, toact as a gold standard for ontology acquisition techniques.Semantic-Oriented Cross-Lingual OntologyMapping (SOCOM): This framework was designed andimplemented. The framework aims to facilitate ontologymappings that are carried out in multilingual environments.The first experiments have been successfully completedusing English, Chinese and French ontologies. Resultsfrom these experiments were accepted for publicationand presented at the prestigious Asian Semantic WebConference (ASWC) in December <strong>2009</strong>.Change Operator Calculus for CustomisableOntology Evolution: Research commenced on aneffective ontology change management approach.Progress to date has involved the development of acustomisable layered, pattern-based change operatorframework for ontology evolution and contentmanagement.Dynamic Composition for ContentPersonalisation and Localisation (DCM3)In this third area, progress in DCM has focused on developinga baseline system to support the automated composition ofopen corpus digital content. This system incorporates TCD’sAdaptive Engine (AE). A second aspect of the research inthis area was to put in place a baseline personalisationinfrastructure to adaptively inform external systems.The final aspect of research was to provide the initialexperimentation in the Multilingual Social Networkinginfrastructure and services which will become part of thethird demonstrator scenario in <strong>CNGL</strong>. The multilingualadaptive services researched are being utilised within the<strong>CNGL</strong> Education & Outreach programme to stimulate interestand interaction with <strong>CNGL</strong> technologies and research.Key prototypes and experiments in this area include:Design and coordination of the Adaptive PersonalisedMulti-Lingual, Multi-Modal Customer Care DemonstratorScenario (as part of the PMCC Demonstrator). This scenarioexamines the use of speech synthesis, machine translationtechnology and personalised presentation techniques tosupport a customer in a hands-busy, eyes-busy situation.DCM’s research has focused on the customisation of theAE adaptive engine to integrate the personalised delivery ofspeech with multimedia content and multilingual text (content).Language Trap Game: DCM developed a personalisedgame engine for supporting multilingual game interaction.The first game developed for the personalised gameengine is called The Language Trap. It is an adaptive,educational game for teaching German to post-primaryschool students and has been trialed in several schoolsand classrooms in Dublin. The game executes an adaptivedialogue-based adventure game for secondary schoolstudents suitable for study in the Leaving CertificateGerman Oral course. Evaluation results have been highlysuccessful and this research is being used as part of the<strong>CNGL</strong> E&O activity.Content-Driven Change Discovery and ImpactDetermination in Evolving Ontologies: As contentchanges impact on the structure and semantics of targetand dependent ontologies, this research has focusedon the detection of content-driven changes on domainontologies and the determination and analysis of impactsof the changes on target ontology and content.Neil Pierce, PhD student at TCD demonstrating theLanguage Trap game for learning German<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>35


Research Strand Overview: Digital Content Management (DCM)Social Networking (Personalised, Multilingual):MyIsle, part of the Personalised Multilingual SocialNetworking (PMSN) demonstrator platform has beendesigned with three core parts:• Applications: showcase the <strong>CNGL</strong> research areasas applied to social networking utilities – impacton demonstrator series and E&O activities;• Development Zone: using open service APIsto make available certain <strong>CNGL</strong> functionalitiesto sanctioned developers under the alt|startincubation programme – impact on E&O andcommercialisation activities;• MyIsle.org: a configurable social networkingapplication for cultural diversity – impact onE&O activities.Social Networking Based Multilingual Services:Progress was also made in the development of severalsocial networking based multilingual services. In particularan automated Twitter translation service called Twanslate,which will be part of the MyIsle social networking platformwas developed. Twanslate, using multilingual translationand personalisation technologies, allows users of theTwitter social networking site to translate tweet streamsdynamically into multiple languages and rate the qualityof the translation. Beta testing will commence as part ofnext year’s DCM 3 research.Social Networking Analysis and Recommendation:Another significant development for the year was inthe area of multilingual social networking research. Inparticular the research was performed via internationalcollaboration between the DCM group in TCD, Dr.Alexander Troussov (<strong>CNGL</strong> Industry Partner IBM) andProf. Peter Brusilovsky (<strong>CNGL</strong> External Advisor fromthe University of Pittsburgh). The research undertakenfocused on the application of spreading-activation theoryto collaborative tagging on the social web. The resultsof this work were published in the proceedings of theACM Workshop on Recommender Systems and the SocialWeb. Denis Parra, a postgraduate student from the PAWSgroup at the University of Pittsburgh undertook theimplementation work as part of an internship with TCDand was supervised by DCM researchers.Other Relevant Work in the Field andHow This ComparesThe <strong>CNGL</strong> DCM IR research has extended the state-of-theartin several areas. In the first instance, novel methodsfor achieving effective query expansion have been trialled.The DCM IR research encompasses a range of systemsand research, including work which has recently beencompleted that will provide the basis for a novel systemthat can personalise queries and alter how results aredisplayed based on the cultural attributes of a user, and ontheir specific personalisation needs. DCM research is at orbeyond the state-of-the-art in many of the main areas of IRresearch, including indexing, query expansion, latent resultre-ordering and personalised multilingual retrieval.Knowledge modelling and management research inDCM addresses the state-of-the-art in the creation,management and interlinking of structured knowledge innew ways. This includes the work on n-gram and corpusderivedconstraints for ontology generation, as well as theinvestigation of semi-automatic and automatic methods foreliciting information from documents for ontologies. Crosslingualontology mapping is a novel application of ontologymapping, with a focus on the complex implications forontologies which differ not only in content and structure,but also in language. Further state-of-the-art work isbeing undertaken in the area of generating a modelwhich permits for change management and propagationin ontological knowledge. This has been identified as akey requirement for large-scale adoption of ontologies incomplex knowledge environments.Building on extensive experience in state-of-the-artresearch in adaptive hypermedia and personalisation, theDCM3 work package in particular has been improving andexpanding the functionality of the TCD Adaptive Engine(AE). This system is a general adaptive engine, capable ofrendering complex adaptive presentations from a widevariety of heterogeneous models. The AE system is beingdeveloped to work in collaboration with ILT MT and SpeechSynthesis technologies, to push the boundaries of researchon multi-modal adaptive presentation. Further work isbeing undertaken on the use of open corpus techniquesto supply Adaptive Hypermedia systems with contentsourced from the Internet. This addresses a key deficiencyin previous adaptive hypermedia research: the requirementfor richly-described, hand-created content for each specificadaptive system. The issue of higher-volume adaptivity hasfurther been addressed in a novel approach to blendingDCM1 IR techniques, DCM2 knowledge managementand DCM3 adaptivity to create a flexible, high-volumeadaptive-IR system that presents users with a rich adaptiveexperience.36 Centre for Next Generation Localisation (<strong>CNGL</strong>)


IndexingIndexRetrievalDocumentsCollectionReaderPre-ProcessingIndexingIREngineRe-Ranked ResultsFigure 9: Latent Re-ranking IR SystemLatent Document Re-Rankerαβ ϕθεωRanked ResultsQueriesAchievementsDCM has so far completed one invention disclosure andis in preparation for two further invention disclosures inYear 3. The invention disclosure already filed, on ‘latentdocument re-ranking’, was made on behalf of DCM TCDresearchers in the DCM 1 area (query expansion).Individuals working in DCM have also had notableinternational recognition of their work: Prof. VincentWade was invited to give the keynote speech at theUser Modelling, Adaptation and Personalisation (UMAP)<strong>2009</strong> conference, held in Trento, Italy. UMAP is themajor international conference held annually focused onPersonalisation and was formed from merging multipleinternational conferences into one flagship conference.DCM had one successful PhD awarded in the year <strong>2009</strong>,(Séamus Lawless) and one PhD submission which isawaiting its viva (Alexander O’Connor).DCM has had significant success in the ad-hoc and logtracks at the Cross Language Evaluation Forum (CLEF)<strong>2009</strong>, held in Corfu, Greece. The research presented atthis event was the result of collaboration between DCM1 team members in both TCD and DCU. DCM conductedexperiments into several monolingual and bilingual tasksusing the TCD IR system. The results ranked among thetop 5 in all CLEF participants (13 groups, 10 countries,231 runs): in the language combinations: English-Germanbilingual (2nd place), German-Englishbilingual (3rd place), German-French bilingual (4th place), Englishmonolingual (3rd place) Germanmonolingual (5th place). The TCD-DCU joint IR system ranked as follows:German monolingual (4th place),English monolingual (5th place),German-English bilingual (4th place).DCM Post-Doc Dr. Seamus Lawless of TCD in discussion with Prof. Jaime Carbonellof Carnegie Mellon University.DCM also published many papers inboth international conferences andjournals, including ACM HypertextConference <strong>2009</strong>, UMAP <strong>2009</strong>, SIGIR<strong>2009</strong>. DCM researchers are also partof the organising and programmecommittees on leading internationalconferences e.g. SIGIR, UMAP,HyperText, IEEE NOMS, IEEE IM.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>37


Research Strand Overview: Digital Content Management (DCM)PlansWith the second year of the project, the PersonalisedMultilingual Customer Care Use Scenario (Figure 10) hasprovided a key framework for advancing the executionof DCM research. The PMCC system has at its heartthe challenges of DCM research, and provides a set ofindustry-driven use cases for the application of <strong>CNGL</strong>technologies. The development of the four DCM-ledPMCC demonstrator scenarios will be supplementedwith additional industrial demonstrator scenarios asnecessary, as well as a platform for showing the additionalcontributions to the state-of-the-art which DCM researchwill create during the year.Preparations will begin at the beginning of Year 3 for thenext iteration of the PMCC Use Scenario, which will includeimproved and deepened collaboration between <strong>CNGL</strong>work packages, industrial partners and other research.A key theme which has been added to the PMCC, andwhich will continue in future work, is the intention toleverage crowd-sourcing and user-generated contentfor personalised content.Industry Engagement<strong>CNGL</strong> DCM has engaged strongly with the industrialpartners. There have been specific collaborations withSymantec and Microsoft under the framework of thePersonalised Multilingual Customer Care demonstratorscenarios, with a view to incorporating real-world usecases and customer care content into the vision ofnext generation digital content management. Furthercollaboration has been undertaken with IBM, with aspecific view towards the IBM LanguageWare technologies,including the LanguageWare workbench product andIBM Galaxy analysis tools. DCM has also continued topresent to potential new <strong>CNGL</strong> industry partners, workingwith the centre management to attract further industryinvolvement. There are extensive plans to expand anddeepen the industrial collaborations in DCM, includingextending new scenarios for demonstration based onuse cases driven from industrial needs, as well as locatingnew potential industrial partners who have specific digitalcontent needs that can only be addressed by the <strong>CNGL</strong>DCM research team.The DCM team is also in the course of planning thedevelopment of the Personalised Multilingual SocialNetwork Use Scenario (PMSN) through core research insocial networks, as well as through the MyIsle platform.2. The user’s query isanalysed to determinetheir information needand preferences fromthe user model1. The user registersa query with the <strong>CNGL</strong>enables site, througha chat or forum postQueryAnalysis3. Content relevant tothe user’s need andtheir preference isretrieved?Retrieval6. The presentation caninclude speech-basedresponses correspondingto the user profileAdaptivePresentationUser ModelCompositionTranslation4. Retrieved contentis translated andlocalised asnecessary5. A personalisedmulti-modal presentationis then generated for theuser based on their queryFigure 10: Personalised Multilingual Customer Care (PMCC) process38 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in hereSystems Framework (SF)The integration objective is to produce a system services architecture anda system design methodology that supports the integration of linguistictechnologies, localisation workflow and digital content management.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>39


Strand Name: Systems Framework (SF)Area Co-ordinator: Saturnino LuzParticipant Names & AffiliationIndustrial CollaboratorsMr. Takeshi FukunagaMr. Donncha Ó CróinínMr. Dag SchmidtkeDr. Alexander TroussovDai Nippon PrintingTraslánMicrosoftIBMInternational CollaboratorsDr. Alistair EdwardsDr. Masood MasoodianProf. Michael McTearProf. Chris MellishUniversity of York, UKThe University of Waikato, New ZealandUniversity of Ulster, UKUniversity of Aberdeen, UKFacultyProf. Julie Carson-Berndsen University College Dublin SF1Dr. Gavin Doherty Trinity College Dublin SF1, SF2Prof. Josef van Genabith Dublin City University SF2Dr. David Lewis Trinity College Dublin SF1, SF2 leaderDr. Saturnino Luz Trinity College Dublin SF1 leader, SF2Mr. Reinhard Schäler University of Limerick SF1, SF2Prof. Vincent Wade Trinity College Dublin SF1, SF2Prof. Andy Way Dublin City University SF2Postdoctoral ResearchersDr. Kevin Feeney Trinity College Dublin SF2Dr. Nikiforos Karamanis Trinity College Dublin SF1Dr. John Keeney Trinity College Dublin SF2Dr. Ielka van der Sluis Trinity College Dublin SF1Dr. Dominic Jones Trinity College Dublin SF2PhD StudentsMr. Stephen Curran Trinity College Dublin SF2 (part-time)Mr. Zohar Etzioni Trinity College Dublin SF2Mr. Dominic Jones Trinity College Dublin SF2Mr. John McAuley Trinity College Dublin SF2Mr. John Moran Trinity College Dublin SF2Ms. Anne Schneider Trinity College Dublin SF1Mr. Stephan Schlögl Trinity College Dublin SF1Mr. Christos Tsarouchis Trinity College Dublin SF2Research AssistantsMr. Stephen Curran Trinity College Dublin SF2 (part-time)Ms. Ilana Rozanes Trinity College Dublin SF1Funding<strong>2009</strong> funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €454,457SFI SRC, Title: Federated Autonomic Management End-toend(FAME), Awarded Dec 2008, David Lewis supervisingone funded PhD student.2010 expected funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €531,2162010 expected funding from other sources:Awaiting funding review on EU FP7 European ResearchCouncil Starter Grant <strong>2009</strong> application by David Lewis,Title: Self-Management in Online OrGanisations (SMOOG),budget €1.4MAwaiting funding review on EU FP7 IST STREP, Title:Engineering Trustworthy and Privacy Aware Mobile WirelessSensor Network Infrastructures (eTaPAS), budget €250K40 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Research Strand Overview: Systems Framework (SF)The SF track investigates and develops methods to ensure that NextGeneration Localisation systems can be successfully integrated andmeet high standards of usability. SF’s goals can be described in terms oftwo overall objectives: a research objective and an integration objective.The research objective is to develop novel interactiondesign techniques and system support to the developmentof speech and language enabled applications. This willbe achieved through the investigation and evaluationof: instrumentation of localisation processes to monitorprocess quality improvement; human factors assessmentin localisation processes and ILT-based systems in general;analysis of contextual and culture-specific factors thataffect the design of speech and language enabled systems;assessment of modality combinations and their impact onusability of multi-modal systems; and management of thequality control process by stakeholder communities.From a systems architecture and integration perspective,the methodologies employed in order to address the needsof <strong>CNGL</strong> involve enabling rapid, iterative and instrumentedintegration of industrial software and academic researchprototypes and supporting their evaluation throughprovision of:• A Software Integration Platform based on openstandards, specially for web services (WSDL andBPEL) and localisation data (XLIFF, TMX, TBX);• Guidelines and tools for developing workflowsand applications using this platform.The integration objective is to produce a system servicesarchitecture and a system design methodology thatsupports the integration of linguistic technologies,localisation workflow and digital content management.This will enable rapid, iterative and instrumentedintegration of industrial software and academic researchprototypes and support their evaluation through provisionof: a software integration platform based on openstandards, guidelines and tools for developing workflowsand applications using this platform, and methods foriterative prototyping and user studies.The Fundamental Research Barriers andMethodologies to Address ThemAlthough the interaction design, human factors andsoftware engineering issues investigated in this trackarise in many application contexts, there has not been aconcerted attempt to investigate them in the context ofemerging language and localisation technologies. The scaleof the application domains we target and the novelty ofthe technologies which define our design and integrationspaces can therefore be regarded as the fundamentalresearch barriers to be overcome by the SystemsFramework track.As the project develops towards more exploratoryresearch, SF will develop novel service-oriented systemssupport to progressively manage the quality of NextGeneration Localisation applications that are composedof language technology, digital content managementand localisation workflow management services. This willbe achieved through the investigation and evaluation ofsystem support for:• Monitoring localisation services to deliver atomicand composite service quality management;• Integrating human quality assessment by contentconsumers into localisation and digital contentmanagement services;• Management of the quality control process bystakeholder communities.From a design and human-factors perspective,SF has adopted methodologies ranging fromethnomethodologically-informed ethnography for thestudy of the work of translators and post-editors in situ toexperimental methods for evaluation and design involvingnovel modality and technology combinations in exploratoryscenarios.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>41


Research Strand Overview: Systems Framework (SF)Other Relevant Work in the Field andHow This ComparesAs natural language processing technologies mature,the areas of software engineering and (perhaps to alesser extent) interaction design have received increasingattention from the language technologies community.Evidence for the growing importance of softwareengineering and integration issues in this field is providedby the fact that the latest editions of the conference ofthe Association for Computational Linguistics (ACL 2008and ACL <strong>2009</strong>) held workshops on Software Engineering,Testing, and Quality Assurance for Natural LanguageProcessing. <strong>CNGL</strong> was represented at last year’s workshopwith a paper which outlined SF’s perspective on webserviceintegration for localisation applications. In theareas of human computer interaction (HCI) and interactiondesign, speech-based interfaces have traditionally receivedthe greatest degree of attention from the user-interfaceresearch community. In general, however, we haveobserved and reported at iHCI <strong>2009</strong> the gap between theresearch on HCI and language technology. The researchbeing developed in this track aims at bridging this gap.Demonstrator systems, through actively collaborating withmultiple teams of researchers from all tracks, academicpartners and industrial partners in order to establishdemonstrators across the Bulk Localisation Workflow(BLW) and Personalised Multilingual Customer Care(PMCC) use scenarios. In addition, SF2 conducted researchinto service management semantics and collective selfmanagementof Next Generation Localisation processes.Jointly, and in collaboration with Symantec and twolanguage service providers, SF1 and SF2 have studied andmodelled mechanisms for providing flexibility and sharedawareness in localisation workflows. This work is reportedin an article submitted for publication in the InternationalJournal of Localisation. In addition, SF2 organised and ledthe demonstrator development effort across the CSETand performed the successful integration of multipledemonstrator systems.In addition to the paper presented at the ACM CHI <strong>2009</strong>conference, SF1 researchers have reported results onthe use of language in multimodal systems in virtualenvironments at IVA <strong>2009</strong> and at the preCogsci’09workshop, as well preliminary results of ethnographicstudies at the Irish HCI conference. SF researchers alsopresented papers related to the theme of this strand,extending work done prior to the start of the project,at the ACM Multimedia conference and at the ACMInternational Conference on Multimodal Interfaces.Stephan Schlogl, PhD student at TCD discussing his work onHuman Computer Interaction with J.J. Collins of UL.SF1 has interacted with members of the ILT and LOCstrands, SF2 researchers, and industrial partners in astudy of awareness, communication and collaboration inlocalisation projects. The SF1 group has also interactedwith the DCM strand through participation in thedemonstrator activities and internal workshops.Other activities included:AchievementsThe main accomplishments in the past year reflect theabove described perspectives. The Interaction Design workpackage (SF1) has established close collaboration with usergroups and stakeholders, including localisation customers,language services providers, and language learners. Itperformed observational studies of these groups andestablished the foundations for empirical investigation ofextra linguistic factors relevant to the success of speechand language enabled applications, including work onuser perception of automatically-generated languageoutput and cross-cultural issues in virtual environments.The efforts of the System Services Architecture workpackage (SF2) have focussed on system integrationand software development in support of the <strong>CNGL</strong>• Observational studies of various user groups insituations where potential uses for languagetechnologies have been identified.• Continued study of Multimodal interaction invirtual environments: evaluating user perceptionof referring behaviour in scripted dialogue inEnglish and Japanese. In collaboration with theNational Informatics Institute (NII) and Dai NipponPrinting (DNP).• Development of a rapid prototyping environmentfor language technology systems with support forWizard-of-Oz studies.• Wizard-of-Oz study of the effectiveness ofdifferent machine translation strategies ininteractive information-seeking dialogues.42 Centre for Next Generation Localisation (<strong>CNGL</strong>)


• Continued work on assessing the affect oflanguage studies to obtain metrics for applicationin adaptive (e-Learning) systems, in collaborationwith Prof. Chris Mellish (University of Aberdeen).• Continued work on collection and annotation ofreferring expressions in multimodal dialogues toinform multimodal applications, in collaborationwith Paul Piwek, Albert Gatt and AdrianBangerter.• Study of affective responses to different textoutput generation strategies, involving over45 subjects in trials conducted at TCD.The specific integration goals for <strong>2009</strong> were first tocomplete the implementation of the Year 1 DemonstratorSystem and coordinate its presentation at the March<strong>2009</strong> programme review. This resulted in a demonstrationsystem that was able to demonstrate a web-based,service-oriented approach to the integration of machinetranslation, text analytics and crowd-sourcing servicesinto both custom web-based applications and existingcommercial tools as summarised in Figure 11. This requiredthe development and testing of service software that isthen deployed for demonstration purposes and to make itsfunctionality available for partners’ research activities.Figure 11 summarises the various software componentsdeveloped and the components and content provided byboth academic and industrial partners, as presented in the2008 <strong>Annual</strong> <strong>Report</strong> and <strong>2009</strong> review.SF2 also led the coordination and integration of the year2 Demonstrator systems for both the Bulk LocalisationWorkflow (BLW) and the Personalised MultilingualCustomer Care (PMCC) use scenarios. This requiredfrequent coordination and collaboration with all tracks andpartners to define and build a viable set of demonstratorscenarios supported by a flexible but easy-to-integratesystems architecture. This included coordination of scenarioteams through regular meetings and video conferencesas well as templated documents on the <strong>CNGL</strong> wiki. Inaddition, David Lewis coordinated the activities of crosstrackworking groups formed to develop Meta-Datadefinitions, assemble and correlate Evaluation techniquesand gather user requirements and organise user trials. SF2collaborated with other tracks in the following individualintegration activities:• Refining the implementation of the web servicewrapper for the ILT1 MaTrEx MT component;• Working closely with engineers at Alchemy tointegrate the MT web service with their Catalystlocalisation product;• The development of a generic service forprocessing, recording and querying crowdsourcedratings and annotations of digital content forintegration with DCM2 content harvesting andslicing components;• Assisting ILT3 in developing a web servicewrapper for their text classification component;• Defining adaptive customer care trouble-shootingWorld ServerTranslationAuto ActionIntellectual Property KeyAcademic ForegroundAcademic BackgroundIndustrial Background3rd Party SoftwareD1 BPELWEB SERVICEORCHESTRATIONMT WS WrapperText AnalyticsWS WrapperFRAGMAText AnalyticsWS WrapperTextCatMT WSWrapperMoses MT PlatformMT MaTRexTranslationMemoryApache Tomcat/Axis SDKWeb Service PlatformSun Glassfish SDKBPEL Execution PlatformCommunityManagementCollTran UIALEX / LanguageExchange – TM ReuseBabelFishMTMSLive MTDrupal Web CommunityPortalFigure 11: Demonstrator 1 Workflow Design<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>43


Research Strand Overview: Systems Framework (SF)content and narrative for use with the DCM3adaptive engine;• Installing a version of the Y1 Demonstrator andproviding a training workshop on WSDL and BPELfor the LOC team at UL.• Supported E&O activities in rolling out anopen service innovation feature for the MyIsleportal upon which innovative applicationsusing <strong>CNGL</strong> technologies may then be publiclydemonstrated. A Twitter translation applicationwas implemented as an initial exemplar or thiscapability.In addition, SF2 maintained the <strong>CNGL</strong> internal wiki, anddeveloped this with additional collaborative functions inresponse to feedback from the partners. Also, throughDavid Lewis’ seat on the E&O committee in year 2, SF2developed a joint strategy for how <strong>CNGL</strong> technologiescould be exposed and promoted publicly by employingthe above open service innovation portal in an applicationdevelopment competition conducted on a social networksuch as FaceBook.In addition to the integration of research results arisingfrom the collaborative demonstrator integration work, thefollowing progress towards SF2 research goals has beenmade:• In developing an open web integration platformfor community-management of membership andstructure, rating and reputation management,a generic component for Voting or Options(VOoOP) has been designed, to allow userannotations of digital content to be capturedin a generic way using the standard ResourceDescription Format form for knowledgetriples, which then can be queried as linkeddata contained in distributed triple stores. Thisenables voting, rating and annotations to besupported through a general purpose serviceimplementation that can be integrated into avariety of multi-partner workflows. This advancesbeyond the state-of-the-art by allowing ratingand ranking information collected by oneorganisation or online community to be sharedas linked data with another. In addition, the useof Community-Based Policy Management as abasis for managing and linking different linguisticprocessing, rating or annotating communities onthe web has been advanced though evaluatingthe implications of semantic resource modellingin the authoring of organisational rules bycommunity members. This provides a base forfurther investigations into how business rules tocontrol linguistic processes that span multipleorganisations can be agreed and deployed in amanaged way.• In investigating a mechanism for the flexibleintegration of quality management semanticsfor service compositions, a detailed state-of-theartsurvey into the management techniques andstandards available for composite services hasrevealed a lack of interoperable mechanisms tomodel and process service-specific operationalmonitoring data. A semantic event middlewarefor collecting heterogeneous monitoring datafrom distributed components is undergoingevaluation with a view to adapting it to servicequality monitoring across <strong>CNGL</strong> demonstratorsthat can then trigger policy-based qualitymanagement rules. This work, which is conductedin collaboration with TCD researchers in theSFI Strategic Research Cluster on FederatedAutonomic Management End-to-End (FAME),supports the monitoring of the operation ofadvanced language and content processingservices operating in workflows that spanorganisational boundaries.• In researching generic management support forcollective self-management of ongoing consensusforming activities in online communitiesthe focus has been on developing reflectivewidgets that can be integrated into communityportals to allow participants to easily visualisethe impact of their collective behaviour onprogress toward collective goals. Appropriatelogging and visualisation techniques are beinginitially trialled through the <strong>CNGL</strong> collaborativeportal prior to deployment in trials with otheronline communities – one 300-strong voluntarycommunity has already been recruited for futuretrials. This work provides the platforms to studyhow appropriate management and visualisationfeatures empower online value networks such asthese in co-evolving the integration of their webportal capabilities with their organisational rulesand norms.44 Centre for Next Generation Localisation (<strong>CNGL</strong>)


PlansA Research Travel Supplement grant was sought fromSFI in order to enable an SF1 researcher to spend 3 monthsin Tokyo furthering the above mentioned collaborationwith DNP and National Informatics Institute, Tokyo (NII).While the SFI proposal was unsuccessful in qualifying forfunding, we have however secured funding from NIIfor a scaled-down joint project to be conducted inFebruary 2010.The SF2 integration goals will be pursued, first to ensurethe successful execution of the set of Bulk LocalisationWorkflow and Personalised Multilingual Customer Care usescenarios for review at the Scientific Committee meeting27-29 April 2010 (see Year 2 Demonstrator section).Subsequently, SF2 will continue to engage with tracks andindustry partners to advance these demonstrator scenariostogether with new ones from these areas and from thethird use scenario of Personalised Multilingual SocialNetworks.For the research goals, the VOoOP service, the servicequality monitoring service and the community behaviourreflection widgets will be refined and extended bothas part of Year 3 Demonstrator activities, and whereappropriate in separate user community trials.Overall, SF2 will continue to develop and document theSystems Services Architecture. This aims to enable rapid,low cost integration of component software servicesand their associated configuration, monitoring andlogging services into novel Next Generation Localisationapplications. It will be validated and refined through itsrepeated application in the design and implementation ofthe various demonstration scenarios.Figure 12 gives an indication of how the System ServicesArchitecture will mediate between different applicationsand component services encompassed by Next GenerationLocalisation. This includes generic approaches to theconfiguration, monitoring and logging of servicecomponents to facilitate integration of service qualitymanagement applications.GMSe.g.GlobalSightWorldServerCATe.g.Trados,CalalystWebClientClientMashupWebServerServerMashupApplicationServerGlassfish,WebSphereBPEL orYAWLOrchestrationWebServerAdaptiveEngineFaceBook,TwitterAppsSmartPhoneAppsNGL WorkflowQuality ManagerWorkflowConfigurationApplicationWorkflowMonitoringApplicationWeb Service Message BusMachineTranslationServiceComponentConfigurationInterfaceTextAnalyticsServiceComponentMonitoringLoggingInterfaceSpeechServiceComponentAdaptiveEngineServiceComponentBPELServiceOrchestrationLogQueryServiceComponentFigure 12: System Services Architecture<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>45


Research Strand Overview: Systems Framework (SF)Industry EngagementCollaboration with DNP and NII facilitated the projecton perception of language output style (in the case ofreferential expressions) and other factors by English andJapanese users in virtual environments. SF has maintainedfrequent contact with DNP with a view to identifying areasfor further cooperation. The work reported by Breitfuss etal. (<strong>2009</strong>) was done in collaboration with <strong>CNGL</strong> industrialpartners DNP and the NII, in Tokyo. DNP was also activelyinvolved in collaborating with Ielka van der Sluis andSaturnino Luz, having allocated a member of their staff,Junko Nagai, to the project. Ms Nagai’s input was essentialto this work and she co-authored the article. As regardsthe area of localisation, input from Symantec and a numberof their language service providers presented a betterunderstanding of current industry practices and processes.In situ observation of practices in these workplacesprovided extensive data which formed the bases forSF’s work on awareness mechanisms and workflows.SF researchers participating in the <strong>CNGL</strong> Autumn ScientificCommittee MeetingFrom a software engineering perspective, in addition tothe close integration with industrial partners requiredas part of the Demonstrator activities, SF has beenstrongly engaged with industry in: co-development (withAlchemy) in integrating the ILT1 MaTrEx MT componentinto the Catalyst 8.0 product, collaboration withVistaTEC and Traslán and ILT3 in identifying processes intheir translation review business (where the ILT3 stylebasedtext analytics can be applied), and collaborationwith Microsoft and Symantec in defining scenariosappropriate to their business in the Bulk LocalisationWorkflow and Personalised Multilingual Customer Careareas. The integration of MaTrEx into Catalyst 8.0 waspublicly demonstrated on the Alchemy corporate standat the Localisation Innovation Showcase held at DCUon 16 October <strong>2009</strong> and will form the basis of furtherinvestigations into the impact of confidence scores onin-context post-editing.46 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in hereNext Generation Localisation (LOC)There is no comparable scientific research project under way anywherein the world that would be conducting highly advanced research into thespecific key areas of localisation under investigation within the <strong>CNGL</strong><strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>47


Strand Name: Next Generation Localisation (LOC)Area Co-ordinator: Reinhard SchälerParticipant Names & AffiliationIndustrial CollaboratorsDr. Fred HollowoodMr. Anthony O’DowdMr. John PapaioannouMr. Phil RitchieMr. Dag SchmidtkeMs. Lori ThickeSymantecAlchemy Software DevelopmentTranslators Without BordersVistaTECMicrosoftTranslators Without BordersInternational CollaboratorsDr. Lynne BowkerMr. José Eduardo de LuccaProf. Patrick HallDr. James HoganDr. Kim WallmachUniversity of Ottawa, CanadaUniversidade Federal de Santa Catarina,BrazilProfessor Emeritus, Open University, U.K.Queensland University of Technology,AustraliaUniversity of South Africa, South AfricaFacultyDr. Jim Buckley University of Limerick LOC3Ms. Yvonne Cleary University of Limerick LOC1Mr. J.J. Collins University of Limerick LOC3Dr. Chris Exton University of Limerick LOC1Dr. Dorothy Kenny Dublin City University LOC2Dr. Liam Murray University of Limerick LOC2Dr. Sharon O’Brien Dublin City University LOC2Mr. Reinhard Schäler University of Limerick & LOC3 leader LOC1, LOC2Postdoctoral ResearchersDr. Dimitra Anastasiou University of Limerick LOC1Dr. Lamine Aouad University of Limerick LOC3Dr. Ian O’Keeffe University of Limerick LOC2ProgrammerDr. Eoin O’Conchuir University of Limerick LOC3PhD StudentsMr. Solomon Gizaw University of Limerick LOC3.3Mr. Rajat Gupta University of Limerick LOC3.5Ms. Madeleine Lenker University of Limerick LOC3.2Mr. Joss Moorkens Dublin City University LOC2.2Ms. Lucia Morado University of Limerick LOC1.2Mr. Aram Morera University of Limerick LOC3.4Mr. Naoto Nishio University of Limerick LOC3.1Mr. Ali Raza Khan University of Limerick LOC2.1Mr. Lorcan Ryan University of Limerick LOC1.1Funding<strong>2009</strong> funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €439,9822010 expected funding from SFI:<strong>CNGL</strong> (07/CE/I1142): €432,515<strong>2009</strong> funding from other sources:Microsoft IrelandFunding for MSc Student €28,00048 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Research Strand Overview: Next Generation Localisation (LOC)The main goal of the Next Generation Localisation (LOC) area isto build a standards- and web-based Next Generation LocalisationFactory by embedding internationalisation and localisation into the fullproduction process of multilingual digital content largely eliminatingthe need for human intervention. LOC1 addresses the need forstandards and guidelines for a localisation knowledge container; LOC2is developing performance measurements and evaluation frameworksfor the building blocks of an integrated localisation solution; andLOC 3 focuses on the discovery, development and implementation ofautomated workflows in a Next Generation Localisation framework.Current mainstream localisation scenarios are largelybased on static processes characterised by pre-definedworkflows. The tools and technologies employed in theseprocesses are based on closed standards and often lackeven basic interoperability. Metadata, the building blockof universal localisation knowledge, is often locked upin proprietary technology silos from where it cannot beextracted. Standardised localisation knowledge containersthat could do the round trip from content creation tolocalisation and back into the content creation processare not sufficiently supported by localisation tools andtechnology frameworks available today. Therefore, largemultinational digital content publishers have developedtheir highly proprietary environments relying on internal“standards” and proprietary technologies; for them,a move to join open standard-based technology andprocess automation development efforts leading to opengeneric solutions would be the ideal and preferred choice.However, it will require proof that the vision of such anopen and generic solution can be realised in order for themto join industry-wide initiatives to standardise and connectlocalisation technologies in automated workflows. Smallerpublishers do not have the resources to develop their ownproprietary solutions and rely on the third party technologymarket for the provision of adequate technologies.While individual tools and technologies are available andaffordable, the sophisticated and expensive localisationprocess and automation technologies available today areout of reach for most small and medium sized contentpublishers.The Fundamental Research Barriers andMethodologies to Address ThemIn order to convince large multinational content publishersto join open standards based industry-wide initiatives, andsmall and medium sized publishers to invest in state-ofthe-arttechnologies, what is required is a solution that isscalable, modularised, interoperable and affordable. Whatis required is a demonstrator system capable of deliveringproof that the vision of an open localisation platform canbe achieved. The risks involved in building such a systemare considerable. Leading global management systemshave been developed by companies such as Idiom andGlobalSight (Ambassador). However, while they aimedto be comprehensive they were not; for example, someservices such as MT never became part of the coreoffering of these systems. While they attracted significantinvestment, they never reached their projected marketpotential.Although they demonstrate a good understanding ofbasic technologies required for a “Localisation Factory”,significant research is still necessary to improve theiroverall architecture in order to provide a modularised andextensible framework, to enable seamless data flows, andto allow for the automatic configuration and executionof tasks. Given the enormity of this undertaking, LOCresearch activities will integrate and build on existingsystems, technologies and research results whereverpossible. Our aim is the development and the deploymentof a localisation technology platform similar to that of‘Moses’ in MT or ‘Festival’ in speech synthesis.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>49


Research Strand Overview: Next Generation Localisation (LOC)In LOC, research concentrates on the improvement of keyareas of localisation automation, such as the constructionof a data model to build, process and maintain localisationknowledge (LOC1), the evaluation and selection of suitabletools and technologies (LOC2) and the modelling ofintelligent localisation processes (LOC3). The availability ofan industry scale demonstrator system is a pre-requisite foradvancing this research and for measuring its success.In early <strong>2009</strong>, Welocalize released GlobalSight, one of theindustry-standard global management systems into theopen source domain, thus making available and accessiblefor the first time an industrial-scale, “heavy-lifting” basesystem for the development of an open localisationplatform. GlobalSight combines workflow definition andautomatic execution with access to localisation industrystandard applications and represents an investment ofmore than US$50m and consist of 1.5 million lines of code.This development represents a seismic shift in our abilityto remove a main barrier to our research efforts, i.e. thelack of an open, industry scale test bed and framework forthe deployment of component technologies developedwith LOC and the other <strong>CNGL</strong> research areas. We cannow concentrate on the individual scientific research tasksin LOC and the <strong>CNGL</strong> as a whole and plan to integratethese in a platform and an environment that meets mostof our requirements for a large scale testing, demonstratorand deployment framework. This does not eliminateall integration efforts, but these are quite manageablein comparison to the effort necessary to build a newframework from the ground up.Other Relevant Work in the Field andHow This ComparesThere are commercial efforts under way to developproprietary automated localisation platforms integratingprocess automation and management functionalitywith localisation and translation automation, such asterminology management, translation memory systemsand machine translation. Large multinational contentpublishers, among them Oracle, SAP and Microsoft, havedemonstrated the commercial viability of such solutions.However, they have also shown the limits of proprietarysolutions and have started exploring ways to connect theirproprietary systems with third party tools and technologies.There is no comparable scientific research project underway anywhere in the world that would be conductinghighly advanced research into the specific key areas oflocalisation under investigation within the <strong>CNGL</strong>, while,at the same time, developing a platform that allows theseamless connection and integration of complementarytechnologies into a core, functional and industrial-scaleplatform which itself is highly modular and extensible.LOC research on Collaborative Platforms50 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Name Organisation TypeXLIFF: XML Localisation InterchangeFile Format.OASISFile format. Document container forlocalisation data.TMX: Translation Memory eXchange. LISA OSCAR File format. Document container for translationmemories.SRX: Segmentation Rules eXchange. LISA OSCAR XML vocabulary for describing segmentationrules.GMX: Global InformationManagement Metrics eXchange.LISA OSCARDifferent metrics to evaluate volume,complexity and quality of G10n and L10n tasks.TBX: Term-based eXchange. LISA OSCAR XML-based framework for representingstructured terminological data.ITS: Internationalisation Tag Setting. W3C XML vocabulary that “Defines data categoriesand their implementation as a set of elementsand attributes” to make better schemas for L10n.GlossML: Glossary Markup Language. Rodolfo Raya (MaxPrograms) File format. Document container for glossaries.Unicode Unicode Consortium Character Encoding Scheme.Table 3: Classification of most important localisation standardsAchievementsWork-package LOC1The overall aim of LOC1 is to embed internationalisationand localisation issues into the design and developmentcycle of digital content production and to develop a datacontainer, a Localisation Knowledge Repository, based on alocalisation taxonomy that allows the storage, maintenanceand reuse of localisation-relevant data in this process.This work is closely linked to ILT and SF2 (data access,exchange and integrity issues). The work package is dividedinto two sections, LOC1.1 Digital Content Production forLocalisation and LOC1.2 Localisation Knowledge – Capture,Organisation, Use.Major achievements:• Active collaboration with, input to, researchand support of the localisation industry‘s mainstandard initiative under the umbrella of OASIS,the XML-based Localisation Interchange FileFormat (XLIFF) 1 . See a classification of mostimportant standards which directly affectthe localisation process and subsequently thetreatment of digital data in Table 3.• Collaboration with OASIS Open Architecture forXML Authoring and Localisation Reference Model(OAXAL) Technical Committee (TC) and OASISDarwin Information Typing Architecture (DITA)TC and OASIS DITA Adoption TC.• Collaboration with industry partners in Microsoftand Symantec as well as with The RosettaFoundation on the development of a datacontainer for localisation knowledge processingthroughout the digital content production andlocalisation process.• First top level specification for a contentcreation system based on internationalisationand localisation guidelines, best practises andstandards, ready to be developed for connectionwith the demonstrator system. It is designed toassist content developers with the process ofauthoring, enabling and testing digital contentfor international audiences (see Figure 13).• Collaboration with corresponding <strong>CNGL</strong>-widecomponent technology working groups.• First top level metadata and localisationknowledge container specification for the NextGeneration Localisation Factory, ready to beexecuted in a demonstrator scenario (see Table 4).The container will ensure that essential projectmetadata is retained throughout the entirelocalisation process, from source languagecontent development to multilingual contentpublishing.Dr. Jim Buckley and Dr. J.J. Collins of UL at the <strong>CNGL</strong>Autumn Scientific Committee Meeting1 http://www.oasis-open.org/committees/membership.php?wg_abbrev=xliff<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>51


Research Strand Overview: Systems Framework (SF)InputsXLIFF CONTAINERLocalisation knowledge RepositoryLinguistic Guidelines Technological Assistance Connectivity EnhancementsContentGuidelinesPresentationGuidelinesNavigationGuidelinesAccessibilityGuidelinesSpelling Character Sets Unicode KnowledgeSharing withDevelopersGrammer Browsers Correct Display KnowledgeSharing withOther AuthorsPunctuationWritingDirectionBi-di LanguagesConsistencey Inputs Keyboard,MouseControlledLanguageDocumentLayoutsMultilingualInfo RetrievalMetadataTools CMS Software CompoleGuidelinesTable 4: XLIFF ContainerLocalisationToolsTM Tools, QAToolsMultimediaWritingConventionsNew AccessMethodsHardware &OperatingSystemsInternationalisationToolsVideo, FlashAnimationText ProcessingMobiles,SpeechProcessingSpecialConsiderationsTestKnowledgeSharing withTranslatorsCommunicationSoftwareTesting Tools Pre Translation File SharingSoftwareXLIFF CONTAINERComprehensionof MaterialBest PraticesTranlationConsistencyInstantContact withDevelopersShare ProjectFiles inReal-TimeINPUTS WEB SERVICE OUTPUTSTechnicalWritersProgrammersAdministratorsContributorsDevelop theWeb ServiceEdit the Datain the LKRUpload Guidelinesto the LKRFigure 13: Localisation Knowledge RepositoryLocalisationKnowledgeRepository (LKR)XLIFF LanguagePack 1XLIFF LanguagePack 2XLIFF LanguagePack 3Interact withthe Web ServiceLinguisticChallangesDifficultieswriting forinternationalaudiencesDigital ContentDevelopersOvercomeChallengesChallenges in Developing Digital Contentfor International AudiencesTechnologicalChallangesDifficulties developingcontent that functionsin all locales, and canbe translated intomultiple languagesWebDesignersHelpAuthorsSoftwareDevelopersConnectivityChallangesDifficultiessharingknowledge52 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Work-package LOC2The overall aim of LOC2 is to understand to whichextent tools and technologies available today coverkey requirements of localisers and to recommend and,where appropriate, pilot alternative approaches. We havecurrently limited the scope of tools and technologies underinvestigation to automated translation and workflowcontrol tools and technologies. Research from this workpackage will feed back into ILT (development of automatedtranslation technologies) and SF2. The work package isdivided into two sections, LOC2.1 Technology Evaluation– The Process Perspective, and LOC2.2 TechnologyEvaluation – The User Perspective.Major Achievements:• Analysis of translation memory resourcesin relation to their perceived quality andconsistency, leading to a review of the commonlyheld view of translation memories as a source forhigh-quality bi- or multilingual parallel texts asinput to machine translation systems.• Collaboration with <strong>CNGL</strong> industry partners andnon-<strong>CNGL</strong> localisation companies reviewingtheir use of translation memory systems, andthe in-house production and maintenance oflanguage resources.• Installation and research use of industriallocalisation management systems, among themIdiom WorldServer, GlobalSight and CrowdSight,ontram and ]project-open[.• Collaboration with corresponding <strong>CNGL</strong>-widecomponent technology working groups.• Development of a blueprint for the evaluation ofautomated localisation and translation systems.Ms. Midori Tatsumi, PhD student at DCU presenting herwork on Evaluation of Machine Translation OutputWork-package LOC3The overall aim of LOC3 is the specification of threelocalisation scenarios with their associated workflowsranging from bulk (enterprise) localisation to multilingualcustomer care and personalised (consumer) localisationofferings, providing localised digital information inresponse to ad hoc individual localisation requests leadinginto and contributing to the unified scenario of the NextGeneration Localisation Factory. Research conductedwithin this work package includes the specificationand the development of demonstrator crowdsourcinglocalisation environments and platforms. LOC3 dependson the availability of technologies from ILT and DCM andis dependent on SF1 and SF2 for the delivery of adequatetest beds and demonstrator prototypes. LOC3 is dividedinto five sections, LOC3.1 Services Descriptor Development(Web Services); LOC3.2 Localisation WorkflowSpecifications for Bulk (Enterprise) Localisation; LOC3.3Localisation Workflow Specifications for the PersonalisedProduction of Digital Multilingual Information in aCustomer Care scenario and Ad Hoc Social Networking;LOC3.4 Mining Workflow Patterns (Transrouter); andLOC3.5 Collaborative Localisation Platforms.Major Achievements:• First localisation process descriptions.• Architecture document for the Next GenerationLocalisation Factory to be deployed for TheRosetta Foundation and to be used as a test bedfor LOC researchers and those from other areas.• Workshop with world-leading expert in one ofthe leading process description languages, YAWL,leading to an intensive period of collaborationwith our academic partner in Australia.• First experiments with basic web serviceimplementations.• Collaboration with our industrial partners, amongthem VistaTEC, on the discovery of localisationworkflows based on the review of real-lifeexecution logs.• Hire of additional programming resource for thedevelopment of a web-based collaborative platform.• Outline specification for the integration of theresults of the EU-funded Transrouter project intothe Next Generation Localisation Factory.• Collaboration with corresponding <strong>CNGL</strong>-widecomponent technology working groups.• Initiation of a Dynamic Coalition for theDevelopment of an Open Localisation Platformunder the aegis of the United Nations’ FourthInternet Governance Forum (IGF); the coalitionhas currently eight members from Asia, Africa,Europe and the Americas.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>53


Research Strand Overview: Systems Framework (SF)Plans for 2010The Next Generation Localisation area will work withThe Rosetta Foundation as well as with the IGF DynamicCoalition on the development and the deployment of anOpen Localisation Platform, supported by the SF1 andSF2 <strong>CNGL</strong> research areas. This platform will serve as atest bed for the research carried out in this work packageallowing it to demonstrate the viability and to measurethe improvements achieved in the localisation process.These improvements will be demonstrated and measuredin relation to particular tasks, e.g. MT and MT postediting,and in relation to the overall process, e.g. (re-)use oflocalisation knowledge and flexible workflow specification,supported by the platform.Each section in each LOC work package is associated withone particular aspect of this demonstrator and each willcontribute to an improvement in the performance of theoverall platform with component technologies from LOCsections connected to the localisation platform. This willenable us to measure the impact of these technologies onthe performance of the overall localisation workflow.The LOC researchtrack will support TheRosetta Foundationon the development and the deployment of The RosettaFoundation platform which, in turn, will provide highlyvaluable feedback from a concrete implementationscenario into the scientific research carried out within LOCand other <strong>CNGL</strong> areas.Once the platform becomes operational (expected forearly 2010), additional component technologies from other<strong>CNGL</strong> research areas will be integrated.CommercialRosetta<strong>CNGL</strong>CommercialRosetta<strong>CNGL</strong>CommercialRosetta<strong>CNGL</strong>ExternalTechnologiesThird PartyTechnologiesNewFile FormatsWin 7Open OfficeNewFeaturesUserCommunity InputNewFunctionalityExtra CommandGlobal Sight EditionDesktop ToolkitIndustry EngagementLOC has closely collaborated with its main industrial partners,especially with Symantec, VistaTEC, and Microsoft. Additionalcollaboration with international collaborators from TranslatorsWithout Borders also provided valuable input. Following theopen sourcing of GlobalSight and the establishment of TheRosetta Foundation as a spin-off from the University ofLimerick and <strong>CNGL</strong>, LOC also collaborated closely with TheRosetta Foundation and Welocalize. The engagement withindustrial partners happened through site visits and one-toonefocused meetings between them and LOC researchers.LOC supports the development of an open localisationplatform that will, in addition to serving as a test bedfor the research in the different work packages, providelarge multinational publishers with a solid case studyfor the viability of open standards for the negotiation oflocalisation data and localisation knowledge thus providingthem with the arguments necessary for a migration fromenclosed proprietary localisation scenario to a moreopen, interconnecting and interoperable framework. Thisplatform will also encourage the uptake of localisation andprocess automation solutions by small and medium sizedenterprises, create new business opportunities and supportthe upscaling of localisation offerings by smaller firms.More than 20 companies have so far joined theDynamic Coalition for a Global Localisation Platform:Localisation4all, initiated by LOC and The RosettaFoundation. The Coalition will organise a workshopat the fifth annual IGF Meeting in Vilnius, Lithuania,on 14-17 September 2010.We expect the platform to generate an increased activityin sectors of the localisation industry (some first indicatorsshow that growth by a factorof 100, in certain sectors, is notStrategicDevelopmentout of reach). Subsequently, weexpect employment to rise inthese sectors driven by a growthNewArchitecturein translation and localisationas well as in the technicalsupport and developmentarea. Among these will be aNew ComponentTechnologiessignificant number of positionsto be created by The RosettaFoundation within the nextModularisationtwo years.APIsWeb ServicesFigure 14: The Rosetta Foundation Platform Development Approach54 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Heading in hereYear 2DemonstratorThe demonstrator work balances the pursuit ofthe research directions set out in the original work-plan,with the agility needed to exploit the opportunities presentedby the emergence of new research collaborations, technicalinnovations and evolving business opportunities.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>55


Year 2 DemonstratorThe goals of the <strong>CNGL</strong> Demonstrator development activities are:• for the Demonstrator systems to be a vehicle for conductingand demonstrating collaborative scientific work between centrepartners;• to provide a means for demonstrating the relevance of <strong>CNGL</strong>research to industry and society;• to provide milestones for assessing the collective progress andoutput of the programme, in particular pan-project outcomes suchas the development of the Unified Localisation Factory;The Year 2 (Y2) Demonstrator advances beyond theBaseline Demonstrator System implemented for the BulkLocalisation Workflow (BLW) use scenario in Year 1 by:• exploring the BLW scenario more deeply (BLW+)• implementing a first iteration of the PersonalisedMultilingual Customer Care (PMCC) use scenario(renamed from the Personalised ProductionContent for Informal Learning to better reflectindustrial partner interests).Demonstrator TeamsEntering Year 2 in <strong>2009</strong>, near the full cohort of <strong>CNGL</strong>researchers are in place and work on many of theindividual PhD topics are entering initial experimentalphases. This resulted in an order of magnitude changein the cross-project engagement with the Demonstratoractivities and in the management complexity ofcoordinating the system development activities. Thisanticipated growth, together with feedback from the firstSFI Programme Review, resulted in the following changesbeing made to the organisation of the demonstratordevelopment:• A dedicated Project Manager role was created tocoordinate the organisation of parallel activitiesin the development of the Demonstrators andfor overall monitoring and reporting of <strong>CNGL</strong>research activities.• A Meta-Data group was established to coordinatethe collection and modelling of meta-dataacross the Demonstrator activities and ResearchTracks to support smooth data interoperabilityand file format flexibility within <strong>CNGL</strong>. Variouslocalisation-related standards are being comparedand this group aims to project a coherent datastrategy in localisation workflows.• An Evaluation group was established to gatherand collate the evaluation approaches used acrossDemonstrator activities and Research Tracks soas to enable the mapping of the evaluation ofindividual components onto the evaluation offull Demo systems.• A User Requirements and Test group wasestablished to coordinate investigations intothe needs of industrial partners and their usersand to use this to drive subsequent user-drivenevaluations and trials.• A decentralised, criteria-driven approach wasadopted through the formation of multipleparallel Demonstrator Sub-Scenario teams thatwere able to work in an efficient manner byfocussing on different aspects of the two usescenarios active in Year 2.• A Demonstrator Steering Group (DSG) wasestablished to coordinate these new subgroupstogether with the System Architecture activitiesof SF2. This group was chaired by David Lewis, theSF2 activity leader, as acting Project Manager, andhad participation from the above groups, the 4Research Tracks and industrial partners.56 Centre for Next Generation Localisation (<strong>CNGL</strong>)


MethodologyThe population of researchers and industrial partnersthat make up <strong>CNGL</strong> present a rich mix of research talent,advanced technical expertise, innovative ideas and realworldbusiness experience. As the main medium fordriving and promoting the collective activities and jointoutcomes of <strong>CNGL</strong>, the demonstrator work balancesthe pursuit of the research directions set out in theoriginal work-plan, with the agility needed to exploit theopportunities presented by the emergence of new researchcollaborations, technical innovations and evolving businessopportunities.To achieve this balance, an iterative approach to definingdemonstrator systems is followed. In Year 2, this beganby the seeding of collaborative teams of researchers, withacademic and industrial partners, to address the broad andcomplex problems set out in the two use scenarios to beaddressed in the Y2 Demonstrator, i.e. BLW+ and PMCC,through a divide-and-conquer approach. This resulted inthe formation of 13 initial Sub-scenario teams across <strong>CNGL</strong>tracks, research institutes and companies. The teams weresupported by horizontal business requirements gatheringby the User Requirements team, based on frequentconsultations with industrial partners and more in depthobservational studies of operational staff in customer careand localisation business units.Towards the end of Summer <strong>2009</strong>, the Demo SteeringGroup (DSG) consulted with the teams and industrialpartners to review the progress and structure of thesub-scenario teams, in order to consolidate their workinto a focussed set of Y2 Demo Scenarios according tothe following criteria:• Each Demo Scenario presents a business scenarioexplicitly supported by one or more industrialpartners or a clear commercialisation planidentifying potential markets;• Each Demo Scenario targets a clear scientificoutcome that spans multiple topics across <strong>CNGL</strong>research areas, highlighting collaborative researchcapable of yielding joint publications;• Demo Scenario teams have a realistic plan tocomplete system integration and evaluation forpresentation and system demonstration at the<strong>CNGL</strong> Spring Scientific Committee Meeting 2010;• The selected set of consolidated demo scenariosaddresses the set of demonstrator systemrequirements from the first programme review(site visit) response document and the originaltechnical annex of the CSET proposal.ILT DCM LOC SF D1 – Y1 ‘08BaselineBLWD2 – Y2 ‘09 D3 – Y3 ‘10 D4 – Y4 ‘11 D5 – Y5 ‘12BLW+PMCCBLW++PMCC+PMSNULFULF+BLWBulk Localisation WorkflowPMSNPersonalised Multilingual Social NetworkingPMCCPersonalised Multilingual Customer CareULFUnified Localisation FactoryFigure 15: Evolution of <strong>CNGL</strong> Demonstrator Systems<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>57


Year 2 DemonstratorAs a result of the review the following Y2 Demo Scenariosare being pursued:Bulk Localisation Workflow (BLW+) DemoScenariosIn-Context Post-Editing of MT Output: a webservice built upon the ILT1 MaTrEx SMT system has beenintegrated into Alchemy’s Catalyst Computer AssistedTranslation tool for in-context localisation. This provides aplatform for studying how the output of MT impacts onthe efficiency of post-editing using this market leading incontexttranslation tool. This work aims to show the impactof the representation of MT-generated confidence scoreson post-editing efficiency, drawing a direct comparisonwith fuzzy TM matching scores. The identification of aconsistent mapping between these two types of scorewill yield a grading system for MT confidence that reflectspost-editing effort in the same way as fuzzy TM scores,thereby directly informing the costing of the localisationprocess. To provide a stronger empirical basis for thiscomparison, eye tracking user evaluation is employed.Successful application of this evaluation technique will leadto more detailed evaluations of techniques to represent,for example, sub-segment translation and tag placementconfidence scores based on MT processing of sourcesegment tags and incorporation of term-base translations.Poster presentation of Demonstrator work on Eye Trackingto study MT Post-EditingDomain Tuned Processing of Translations: a textanalytics web service has been implemented using thetext classifier technology developed in ILT3, combinedwith the SMT Web service from ILT1. These can becombined to support localisation workflows in a numberof ways, including: Domain Classification for specialiseddomain training of SMT; selecting the best stylistic matchfrom multiple MT output as a hint to post-editors; useof stylometry to quickly route new translation jobs tospecific translators or reviewers based on domain or evenmother tongue identification and support for reviewingtranslation in absence of specific style guides based onstylistic comparison to sample target language text. Initialevaluations are being conducted with VistaTEC on thelatter application and evaluations on domain trainingMT are underway.Workflow discovery, visualisation and instantiation:this scenario aims to develop a decision support tool forlocalisation and translation projects by gathering criteriaused by experts from partners such as VistaTEC andSymantec, in designing a workflow for a particular project,as well as learning from previous and similar processesusing data mining. This work explores two approachesto supporting translation and localisation managers todecide about the most suitable route, or the implications ofalternative ones. The first uses hand-crafted models, or atop-down approach, which will be garnered from industryand academia experts through discussing and defining‘generic’ workflows for particular localisation processes.The second aspect is a bottom-up approach using datamining on previous workflow instantiations and logs withinitial data provided by VistaTEC. Also, extra-workflowcommunications, such as email queries, will be consideredin generating and integrating any associated extraworkflowtasks.Personalised Multilingual Customer Care(PMCC) Demo ScenariosAnnotation: in the PMCC domain, the existence of‘community’ forums where users can post queriesand problems is a rich source of potential customercare information that can be included in integratedlocalised customer care services. However, it is difficultto automatically recognise and adequately filter solutionswhich are appropriate, or even to distinguish whatparts of a solution discussed in a forum are relevant to aparticular user query. Many systems have attempted tosolve this problem by crowd-sourcing annotations for thosesolutions, in addition to the solutions themselves. This workaddresses how users can be motivated to annotate usergeneratedcustomer care content and how the resultingannotations might be stored, retrieved, represented andaggregated for ongoing leverage in matching user queries.The first component of this work is the ‘Vote or Opinion’(VOoOP) rating and voting system developed by SF2 andDCM1 which uses RDF Schemata and the linked-datamethodology to represent a set of ballots, which describethe ‘vote or opinion’ of particular annotators about aparticular topic. The Linked Data aspect is intended toallow the VOoOP system to expose large collections ofannotation information about a variety of topics in auniform fashion. This means that the resulting annotationcollections can be examined by third party aggregators58 Centre for Next Generation Localisation (<strong>CNGL</strong>)


and filters, as necessary. Users willuse this service to annotate overa range of user and professionallygenerated content, taken initiallyfrom the customer care knowledgebase operated by Symantec’senterprise customer care division.This content will be gatheredusing the DCM2 harvesting andslicing technology, and suggestedconsolidations of annotations willbe explored using the ontologymapping methodologies beinginvestigated in DCM3.Multilingual PersonalisedInformation Retrieval: thisscenario supports content discoveryin languages other than the user’s Figure 16: Screen shot of MyISLE Twanslator App at http://www.myisle.org/twanslatenative language. This approachenables queries in one language to be conducted against a customer care dialogue, including the user interfacecollections of content authored in other languages. In the design issues that surround such multi-modal adaptivity.PMCC domain, the existence of ‘community’ forums where The scenario uses the Adaptive Engine technologies fromusers in any location can post problems and solutionsDCM3 to adaptively render trouble-shooting dialoguesis an example of a rich source of multilingual customerin response to the user’s physical context, replace textcare information. However, the majority of professionally with synthesised speech and receiving simple dialogueauthored help articles and solutions are still authorednavigation commands via speech recognition (both fromin English. This scenario demonstrates how a user canILT2), using, for example, a lightweight Bluetooth headset,conduct searches in their native language, but receivewhile away from the PC screen. The adaptive design mustresults collated from content authored in a variety ofaccount for modal restrictions in efficiently completing alanguages, all tailored for consumption by the searcher.task, for instance call the user back to the screen if the userThis work combines adaptive information retrieval research is best supported in that step by a graphic rendering of thefrom DCM1, content harvesting and slicing form DCM 2 relevant content. Equally, considering a user with Englishand MT services from ILT1, operating on corpora provided as a second language, the adaptive dialogue could optby Microsoft for the Office Online support site, whereto render complex technical content in the source Englishmultilingual search support, initially focussed on clipartsearches, can be compared to existing multi-lingualimpair the user’s understanding compared to less technicalrather than employ an MT-generated example which couldcontent.instructions that he/she would prefer in their native tongue.Adaptive Multi-modal, Multi-lingual CustomerSupport: customer care content for support of troubleshooting activities often comes in the form of a branchingdialogue. Though web-based renditions of these arebecoming increasingly graphical, they do not take intoaccount the user’s preferred modes of working, or morespecifically, their current working context. In particular,many self-help troubleshooting activities take the useraway from the screen or keyboard to a hands-busy, eyesbusysetting. In this scenario, the content used comes fromthe Microsoft Xbox support web site and the user scenariocentres on the configuration of an Xbox in the home. Thescenario explores the possibility of adaptively employinga wider range of modalities and languages in deliveringIn addition, the following demo scenarios are also beingadvanced, since, while they do not yet fulfil the completeset of criteria required of the core Y2 demonstrator,they provide substantial potential as platforms for futurecollaborative research and commercial applications:• Twanslator: Multi-Lingual Twitter Service:Here, the same web services for text analyticsand MT that are composed for localisationworkflows in Y1 and the Y2 BLW scenarios arere-integrated to provide Twitter users with aportal for translating individual tweets into alanguage of their choice. This demonstratorwas initially a rapid development by the<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>59


Year 2 DemonstratorFigure 17: Serious Game with Multilingual Speech synthesis for teaching GermanEducation & Outreach activity to road-test <strong>CNGL</strong>technologies for the MyIsle portal (see Figure 16).However, if sufficient traffic from Twitter userscan be attracted it will provide a basis forongoing user evaluation of MT output (ILT1),for incremental MT training (ILT1), for domainrouting of translation evaluation (ILT1+3) andfor adaptive dialogues in harvesting annotations(DMC3). The scenario also feeds directly into theY3 Personalised Multilingual Social Networking(PMSN) demonstrator and provides an accessibleplatform for studying MT approaches toabbreviated textual communication characteristicof instant messaging and mobile phone texts in asocial networking setting.• Adaptive Speech for Game Based Training: this isbuilt on an existing serious game platform thatcombines personalised adaptive behaviour withmultilingual speech output which is an existingTCD application currently used for teachingLeaving Certificate German (see Figure 17). Thisis being integrated with the ILT3 speech synthesisengine to provide a strong application for theevaluation of how personalised adaptation,based on game progress, can impact on spokencharacter output rendered in the game and alsohow individual game characters can convincinglyrender synthesised speech in more than onelanguage. This demonstrator is well suited tofuture re-configuration in customer care trainingapplications.The Demonstration SystemsFrameworkUltimately the demonstrator systemwork must feed into the formationof the Unified Localisation Factory(ULF), supporting next generationlocalisation-focused applications thatleverage advanced language, digitalcontent management and localisationworkflow management technologiesto be rapidly configured andintegrated at low cost. Integratingthe technologies and lessons learntfrom the demonstrator activities aswell as individual research tracksrequires an overall framework, aSystems Framework, for presentingand assessing individual technologies,applications, evaluation techniques, design patterns,interoperability standards and workflows. In Year 1,an abstract process map was used to present specificworkflows and their implementations by placing themacross different business stakeholder types and areas ofbusiness process activities. With the introduction of thePMCC use scenario in Year 2 this process map (Figure 18)was extended to cover the following stakeholder types andprocess areas:Stakeholder Types• Companies: reflecting corporate processes. Flowsmay span several companies in a value chain, e.g.content developer, LSP, Translation Agencies etc;• Communities: reflecting concerns of communitiesof content /service users or (non professional)content producers or translators;• Consumers: reflecting the view of the individualend consumer of content/services, including anyservice personalisation and gathering of feedbackon service quality;• Service Developers: reflecting issues arising forthe developers of software services which areused by the other stakeholders, particularlyas localisation technology moves to a ServiceOriented Architecture and a Software as a Servicebusiness model, including how services integratewith existing and future software systems.60 Centre for Next Generation Localisation (<strong>CNGL</strong>)


ContentDeveloperContentLocalisationContentConsumptionAssetManagementProcessManagementY2-DS3: WorkflowDiscovery, Visualisation and InstantiationCorporateServiceDeveloperCommunity ConsumerDS2: In-Context Post-Editingof MT OutputY1 DemoY2-DS3: Domain TunedProcessing of TranslationsY2-DS1:AnnotationY2-DS5: AdaptiveMultimodal, MultilingualCustomer SupportY2-DS4:MultilingualPersonalisedInformationRetrievalFigure 18: Y1 and Y2 Demos overlaid onto the Systems Framework Process MapProcess Areas• Content Production: including authoring,internationalisation, development of terminology,domain models and content guidelines;• Content Localisation: translating content from asource language to one or more target languages;• Content Consumption: the user drivenconsumption of content including searching,annotating, commenting and rating;• Asset Management: the collection, storage,refinement and general husbanding of reusabledigital assets, e.g. TMs, term-bases, guidelines,workflows, user models, content slices etc;• Process Management: the processes involved inmonitoring, analysing and modifying businessprocesses with the view to improving them.Based on experience from the first year demonstratorand an appreciation of the state of the art, the followingarchitectural principles were agreed and followed, therebyproviding validation for initial input into the ULF SystemsFramework:• A service-oriented architecture using web servicesand web service orchestration should provide thebasis for integrating components and operatingworkflows between potential partners in nextgeneration localisation value chains. This providesa high degree of flexibility in integrating thedifferent language technologies and localisationproducts into different workflow configurationsfor the project, while avoiding reliance on anysingle proprietary platform.• Open standards should be used where possible,so WSDL was adopted for web service definition,BPEL for service orchestration, XLIFF fortranslation and localisation information andTMX for translation memory exchange. Theinteroperability of additional models used inconcert with these standards is supported bythe consolidation work being undertaken bythe Meta-Data group.• User interaction should be primarily webbased and supported through collaborativeweb platforms in particular for the PMCC andlocalisation crowd-sourcing scenarios. Forsome BLW+ scenarios however, integrationwith existing Computer Aided Translation andGlobalisation/Translation Management Systemsemployed or developed by existing partners isessential to enable evaluation of the value-addof integrated <strong>CNGL</strong> technologies within existingwork processes.• The Intellectual Property details of all softwareand models used was tracked to ensure clearmapping of ownership and potential licensingterms and to manage the use of the open sourceplatform exploited to achieve rapid and flexibledevelopment and testing.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>61


Year 2 DemonstratorFuture PlansBased on the Year 2 Demonstrator experiences thefollowing plans will be pursued for the Demonstratoractivities in Year 3:of perishable content and short text form content; socialnetwork informed personalisation of content querying andcontent translation and automated semantic annotationbased on domain personalisation and text analytics.The third iteration of the BLW use scenario (BLW++) willexamine:• different patterns of workflow, including thosethat make major use of machine translation.Tools to support the monitoring and planningof localisation workflows and for supportingthe engagement of users in crowd-sourcedlocalisation are being developed by LOC.• the synergy of text classification of content andthe training of specialised MT will be furtherinvestigated, with service interfaces beingdeveloped for the iterative training of operationalMT systems.• the study of the integration and impact of MT onpost-editing will be deepened, with the potentialto factor in domain-training of MT with thedomain specialisation of post-editors.In addition, a new use scenario is being examined to bettercapture the large dynamic corpora that can best benefitfrom the research being conducted into the extractionand leverage of semantics from unstructured content. Inparticular, applications around extracting semantics fromnews content to support further scenarios in personalisedmulti-lingual Information Retrieval, semantic mapping andontology evolution are being considered.Finally, work toward the definition of the ULF frameworkwill be advanced. The Meta-Data Group and the EvaluationGroup are both providing specific overview frameworksthat will provide the structure needed to populate the ULFwith patterns identified from and validated by the concretedemo scenario work.The second iteration of the PMCC use scenario (PMCC+)will examine:• the multilingual implications of user-drivenannotation, with further investigation of theproblems around adding structure to annotationsets and mapping annotations from differentsources, gathered in different languages;• integration of user annotation and multilingualinformation retrieval;• richer integration of spoken character andmulti-lingual support into speech synthesis and inintegration of training for speech and translationengines.Year 3 will also kick-start the Personalised MultilingualSocial Network (PMSN) use scenario as a demonstrator.This use scenario will address personalised multi-lingualperson-to-person communication and personalisedmulti-lingual search of user generated content. Thework will build on the Year 2 progress in integrating MTand personalisation with the Twitter micro-bloggingservice. We will leverage the high volume of open-accesscontent, the social network features and the open API ofTwitter as the basis for a number of applications accessedthrough the MyIsle portal. These Twitter applicationswill enable studies to be conducted with specific <strong>CNGL</strong>technologies, including: collective content annotation,rating and translation (crowdsourcing); machine translation62 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Impacts,Industry Partners,Technology TransferScientists in <strong>CNGL</strong> are working closely with our industry partnersto address these challenges and identify emerging opportunitiesin the rapidly evolving global multilingual information society.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>63


Impacts/Industry Partners/Technology TransferThe Internet is accelerating globalisation and exposing a range ofstakeholders to multilingual global audiences. Businesses can nolonger afford to think strictly in national terms; however the toolsand technologies for effectively engaging international customershave not kept pace with the triple challenges of increasing volume,shifting modalities, and the personalisation of content delivery.The localisation industry is entering a period of rapidtechnological disruption. Traditional business models aredisappearing and new ones are emerging. Aggressiveyoung entrants are challenging formidable industrycompetitors. Existing stakeholders are being forced tolook beyond their own walls in order to maintain andstrengthen their competitive advantages.structure; this provides them formal town-downcommunication channels with which to influence theresearch agenda. Furthermore our corporate engagementstrategy emphasises one-on-one reciprocal relationshipsbetween academic researchers in <strong>CNGL</strong> and theircorporate equivalents, which provides equally importantand effective bottom-up communication channels.Scientists in <strong>CNGL</strong> are working closely with our industrypartners to address these challenges and identify emergingopportunities in the rapidly evolving global multilingualinformation society. As a world leading multidisciplinarylocalisation research centre, <strong>CNGL</strong> is well positioned togenerate fundamental advances in the state-of-the-artand to help guide our industrial partners through thisperiod of foment.As <strong>CNGL</strong>’s second year draws to a close we canreport progress on multiple fronts, particularly in ourcommercialisation and industry outreach efforts. Duringthe past year the <strong>CNGL</strong> Centre Management team hasplaced significant emphasis on maturing and deepeningrelationships with our current industry partners, as well asembarking on an extensive international outreach effort.At the same time our Intellectual Property portfolio andcommercialisation pipeline have come together and aredemonstrating significant market potential.Knowledge transfer within the Centre operates underan industry-standard Collaborative Research and IPAgreement. The IP agreement was signed by all parties inMay 2008 while the Collaborative Research Agreementwas signed in May <strong>2009</strong> at an event held at the IBMcampus in Dublin. The collaborative research agreementclearly defines how intellectual property generated by theCentre is managed and ultimately commercialised.As a commercially focused research centre <strong>CNGL</strong> dependsupon its industrial partners to provide candid guidanceregarding the research agenda and to continually assessour progress towards key project milestones. Industrialpartners have representatives on every significantmanagement committee within the <strong>CNGL</strong> organisationIn May of <strong>2009</strong>, the centre hosted its’ first annual <strong>CNGL</strong>Scenarios Summit. Over forty participants spent two daysexploring alternative futures of the localisation industry,technological advances and globalisation in the contextof <strong>CNGL</strong>’s position in this rapidly evolving landscape.The Summit generated a high level of participation andengagement from Principal Investigators, PhDs and Post-Doctoral researchers within <strong>CNGL</strong> as well as our industrialpartners, which represented a third of all participants.Ultimately the exercise resulted in the development of a<strong>CNGL</strong> Strategic Plan but more importantly served as afocal point bringing together a diverse group of industrialstakeholders who were able to step back and think abouthow the transforming competitive landscape could impacttheir day-to-day business operations.Current Industrial Partners<strong>CNGL</strong> currently has nine diverse corporate partners whomaintain a strong commitment to the long-term successof our research efforts. Our partners include multinationalcompanies such as Dai Nippon Printing, IBM, Microsoft,SDL and Symantec along with indigenous and regionalSMEs including Alchemy, SpeechStorm, Traslán, andVistaTEC.The diversity of our partners is a reflection of thechallenges facing <strong>CNGL</strong> as well as the importanceof our research to both the Irish economy and theglobal marketplace. A successful realisation of <strong>CNGL</strong>objectives will help drive not only the development andproductisation of novel early stage technologies but alsosolidify Ireland as the centre of excellence for multilinguallocalisation research and development.64 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Trinity College DublinTrinity College DublinAlchemy Software DevelopmentAlchemy Software Development is one of the world’sforemost and recognised localisation technology providers.The company was founded as an Irish SME in 2000 and, asa result of its phenomenal growth and success completeda merger with Translations.com, a leading provider ofsoftware, website and enterprise-wide localisation services,as well as localisation-related technology products, in2008.Alchemy’s initial 5-year commitment to <strong>CNGL</strong> had beenvalued at €630k which is a combination of softwarelicenses and consulting expertise. The company hasalready contributed the full complement of software toour research strands, valued at over €600k. In additionto software licenses, Alchemy personnel have dedicateda significant number of hours working directly with<strong>CNGL</strong> staff. This includes running training sessions,offering technical expertise and mapping out areas ofongoing collaboration. During <strong>2009</strong> Alchemy SoftwareDevelopment has been particularly active in helpingdefine Bulk Localisation Workflow scenarios for the <strong>CNGL</strong>Demonstrator Systems, particularly with respect to the useof Machine Translation technology and the post-editing ofMT output by professional translators. Alchemy has workedextensively with researchers at DCU and TCD on this areaof research and development and has provided additionalfunding of €5,000 to support specific activities toward theintegration of DCU’s MaTrEx MT system with Alchemy’sCATALYST translation workbench and the study of posteditingactivities.Dai Nippon PrintingFounded in Japan in 1876, Dai Nippon Printing (DNP)has grown to become one of the world’s leadingcomprehensive printing companies. DNP has developeda unique vision of the future of multilingual multi-modaldigital media based on its significant expertise in themanagement of global multilingual content distribution.With the company predicting the coexistence of paper anddigital media along with the anticipated creation of newforms of media, DNP’s participation in <strong>CNGL</strong> is of particularstrategic importance to their long-term objectives.Despite the distance, DNP is actively involved in thestrategic direction of the Centre and particularly withrespect to commercialisation activities. Following a meetingof the <strong>CNGL</strong> Commercialisation Committee held atIchigaya Rotunda, Tokyo in December 2008, DNP is activelyengaging in the commercialisation planning of the centreand in the later part of <strong>2009</strong> has put forth a proposal fora joint test model to begin unifying some of the offeringsof existing industry partners with research outcomes ofthe <strong>CNGL</strong> programme with a view toward the Japanesemarket. DNP has also collaborated directly with researcherswithin the Systems Framework (SF) track at Trinity CollegeDublin on multimodal interaction. A research visit by aresearcher from the SF group to Tokyo is planned for 2010.Prof. Josef van Genabith addressing academic and industry partners at the signing of the <strong>CNGL</strong> Collaborative Researchagreement at IBM campus in May.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>65


Impacts/Industry Partners/Technology TransferIBMIBM is one of the world’s leading technology and serviceproviders dedicated to helping clients succeed in deliveringbusiness value by becoming more efficient and competitivethrough the use of business insight and informationtechnology. As a multinational firm IBM takes a globallyintegrated approach to innovation with a network of morethan 60 software development and research laboratoriesthat explore, test and support a wide range of emergingtechnologies. IBM first set up operations in Ireland over 50years ago and since that time the region has become thehub of world-wide research into linguistic technologies.Furthermore the recently established IBM Dublin Centrefor Advanced Studies (CAS) has made Human LanguageTechnologies one of its core research priorities. IBMlaunched the LanguageWare project in 2001 with thevision of creating a componentised linguistic platform withapplications across the company’s entire product portfolio.As a result of their early efforts LanguageWare is now themost broadly used linguistic technology across IBM.Over the initial five years of the <strong>CNGL</strong> operation IBMhas committed a total of €8.65M in funding to theprogramme, €7.7M in the form of software licenses and1.75 FTEs valued at €950k. To date we have integrated€6.7M worth of IBM software licenses with the remainingtools scheduled for inclusion in upcoming research strands.MicrosoftFounded in 1975, Microsoft is the worldwide leader insoftware, services and solutions that help people andbusinesses realise their full potential. The company first setup operations in Ireland in 1985 and has steadily expandedits base of activity, now employing almost 2,000 full-timeand contract staff. As a company that localises productsand services into 60+ languages the need for integratedenterprise and personalised localisation tools is one ofthe fundamental challenges stretching across each ofMicrosoft’s business units. The company’s participationin <strong>CNGL</strong> provides our researchers with a unique industryperspective on the challenges of international productdevelopment.Microsoft had initially committed to providing <strong>CNGL</strong>with translation memories estimated to be worth €2M.During <strong>2009</strong>, with Microsoft’s active involvement in theformulation and specification of industry-relevant scenariosfor <strong>CNGL</strong> Demonstrator Systems, both in Bulk LocalisationWorkflows and Personalised Multilingual Customer Care,additional in-kind contributions of content and resourceshave been committed and, following required legalagreements, are anticipated to be delivered in the firstquarter of 2010.The signing of the <strong>CNGL</strong> Collaborative Research agreement at IBM campus in May66 Centre for Next Generation Localisation (<strong>CNGL</strong>)


SDLSDL was founded in 1992 and has since grown tobecome one of the world’s leading localisation providersto businesses maintaining a global market presence.SDL is at the forefront of research and development inthe fields of machine translation and global informationmanagement technologies. SDL’s industry leading positionin the translation supply chain offers <strong>CNGL</strong> researchersunparalleled access the tools and expertise that are usedto serve over 400 of the world’s leading enterprises.SDL’s initial commitment to <strong>CNGL</strong> included a localisationmanagement system (Idiom Worldserver) valued in excessof €300k over the life of the project. This softwarehad already been delivered during the first year of thecentre’s operation and formed the backbone of thebaseline <strong>CNGL</strong> Demonstration System. SDL has showncontinued commitment to the research programme and tocontributing software and expertise when opportunitiesarise. During <strong>2009</strong> a contribution of 40 licenses for theTRADOS Translator Workbench suite was made to theSchool of Applied Languages and Intercultural Studies(SALIS) at DCU as a resource to be used in courses onTranslation Technology taught there.SpeechstormSpeechStorm is a solutions provider which specialisesin integrating market leading voice platforms andspeech recognition software with in-house applicationdevelopment expertise. The company is an SME based inNorthern Ireland and serves a range of customers includingmultiple government agencies, utility providers andfinancial service firms. The company’s expertise integratingmultiple voice platforms and speech recognition systemsis particularly relevant to the research work-packageson Speech Technology within the Integrated LanguageTechnologies track.SpeechStorm’s initial five-year commitment to <strong>CNGL</strong> wasvalued at €140k which includes €80k worth of softwareservices and 0.10 FTEs valued at €60k. The Centre’sresearch programme timeline calls for the utilisation ofSpeechStorm expertise in the ILT and Systems Frameworktracks from Year 2 onwards. SpeechStorm has to dateinteracted primarily through direct research engagementswith the Speech Technology groups at UCD and TCD. Weanticipate their involvement increasing further from 2010onwards as the greater collaborative integration of SpeechTechnology into other research tracks and DemonstratorSystems takes place.Prof. Andy way of DCU with Paul McManus of SDL and Tony O’Dowd of AlchemySoftware Development at the <strong>CNGL</strong> Scenarios Workshop held in May.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>67


Impacts/Industry Partners/Technology TransferSymantecSymantec is the world leader in providing solutions tohelp individuals and enterprises assure the security,availability, and integrity of their digital information. TheSymantec Shared Engineering Services group is responsiblefor company-wide localisation management along withongoing research and development efforts. Symantec’sprimary areas of localisation-related research focus onmachine translation, MT customer satisfaction studiesand techniques to enhance Rule-Based MT (RBMT)performance. During <strong>2009</strong> we also witnessed an increasedinterest in the area of collaborative community-driventranslation from Symantec.Symantec’s initial commitment to <strong>CNGL</strong> was valued at€2.25M comprised of €2.0M worth of multiple translationmemories and 2.15 FTEs valued at €225k. The companyhas set aside an additional €25k in supplementaryfunds for sponsorship of future targeted projectswithin the <strong>CNGL</strong>. With Symantec’s heavy involvementin the formulation and specification of use scenariosfor Demonstrator Systems, we have seen additionalcommitments of content and translation memoryresources from Symantec. We have also benefited fromaccess to additional Symantec business units, includingthe Customer Support organisation for research visitsand assistance in specifying Demonstrator scenariosfor Personalised Multilingual Customer Care. FollowingSymantec’s interest in collaborative community translation,they have committed to fully fund a new PhD student atTrinity College Dublin starting in 2010. Dr. Fred Hollowood,Director for Global Language Services, has served on the<strong>CNGL</strong> Executive Committee during <strong>2009</strong>.TraslánTraslán is a wholly Irish-owned SME specialising in English-Irish translation services. Their current customers includeindigenous government agencies, City & County Councilsas well as numerous private corporations.Traslán is currently focusing its research efforts on furtherimproving and extending its state-of-the-art (English-Irish)MT system. The research currently being done withinthe <strong>CNGL</strong> work-packages in the Integrated LanguageTechnology track is therefore particularly relevant toTraslán’s current efforts. Dr. Declan Groves of Traslán hascollaborated closely with ILT researchers at DCU and, withthree <strong>CNGL</strong> PhD students, co-authored a paper presentedat the 3rd International Workshop on EBMT, which tookplace in DCU in November <strong>2009</strong>. Dr. Groves has alsoparticipated in a number of collaborations with other<strong>CNGL</strong> industrial partners and has published and presentedpapers on this work. Most notably, his collaboration withMicrosoft led to a paper presented at MT Summit XII heldin Ottawa in August <strong>2009</strong>. He was a regular attendee at<strong>CNGL</strong> Integration committee meetings in <strong>2009</strong>, and hasbeen a regular on-site presence at the <strong>CNGL</strong> Industry Labat DCU. Donncha Ó Cróinín, MD of Traslán, participated asan industry partner representative on the E&O committeeduring the year.Dr. Fred Hollowood of Symantec in discussion withProf. Eiichiro Sumita of ATR Japan.68 Centre for Next Generation Localisation (<strong>CNGL</strong>)


VistaTECOriginally founded in 1997, VistaTEC is a leading Irishlocalisation company providing internationalisation,language services and engineering expertise to a diversecustomer base. VistaTEC is particularly interested inseveral areas that overlap with the <strong>CNGL</strong> research strandsincluding automated workflow planning, quality assurancesystems and crowd-sourcing technologies. As VistaTECis an LSP, with a broad set of commercial clients, they areable to provide <strong>CNGL</strong> researchers with a practical view ofthe current workflows employed in the industry.Mr. Phil Ritchie, CTO of VistaTEC has been an activeparticipant at <strong>CNGL</strong> research meetings, particularly relatedto Demonstrator systems, and has served on the <strong>CNGL</strong>Executive committee during <strong>2009</strong>. VistaTEC has alsocontinued with the funding of a PhD studentship, initiatedin 2008, focused on the post-editing of translation memoryand machine translation output. Finally an additionalcontribution of resources has been committed by VistaTECin <strong>2009</strong> to assist with the Demonstrator System projectrelated to post-editing machine translation output incollaboration with Alchemy Software Development:VistaTEC has committed to cover the cost of five of itsprofessional translators participating in an eye-trackingexperiment while using the Alchemy CATALYST softwareto post-edit translations.Potential New Industrial PartnershipsIn the past year <strong>CNGL</strong> has engaged in an extensiveprogramme of industry outreach, following a strategyof targeting specific industry verticals where the <strong>CNGL</strong>has developed robust, rapidly transferable expertise. Thishas resulted in one-on-one discussions with over thirtycompanies and a number of new industrial collaborationswhich serve to extend the reach of our activities and helpdiversify funding to complement the initial investmentmade by Science Foundation Ireland.The industrial outreach efforts of <strong>CNGL</strong> emphasise twomain pillars:Ireland as a centre of excellence for high-value R&D inlocalisation with a critical mass of industry participants &ancillary activities.<strong>CNGL</strong> as a critical mass of applied academic researchexpertise in localisation and related industries which isvaluable for partners and collaborators.In conjunction with our industry outreach efforts we havelaunched the <strong>CNGL</strong> Collaboration Framework, whichprovides mechanisms for new partners to engage withthe centre. This collaboration framework is designed tofoster the flow of information among trusted partnerswhile at the same time respecting the intellectual propertyobligations set forth by the <strong>CNGL</strong> Collaborative ResearchAgreement. There are three broad types of classifiedcollaboration opportunities set out: Full Members,Collaborators and Associates.AssociatesCollaboratorsMembers<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>69


Impacts/Industry Partners/Technology TransferFull MembersFull Members are both industrial and academic partnerswho have agreed to be bound by the terms of the<strong>CNGL</strong> Collaborative Research and IP Agreements. FullMembership is available on a limited basis to third parties,who have a long-term strategic interest in <strong>CNGL</strong> and thewherewithal to contribute substantial resources to ongoingresearch activities within the Centre. Full Membershipprovides preferential IP access, Committee Membership,and direct access to researchers and staff within <strong>CNGL</strong>.CollaboratorsCollaborators engage directly with <strong>CNGL</strong> on issues ofstrategic importance to them. Collaborators can be bothindustrial and academic entities that are either 1) a <strong>CNGL</strong>Full Member who has sponsored a specific research projector 2) a legal entity not previously affiliated with <strong>CNGL</strong>.Collaborator projects are governed under separate andindividual Collaborative Research, IP and ConfidentialityAgreements, which provide a range of structural options.While collaborators operate under separate agreements,there is a benefit to integrating them under the broader<strong>CNGL</strong> umbrella thereby facilitating valuable interactionsand sharing of expertise.Associate MembersAssociate Membership provides a springboard fororganisations that may be interested in establishing deeperties with <strong>CNGL</strong>. Under the envisaged scheme, in exchangefor a small membership fee Associates are granted an arrayof benefits, the most noteworthy being access to the prescreened<strong>CNGL</strong> publication stream. While Associates arenot granted preferential access to IP generated in <strong>CNGL</strong>,it is expected that this group will play a critical role in thecommercialisation and licensing of emerging technologies.Intellectual Property Management<strong>CNGL</strong>’s Intellectual Property (IP) Management Strategyrecognises the natural progression from emerginginnovations to freely exploitable foreground IP as wellas the preferences of our industrial partners. The <strong>CNGL</strong>option period is designed to allow our commercial partnersthe opportunity to capture strategic pieces of IP that conferexceptional competitive advantage. Outside of thesestrategic acquisitions their interest in, and uptake capacityof novel foreground IP is diminished. Consequently wecategorise emerging foreground IP according to threecommercialisation related categories in close collaborationwith our industrial partners:A. Novel but partners not interestedB. Novel and partners interested in final (v1.0)productC. Novel and partners interested in raw technologyOur research program is designed to ensure that a constantstream of new ideas are entering the commercialisationpipeline, a portion of which are of strategic industrialsignificance (Category B/C) which caters to the needs ofour original founding members, alongside non-core yetnovel ideas (Category A/B) that can be fast-tracked via the<strong>CNGL</strong> alt|start program (see below).As an academic research body, one of the core missions ofthe Centre for Next Generation Localisation is expandingthe state of the art through dissemination of researchresults. At the same time <strong>CNGL</strong> is required to ensure thatour research results are industrially relevant and whereappropriate, protected and available for commercialexploitation. In order for these missions to be realised,and reflecting the publication clearance provisions of the<strong>CNGL</strong> IP Agreement, we rolled out a new PublicationManagement System at the end of <strong>2009</strong>.In addition to providing a streamlined and semi-automatedprocess for review and approval of research publicationsby the <strong>CNGL</strong> IP Committee, the Publication ManagementSystem also provides <strong>CNGL</strong>, its Industry Partners and SFIwith visibility into our innovation pipeline, allows us to lookacross academic institutions and research streams for novelcombinations of and applications for existing technologiesand serves as one focal point to measure progress towardsour annual targets.70 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Commercialisation<strong>CNGL</strong> is entering its third year with a robust pipeline ofbusiness opportunities. In order to support the maturationof our commercial pipeline, the management of <strong>CNGL</strong> hasplaced significant emphasis on developing the Centre’sentrepreneurial ecosystem. In January <strong>2009</strong>, the <strong>CNGL</strong>implemented its’ Researcher Engagement Strategy whichwas designed to identify academic entrepreneurs andequip them with the necessary tools to undertake spin-outactivities. A series of monthly meetings, casual gatheringsand formal programs have laid the foundation for the nextphase of commercialisation activities.To date <strong>CNGL</strong> has generated one spinout company (theRosetta Foundation), two active opportunities that areexpected to be ready for spinout within the next twelvemonths, three emerging opportunities and a steadilygrowing pool of seed ideas.The Rosetta FoundationAccess to information is a fundamental right. It can makethe difference between prosperity and poverty, freedomand captivity, life and death. Today, access to informationis severely restricted for millions of people who live indeveloping regions of the world, speak so-called minoritylanguages, or do not have the economic means to pay forinformation they require.The Rosetta Foundation supports the not-for-profitactivities of the localisation and translation communitiesthrough the development and deployment of an intelligenttranslation and localisation platform. It works withthose who want to provide equal access to informationacross languages, independent of economic or marketconsiderations, including localisation and translationcompanies, technology developers, not-for-profit andnon-governmental organisations.<strong>CNGL</strong> is a large research centre that generates a vastnumber of new opportunities each year. By workingclosely with our industrial partners and actively involvingthem in the research programme we are able to segmentnew opportunities based upon their interest levels. Newtechnologies that are of particular interest to our partnersare placed on a development pipeline that is designed toresult in the direct licensing of technologies. Technologieswhich fall outside the remit of our industrial partners arequeued in the alt|start development pipeline.The translation and localisation platform developmentis based on an open source model making the platformavailable to the translation and localisation community. Itis deployed and supported by The Rosetta Foundation forselected not-for-profit organisations and volunteer translators.The Rosetta Foundation is a not-for-profit organisation(charity) registered in Ireland. It is a spin-off from theUniversity of Limerick’s Localisation Research Centre andthe Centre for Next Generation Localisation (<strong>CNGL</strong>)..alt|start is a design-led,inventor-centric technologyspin-out accelerator programcreated by <strong>CNGL</strong>. Theprogram identifies promisingtechnologies early in theinnovation pipeline and worksone-on-one with the academicentrepreneur to bring solutionsto the market faster and witha greater focus on customerdevelopment. alt|start servesas a catalyst bringing togetherthe necessary resourcesand people who are bestpositioned to execute on astrategic vision prior to andafter spinning-out.The Rosetta Foundation is a not-for-profit spin-off from University of Limerick and <strong>CNGL</strong>.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>71


.Management andGovernanceThe research efforts of the Centre involve a considerableamount of cross-site collaboration and interdependencybetween our four academic and nine industrialpartners. This requires a strong emphasis oncross-site coordination.72 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Management and GovernanceThe Centre for Next Generation Localisation believes that clear andsimple Management and Governance structures are essential to ensurethe scientific, commercial and operational success of the centre. OurManagement and Governance structures are designed to support aworld-class research environment based on:• simple, effective and efficient planning anddecision making• clear responsibility• open and transparent communication structures• balanced and comprehensive representation andinvolvement of all partners and stakeholders• provision of point of contact and procedures forconflict resolution• flexibility to respond quickly and appropriately tochanging environments• structures and support for IP management,Technology Transfer and commercial exploitation• regular appraisal of the scientific programme byinternational experts• regular appraisal of management and governancestructures• reflecting best practice in management andgovernance of large collaborative research centresThe Centre Director, Prof. Josef van Genabith, providesoverall scientific leadership and responsibility forthe running of the Centre. A number of boards andcommittees support the Director in the management,integration and oversight of the Centre’s research andoperations following the principles set out above. Inparticular, the research efforts of the Centre involve aconsiderable amount of cross-site collaboration andinterdependency between our four academic and nineindustrial partners. This requires a strong emphasis oncross-site coordination.The overall management and governance of the Centreis organised as follows (Figure 19).DirectorExternal Scientific Advisory BoardGovernance BoardExecutive CommitteeIP Management BoardIntegration CommitteeEducation and Outreach BoardCommercialisation CommitteeScientific CommitteeFigure 19: <strong>CNGL</strong> Governance and Management Structure<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>73


Management and GovernanceResearch Co-OrdinationThe <strong>CNGL</strong> research programme is organised in a hierarchyof research tracks, work-packages, and sub-workpackages.The four main research tacks relate to work inIntegrated Language Technologies (ILT), Digital ContentManagement (DCM), Next Generation Localisation (LOC)and Systems Framework (SF). Within these four researchtracks, the research programme is organised into 11main work-packages, with individual research projectsthen organised in 50 sub-work-packages. Following thisstructure, co-ordination of the <strong>CNGL</strong> research activitiesoperates across four interrelated levels:• CSET Coordination• Research Track Coordination• Main Work-package Coordination• Sub-Work-package CoordinationOverall CSET Coordination is the responsibility of theCentre Director, Prof. Josef van Genabith. Researchtrack coordination is the responsibility of the four TrackCoordinators:• Integrated Language Technologies (ILT):Prof. Andy Way, DCU• Digital Content Management (DCM):Prof. Vincent Wade, TCD• Next Generation Localisation (LOC):Reinhard Schäler, UL• Systems Framework (SF):Dr. Saturnino Luz, TCDEach of the eleven main work-packages within the fourresearch tracks has a work-package co-ordinator wholiaises with the relevant research track leader. The structureof the four research tracks, 11 main work-packages and 50individual sub-work-packages is shown in Figure 20 below:DirectorProf. Josef van GenabithDCU4 RESEARCH TRACKSIntegrated LanguageTechnologies (ILT)Prof. Andy Way, DCUDigital ContentManagementProf. Vincent WadeNext GenerationLocalisation (LOC)Reinhard Schäler, ULSystems Framework(SF)Dr. Saturnino Luz, TCD11 MAIN WORK-PACKAGESILT 1 ILT 2 ILT 3 DCM 1 DCM 2 DCM 3 LOC 1 LOC 2 LOC3 SF 1 SF 250 SUB-WORK-PACKAGESFigure 20: <strong>CNGL</strong> Research Organisation74 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Integration CommitteeThe <strong>CNGL</strong> research programme is highly collaborative, withtwo basic (ILT & DCM) and two applied (LOC & SF) researchtracks and demonstrator systems centred around shareduse scenarios and demonstrator systems. Given the levelof research co-ordination and integration across the fourresearch tracks and main work-packages, and the levelof integration involved in building demonstrator systemsfrom research outputs, the <strong>CNGL</strong> Integration Committeeis the main body dealing with the operations of the <strong>CNGL</strong>with particular emphasis on scientific and operationalmatters. The Integration Committee is composed of theCentre Director (who chairs the committee), the Scientific& Operations Manager, all four track leaders, Prof. JulieBerndsen from UCD, and a representative of each industrypartner to ensure maximum engagement of industrypartners in oversight of the research program. TheIntegration Committee meets on a bi-monthly schedule,with additional ad-hoc meetings called when necessary.Scientific CommitteeThe <strong>CNGL</strong> Scientific Committee is comprised of allmembers of the Centre across all levels and functions. Thefull Scientific Committee meets twice every year in a twoor three day plenary session to review and share researchprogress and outcomes. The meetings of the ScientificCommittee also provide the opportunity for engagementwith our International Collaborators and External ScientificAdvisory board.The Spring meeting of the <strong>CNGL</strong> Scientific Committeewas held at DCU on February 19 th and 20 th and was usedas a preparatory meeting for an SFI CSET review held onMarch 3 rd <strong>2009</strong>. The meeting was attended by all <strong>CNGL</strong>members and followed the format of the SFI reviewmeeting, including technical overviews of each researchtrack followed by sessions covering industry participation,operational management, and education and outreachactivities. This Scientific Committee meeting served as auseful segue between our year-1 and year-2 activities in<strong>CNGL</strong>, providing a forum for all members, including themany new researchers who had been hired over the courseof 2008, to share a comprehensive overview of the centre’sgoals, activities and accomplishments.The Autumn meeting of the <strong>CNGL</strong> Scientific Committeewas a three-day event held at DCU on October 14 th -16 th<strong>2009</strong>. This meeting reflected the significant progress of the<strong>CNGL</strong> scientific programme through 2008 and <strong>2009</strong> andconsisted of a two-day programme of twenty one scientificpresentations by <strong>CNGL</strong> researchers at the PhD and Post-Doc level, followed by a ‘Localisation Innovation Showcase’event organised in conjunction with the ‘InnovationDublin’ festival. The Localisation Innovation Showcase wasdeemed to be a great success; it included demonstrationsand poster presentations by both academic and industrymembers of <strong>CNGL</strong>, was open to the public and wasattended by a number of localisation industry participantsnot previously involved with <strong>CNGL</strong>. The afternoon sessionof the Showcase event included a keynote address byDion Wiggins, CEO of AsiaOnline, an introduction toThe Rosetta Foundation by Reinhard Schäler of Universityof Limerick and an overview of <strong>CNGL</strong>’s education andoutreach activities.Our plans for Scientific Committee meetings in 2010include the objective of bringing the meetings to otherhost sites across the four academic partners, starting withour Spring 2010 meeting at the University of Limerick.Centre ManagementCentre Operations TeamThe day-to-day implementation of the Centre’s operationaldecisions and policies, financial management, activityco-ordination, tracking and reporting is carried out by theCentre Operations Team in close co-operation with theCentre Director. The Centre Operations Team is led byDr. Páraic Sheridan and meets weekly with the CentreDirector and Deputy Director to continually monitorand prioritise activities across all operational functions,including finance, human resources, reporting, systemadministration and software and IP management. Thecomposition of the Centre Operations team is as follows:• Dr. Páraic Sheridan, Scientific & OperationsManager• Ms. Hilary McDonald, Programme Manager• Ms. Ríona Finn, Centre Administrator• Ms. Fiona Maguire, Financial Administrator• Ms Eithne McCann, Centre Secretary• Mr. Joachim Wagner, Systems AdministratorAlso• Ms Cara Greene, Education & Outreach Manager• Mr. Steve Gotz, IP ManagerDuring the course of <strong>2009</strong>, in response to emerging needsof workload balancing and co-ordination of activitiesacross the four academic and nine industrial partners of<strong>CNGL</strong>, two new positions were provided for and filledwithin the Centre Operations team.A dedicated full-time position of Programme Manager wascreated at <strong>CNGL</strong> to improve overall scheduling, trackingand management of project activities. The programme<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>75


Management and Governancemanager actively coordinates activities across theresearch tracks and work-packages as well as across thecollaborative multi-site academic and industry interactionsdirected toward the <strong>CNGL</strong> Demonstrator Systems. Ms.Hilary McDonald has filled this position, bringing significantexperience from project and programme management inthe localisation industry from her previous employment atMicrosoft. Ms. McDonald is based at Trinity College Dublinand works closely with other members of the CentreOperations team based at DCU.To rebalance the workloads of the Scientific & OperationsDirector and the Centre Administrator and thereby providefor more support for the Centre Director, the fundingdiversification activities of <strong>CNGL</strong> and the industry outreachand engagement program, a new position of FinancialAdministrator was created within <strong>CNGL</strong>. The creation ofthis full-time position also recognised the significant effortof tracking, collating and reporting <strong>CNGL</strong> financial activitiesacross the individual research groups in receipt of fundingacross the four academic institutions in <strong>CNGL</strong>. Ms. FionaMaguire will fill this position at the beginning of 2010,having previously worked within the Finance Office at DCU.She is therefore intimately familiar with the processes andsystems used for financial tracking and reporting within theacademic setting. In addition to financial administrationand reporting, Ms. Maguire is also responsible for HumanResources administration across <strong>CNGL</strong>.Following updated CSET Management and Oversightguidelines circulated by SFI toward the end of <strong>2009</strong>, <strong>CNGL</strong>is revising the composition of its Executive Committee tomeet the new guidelines. The Executive Committee will benewly constituted as the <strong>CNGL</strong> Management Committeeand will include all four research track leaders plus Prof.Julie Berndsen from UCD to ensure representation acrossall four academic institutions who are members of <strong>CNGL</strong>.We will continue to invite industry partner representationin the management committee, though they will not havea formal vote in decisions of the committee.Education and Outreach BoardThe Education and Outreach Board provides leadership,policy and strategy, objectives and resource allocation forthe Centre’s Education and Outreach Programme. TheEducation and Outreach Board meets quarterly and reportsto the <strong>CNGL</strong> Executive Committee. The Board is chaired bythe Education and Outreach Director, and consists of theEducation and Outreach Manager, one nominee from theacademic participants and one nominee from the Industrialparticipants in the Centre. During <strong>2009</strong>, when rotatingmembership of the board from 2008, it was decided toretain a full-time representative from the University ofLimerick on the education and Outreach Board given theextent of UL’s participation in education and outreachactivities. The membership of the Education and OutreachBoard in <strong>2009</strong> included:In addition to the day-to-day work of the CentreOperations team in executing the operational policies andactivities of the <strong>CNGL</strong>, several Management Boards andCommittees provide direction and prioritisation of theCentre’s various activities.• Prof. Harold Somers, DCU [Chair]• Ms. Cara Greene, DCU• Mr. Dag Schmidtke, Microsoft• Mr. Karl Kelly, UL• Dr. David Lewis, TCDExecutive CommitteeThe Executive Committee is the <strong>CNGL</strong>’s decisionmaking body and provides leadership, policy, strategy,resource allocation, performance monitoring and review,management of CSET membership, and conflict resolution.The Executive Committee meets quarterly and is chairedby the Centre Director. Its membership is made up of twoacademic and two industry representatives each year, plusthe Centre Director and Deputy Director. The membershipof the Executive Committee for <strong>2009</strong> included:• Prof. Josef van Genabith, DCU (Director) [Chair]• Prof. Vincent Wade, TCD (Deputy Director)• Prof. Andy Way, DCU• Mr. Reinhard Schäler, UL• Dr. Fred Hollowood, Symantec• Mr. Phil Ritchie, VistaTECIP Management BoardThe IP Management Board manages the IntellectualProperty of the Centre and facilitates Technology Transferand commercial exploitation of IP generated by the Centre.The IP Management Board advises the Centre on all IPissues and, in particular, evaluates proposed publicationsand invention disclosures in accordance with the Centre’s IPagreement. The IP Management Board meets bi-monthly.The IP Management Board is chaired by the IP Manager,Mr. Steve Gotz, and consists of one nominee from each ofthe participating University and Industrial partners plus theCentre Director. The IP Management Board reports to theExecutive Committee.In the early part of <strong>2009</strong> the IP Management Boardcontinued to focus heavily on the conclusion ofnegotiations towards an overall Collaborative ResearchAgreement between all academic and industry members76 Centre for Next Generation Localisation (<strong>CNGL</strong>)


of the <strong>CNGL</strong>. The collaborative research agreement wassuccessfully completed and signed by all members at anevent hosted by IBM at their Dublin campus in May <strong>2009</strong>.Other significant accomplishments in terms of IPmanagement during <strong>2009</strong> include the filing of fourinvention disclosures and two patent applications, thedevelopment of a new in-house publication clearancemanagement system, and the formulation of acollaboration framework to guide engagement withoutside parties, including through new affiliated projects.Commercialisation CommitteeWhile not originally included as part of the initial <strong>CNGL</strong>management infrastructure, the <strong>CNGL</strong> decided duringits first year in operation to convene a committeespecifically to promote and oversee the agenda of researchcommercialisation and technology transfer which is acore part of the centre’s strategy. The CommercialisationCommittee meets bi-annually, usually in conjunction withthe bi-annual Scientific Committee meetings, and includesrepresentatives of all academic and industry partners.During <strong>2009</strong> the Commercialisation Committee met twice,on April 28 th at DCU and on October 12 th at Trinity CollegeDublin. The October meeting included a four-strongdelegation from <strong>CNGL</strong> partners at Dai Nippon Printing inTokyo. In addition to the committee meetings, <strong>2009</strong> sawthe emergence of specific commercialisation activities.These include:• the promotion of <strong>CNGL</strong> within the localisationindustry across the U.S., Europe and Asia, bothat industry-related events and in one-on-onecompany meetings,• the formation of The Rosetta Foundation as asocial entrepreneurship activity targeting effortsto deploy translation industry technology andexpertise to address information poverty andpromote equality through language and culturaldiversity, and• the successful application to Enterprise Ireland forfunding to support a Commercial DevelopmentManager under a joint programme of SFI and EItargeted at CSET commercialisation activities.Looking ahead to 2010, we expect that having a fulltimeCommercial Development Manager under theSFI/EI programme will substantially accelerate thecommercialisation activities of the centre.External OversightFollowing SFI guidelines and best practice for the oversightand governance of large research centres, <strong>CNGL</strong> has twoexternal advisory and oversight boards that meet regularly toreview the scientific and operational progress of the centre.External Scientific Advisory BoardThe External Scientific Advisory Board provides review ofthe long term scientific direction, impact and progress ofthe Centre. It advises, challenges and provides guidanceto the Executive Committee on both the overall scientificgoals and objectives of the Centre as well as on theongoing management of the Centre. The External ScientificAdvisory Board aims to meet biannually and work in closeco-operation with the Executive Committee and the CentreDirector. The <strong>CNGL</strong> External Advisory Board consists ofrecognised world leaders from both Academia and Industryin the fields of Language Technology, Machine Translation,Speech, Adaptive Hypermedia, Information Retrieval, andLocalisation. The External Scientific Advisor Board is chairedby an expert from the area of Localisation.During <strong>2009</strong> the previous chair of our External ScientificAdvisory Board, Mr. Jaap van der Meer stepped downfrom the advisory board in order to dedicate himself fulltimeto his activities promoting translation automationand language resources sharing within the localisationindustry through the Translation Automation Users Society(TAUS), of which he is the Director. As a result of Mr. vander Meer’s departure from the board and in recognition ofthe limitations in travel by Prof. Makoto Nagao, two newmembers were invited to join the board.Mr. Francis Tsang is Director of Globalisation atAdobe Systems Inc. He is responsible for the strategyand delivery of all localised Adobe product releasesand the development of tools and libraries in theinternationalisation area. Mr. Tsang has spent the lasttwenty years building software for various internationalmarkets. He holds degrees in computing and businessmanagement. As of the end of <strong>2009</strong>, Mr. Tsang will nowchair the <strong>CNGL</strong> External Scientific Advisory Board.Dr. Andrew Bredenkamp is co-founder and CEO of acrolinxGmbH, a leading provider of content quality managementsoftware. Acrolinx GmbH was a spin-out company fromthe German Research Centre for Artificial Intelligence(DFKI) where Dr. Bredenkamp worked in the TechnologyTransfer Office. As such he brings experience in companystart-ups from a university spin-out environment. Healso has a background in translation and holds a PhD inComputational Linguistics.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>77


Management and GovernanceThe <strong>CNGL</strong> External Scientific Advisory Board activelyparticipates in the bi-annual <strong>CNGL</strong> Scientific Committeemeetings and reports back to the Centre Director andExecutive Committee. The board is currently composed ofthe following members:• Mr. Francis Tsang, Adobe Corporation, USA[Localisation] (Chair)• Dr. Andrew Bredenkamp, Acrolinx GmbH,Germany [Language Technology]• Prof. Lauri Karttunen, PARC, USA [LanguageTechnology]• Prof. Makato Nagao, President, NIST, Japan[Machine Translation]• Prof. Carol Espy-Wilson, University of Maryland,USA [Speech Technology]• Prof. Peter Brusilovsky, University of Pittsburgh,USA [Adaptive Hypermedia]• Prof. Elizabeth Liddy, Syracuse University, USA[Information Retrieval and NLP]• Prof. Fred Jelinek, Johns Hopkins University, USA[Machine Translation]External Oversight BoardIn accordance with SFI requirements, the President of DCUas the host institution has appointed an external OversightBoard to help with the oversight and assessment of theCentre’s progress. The Oversight Board reports to SFI on aquarterly basis. The Oversight Board currently consists ofthe following members:• Mr. Martin Conry (Secretary DCU and Interim Chair)• Mr. Jim Dowling (Executive Dean, Faculty ofEngineering & Computing DCU)• Prof. Eugene Kennedy (VP Research DCU)• Dr. David Lloyd (Dean of Research, TCD)• Prof. Des Fitzgerald (VP Research, UCD)• Prof. Brian Fitzgerald (VP Research, UL)• Dr. Gearoid Mooney (Enterprise Ireland)• Mr. Aidan Sweeney (IBEC)• Dr. Carol Gibbons (IDA)At the External Oversight Board meetings, <strong>CNGL</strong> isrepresented by:• Prof Josef van Genabith, Centre Director• Prof. Vincent Wade, Deputy Director• Dr. Páraic Sheridan, Scientific & OperationsDirectorThe Oversight Board met quarterly during <strong>2009</strong> to review<strong>CNGL</strong> progress against its scientific and operational targetsto review Key Performance Indicators (KPIs) and reportback to SFI.At its last meeting, held on January 13 th 2010, theOversight Board reviewed the updated SFI Guidelineson CSET Management and Governance with particularattention on the updated ‘CSET Governance Committee’guidelines and agreed to recommend to the President ofDCU, as host institution, to convene a newly constitutedGovernance Committee of eight members following theupdated SFI Guidelines for the next meeting.<strong>2009</strong> Significant AccomplishmentsIn the second year of the Centre for Next GenerationLocalisation, the following management and governanceaccomplishments have been recorded:• The Operations Team has been successfullyexpanded in response to needs identified atthe beginning of the year related to projectmanagement and co-ordination and to provideadditional financial and human resourcesadministration to free up resources in support ofthe Centres objectives for funding diversificationand industry engagement.• The External Scientific Advisory board has beenstrengthened with two new members withsignificant experience in the Localisation industryfrom both large and small companies.• The Commercialisation Committee has provideda solid focus on and foundation for activitiesrelated to the commercial exploitation of researchresults. We have successfully applied for fundingfor a dedicated Commercial DevelopmentManager and we expect to be able to reportseveral commercial successes in the coming year.• The ‘Localisation Innovation Showcase’ event heldin conjunction with our Autumn <strong>2009</strong> ScientificCommittee Meeting was deemed a huge success,both from the point of view of showcasingdemonstrations and posters of <strong>CNGL</strong> scientificwork across our academic and industry partners,and as a venue for highlighting our work to otherinterested parties, particularly those involvedwith the localisation industry in Ireland. Thiswill become a regular feature of our ScientificCommittee meetings in the future.78 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Management OverviewEducationand OutreachThe E&O programme’s main objective is to promote theunderstanding and appreciation of computer science,language technology, localisation, digital contentmanagement and language and cultural issues.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>79


Education and OutreachThe <strong>CNGL</strong> E&O team consists of Prof. Harold Somers (E&O Director),Cara Greene (E&O Manager) and Karl Kelly (LRC Manager). The E&Oteam aim to raise the profile of <strong>CNGL</strong> and promote <strong>CNGL</strong>’s researchareas to benefit the public as well as our own students, researchersand industrial partners.ObjectivesThe E&O programme’s main objective is to promote theunderstanding and appreciation of computer science,language technology, localisation, digital contentmanagement and language and cultural issues. We run aninternship programme, education programmes, open days,innovation days, seminars and competitions for:• Primary, second, third and fourth level education• The general public• The commercial sectorWe run summer schools, tutorials, conferences and publisha journal to foster close ties with the localisation, languageengineering and content management industries as well ashighlighting <strong>CNGL</strong> and Ireland internationally as a leader inlocalisation.Key Performance and Management IndicatorsThe E&O team keep an archive of all <strong>CNGL</strong> events. Thisincludes seminars, workshops, public relations etc. Ourkey performance and management indicators are:• Numbers of <strong>CNGL</strong> members involved in E&O activities• Number of external participants taking part• Number of activities• Questionnaire feedback on education programmes• Evaluation of internship experiences• Level of integration of research tracks with E&O• Visibility of <strong>CNGL</strong> in the public domain e.g. media• The number of professional and transferable skillsdevelopment courses students and staff attend• The number of tutorials / workshops students attend• The number of conferences attended by students• The number of events <strong>CNGL</strong> exhibits atThe E&O programme also promotes internalcommunication or “in reach” within the <strong>CNGL</strong> teamto maintain openness, communication and teamwork.Due to the strong collaborative nature of the research,<strong>CNGL</strong> involves a very high degree of cross-research teaminteraction.The “in reach” programme consists of seminars,workshops and tutorials and highlights professionaldevelopment on offer by the <strong>CNGL</strong> partneruniversities. <strong>CNGL</strong> also provides commercialisation andentrepreneurship support to researchers.<strong>2009</strong> Accomplishments – Internal ActivitiesDCLRS Seminar Series, <strong>CNGL</strong> and NCLT Seminar Series<strong>CNGL</strong> supports the Dublin Computational LinguisticsResearch Seminar (DCLRS) and the <strong>CNGL</strong>/NCLT seminarseries. Both seminars have approx. 30 attendees everyweek. A full list of speakers/presentations can be foundin the appendix.Tutorials and Workshops<strong>CNGL</strong> ran a number of research-focused tutorials andworkshops in <strong>2009</strong> that were open to researchers in thefield. A full list of tutorials/workshops can be found in theappendix.Joint ICT-CSET Thesis award<strong>CNGL</strong> were involved in a joint ICT-CSET Computer Sciencethesis award with CLARITY, CTVR, DERI and Lero. DERI arerunning the competition this year.Internal and External Professional DevelopmentBoth <strong>CNGL</strong> staff and students have taken part inprofessional development activities in their hostuniversities. The courses encompass orientation, libraryskills, research skills, entrepreneurship development,commercialisation and intellectual property management.80 Centre for Next Generation Localisation (<strong>CNGL</strong>)


<strong>2009</strong> Accomplishments – External ActivitiesPrimary LevelIrish Centre for Talented Youth (CTYI) Courses<strong>CNGL</strong> ran courses with the Centre for Talented YouthIreland (CTYI). CTYI runs courses for primary andsecondary school students with exceptional academicability.<strong>CNGL</strong> ran two combined modules with CTYI: ‘JapaneseLanguage’ and ‘Culturally Localising Web Pages’. Fortystudents, aged 8–13 years old, took the two modules.The course aimed to help students become familiar withJapanese language and culture and to introduce them towebsite localisation.Second LevelAll Ireland Linguistics Olympiad (AILO)<strong>CNGL</strong> ran the first everAll Ireland LinguisticsOlympiad (AILO) in <strong>2009</strong>.The Olympiad is aimed attransition year and fifthyear students. A linguistics Olympiad involves face-to-facecompetition where teams (four students) or individualshave to use their ingenuity, creativity and skill to solvelanguage-related problems.Fifteen schools from Ireland and Northern Ireland enteredboth teams and individuals. All schools received trainingvisits from a <strong>CNGL</strong> tutor. Over 90 secondary schoolstudents competed at the AILO final in DCU in 27 April <strong>2009</strong>.The individual competition was won by Dylan CoburnGray, Mount Temple School, Dublin and the teamcompetition was won by Newtown School, Waterford.A team with members from Newtown School and twoindividual winners went on to represent Ireland at the7 th International Linguistics Olympiad (ILO) in July <strong>2009</strong>in Wrocław, Poland. At the international competition,Ruadhan Treacy from Newtown School, Waterford wonan ‘Honourable Mention’ award in the individual round.Dr. Dimitra Anastasiou, Post-Doctoral researcher at UL,working with a student on the CTYI course run by <strong>CNGL</strong>.Localisation Education Activity Programme (LEAP)Dr. Dimitra Anastasiou is leading the Localisation EducationAction Project (LEAP) at the University of Limerick. LEAPis a new educational programme which is envisioned toencourage children to appreciate the cultural and linguisticdiversity of the world.The LEAP project encompasses the Primary SchoolLocalisation Toolkit and the “Language & CulturallyLocalising Web Pages” courses. The toolkit is composedof two elements that are designed to be used together:i) an educational games framework and ii) a teachingaid. Localisation awareness topics are taught in a fun andeducational way through downloadable interactive gamesand lessons.The Irish team at the ILO in Wrocław, PolandAILO 2010 will have a first round in schools due toincreased numbers. We are working closely with ourNorth American, Australian and UK Linguistics Olympiadcounterparts with regard to the training material andcompetition problems. The AILO 2010 final will takeplace on 24 March 2010 in DCU. The 8 th InternationalLinguistics Olympiad will take place in Stockholm,Sweden in Summer 2010.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>81


Education and OutreachBT Young Scientist Exhibition<strong>CNGL</strong> was represented at the SFI stand during the annualBT Young Scientist Exhibition at the RDS on 8 January<strong>2009</strong>. Additionally, two of our industrial partners, Alchemyand SpeechStorm, as well as Dr. Declan Dagger presentedinteractive demos.Third Level<strong>2009</strong> LRC Internationalisation and LocalisationSummer SchoolThe LRC Internationalisation and Localisation SummerSchool took place in June <strong>2009</strong> at UL. The summer schooloffered attendees the chance to attend hands-on trainingcourses and lectures on industry-standard localisationtechnologies delivered by the developers themselves.There were 15 participants at this summer school.Internships<strong>CNGL</strong> hosted three internships as a part of the‘Undergraduate Students as Researchers’ Programme.Mr. Enda McDonnell (Alchemy) showing students AlchemySoftware’s work with iTunesAdaptive Language Learning GameProf. Vincent Wade’s DCM group have developed anadaptive multilingual game, ‘Language Trap’. Teachersare working with Neil Peirce to develop role-plays withinhis adaptive educational game that can be used for theLeaving Certificate German curriculum.Reinhard Schäler’s LOC group in UL hosted an internwho worked on the Primary School Localisation Toolkitwhich aims to help children develop an appreciation ofcultural and linguistic diversity. Localisation awarenesstopics are taught in a fun and educational manner throughdownloadable interactive games and lessons. After thisinternship, the intern accepted an MSc scholarship tocontinue his work developing the collaborative learningenvironment for primary school students.Internships<strong>CNGL</strong> hosted a transition year intern for two weeks inTCD in December <strong>2009</strong>. The intern was trained up onthe Twanslate application and then demonstrated this atthe TCD open day to 6th year second level students. Healso completed an evaluation task on the ‘Language Trap’game.Primary School Toolkit ScreenshotProf. Josef van Genabith and Dr. Lamia Tounsi hosted anintern who worked on extending the existing Arabic LFGGold Standard from 250 annotated sentences to 500. Thisintern has since accepted a PhD scholarship in the ILT groupat <strong>CNGL</strong>.<strong>CNGL</strong> hosted an intern as part of the DCU INTRAprogramme for third year undergraduate students. Theintern worked with both the Information Retrieval group inDCU and Digital Content Management group in TCD. Forhis project he developed a search engine toolkit for use inthe <strong>CNGL</strong> CTYI as well as a movie recommender with the<strong>CNGL</strong> group in TCD.82 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Fourth LevelCLUKI Colloquium<strong>CNGL</strong> hosted the 12th <strong>Annual</strong> CLUK(I) Colloquium inApril <strong>2009</strong>. The occasion saw the expansion of the formerCLUK (Computational Linguistics UK) organisation toinclude Ireland. Its annual colloquium offers PhD studentsin Natural Language Processing and related disciplinesan opportunity to present and discuss their work withmembers of the research community in a student-run andstudent-centred environment. There were 25 participantsat this event.Young Researchers Workshop in Speech Technology<strong>CNGL</strong> hosted the first Young Researchers Workshop inSpeech Technology in April <strong>2009</strong> to coincide with theCLUK(I) Colloquium. The goal of the workshop was toprovide an opportunity for early stage PhD students topresent their research as well as share their experiencesin a less formal setting than a traditional internationalconference. There were 30 participants at this event.Localisation Research Centre (LRC) Best Thesis andScholar AwardIn <strong>2009</strong>, <strong>CNGL</strong> co-sponsored the 13 th Best LocalisationThesis Award, sponsored by Symantec (a <strong>CNGL</strong> industrypartner) since its inception. <strong>CNGL</strong> also supported the LRCBest Scholar Award. The LRC awards are the most valuableand longest established academic awards in localisation.MyISLE – Personalised multilingual social networkingDr. Declan Dagger and the MyISLE team at TCD, have justfinished an application called “Twanslate” which looks atsemi-automatic translation of Twitter tweets and combinesan adaptive dialogue-based rating system for qualitycontrol. They are scheduled to demo this application at theYoung Scientist Exhibition in January 2010.DCU in the Community<strong>CNGL</strong> continued to provide language courses for the ‘DCUin the Community’ programme throughout <strong>2009</strong>. DCUin the Community is a DCU outreach centre in Ballymun,a socially disadvantaged area of Dublin. The aim of thecentre is to give people in Ballymun the chance to get backinto Education and to give them a taste of courses studentscan do in university.<strong>CNGL</strong> Spring MeetingThe <strong>CNGL</strong> Spring <strong>2009</strong> meeting took place in DCU on19–20 February <strong>2009</strong>. All <strong>CNGL</strong> members attended the eventand many of our international collaborators and externaladvisory board members were in attendance. There were30 demos and posters on display from <strong>CNGL</strong> PhD studentsand Postdoctoral researchers. A public event was held onthe final afternoon which featured Dr. Elizabeth Shribergfrom SRI International who gave an entertaining keynoteentitled ‘Ten Reasons for Ignoring Prosody in SpokenLanguage Processing − And Why They Are Wrong’.The General PublicMedia EngagementIn <strong>2009</strong>, we worked with the other SFI CSETs in Ireland andour partner universities to inform the public about <strong>CNGL</strong>events and successes. A full list of media coverage can befound in the appendix.Prof. Josef van Genabith and Dr. Elizabeth Shriberg(SRI International) at the <strong>CNGL</strong> public meeting<strong>CNGL</strong> Autumn MeetingThe <strong>CNGL</strong> Autumn <strong>2009</strong> meeting took place in DCU on14-16 October <strong>2009</strong>. There were 30 demos and posterson display from <strong>CNGL</strong> PhD students and Postdoctoralresearchers. Dion Wiggins from AsiaOnline gave thekeynote address.Demos and posters at the <strong>CNGL</strong> Localisation Innovation DayLocalisation Innovation DayIn conjunction with the ‘Innovation Dublin’ festival,<strong>CNGL</strong> hosted a ‘Localisation Innovation Showcase’event at DCU on 16 October <strong>2009</strong>. The <strong>CNGL</strong> showcase<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>83


Education and Outreachhighlighted localisation innovations through exhibitionsand demonstrations of products, technologies andprojects from <strong>CNGL</strong> as well as our industrial partners. Othercompanies interested and involved in localisation attendedthe <strong>CNGL</strong> Innovation Showcase including Product Innovator,AOL, Straker Software and SAP Business Objects.Design an App CompetitionThe E&O team is organising a ‘Design an Application’competition to be held in 2010. This initiative will be ledby Dr. David Lewis, a <strong>CNGL</strong> researcher at TCD and targetsundergraduate computer science and/or business & ITstudents but is open to non-students as well. Competitorswill draw up a specification for the design of an onlinesocial network application that uses language technologyand translation techniques from the localisation industryto support multi-lingual social interactions across socialnetworks.Commercial and International OutreachExhibiting at and Sponsoring InternationalConferencesMembers of <strong>CNGL</strong> have presented papers and exhibited atall of the major international conferences in our researchareas. A full list of conferences attended can be found inthe appendix.Workshop on Example-based Machine Translation<strong>CNGL</strong> hosted the Third International Workshop on EBMT,held at DCU 12-13 November <strong>2009</strong>. The workshopfeatured research papers as well as multiple panel sessions.Prof. Sadao Kurohashi of Kyoto University, Japan, gavethe keynote address. A total of 47 delegates attendedthe event.ISO/IEC JTC1/SC2/WG2 MeetingThe international standardisation working group meetingwas hosted by <strong>CNGL</strong> at DCU in April <strong>2009</strong>. The scope ofthe meeting was to develop a universal multiple-octetcoded character set that encompasses the world’s scripts.40 participants from 13 countries worldwide were inattendance.Action week for Global Information Sharing(AGIS) <strong>2009</strong>AGIS ’09 Promoting Equality through Language andCultural Diversity, took place at UL, 21-23 September <strong>2009</strong>.The Action week for Global Information Sharing broughttogether hundreds of volunteer translators, localisationspecialists and NGOs from all over the world to addressglobal information poverty.Internationalisation and Localisation ConferenceThe 14th Internationalisation and Localisation Conferenceorganised by the LRC took place on 24-25 September<strong>2009</strong> at the Clarion Hotel in Limerick City. The themeof this year’s conference was ‘Localisation in the Cloud’and the conference looked at the application of cloudbasedcomputing and software as a service models to thelocalisation industry. The conference had 112 attendeesand featured 18 research papers.Localisation Focus, The International Journal ofLocalisationLocalisation Focus is co-sponsored by <strong>CNGL</strong>. One issue hasbeen produced by the LRC in <strong>2009</strong> and will be publishedshortly. A special issue featuring papers from <strong>CNGL</strong>researchers is in production and will appear early 2010.ThinkTank – Localisation in 2014A Localisation ThinkTank exploring the future of theindustry was held on 6 March <strong>2009</strong> in Carlton House,Maynooth, Co. Kildare. This meeting was the first of aseries of annual meetings planned between now and2012. The meeting was chaired by Reinhard Schäler ofUL and attended by 24 senior executives representingsome of the largest localisation companies and digitalpublishers in Ireland.Prof. Sadao Kurohashi giving the keynote at the E<strong>MB</strong>Tworkshop in DCU84 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Appendix 1: Outputs & AccomplishmentsAppendix 1: People and Partnerships<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>85


Appendix 1: People and PartnershipsCSET Research TeamsTeam members associated with the CSET during the reporting periodName Type Institution ResearchStrandHighestDegreeGenderNationalityCSETFundedSupervisorPáraic Sheridan Administrative DCU CM PhD M Irish Yes N/ARíona Finn Administrative DCU CM MSc F Irish Yes N/ASteve Gotz Administrative DCU CM MSc M American Yes N/AHilary McDonald Administrative TCD CM MSc F Irish Yes N/AGeraldine Harrahill Administrative UL CM FETAC F Irish Yes N/AKarl Kelly Administrative UL CM Grad Dip M Irish Yes N/ACara Nicole Greene Administrative DCU E&O BSc F Irish Yes N/AEithne McCann Administrative DCU E&O National Cert F Irish Yes N/AHarold Somers Administrative DCU E&O PhD M British Yes N/ADavid LewisJosef van GenabithFundedInvestigatorLead PrincipalInvestigatorTCD SF PhD M English Yes N/ADCU CM PhD M German Yes N/AAlfredo GuerraMaldonadoMSc TCD ILT BSc M Mexican / Irish Yes Dr. Carl VogelEnda Quigley MSc UL LOC BSc M Irish Yes Mr. Reinhard SchälerDebasis Ganguly PhD DCU DCM MTech M Indian Yes Dr. Gareth JonesJinming Min PhD DCU DCM MSc M Chinese Yes Dr. Gareth JonesMuhammad Javed PhD DCU DCM MSc M Pakistani Yes Dr. Claus PahlWalid Magdy PhD DCU DCM MSc M Egyptian Yes Dr. Gareth JonesWei Li PhD DCU DCM MSc F Chinese Yes Dr. Gareth JonesYalemisew Abgaz PhD DCU DCM MSc M Ethiopian Yes Dr. Claus PahlBen Steichen PhD TCD DCM MSc M Luxembourgish Yes Prof. Vincent WadeBo Fu PhD TCD DCM MSc F Chinese Yes Prof. Vincent WadeCatherine Mulwa PhD TCD DCM MSc F Kenyan YesMs. Mary Sharp /Dr. Christer GoblEamon Hynes PhD TCD DCM MSc M Irish Yes Prof. Vincent WadeKevin Koidl PhD TCD DCM MSc M Irish Yes Prof. Vincent WadeKillian Levacher PhD TCD DCM MSc M French / Irish Yes Prof. Vincent WadeMohammed RamiElHussein GhorabPhD TCD DCM MSc M Egyptian Yes Prof. Vincent WadeNeil Peirce PhD TCD DCM MSc M Irish Yes Prof. Vincent WadeAlejandra LopezFernandezPhD UCD DCM MSc F Mexican Yes Dr. Tony VealeMouradElMoueddebPhD UCD DCM MSc M Tunisian Yes Dr. Tony VealeAnkit Srivastava PhD DCU ILT MA M Indian Yes Prof. Andy WayHala Maghout PhD DCU ILT BSc F Syrian Yes Prof. Andy WayHanna Béchara PhD DCU ILT BA F Irish Yes Prof. Josef van GenabithPratyush Banerjee PhD DCU ILT MSc M Indian YesProf. Andy Way /Prof. Josef van GenabithRejwanul Haque PhD DCU ILT MTech M Indian Yes Prof. Andy WayRobert Smith PhD DCU ILT BSc M Irish Yes Prof. Harold SomersSandipan Dandapat PhD DCU ILT MSc M Indian Yes Prof. Harold SomersSergio Penkale PhD DCU ILT BSc M Argentinean Yes Prof. Andy WayStephen Doherty PhD DCU ILT BA M Irish Yes Dr. Dorothy KennyTsuyoshi Okita PhD DCU ILT MSc M Japanese Yes Prof. Andy WayYiFan He PhD DCU ILT MA M Chinese YesProf. Andy Way /Prof. Josef van Genabith86 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Name Type Institution ResearchStrandHighestDegreeGenderNationalityCSETFundedSupervisorGerard Lynch PhD TCD ILT MSc M Irish Yes Dr. Carl VogelHector Hugo FrancoPenyaPhD TCD ILT BSc M Spanish Yes Dr. Martin EmmsJohn Kane PhD TCD ILT MPhil M Irish Yes Prof. Ailbhe Ní ChasaideLiliana Mamani-SanchezPhD TCD ILT MSc F Peruvian Yes Dr. Carl VogelAhmed Zeeshan PhD UCD ILT MSc M Pakistani YesProf. Julie Carson-BerndsenAmalia Zahra PhD UCD ILT BSc F Indonesian YesProf. Julie Carson-BerndsenEva Szekely PhD UCD ILT MA F Hungarian YesProf. Julie Carson-BerndsenMark Kane PhD UCD ILT MSc M Irish YesProf. Julie Carson-BerndsenMohamed Abou-ZleikhaUdochukwu KaluOgburekePhD UCD ILT MSc M Syrian YesPhD UCD ILT MPhil M Nigerian YesJoss Moorkens PhD DCU LOC BA M Irish YesProf. Julie Carson-BerndsenProf. Julie Carson-BerndsenDr. Sharon O’Brien /Mr. Reinhard SchälerAli Raza Khan PhD UL LOC BSc M Pakistani Yes Mr. Reinhard SchälerAram Morera Mesa PhD UL LOC Grad Dip M Spanish Yes Mr. Reinhard SchälerLorcan Ryan PhD UL LOC MSc M Irish Yes Mr. Reinhard SchälerLucia MoradoVasquezPhD UL LOC MSc F Spanish Yes Mr. Reinhard SchälerMadeleine Lenker PhD UL LOC MA F German Yes Mr. Reinhard SchälerNaoto Nishio PhD UL LOC Grad Dip M Japanese Yes Mr. Reinhard SchälerRajat Gupta PhD UL LOC BSc M Indian Yes Mr. Reinhard SchälerSolomon Gizaw PhD UL LOC MSc M Ethiopian Yes Mr. Reinhard SchälerAnne Schneider PhD TCD SF BSc F German Yes Dr. Saturnino LuzChristos Tsarouchis PhD TCD SF MSc M Greek Yes Dr. David LewisIlana Rozanes PhD TCD SF MSc F American Yes Dr. Saturnino LuzJohn Moran PhD TCD SF BSc M Irish Yes Dr. David LewisJohn McAuley PhD TCD SF MPhil M Irish Yes Dr. David LewisStephan Schlogl PhD TCD SF MSc M Austrian Yes Dr. Saturnino LuzZohar Etzioni PhD TCD SF MSc M Israeli Yes Dr. David LewisStephen Curran PhD / Technician TCD SF BSc M Irish Yes Dr. David LewisJohannes LevelingPostdoctoralResearcherDCU DCM PhD M German Yes Dr. Gareth JonesAlex O’ConnorPostdoctoralResearcherTCD DCM MSc M Irish Yes Prof. Vincent WadeDeclan DaggerPostdoctoralResearcherTCD DCM PhD M Irish Yes Prof. Vincent WadeDong ZhouPostdoctoralResearcherTCD DCM PhD M Chinese Yes Prof. Vincent WadeIan O’KeeffePostdoctoralResearcherTCD DCM PhD M Irish Yes Prof. Vincent WadeMelike SahPostdoctoralResearcherTCD DCM PhD F Cypriot Yes Prof. Vincent WadeSeamus LawlessPostdoctoralResearcherTCD DCM PhD M Irish Yes Prof. Vincent WadePrasenjit MajumderPostdoctoralResearcherUCD DCM PhD M Indian Yes Dr. Tony Veale<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>87


Appendix 1: People and PartnershipsName Type Institution ResearchStrandAnton BrylJie JiangJinhua DuOzlem CetinogluPatrik GildasLambertSara MorrisseySudip Kumar NaskarYanjun MaBaoli LiIrenaYanushevskayaJulie MauclairPeter CahillDimitra AnastasiouIan O’KeeffeLamine AouadDominic JonesIelka van der SluisJohn KeeneyKevin FeeneyNikiforos KaramanisAoife BradyLamia TounsiSylwia OzdowskaTeresa NevinAnnette HautliYvette GrahamAengus WaltonKevin FeeneyPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherPostdoctoralResearcherResearchAssistantResearchAssistantResearchAssistantResearchAssistantResearchAssistantResearchAssistantResearchAssistantResearchAssistantHighestDegreeGenderNationalityCSETFundedSupervisorDCU ILT PhD M Belarusian Yes Prof. Andy WayDCU ILT PhD M Chinese Yes Prof. Andy WayDCU ILT PhD M Chinese Yes Prof. Andy WayDCU ILT PhD F Turkish Yes Prof. Josef van GenabithDCU ILT PhD M French Yes Prof. Andy WayDCU ILT PhD F Irish Yes Prof. Andy WayDCU ILT PhD M Indian Yes Prof. Andy WayDCU ILT PhD M Chinese Yes Prof. Andy WayTCD ILT PhD M Chinese Yes Dr. Carl VogelTCD ILT PhD F Russian Yes Prof. Ailbhe Ní ChasaideUCD ILT PhD F French YesUCD ILT PhD M Irish YesProf. Julie Carson-BerndsenProf. Julie Carson-BerndsenUL LOC PhD F Greek Yes Mr. Reinhard SchälerUL LOC PhD M Irish Yes Mr. Reinhard SchälerUL LOC PhD M Algerian Yes Mr. Reinhard SchälerTCD SF MSc M British Yes Dr. David LewisTCD SF PhD F Dutch Yes Dr. Saturnino LuzTCD SF PhD M Irish Yes Dr. David LewisTCD SF PhD M Irish Yes Dr. David LewisTCD SF PhD M Greek Yes Dr. Saturnino LuzTCD DCM BA F Irish Yes Prof. Vincent WadeDCU E&O PhD F Algerian Yes Ms. Cara GreeneDCU E&O PhD F Polish Yes Ms. Cara GreeneDCU E&O BSc F Irish Yes Ms. Cara GreeneDCU ILT MSc F German Yes Prof. Josef van GenabithDCU ILT MSc F Irish Yes Prof. Andy WayTCD ILT BSc M Irish Yes Dr. Martin EmmsTCD SF PhD M Irish Yes Dr. David Lewis88 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Name Type Institution ResearchStrandHighestDegreeGenderNationalityCSETFundedSupervisorJoachim Wagner Technician DCU CM MA M German Yes N/AEoin Ó’Conchuir Technician UL CM BSc M Irish Yes N/AHanna Béchara Undergraduate DCU E&O BA F Irish Yes Ms. Cara GreeneJian Zhang Undergraduate DCU E&O Undergraduate M Chinese YesDr. Gareth Jones /Ms. Cara GreeneEnda Quigley Undergraduate UL E&O BSc M Irish Yes Mr. Reinhard SchälerDennis ParraStephen FitzmauriceDaniel GalronVisitingResearcherVisitingResearcherVisitingResearcherTCD DCM BSc M Chilean Yes Prof. Vincent WadeTCD DCM BA M Irish Yes Prof. Vincent WadeDCU ILT BSc M American Yes Prof. Andy WayName Type Institution ResearchStrandMikel ForcadaNick CampbellClaus PahlGareth JonesChrister GoblDeclan O’SullivanMary SharpTony VealeDorothy KennySharon O’BrienAilbhe Ní ChasaideCarl VogelMartin EmmsChris ExtonJim BuckleyJJ CollinsLiam MurrayYvonne ClearyGavin DohertyCollaborator notreceiving fundsCollaborator notreceiving fundsCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorCo-PrincipalInvestigatorAffiliated members and collaborators not receiving fundsHighestDegreeGenderNationalityCSETFundedSupervisorDCU ILT PhD M Spanish No N/ATCD ILT PhD M British No N/ADCU DCM PhD M German No N/ADCU DCM PhD M British No N/ATCD DCM PhD M American No N/ATCD DCM PhD M Irish No N/ATCD DCM BSc F Irish No N/AUCD DCM PhD M Irish No N/ADCU ILT PhD F Irish No N/ADCU ILT PhD F Irish No N/ATCD ILT PhD F Irish No N/ATCD ILT PhD M American No N/ATCD ILT PhD M Irish No N/AUL LOC PhD M Australian/Irish No N/AUL LOC PhD M Irish No N/AUL LOC PhD M Irish No N/AUL LOC PhD M Irish No N/AUL LOC PhD F Irish No N/ATCD SF PhD M Irish No N/A<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>89


Appendix 1: People and PartnershipsName Type Institution ResearchStrandVincent WadeAndy WayJulie Carson-BerndsenReinhard SchälerSaturnino LuzLead PrincipalInvestigatorLead PrincipalInvestigatorLead PrincipalInvestigatorLead PrincipalInvestigatorLead PrincipalInvestigatorHighestDegreeGenderNationalityCSETFundedSupervisorTCD DCM PhD M Irish No N/ADCU ILT PhD M British No N/AUCD ILT DPhil F Irish No N/AUL LOC MSc M German No N/ATCD SF PhD M Brazilian No N/ARuwan AsankaWasalaMSc UL LOC BSc M Sri Lankan No Mr. Reinhard SchälerMarian Flanagan PhD DCU ILT PhD F Irish No Dr. Dorothy KennyGiselle De Almeida PhD DCU LOC MPhil F Brazilian/Irish No Dr. Sharon O’BrienMidori Tatsumi PhD DCU LOC MSc F Japanese No Dr. Sharon O’BrienNora ArranberiMonasterioPhD DCU LOC PhD F Spanish No Dr. Sharon O’BrienYanli Sun PhD DCU LOC MA F Chinese No Dr. Sharon O’BrienVentsislav ZhechevPostdoctoralResearcherDCU ILT PhD M Bulgarian No Prof. Josef van GenabithJohn TinsleyResearchAssistantDCU ILT PhD M Irish No Dr. Páraic SheridanDylan Lawless Undergraduate TCD DCM Second Level M Irish No Prof. Vincent WadeMartha LarsonVisitingResearcherDCU DCM PhD F American No Dr. Gareth JonesDavid FarwellVisitingResearcherDCU ILT PhD M American No Prof. Andy WaySivajiBandyopadhayayVisitingResearcherDCU ILT PhD M Indian No Prof. Andy Way90 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Industry Partners and Contact NamesIndustry PartnersOrganisationTypeSMEMNCOrganisationNameAlchemy SoftwareDevelopmentDai NipponPrintingLocationDate joinedCSETDate departedCSETContactNameDublin, Ireland 04/12/2007 N/A Tony O’DowdPositionChief Executive Officerand PresidentTokyo, Japan 04/12/2007 N/A Takeshi Fukunaga Advisor of HeadquartersMNC IBM Dublin, Ireland 04/12/2007 N/A Marie WallaceMNC Microsoft Dublin, Ireland 04/12/2007 N/A Dag SchmidtkeSenior DevelopmentManagerProgram Manager forLanguage TechnologyStrategyMNC SDL Wicklow, Ireland 04/12/2007 N/A Paul McManus Managing DirectorSMESpeechStormBelfast, NorthernIreland04/12/2007 N/A Oliver Lennon Chief Executive OfficerMNC Symantec Dublin, Ireland 04/12/2007 N/A Fred HollowoodDirector for GlobalLanguage ServicesSME Traslán Dublin, Ireland 04/12/2007 N/A Donncha Ó’Cróinín Chief Executive OfficerSME VistaTEC Dublin, Ireland 04/12/2007 N/A Phil Ritchie Chief Technology OfficerGovernance Committee and Scientific Advisory Board membersGovernance Committee MembersName Organisation PositionAidan Sweeney IBEC R&D Policy ExecutiveCarol Gibbons IDA Scientific & Technology AdvisorDavid Lloyd TCD Dean of ResearchDes Fitzgerald TCD VP of ResearchEugene Kennedy DCU VP of ResearchGearoid Mooney Enterprise Ireland Director Informatics Research and CommercialisationJim Dowling DCU Executive Dean, Faculty of Engineering & ComputingJosef van Genabith DCU <strong>CNGL</strong> DirectorMartin Conry DCU University SecretaryScientific Advisory Board MembersName Organisation PositionAndrew Bredenkamp Acrolinx Chief Executive OfficerCarol Espy-Wilson University of Maryland, Department of Electrical & Computer Engineering ProfessorFrancis Tsang Adobe Systems Director of GlobalisationFred JelinekJohns Hopkins University, Department of Electrical & Computer EngineeringDirector, Center for Languageand Speech ProcessingLauri Karttunen Palo Alto Research Center Computational LinguistElizabeth Liddy Syracuse University, School of Information Studies Dean, Trustee ProfessorMakoto Nagao Kyoto University, Department of Electrical Engineering ProfessorPeter Brusilovsky University of Pittsburgh, School of Information Sciences Professor<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>91


Appendix 2: Outputs92 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Appendix 2: OutputsAll CSET publicationsAll <strong>CNGL</strong> publications are stored in a central document management system. We expect to roll-out an open access version ofthis system in the later half of 2010.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.Conference PublicationsAnastasiou, D. (<strong>2009</strong>), “Localisation, Centre for Next Generation Localisation, Localisation and Standards”, in: Proceedings of theConference on Practical Applications in Language and Computers (PALC), 6th - 8th April, Lodz, Poland.Anastasiou, Dimitra, (<strong>2009</strong>), “Crowdsourcing and Machine Translation”, in: The <strong>Annual</strong> Conference Proceedings of the LocalisationResearch Centre LRC XIV, Localisation in the Cloud,, 24th - 25th September, Limerick, Ireland.Apidianaki, Marianna, Yifan He and Andy Way. <strong>2009</strong>. Capturing lexical variation in MT evaluation using automatically built sense-clusterinventories. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.Breitfuss, Werner, Ielka van der Sluis, Saturnino Luz, Helmut Prendinger and Mitsuru Ishizuka (<strong>2009</strong>) Evaluating an Algorithm for theGeneration of Multimodal Referring Expressions in a Virtual World: a Pilot Study. IVA 09: 9th International Conference on IntelligentVirtual Agents, Amsterdam, The Netherlands.Bryl, Anton, Josef van Genabith and Yvette Graham (<strong>2009</strong>) Guessing the Grammatical Function of a Non-Root F-Structure in LFG IWPT-09, Proceedings of the 11th International Conference on Parsing Technologies, Paris, France, pp. 146-149Cahill, Peter, Jinhua Du, Andy Way and Julie Carson-Berndsen (<strong>2009</strong>) Using Same-Language Machine Translation to CreateAlternative Target Sequences for Text-To-Speech Synthesis. INTERSPEECH <strong>2009</strong>: 10th <strong>Annual</strong> Conference of the International SpeechCommunication Association, Brighton, England.Curran, Stephen, Kevin Feeney, David Lewis and Reinhard Schäler (<strong>2009</strong>) The Management of Crowdsourcing in Business Processes.BDIM <strong>2009</strong>: 4th IFIP/IEEE International Workshop on Business-driven IT Management, New York, USA.Doherty, Stephen and Sharon O’Brien (<strong>2009</strong>) Can MT output be evaluated through eye tracking? MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 214-221.Du, Jinhua, Yifan He, Sergio Penkale and Andy Way (<strong>2009</strong>) MaTrEx: the DCU MT System for WMT <strong>2009</strong>. EACL <strong>2009</strong> Fourth Workshopon Statistical Machine Translation, Athens, Greece, 95-99.Du, Jinhua, Yanjun Ma and Andy Way (<strong>2009</strong>) Source-side context-informed hypothesis alignment for combining outputs from machinetranslation systems. MT Summit XII: Proceedings of the Twelfth Machine Translation Summit, Ottawa, Ontario, Canada, 230-237.Du, Jinhua and Andy Way. <strong>2009</strong>. A Three-pass System Combination Framework by Combining Multiple Hypothesis Alignment Methods.In Proceedings of IALP<strong>2009</strong>: the International Conference on Asian Language Processing <strong>2009</strong>, Singapore.Fu, Bo, Rob Brennan and Declan O’Sullivan (<strong>2009</strong>) Multilingual Ontology Mapping: Challenges and a Proposed Framework. ArtificialIntelligence and Simulation of Behaviour <strong>2009</strong> Convention, Workshop on Matching and Meaning <strong>2009</strong>, Edinburgh, Scotland, http://dream.inf.ed.ac.uk/events/wmm-<strong>2009</strong>/download/FuB.<strong>pdf</strong>Galron, Daniel, Sergio Penkale, Andy Way and I. Dan Melamed (<strong>2009</strong>) Accuracy-Based Scoring for DOT: Towards Direct ErrorMinimisation for Data-Oriented Translation. EMNLP <strong>2009</strong>, Proceedings of the <strong>2009</strong> Conference on Empirical Methods in NaturalLanguage Processing, Singapore, 371-380.Ghorab, M. Rami, Dong Zhou, Alex O’Connor and Vincent Wade (<strong>2009</strong>) A Framework for Cross-language Search Personalisation 4thInternational Workshop on Semantic Media Adaptation and Personalisation, San Sebastián, SpainGhorab, M. Rami, Johannes Leveling, Dong Zhou, Gareth J.F. Jones and Vincent Wade (<strong>2009</strong>) TCD-DCU at LogCLEF <strong>2009</strong>: An Analysis ofQueries, Actions, and Interface Languages Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, GreeceHaque, Rejwanul, Sandipan Dandapat, Ankit Kumar Srivastava, Sudip Kumar Naskar and Andy Way (<strong>2009</strong>) English-Hindi TransliterationUsing Context-Informed PB-SMT: the DCU System for NEWS <strong>2009</strong>. NEWS <strong>2009</strong>, <strong>2009</strong> Named Entities Workshop: Shared Task onTransliteration, Singapore, 104-107.Haque, Rejwanul, Sudip Kumar Naskar, Yanjun Ma and Andy Way (<strong>2009</strong>) Using supertags as source language context in SMT. EAMT-<strong>2009</strong>: Proceedings of the 13th <strong>Annual</strong> Conference of the European Association for Machine Translation, Barcelona, Spain, 237-244.Haque, Rejwanul, Sudip Kumar Naskar, Antal van den Bosch and Andy Way. <strong>2009</strong>. Dependency Relations as Source Context in Phrase-Based SMT. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.Haque, Rejwanul, Sudip Kumar Naskar, Josef van Genabith and Andy Way. <strong>2009</strong>. Experiments on Domain Adaptation for English--HindiSMT. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.He, Yifan and Andy Way (<strong>2009</strong>) Learning labelled dependencies for Machine Translation evaluation. EAMT-<strong>2009</strong>: Proceedings of the13th <strong>Annual</strong> Conference of the European Association for Machine Translation, Barcelona, Spain, 44-51.He, Yifan and Andy Way (<strong>2009</strong>) Improving the objective function in minimum error rate training. MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 238-245.Kane, John and Christer Gobl (<strong>2009</strong>) Automatic parameterisation of the glottal waveform combining time and frequency domainmeasures 6th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA<strong>2009</strong>, Firenze,Italy, pp.91-94.Karamanis, Nikiforos and Saturnino Luz (<strong>2009</strong>) Interaction strategies by a non-English speaker in Dublin and their relation to MachineTranslation. Irish HCI Conference, Dublin, Ireland.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>93


Appendix 2: Outputs24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.Karamanis, Nikiforos, Anne Schneider, Ielka van der Sluis, Stephan Schlögl, Gavin Doherty and Saturnino Luz (<strong>2009</strong>) Investigating theoverlap between Human-Computer Interaction and Natural Language Processing. CHI <strong>2009</strong>: 27th <strong>Annual</strong> CHI Conference on HumanFactors in Computing Systems, Boston, USA, WIP128.Kevin Koidl, Owen Conlan and Vincent Wade (<strong>2009</strong>) Non-Invasive Adaptation Service for Web-based Content Management SystemsProceedings of the First International Workshop on Dynamic and Adaptive Hypertext: Generic Frameworks, Approaches and Techniques(DAH’09), [Torino, Italy], pp. 37-48Levacher, Killian, Éamonn Hynes, Séamus Lawless, Alexander O’Connor and Vincent Wade (<strong>2009</strong>) A Framework for Content Preparationto Support Open-Corpus Adaptive Hypermedia DAH’<strong>2009</strong>, 1st International Workshop on Dynamic and Adaptive Hypertext: GenericFrameworks, Approaches and Techniques, EIndhoven, Netherlands, pp.1-11Leveling, Johannes (<strong>2009</strong>) A Comparison of Sub-Word Indexing Methods for Information Retrieval. LWA <strong>2009</strong>: Lernen WissenAdaptivität, Workshop “Information Retrieval <strong>2009</strong>” der Fachgruppe Information Retrieval, Darmstadt, Germany.Leveling, Johannes, Dong Zhou, Gareth F. Jones, and Vincent Wade (<strong>2009</strong>) TCD-DCU at TEL@CLEF <strong>2009</strong>: Document Expansion, QueryTranslation and Language Modeling Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Lewis, David, Stephen Curran, Gavin Doherty, Kevin Feeney, Nikiforos Karamanis and Saturnino Luz (<strong>2009</strong>) Supporting Flexibility andAwareness in Localisation Workflows. LRC XIV “Localisation in The Cloud”: The 14th <strong>Annual</strong> Internationalisation and LocalisationConference, Limerick, Ireland.Lewis, David, Stephen Curran, Kevin Feeney, Zohar Etzioni, John Keeney, Andy Way and Reinhard Schäler (<strong>2009</strong>) Web Service Integrationfor Next Generation Localisation. Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP<strong>2009</strong>), Boulder, Colorado, USA, 47-55.Lewis, David, John McAuley and Kevin Feeney (<strong>2009</strong>) A platform for studying progressive self management in online communities.Proceedings of the WebSci’09: Society On-Line, Athens, Greece, http://journal.webscience.org/233/1/websci09_submission_141.<strong>pdf</strong>Li, Baoli, Martin Emms, Saturnino Luz and Carl Vogel (<strong>2009</strong>) Exploring Multilingual Semantic Role Labelling, Proceedings of theThirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, Boulder, Colorado, USA, 73-78.Li, Baoli and Carl Vogel. <strong>2009</strong>. Leveraging Sub-class Partition Information for Binary Classification and Its Application. In Proceedings ofthe Twenty-ninth SGAI International Conference on Artificial Intelligence (AI-<strong>2009</strong>), England.Ma, Yanjun, Patrik Lambert Andy Way (<strong>2009</strong>) Tuning syntactically enhanced word alignment for Statistical Machine Translation. EAMT09: 13th <strong>Annual</strong> conference of the European Association for Machine Translation, Barcelona, Spain, 253-261.Ma, Yanjun, Tsuyoshi Okita, Özlem Çetinog,lu, Jinhua Du and Andy Way. <strong>2009</strong>. Low-Resource Machine Translation Using MaTrEx: theDCU MT System for IWSLT <strong>2009</strong>. In Proceedings of the IWSLT <strong>2009</strong> Workshop (IWSLT <strong>2009</strong>), Tokyo, Japan.Magdy, Walid, Johannes Leveling and Gareth J.F. Jones (<strong>2009</strong>) DCU @ CLEF-IP <strong>2009</strong>: Exploring Standard IR Techniques on PatentRetrieval Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Min, Jinming, Peter Wilkins, Johannese Leveling and Gareth Jones, DCU at WikipediaMM <strong>2009</strong>: Document Expansion from WikipediaAbstracts, In Proceedings of the CLEF <strong>2009</strong>: Workshop on Cross-Language Information Retrieval and Evaluation, Corfu, Greece, <strong>2009</strong>.Mauclair, Julie, Daniel Aioanei and Julie Berndsen (<strong>2009</strong>) Exploiting phonetic and phonological similarities as a first step for robustspeech recognition. EUSIPCO <strong>2009</strong>, 17th European Signal processing Conference, Glasgow, Scotland.McAuley, John, Kevin Feeney and David Lewis (<strong>2009</strong>) Ethnomethodology as an influence on community-centred design, Proceedings ofAIS SigPrag International Pragmatic Web Conference (ICPW <strong>2009</strong>), Graz, Austria.Moorkens, Joss (<strong>2009</strong>) Total Recall? A Case Study of Consistency in Translation Memory. LRC XIV “Localisation in The Cloud”: The 14th<strong>Annual</strong> Internationalisation and Localisation Conference, Limerick, Ireland.Morado, Lucia; Anastasiou, D., Exton, C., (<strong>2009</strong>), Web 2.0: The great opportunity for Galician Language, in: Proceedings of the IXConference of the International Association of Galician Studies (AIEG) , 13th - 17th July, Santiago de Compostela, Vigo and A Coruña,Spain.Morrissey, Sara (<strong>2009</strong>) An assessment of appropriate sign language representation for machine translation in the healthcare domain.“Sign Language Corpora : Linguistic Issues” Workshop <strong>2009</strong>, London, England.Ogbureke, Udochukwu Kalu and Julie Carson-Berndsen (<strong>2009</strong>) Improving Initial Boundary Estimation for HMM-based AutomaticPhonetic Segmentation. INTERSPEECH <strong>2009</strong>: 10th <strong>Annual</strong> Conference of the International Speech Communication Association, Brighton,England.O’Keeffe, Ian R. (<strong>2009</strong>) An Interactive Music System for Capturing Emotive Data in Music. International Conference on Music andEmotion <strong>2009</strong>, Durham, England.O’Keeffe, Ian R. (<strong>2009</strong>) Active music content for web pages - Intelligent Music Localisation. LRC XIV “Localisation in The Cloud”: The14th <strong>Annual</strong> Internationalisation and Localisation Conference, Limerick, Ireland.O’Keeffe, Ian and Vincent Wade (<strong>2009</strong>) Personalised Web: Adaptability for Web Service Composition and Web Content. In G.-J. Houben,G. McCalla, F. Pianesi and M. Zancanaro (Eds), User Modeling, Adaptation, and Personalisation: 17th International Conference, UMAP<strong>2009</strong>, formerly UM and AH, Trento, Italy (LNCS 5535), Springer-Verlag, Berlin, 480-486.Okita, Tsuyoshi (<strong>2009</strong>) Data Cleaning for Word Alignment. ACL-IJCNLP <strong>2009</strong>: Joint Conference of the 47th <strong>Annual</strong> Meeting of theAssociation for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP,Proceedings of the Student Research Workshop, Singapore, 72-80.Okita, T., S. Naskar and A. Way. <strong>2009</strong>. Noise Reduction Experiments in Machine Translation. In Proceedings of the European Conferenceon Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Project Exhibition, Bled, Slovenia.94 Centre for Next Generation Localisation (<strong>CNGL</strong>)


49.50.51.52.53.54.55.5<strong>6.5</strong>7.58.59.60.61.62.63.Harold Somers, Sandipan Dandapat and Sudip Kumar Naskar (<strong>2009</strong>) A review of EBMT using proportional analogies Proceedings of the3rd Workshop on Example-Based Machine Translation, Dublin, pp. 53-60Steichen, Ben, Séamus Lawless, Alexander O’Connor and Vincent Wade (<strong>2009</strong>) Dynamic Hypertext Generation for Reusing Open CorpusContent. Hypertext <strong>2009</strong>, Proceedings of the Twentieth Conference on Hypertext and Hypermedia, Torino, Italy, pp. 119-128Srivastava, Ankit and Andy Way (<strong>2009</strong>) Using Percolated Dependencies for Phrase Extraction in SMT. MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 316-323.Srivastava, Ankit, Sergio Penkale, Declan Groves and John Tinsley (<strong>2009</strong>) Evaluating Syntax-Driven Approaches to Phrase Extraction forMT Proceedings of the 3rd Workshop on Example-Based Machine Translation, Dublin, pp. 19-28van der Sluis, Ielka, Junko Nagai and Saturnino Luz (<strong>2009</strong>) Producing Referring Expressions in Dialogue: Insights from a translationexercise. PRE-CogSci <strong>2009</strong>, Production of Referring Expressions: Bridging the gap between computational and empirical approaches toreference, Amsterdam, The Netherlands.van der Sluis, Ielka and Chris Mellish (<strong>2009</strong>) Towards Empirical Evaluation of Affective Tactical NLG 12th EuropeanWorkshop on NaturalLanguage Generation, Athens, Greece, pp. 146-153Veale, Tony, Guofu Li and Yanfen Hao (<strong>2009</strong>) Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size. EACL<strong>2009</strong>: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece,835-842.Wade, Vincent (<strong>2009</strong>) Challenges for the Mulit-dimensional personalised Web. In G.-J. Houben, G. McCalla, F. Pianesi and M. Zancanaro(Eds), User Modeling, Adaptation, and Personalisation: 17th International Conference, UMAP <strong>2009</strong>, formerly UM and AH, Trento, Italy(LNCS 5535), Springer-Verlag, Berlin, 3.Wade, Vincent (<strong>2009</strong>) Supporting a Locale of One. LRC XIV “Localisation in The Cloud”: The 14th <strong>Annual</strong> Internationalisation andLocalisation Conference, Limerick, Ireland.Zhou, Dong and Vincent Wade (<strong>2009</strong>) Latent Document Re-Ranking. EMNLP <strong>2009</strong>, Proceedings of the <strong>2009</strong> Conference on EmpiricalMethods in Natural Language Processing, Singapore, 1571-1580.Zhou, Dong and Vincent Wade (<strong>2009</strong>) Language Modeling and Document Re-Ranking: Trinity Experiments at TEL@CLEF-<strong>2009</strong> WorkingNotes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Fu Bo, Brennan Rob, O’Sullivan Declan, Cross-lingual Ontology Mapping - An Investigation of the Impact of Machine Translation. InProceedings of the 4th <strong>Annual</strong> Asian Semantic Web Conference, Shanghai, China, December, <strong>2009</strong>, to appear.Javed, Muhammad, Yalemisew Abgaz and Claus Pahl (<strong>2009</strong>) A Pattern-based Framework of Change Operators for Ontology Evolution.To appear in OnToContent <strong>2009</strong>, The 4rth [sic] International Workshop on Ontology Content, Algarve, Portugal.van der Sluis, Ielka and Gavin Doherty (2010) Ontology Based Queries – Investigating a Natural Language Interface 2010 InternationalConference on Intelligent User Interfaces, Hong Kong, China, FebruaryZhou, Dong and Wade Vincent (2008)2 Smoothing Methods and Cross-language Document Re-ranking, Working Notes for the CLEF<strong>2009</strong> Workshop, Corfu, Greece1.2.Book ChaptersOgbureke, U. and J. Carson-Berndsen: Automatic Speech Segmentation: A balancing act between human judgments, statisticsand linguistics. Chapter submitted to: Gemma Bel-Enguix & M. Dolores Jiménez-López (eds): Language as a Complex System:Interdisciplinary Challenge. Cambridge Scholars Publishing.Way, Andy (2010) Chapter on Machine Translation to appear in Alexander Clark, Chris Fox and Shalom Lappin (Eds) Handbook ofComputational Linguistics and Natural Language Processing, Wiley-Blackwell, expected June 2010Journals1.Anastasiou, Dimitra; Lenker, M., Schäler, R., (<strong>2009</strong>), “Lokalisierung. Lokalisierungskonzept, Internationalisierung und Übersetzung,Software-Lokalisierung, in: Zeitschrift der Gesellschaft für Sprache und Sprachen, Ausgabe 39, 45-52.2. Anastasiou, Dimitra; Ryan, L., (<strong>2009</strong>), “Digital Content for Global Audiences”, in: tc world online magazine.3.Anastasiou, Dimitra and Reinhard Schäler (<strong>2009</strong>) Introducing the project “Centre for Next Generation Localisation”. tcworld, July <strong>2009</strong>.http://www.tcworld.info/index.php?id=62.4. He, Yifan and Andy Way. The Reference and Metric Factors in Minimum Error Rate Training. Machine Translation (in press)5. O’Connor, Alex and Séamus Lawless. Applying Digital Content Management Techniques to Localisation. Localisation Focus Journal6. O’Keeffe, Ian. Music Localisation: Active Music Content for Web Pages. Localisation Focus Journal7.Peirce, Neil, Owen Conlan and Vincent Wade ( 2008) Adaptive Educational Games: Providing Non-invasive Personalised LearningExperiences. The 2nd IEEE International Workshop on Digital Game and Intelligent Toy Enhanced Learning, Banff, Canada.8.Way, Andy and Mary Hearne (2010) On the Role of Translations in State-of-the-Art Statistical Machine Translation Translation to appearin Language and Linguistics Compass, vol. 49.Way, Andy (<strong>2009</strong>) A Critique of Statistical Machine Translation. Journal of translation and interpreting studies: Special Issue onEvaluation of Translation Technology, vol. 5.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>95


Appendix 2: Outputs1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.All Conference PresentationsAnastasiou, D. (<strong>2009</strong>), “Localisation, Centre for Next Generation Localisation, Localisation and Standards”, in: Proceedings of theConference on Practical Applications in Language and Computers (PALC), 6th - 8th April, Lodz, Poland.Anastasiou, Dimitra, (<strong>2009</strong>), “Crowdsourcing and Machine Translation”, in: The <strong>Annual</strong> Conference Proceedings of the LocalisationResearch Centre LRC XIV, Localisation in the Cloud,, 24th - 25th September, Limerick, Ireland.Apidianaki, Marianna, Yifan He and Andy Way. <strong>2009</strong>. Capturing lexical variation in MT evaluation using automatically built sense-clusterinventories. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.Breitfuss, Werner, Ielka van der Sluis, Saturnino Luz, Helmut Prendinger and Mitsuru Ishizuka (<strong>2009</strong>) Evaluating an Algorithm for theGeneration of Multimodal Referring Expressions in a Virtual World: a Pilot Study. IVA 09: 9th International Conference on IntelligentVirtual Agents, Amsterdam, The Netherlands.Bryl, Anton, Josef van Genabith and Yvette Graham (<strong>2009</strong>) Guessing the Grammatical Function of a Non-Root F-Structure in LFG IWPT-09, Proceedings of the 11th International Conference on Parsing Technologies, Paris, France, pp. 146-149Cahill, Peter, Jinhua Du, Andy Way and Julie Carson-Berndsen (<strong>2009</strong>) Using Same-Language Machine Translation to CreateAlternative Target Sequences for Text-To-Speech Synthesis. INTERSPEECH <strong>2009</strong>: 10th <strong>Annual</strong> Conference of the International SpeechCommunication Association, Brighton, England.Curran, Stephen, Kevin Feeney, David Lewis and Reinhard Schäler (<strong>2009</strong>) The Management of Crowdsourcing in Business Processes.BDIM <strong>2009</strong>: 4th IFIP/IEEE International Workshop on Business-driven IT Management, New York, USA.Doherty, Stephen and Sharon O’Brien (<strong>2009</strong>) Can MT output be evaluated through eye tracking? MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 214-221.Du, Jinhua, Yifan He, Sergio Penkale and Andy Way (<strong>2009</strong>) MaTrEx: the DCU MT System for WMT <strong>2009</strong>. EACL <strong>2009</strong> Fourth Workshopon Statistical Machine Translation, Athens, Greece, 95-99.Du, Jinhua, Yanjun Ma and Andy Way (<strong>2009</strong>) Source-side context-informed hypothesis alignment for combining outputs from machinetranslation systems. MT Summit XII: Proceedings of the Twelfth Machine Translation Summit, Ottawa, Ontario, Canada, 230-237.Du, Jinhua and Andy Way. <strong>2009</strong>. A Three-pass System Combination Framework by Combining Multiple Hypothesis Alignment Methods.In Proceedings of IALP<strong>2009</strong>: the International Conference on Asian Language Processing <strong>2009</strong>, Singapore.Fu, Bo, Rob Brennan and Declan O’Sullivan (<strong>2009</strong>) Multilingual Ontology Mapping: Challenges and a Proposed Framework. ArtificialIntelligence and Simulation of Behaviour <strong>2009</strong> Convention, Workshop on Matching and Meaning <strong>2009</strong>, Edinburgh, Scotland, http://dream.inf.ed.ac.uk/events/wmm-<strong>2009</strong>/download/FuB.<strong>pdf</strong>Galron, Daniel, Sergio Penkale, Andy Way and I. Dan Melamed (<strong>2009</strong>) Accuracy-Based Scoring for DOT: Towards Direct ErrorMinimisation for Data-Oriented Translation. EMNLP <strong>2009</strong>, Proceedings of the <strong>2009</strong> Conference on Empirical Methods in NaturalLanguage Processing, Singapore, 371-380.Ghorab, M. Rami, Dong Zhou, Alex O’Connor and Vincent Wade (<strong>2009</strong>) A Framework for Cross-language Search Personalisation 4thInternational Workshop on Semantic Media Adaptation and Personalisation, San Sebastián, SpainGhorab, M. Rami, Johannes Leveling, Dong Zhou, Gareth J.F. Jones and Vincent Wade (<strong>2009</strong>) TCD-DCU at LogCLEF <strong>2009</strong>: An Analysis ofQueries, Actions, and Interface Languages Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, GreeceHaque, Rejwanul, Sandipan Dandapat, Ankit Kumar Srivastava, Sudip Kumar Naskar and Andy Way (<strong>2009</strong>) English-Hindi TransliterationUsing Context-Informed PB-SMT: the DCU System for NEWS <strong>2009</strong>. NEWS <strong>2009</strong>, <strong>2009</strong> Named Entities Workshop: Shared Task onTransliteration, Singapore, 104-107.Haque, Rejwanul, Sudip Kumar Naskar, Yanjun Ma and Andy Way (<strong>2009</strong>) Using supertags as source language context in SMT. EAMT-<strong>2009</strong>: Proceedings of the 13th <strong>Annual</strong> Conference of the European Association for Machine Translation, Barcelona, Spain, 237-244.Haque, Rejwanul, Sudip Kumar Naskar, Antal van den Bosch and Andy Way. <strong>2009</strong>. Dependency Relations as Source Context in Phrase-Based SMT. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.Haque, Rejwanul, Sudip Kumar Naskar, Josef van Genabith and Andy Way. <strong>2009</strong>. Experiments on Domain Adaptation for English--HindiSMT. In Proceedings of PACLIC 23: the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong.He, Yifan and Andy Way (<strong>2009</strong>) Learning labelled dependencies for Machine Translation evaluation. EAMT-<strong>2009</strong>: Proceedings of the13th <strong>Annual</strong> Conference of the European Association for Machine Translation, Barcelona, Spain, 44-51.He, Yifan and Andy Way (<strong>2009</strong>) Improving the objective function in minimum error rate training. MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 238-245.Kane, John and Christer Gobl (<strong>2009</strong>) Automatic parameterisation of the glottal waveform combining time and frequency domainmeasures 6th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA<strong>2009</strong>, Firenze,Italy, pp.91-94.Karamanis, Nikiforos and Saturnino Luz (<strong>2009</strong>) Interaction strategies by a non-English speaker in Dublin and their relation to MachineTranslation. Irish HCI Conference, Dublin, Ireland.Karamanis, Nikiforos, Anne Schneider, Ielka van der Sluis, Stephan Schlögl, Gavin Doherty and Saturnino Luz (<strong>2009</strong>) Investigating theoverlap between Human-Computer Interaction and Natural Language Processing. CHI <strong>2009</strong>: 27th <strong>Annual</strong> CHI Conference on HumanFactors in Computing Systems, Boston, USA, WIP128.Kevin Koidl, Owen Conlan and Vincent Wade (<strong>2009</strong>) Non-Invasive Adaptation Service for Web-based Content Management SystemsProceedings of the First International Workshop on Dynamic and Adaptive Hypertext: Generic Frameworks, Approaches and Techniques(DAH’09), [Torino, Italy], pp. 37-4896 Centre for Next Generation Localisation (<strong>CNGL</strong>)


26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.Levacher, Killian, Éamonn Hynes, Séamus Lawless, Alexander O’Connor and Vincent Wade (<strong>2009</strong>) A Framework for Content Preparationto Support Open-Corpus Adaptive Hypermedia DAH’<strong>2009</strong>, 1st International Workshop on Dynamic and Adaptive Hypertext: GenericFrameworks, Approaches and Techniques, EIndhoven, Netherlands, pp.1-11Leveling, Johannes (<strong>2009</strong>) A Comparison of Sub-Word Indexing Methods for Information Retrieval. LWA <strong>2009</strong>: Lernen WissenAdaptivität, Workshop “Information Retrieval <strong>2009</strong>” der Fachgruppe Information Retrieval, Darmstadt, Germany.Leveling, Johannes, Dong Zhou, Gareth F. Jones, and Vincent Wade (<strong>2009</strong>) TCD-DCU at TEL@CLEF <strong>2009</strong>: Document Expansion, QueryTranslation and Language Modeling Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Lewis, David, Stephen Curran, Gavin Doherty, Kevin Feeney, Nikiforos Karamanis and Saturnino Luz (<strong>2009</strong>) Supporting Flexibility andAwareness in Localisation Workflows. LRC XIV “Localisation in The Cloud”: The 14th <strong>Annual</strong> Internationalisation and LocalisationConference, Limerick, Ireland.Lewis, David, Stephen Curran, Kevin Feeney, Zohar Etzioni, John Keeney, Andy Way and Reinhard Schäler (<strong>2009</strong>) Web Service Integrationfor Next Generation Localisation. Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP<strong>2009</strong>), Boulder, Colorado, USA, 47-55.Lewis, David, John McAuley and Kevin Feeney (<strong>2009</strong>) A platform for studying progressive self management in online communities.Proceedings of the WebSci’09: Society On-Line, Athens, Greece, http://journal.webscience.org/233/1/websci09_submission_141.<strong>pdf</strong>Li, Baoli, Martin Emms, Saturnino Luz and Carl Vogel (<strong>2009</strong>) Exploring Multilingual Semantic Role Labelling, Proceedings of theThirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, Boulder, Colorado, USA, 73-78.Li, Baoli and Carl Vogel. <strong>2009</strong>. Leveraging Sub-class Partition Information for Binary Classification and Its Application. In Proceedings ofthe Twenty-ninth SGAI International Conference on Artificial Intelligence (AI-<strong>2009</strong>), England.Ma, Yanjun, Patrik Lambert Andy Way (<strong>2009</strong>) Tuning syntactically enhanced word alignment for Statistical Machine Translation. EAMT09: 13th <strong>Annual</strong> conference of the European Association for Machine Translation, Barcelona, Spain, 253-261.Ma, Yanjun, Tsuyoshi Okita, Özlem Çetinog,lu, Jinhua Du and Andy Way. <strong>2009</strong>. Low-Resource Machine Translation Using MaTrEx: theDCU MT System for IWSLT <strong>2009</strong>. In Proceedings of the IWSLT <strong>2009</strong> Workshop (IWSLT <strong>2009</strong>), Tokyo, Japan.Magdy, Walid, Johannes Leveling and Gareth J.F. Jones (<strong>2009</strong>) DCU @ CLEF-IP <strong>2009</strong>: Exploring Standard IR Techniques on PatentRetrieval Working Notes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Min, Jinming, Peter Wilkins, Johannese Leveling and Gareth Jones, DCU at WikipediaMM <strong>2009</strong>: Document Expansion from WikipediaAbstracts, In Proceedings of the CLEF <strong>2009</strong>: Workshop on Cross-Language Information Retrieval and Evaluation, Corfu, Greece, <strong>2009</strong>.Mauclair, Julie, Daniel Aioanei and Julie Berndsen (<strong>2009</strong>) Exploiting phonetic and phonological similarities as a first step for robustspeech recognition. EUSIPCO <strong>2009</strong>, 17th European Signal processing Conference, Glasgow, Scotland.McAuley, John, Kevin Feeney and David Lewis (<strong>2009</strong>) Ethnomethodology as an influence on community-centred design, Proceedings ofAIS SigPrag International Pragmatic Web Conference (ICPW <strong>2009</strong>), Graz, Austria.Moorkens, Joss (<strong>2009</strong>) Total Recall? A Case Study of Consistency in Translation Memory. LRC XIV “Localisation in The Cloud”: The 14th<strong>Annual</strong> Internationalisation and Localisation Conference, Limerick, Ireland.Morado, Lucia; Anastasiou, D., Exton, C., (<strong>2009</strong>), Web 2.0: The great opportunity for Galician Language, in: Proceedings of the IXConference of the International Association of Galician Studies (AIEG) , 13th - 17th July, Santiago de Compostela, Vigo and A Coruña,Spain.Morrissey, Sara (<strong>2009</strong>) An assessment of appropriate sign language representation for machine translation in the healthcare domain.“Sign Language Corpora : Linguistic Issues” Workshop <strong>2009</strong>, London, England.Ogbureke, Udochukwu Kalu and Julie Carson-Berndsen (<strong>2009</strong>) Improving Initial Boundary Estimation for HMM-based AutomaticPhonetic Segmentation. INTERSPEECH <strong>2009</strong>: 10th <strong>Annual</strong> Conference of the International Speech Communication Association, Brighton,England.O’Keeffe, Ian R. (<strong>2009</strong>) An Interactive Music System for Capturing Emotive Data in Music. International Conference on Music andEmotion <strong>2009</strong>, Durham, England.O’Keeffe, Ian R. (<strong>2009</strong>) Active music content for web pages - Intelligent Music Localisation. LRC XIV “Localisation in The Cloud”: The14th <strong>Annual</strong> Internationalisation and Localisation Conference, Limerick, Ireland.O’Keeffe, Ian and Vincent Wade (<strong>2009</strong>) Personalised Web: Adaptability for Web Service Composition and Web Content. In G.-J. Houben,G. McCalla, F. Pianesi and M. Zancanaro (Eds), User Modeling, Adaptation, and Personalisation: 17th International Conference, UMAP<strong>2009</strong>, formerly UM and AH, Trento, Italy (LNCS 5535), Springer-Verlag, Berlin, 480-486.Okita, Tsuyoshi (<strong>2009</strong>) Data Cleaning for Word Alignment. ACL-IJCNLP <strong>2009</strong>: Joint Conference of the 47th <strong>Annual</strong> Meeting of theAssociation for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP,Proceedings of the Student Research Workshop, Singapore, 72-80.Okita, T., S. Naskar and A. Way. <strong>2009</strong>. Noise Reduction Experiments in Machine Translation. In Proceedings of the European Conferenceon Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Project Exhibition, Bled, Slovenia.Harold Somers, Sandipan Dandapat and Sudip Kumar Naskar (<strong>2009</strong>) A review of EBMT using proportional analogies Proceedings of the3rd Workshop on Example-Based Machine Translation, Dublin, pp. 53-60Steichen, Ben, Séamus Lawless, Alexander O’Connor and Vincent Wade (<strong>2009</strong>) Dynamic Hypertext Generation for Reusing Open CorpusContent. Hypertext <strong>2009</strong>, Proceedings of the Twentieth Conference on Hypertext and Hypermedia, Torino, Italy, pp. 119-128Srivastava, Ankit and Andy Way (<strong>2009</strong>) Using Percolated Dependencies for Phrase Extraction in SMT. MT Summit XII: Proceedings of theTwelfth Machine Translation Summit, Ottawa, Ontario, Canada, 316-323.<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>97


Appendix 2: Outputs52.53.54.55.5<strong>6.5</strong>7.58.59.60.61.62.63.Srivastava, Ankit, Sergio Penkale, Declan Groves and John Tinsley (<strong>2009</strong>) Evaluating Syntax-Driven Approaches to Phrase Extraction forMT Proceedings of the 3rd Workshop on Example-Based Machine Translation, Dublin, pp. 19-28van der Sluis, Ielka, Junko Nagai and Saturnino Luz (<strong>2009</strong>) Producing Referring Expressions in Dialogue: Insights from a translationexercise. PRE-CogSci <strong>2009</strong>, Production of Referring Expressions: Bridging the gap between computational and empirical approaches toreference, Amsterdam, The Netherlands.van der Sluis, Ielka and Chris Mellish (<strong>2009</strong>) Towards Empirical Evaluation of Affective Tactical NLG 12th EuropeanWorkshop on NaturalLanguage Generation, Athens, Greece, pp. 146-153Veale, Tony, Guofu Li and Yanfen Hao (<strong>2009</strong>) Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size. EACL<strong>2009</strong>: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece,835-842.Wade, Vincent (<strong>2009</strong>) Challenges for the Mulit-dimensional personalised Web. In G.-J. Houben, G. McCalla, F. Pianesi and M. Zancanaro(Eds), User Modeling, Adaptation, and Personalisation: 17th International Conference, UMAP <strong>2009</strong>, formerly UM and AH, Trento, Italy(LNCS 5535), Springer-Verlag, Berlin, 3.Wade, Vincent (<strong>2009</strong>) Supporting a Locale of One. LRC XIV “Localisation in The Cloud”: The 14th <strong>Annual</strong> Internationalisation andLocalisation Conference, Limerick, Ireland.Zhou, Dong and Vincent Wade (<strong>2009</strong>) Latent Document Re-Ranking. EMNLP <strong>2009</strong>, Proceedings of the <strong>2009</strong> Conference on EmpiricalMethods in Natural Language Processing, Singapore, 1571-1580.Zhou, Dong and Vincent Wade (<strong>2009</strong>) Language Modeling and Document Re-Ranking: Trinity Experiments at TEL@CLEF-<strong>2009</strong> WorkingNotes for the CLEF <strong>2009</strong> Workshop, Corfu, Greece [NO PAGES]Fu Bo, Brennan Rob, O’Sullivan Declan, Cross-lingual Ontology Mapping - An Investigation of the Impact of Machine Translation. InProceedings of the 4th <strong>Annual</strong> Asian Semantic Web Conference, Shanghai, China, December, <strong>2009</strong>, to appear.Javed, Muhammad, Yalemisew Abgaz and Claus Pahl (<strong>2009</strong>) A Pattern-based Framework of Change Operators for Ontology Evolution.To appear in OnToContent <strong>2009</strong>, The 4rth [sic] International Workshop on Ontology Content, Algarve, Portugal.van der Sluis, Ielka and Gavin Doherty (2010) Ontology Based Queries – Investigating a Natural Language Interface 2010 InternationalConference on Intelligent User Interfaces, Hong Kong, China, FebruaryZhou, Dong and Wade Vincent (2008)2 Smoothing Methods and Cross-language Document Re-ranking, Working Notes for the CLEF<strong>2009</strong> Workshop, Corfu, GreeceWorkshops and Conferences HostedDate Event Location19.01.09 – 21.01.09 Graph-based mining of digital content DCU06.03.09 ThinkTank - Localisation in 2014 Maynooth14.04.09 – 16.04.09 Free/Open-Source Machine Translation (FOSMT) DCU20.04.09 – 24.04.09 ISO/IEC JTC1/SC2/WG2 Meeting DCU23.04.09 – 24.04.09 Computational Linguistics UK and Ireland DCU25.04.09 1 st Young Researchers Workshop in Speech Technology UCD02.06.09 – 05.06.09 LRC Internationalisation and Localisation Summer School UL12.07.09 Open Standards used in localisation UL21.09.09 – 23.09.09 Action week for Global Information Sharing (AGIS) <strong>2009</strong> UL24.09.09 – 25.09.09 14 th Internationalisation and Localisation Conference UL16.10.09 Localisation Innovation Showcase DCU12.11.09 – 13.11.09 3 rd Workshop on Example-Based Machine Translation DCU98 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Invention Disclosures SubmittedInvention Title Inventor Disclosure DateAutomatic Text Transformation Into a Set of Questions & Answers Walid Magdy 17.04.09Latent Document Reranking using LDA Dong Zhou 01.05.09Using Same Language Machine Translation to Create AlternativeTarget Sequences for Text-To-Speech SynthesisA Method for Systems Evaluation for Recall-Oriented InformationRetrieval ApplicationsPeter Cahill, Jinhua Du,Andy Way, Julie Berndsen20.05.09Walid Magdy 12.08.09Patent Applications Submitted or Granted, and License Agreements SignedTitle Priority Date Inventor Patent Office Patent NumberLatent Document Reranking using LDA 08.05.09 Dong Zhou Irish S<strong>2009</strong>/0361Using Same Language Machine Translationto Create Alternative Target Sequences forText-To-Speech Synthesis09.09.09Peter Cahill, Jinhua Du,Andy Way, Julia BerndsenIrish, EU, USIrish-<strong>2009</strong>/0679EPO-09394025.2US-61/272,299Spin-out Companies CreatedCompanyIncorporation Date 26.08.09Registration Number 474505DescriptionWebsiteThe Rosetta Foundation LimitedAccess to information is a fundamental right. The Rosetta Foundation supports the not-for-profit activities of thelocalisation and translation communities through the development and deployment of an intelligent translation andlocalisation platform.http://www.therosettafoundation.org/All awards and Honours ReceivedNameProf. Mikel ForcadaAward Body Science Foundation IrelandAward Type E.T.S. Walton AwardDates 19.06.09 – 18.06.09<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>99


Appendix 2: OutputsMedia CoverageDate Event Coverage LinkFebruary 09Article on All Ireland LinguisticsOlympiad26 February 09 Article on <strong>CNGL</strong> and CLARITYApril 09April 09June 09July 09DCU article on All Ireland LinguisticsOlympiad FinalDCU president opens All IrelandLinguistics Olympiad FinalBlog on MT - Google’s play in thetranslation spaceTraslán article on 3rd Workshop onExample-Based Machine TranslationAvailable on giftedkids.ieNATURE|Vol 457|26February <strong>2009</strong>DCU press releaseSchool of Computing, DCUSteve Flinter’s blogTraslán websiteJuly 09 DCU article on <strong>CNGL</strong>-CTYI courses DCU web site23 September 09Irish Times article on All IrelandLinguistics Olympiad25 September 09 Article on Joint CSET thesis award26 September 09October 09October 0915 October 09October 0913 November 0915 November 09Limerick Post article on RosettaFoundationArticle on CTTS and IndustrycollaborationArticle on <strong>CNGL</strong> Adaptive HypermediaGameArticle on Innovation Dublin (<strong>CNGL</strong>Innovation Day)Article on <strong>CNGL</strong> “LocalisationInnovation Day”“100m cut could close major researchcentres”Sunday Business Post Feature on theCSETsIrish TimesENN - Ireland IT NewsSourceLimerick PostTranslation and TextualStudies (CTTS)Available ongermanteachers.ieSilicon RepublicInnovation Dublin websiteIrish TimesPrint paper25 November 09 Article on the benefits of R&D SFI website25 November 0926 November 09Mary Coughlan launched the CSETCommercialisation ForumIrish Times article on All IrelandLinguistics Olympiad06 December 09 Strategic focus pays dividends for DCUENN - Ireland IT NewsSourceCommerce to be Focus forfundingSunday Times - SmartIreland supplementhttp://www.giftedkids.ie/teenprograms.htmlhttp://www.idaireland.com/news-media/publications/library-publications/externalpublications/nj7233-1170a_(2).<strong>pdf</strong>http://www.dcu.ie/news/<strong>2009</strong>/apr/s0409m.shtmlhttp://www.computing.dcu.ie/news/news08092.html#ailohttp://flinter.com/<strong>2009</strong>/06/11/googles-play-inthe-translation-space/http://www.traslan.ie/news.htmlhttps://www.dcu.ie/news/<strong>2009</strong>/jul/s0709d.shtmlhttp://www.irishtimes.com/newspaper/features/<strong>2009</strong>/0923/1224255054023.htmlhttp://www.enn.ie/story/show/10125429http://www.cngl.ie/Press/Limerick_Post_260909.jpghttp://www.ctts.dcu.ie/blog/?p=129http://www.germanteachers.ie/http://www.siliconrepublic.com/news/article/14133/randd/innovation-dublin-bringsfestival-mood-to-irish-randdhttp://www.innovationdublin.ie/index.php/fri/cngl_showcase/http://www.irishtimes.com/newspaper/finance/<strong>2009</strong>/1113/1224258724874.htmlhttp://www.sbpost.ie/newsfeatures/keepinginvestment-irish-45640.htmlhttp://www.sfi.ie/content/content.asp?section_id=226&language_id=1&publication_id=1937http://www.electricnews.net/story/show/10125535http://www.irishtimes.com/newspaper/finance/<strong>2009</strong>/1126/1224259483356.htmlhttp://www.cngl.ie/Press/smartireland-<strong>2009</strong>-12-06.<strong>pdf</strong>100 Centre for Next Generation Localisation (<strong>CNGL</strong>)


Other Funding ObtainedProject Name Principal Investigator Award Type Funding Agency Value €Modelling Post-Editing Behaviour to DesignSpecifications for Computer-Aided Tools andTraining ProgrammesEuroMatrix+: Bringing Machine Translation forEuropean Languages to the UserDevelopment of Multimodal Interfaces: ActiveListening and SynchronyLocalisation File Formats, Tools Compatibilityand Format ConversionAMAS: Adaptable Media and Servicesfor Dynamic Personalisation andContextualisationDr. Sharon O’Brien PhD VistaTEC & IRCSET 68,006Prof. Josef van Genabith,Prof. Andy WaySTREP EU-FP7 273,210Dr. Carl Vogel Summer School ESF-COST2102 5,000Mr. Reinhard Schäler MSc Industry-funded 30,000Prof. Vincent Wade PI SFI 702,544TCD Refurbishments Prof. Vincent Wade Refurbishment TCD 75,000CLUKI: Computational Linguistics UnitedKingdom and IrelandMs. Cara Greene Sponsorship Industry-funded 500E.T.S. Walton Fellowship Prof. Mikel Forcada E.T.S. Walton SFI 124,200Dynamic Syntax Meeting Dr. Carl Vogel Workshop TCD 1,000Panacea: Platform for Automatic, NormalisedAnnotation and Cost-Effective Acquisition ofLanguage Resources for Human LanguageProf. Andy Way STREP EU-FP7 299,200TechnologiesPLuTO: Patent Language Translations OnlineProf. Andy Way,Dr. Páraic SheridanICT-PSP EU-FP7 825,271CoSyne: Multilingual Content Synchronisationwith WikisProf. Harold Somers STREP EU-FP7 303,186T4ME: Technologies for the MultilingualEuropean Information SocietyProf. Josef van Genabith NoE EU-FP7 379,740The Rosetta Foundation Mr. Reinhard Schäler Charity Industry-funded 35,000Multilingual Web Mr. Reinhard Schäler ICT-PSP EU-FP7 10,000Evaluating TM Interfaces and Matching Dr. Sharon O’Brien Short-term project DCU 18,000EYECON Dr. Sharon O’Brien Consultancy Industry-funded 10,200Commercialisation Initiative Prof. Josef van Genabith Commercialisation EI 124,843Snap-On Diagnostics Machine Translation Prof. Andy Way Consultancy Industry-funded 25,000SDL Trados <strong>2009</strong> Suite Licenses Dr. Sharon O’Brien Software Industry-funded 10,780Total Awards €3,320,680Design: www.designit.ie<strong>Annual</strong> <strong>Report</strong> <strong>2009</strong>101


Centre for Next Generation LocalisationDublin City University,Dublin 9, Ireland.Tel: +353-1-700 6700Fax: +353-1-700 6702Email: info@cngl.iewww.cngl.ie

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!