12.01.2015 Views

CNGL Annual Report 2012

CNGL Annual Report 2012

CNGL Annual Report 2012

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CNGL</strong> ANNUAL REPORT <strong>2012</strong>


<strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

Director: Prof. Josef Van Genabith<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Deputy Director: Prof. Vincent Wade<br />

School of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Associate Director: Dr. Páraic Sheridan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

INFO@<strong>CNGL</strong>.IE<br />

WWW.<strong>CNGL</strong>.IE<br />

Dublin City University Trinity College Dublin University College Dublin University of Limerick


Preface<br />

THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>) IS A CENTRE FOR SCIENCE ENGINEERING<br />

AND TECHNOLOGY (CSET) FUNDED BY SCIENCE FOUNDATION IRELAND (SFI) AND INDUSTRY PARTNERS.<br />

Centres for Science, Engineering and Technology (CSETs) help link scientists and engineers in partnerships across<br />

academia and industry to address crucial research questions, foster the development of new and existing Irish-based<br />

technology companies, attract industry that could make an important contribution to Ireland and its economy, and<br />

expand educational and career opportunities in Ireland in science and engineering. CSETs are expected to exhibit<br />

outstanding research quality, intellectual breadth, active collaboration, flexibility in responding to new research<br />

opportunities, and integration of research and education in the fields that SFI supports. Science Foundation Ireland<br />

(SFI) is a key organisation in the implementation of Ireland’s National Development Plan (NDP 2007-2013) and the<br />

Strategy for Science, Technology and Innovation 2006-2013. A sum of €8.2 billion has been allocated for scientific<br />

research under the NDP and SSTI of which SFI has responsibility to invest €1.4 billion. SFI will continue to invest in<br />

academic researchers and research teams who are most likely to generate new knowledge, leading edge technologies<br />

and competitive enterprises in the fields of science and engineering.<br />

SFI Vision<br />

Ireland will be a global knowledge leader that places scientific and engineering research at the core of its society<br />

to power economic development and social progress.<br />

This centre is supported by Science Foundation Ireland (grant 07/CE/I1142)<br />

and the National Development Plan 2007–2013.<br />

Science Foundation Ireland<br />

National Development Plan<br />

2007-2013


Table of Contents<br />

Executive Summary 5<br />

CSET Leadership 7<br />

Management Team Biosketches 9<br />

<strong>CNGL</strong> Overview 17<br />

Integrated Language Technologies 27<br />

Digital Content Management 45<br />

Next Generation Localisation 57<br />

Systems Framework 71<br />

Year 5 Demonstrator Programme 81<br />

Industry Partnerships and Technology Transfer 89<br />

Management and Governance 99<br />

Education and Outreach 107<br />

Appendix 1: People and Partnerships 115<br />

Appendix 2: Outputs 124


Executive Summary


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 5<br />

Executive Summary<br />

“Our work is guided by the vision of enabling people to interact with content, products, services and other people<br />

in their own language, according to their own culture, and according to their own personal needs.”<br />

Localisation is the process of adapting digital content<br />

to culture, locale and linguistic environment. It is<br />

a key enabling multiplier technology of the global<br />

manufacturing, software, services and content creation<br />

and distribution industries, unlocking markets otherwise<br />

unavailable. Localisation has a social dimension as<br />

many communities find themselves on the wrong side<br />

of the “digital divide” with vital information (health,<br />

hygiene, food, education etc.) not available in the local<br />

languages, with potentially disastrous consequences.<br />

Localisation technologies and processes can make<br />

a significant contribution to bridging this divide.<br />

The <strong>CNGL</strong> partnership has focused on both the<br />

commercial and the societal dimensions of localisation,<br />

concentrating on the challenges of volume, access and<br />

personalisation. Volume: the amount of content to be<br />

localised massively outstrips human translation capacity.<br />

Access: mobile devices enable ubiquitous access to<br />

perishable and frequently updated information on the<br />

go, involving interaction modalities such as speech and<br />

image, corporate as well as user-generated content.<br />

Personalisation: information is most useful if adapted to<br />

the user, device, background information, knowledge<br />

and task at hand. In terms of a slogan: “the person is the<br />

ultimate locale”.<br />

Over the last five years (2007-<strong>2012</strong>) <strong>CNGL</strong> has made<br />

strident progress connecting the localisation industry<br />

with cutting-edge research in language technologies,<br />

content management, workflow, community and human<br />

factors and software engineering: today the question is<br />

no longer whether or not to use machine translation but<br />

how best to. Today the question is no longer whether or<br />

not to use user-generated content in customer support,<br />

but how best to. Today the question is no longer whether<br />

or not to use collaborative community-based localisation<br />

models, but how best to. These step-changes are based<br />

on scientific progress. Over its first funding period <strong>CNGL</strong><br />

has produced more than 400 peer-reviewed research<br />

papers, 21 PhD students, 39 innovation and software<br />

disclosures, 9 patent applications and secured €15.8m<br />

additional research income growing the <strong>CNGL</strong> research<br />

eco-system.<br />

Key to the success of <strong>CNGL</strong> is close collaboration with<br />

the <strong>CNGL</strong> industry partners, focusing and sharpening the<br />

research. Without this, the step-change in localisation<br />

would not have been possible. Taking research out of the<br />

lab is a core objective of <strong>CNGL</strong>: to date 4 <strong>CNGL</strong> start-up<br />

and spin-out companies including Xcelerator Machine<br />

Translations, Digital Linguistics, Scream Technologies<br />

and Emizar and the not-for-profit social localisation<br />

Rosetta Foundation are strong testimony to this.<br />

Additionally, spinout candidate Wripl is preparing for<br />

launch in 2013.<br />

<strong>CNGL</strong> is preparing for the future: <strong>2012</strong> saw the successful<br />

<strong>CNGL</strong>II application coordinated and led by <strong>CNGL</strong> Deputy<br />

Director Prof. Vincent Wade secure core SFI funding<br />

of €10.5M for the next 30 months. <strong>CNGL</strong>II focuses on<br />

Global Intelligent Content based on the concept of the<br />

Global Content Value Chain, where services interact<br />

with content to make it self-describing, self-aware<br />

and self-adapting across language barriers, modalities<br />

and interaction platforms, tuned to context and user.<br />

Prof. Wade will take over as <strong>CNGL</strong> Director in March<br />

2013. Prof. Wade is an experienced and accomplished<br />

international research leader. Please give him all your<br />

support.<br />

To conclude, I would like to say to all our research<br />

students, postdoctoral researchers, principal<br />

investigators, technical, operations and education and<br />

outreach team staff, to all our industry partners, all our<br />

start-up companies and the researchers and staff in our<br />

extended <strong>CNGL</strong> research eco-system: thank you! You<br />

make this happen!<br />

Prof. Josef van Genabith<br />

Director, <strong>CNGL</strong>


CSET Leadership


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 7<br />

CSET Leadership<br />

CSET Contact Information<br />

<strong>CNGL</strong><br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: info@cngl.ie<br />

Management Team<br />

Director, Co-Leader: Integrated<br />

Language Technologies Track<br />

Prof. Josef van Genabith<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: josef@computing.dcu.ie<br />

Deputy Director, Track Leader:<br />

Digital Content Management<br />

Prof. Vincent Wade<br />

Department of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 1765<br />

Fax: +353 1 677 2204<br />

Email: vincent.wade@cs.tcd.ie<br />

Associate Director<br />

Dr. Páraic Sheridan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6706<br />

Fax: +353 1 700 6702<br />

Email: psheridan@computing.dcu.ie<br />

Track Leaders<br />

Co-Track Leader:<br />

Integrated Language Technologies<br />

Prof. Nick Campbell<br />

Centre for Language and Communication Studies<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 1626<br />

Fax: +353 1 896 2941<br />

Email: nick.campbell@tcd.ie<br />

Track Leader:<br />

Systems Framework<br />

Dr. Saturnino Luz<br />

School of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 3686<br />

Fax: +353 1 677 2204<br />

Email: luzs@cs.tcd.ie<br />

Track Leader:<br />

Next Generation Localisation<br />

Mr. Reinhard Schäler<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 881<br />

Fax: +353 61 202 734<br />

Email: reinhard.schaler@ul.ie<br />

OPERATIONS TEAM<br />

Commercial Development Manager<br />

Mr. Steve Gotz<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6710<br />

Fax: +353 1 700 6702<br />

Email: sgotz@computing.dcu.ie


8<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

CSET LEADERSHIP<br />

LRC Administrator<br />

Ms. Geraldine Harrahill<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 881<br />

Fax: +353 61 202 734<br />

Email: geraldine.harrahill@ul.ie<br />

Financial Administrator<br />

Ms. Fiona Maguire<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6708<br />

Fax: +353 1 700 6702<br />

Email: fmaguire@computing.dcu.ie<br />

Centre Administrator<br />

Ms. Sophie Matabaro<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6707<br />

Fax: +353 1 700 6702<br />

Email: smatabaro@computing.dcu.ie<br />

Centre Secretary<br />

Ms. Eithne McCann<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: emccann@computing.dcu.ie<br />

Project Manager<br />

Ms. Hilary McDonald<br />

School of Computer Science and Statistics<br />

O’Reilly Institute<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 4244<br />

Fax: +353 1 677 2204<br />

Email: mcdonah@scss.tcd.ie<br />

Intellectual Property Manager<br />

Mr. Stephen Roantree<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6720<br />

Fax: +353 1 700 6702<br />

Email: sroantree@computing.dcu.ie<br />

Systems Administrator<br />

Mr. Joachim Wagner<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6915<br />

Fax: +353 1 700 6702<br />

Email: jwagner@computing.dcu.ie<br />

Education and Outreach Team<br />

Education and Outreach Manager<br />

Ms. Cara Greene<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6704<br />

Fax: +353 1 700 6702<br />

Email: cgreene@computing.dcu.ie<br />

Marketing and Communications Officer<br />

Ms. Laura Grehan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6705<br />

Fax: +353 1 700 6702<br />

Email: lgrehan@computing.dcu.ie<br />

LRC Manager<br />

Mr. Karl Kelly<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 748<br />

Fax: +353 61 202 734<br />

Email: karl.kelly@ul.ie


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 9<br />

Management Team Biosketches<br />

publications (including publications in the journals of<br />

Computational Linguistics, Machine Translation, Artificial<br />

Intelligence, Research on Language and Computation,<br />

Natural Language Engineering and the ACL, EACL,<br />

COLING, EMNLP and IJCNLP conferences).<br />

Research Interests<br />

Prof. van Genabith works on localisation, machine<br />

translation, multilingual treebank-based deep grammar<br />

acquisition, and statistical parsing and generation.<br />

Career Highlights<br />

Centre Director and Co-Leader,<br />

Integrated Technologies Track:<br />

Prof. Josef van Genabith<br />

Department: School of Computing<br />

University: Dublin City University<br />

Brief Biography<br />

Prof. Josef van Genabith is the founder and Director<br />

of the Centre for Next Generation Localisation (<strong>CNGL</strong>)<br />

and an Associate Professor in DCU School of Computing.<br />

He graduated in Electronic Engineering and English<br />

at RWTH Aachen (Germany) in 1988 and received his<br />

PhD in Linguistics from the University of Essex (U.K.)<br />

in 1993. He worked as a researcher at the University of<br />

Essex (1991–1992) and at the Institut für Maschinelle<br />

Sprachverarbeitung IMS, Universität Stuttgart (Germany)<br />

(1992–1996). He joined the School of Computing at<br />

DCU as Lecturer in 1996, became Senior Lecturer in<br />

1999 and Associate Professor in 2002. He was Chair<br />

of the Programme Board for the B.Sc. in Applied<br />

Computational Linguistics (DCU) 1997–2001. In 2001<br />

he became Director of the National Centre for Language<br />

Technology (NCLT) and developed the NCLT to its<br />

current 40+ members, and research grant income of<br />

over €5M since 2001 (excluding <strong>CNGL</strong>). He has been<br />

leading Science Foundation Ireland (SFI), Enterprise<br />

Ireland (EI) and European Union (EU) funded research<br />

projects and was awarded an SFI Principal Investigator<br />

award in 2004. He became a Visiting Researcher at IBM’s<br />

Dublin Center for Advanced Studies (CAS) in 2003 and<br />

a Faculty Fellow in 2004. He has graduated 18 PhD<br />

and 6 M.Sc. by Research students. He is (joint) author<br />

of more than 150 peer-reviewed international research<br />

} <strong>2012</strong>: General Chair COLING 2014, Dublin, Ireland<br />

} <strong>2012</strong>: Recipient of the DCU <strong>2012</strong> President’s<br />

Research Award for Science and Engineering<br />

} 2010–present: META-NET (Multilingual Europe<br />

Technology Alliance EU Network of Excellence)<br />

Executive Board and Technology Council member<br />

} 2007–present: Advisory Board, European Association<br />

for Computational Linguistics (EACL)<br />

} 2007–present: Director and Lead-PI of SFI <strong>CNGL</strong><br />

CSET Award €16.8M<br />

} 2005–present: Faculty Fellow, IBM Center for<br />

Advanced Studies (CAS), Dublin<br />

} 2004–2005: Visiting Scientist, IBM Center for<br />

Advanced Studies (CAS), Dublin<br />

} 2004–2009: SFI Principal Investigator, Science<br />

Foundation Ireland, GramLab, €839K<br />

} 2001–2008: Director, National Centre for Language<br />

Technology (NCLT), DCU<br />

} 1997–2001: Chair of Programme Board, B.Sc. in<br />

Applied Computational Linguistics (ACL), DCU<br />

School of Computing


10<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Deputy Centre Director, Track Leader:<br />

Digital Content Management:<br />

Prof. Vincent P. Wade<br />

Department: Discipline of Intelligent Systems,<br />

School of Computer Science and Statistics<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Prof. Vincent Wade is Deputy Director of the Centre<br />

for Next Generation Localisation (<strong>CNGL</strong>) and Head of<br />

the Discipline of Intelligent Systems at the School of<br />

Computer Science and Statistics, Trinity College Dublin.<br />

The Discipline of Intelligent Systems comprises four<br />

research groups: the Knowledge and Data Engineering<br />

Group, the Computational Linguistics Group, the<br />

Graphics Vision and Visualisation Group, and the<br />

Artificial Intelligence Group. The Discipline comprises<br />

21 academics and more than 150 full-time postgraduate<br />

(PhD) students and research fellows.<br />

Prof. Wade graduated from UCD with a B.Sc. (Hons)<br />

in Computer Science (1987) and received his M.Sc.<br />

and PhD postgraduate degrees in Computer Science<br />

from TCD. He holds the position of Associate Professor<br />

in the School of Computer Science and Statistics and<br />

in 2002 was awarded Fellowship of Trinity College for<br />

his contribution to research in the areas of knowledge<br />

management and adaptive technologies. In 1999 he<br />

founded the Centre for Learning Technology, which<br />

has pioneered the innovation and development of<br />

eLearning technologies in the University. He was also<br />

awarded the position of Visiting Scientist in the Center<br />

for Advanced Studies at IBM for his research in adaptive<br />

hypermedia and knowledge management (2005-2008).<br />

He was Research Director of the Knowledge and Data<br />

Engineering Research Group (1995-2007).<br />

Prof. Wade is author of over 150 scientific papers<br />

in peer-reviewed research journals and international<br />

conferences and has received eight ‘best paper’ awards<br />

for publications in IEEE, IFIP and AACE Conferences<br />

within the last nine years. He has been guest editor of<br />

IEEE Communications as well as a reviewer for many<br />

IEEE and ACM journals including IEEE Communications,<br />

IEEE Network, IEEE Intelligent Systems, ACM Transaction<br />

on the Web, and IEEE Transactions on Learning<br />

Technologies. Prof. Wade is a scientific programme<br />

member for many prestigious international conferences<br />

including IEEE’s IM and NOMS, ACM Hypertext and<br />

WWW Conference series. He was co-chair of the<br />

Adaptive Hypermedia Conference (AH2006) that<br />

was held in Dublin in June 2006, and General Cochair<br />

for IEEE IM 2011, which was held at TCD in May<br />

2011. He has been responsible for fourteen major EU<br />

research projects under the EU ACTS and IST Research<br />

Programmes as well as national research projects<br />

funded under the SFI PI Programme, HEA PRTLI and<br />

several Science Foundation Ireland/Enterprise Ireland<br />

Technology Innovation Development Awards. He has<br />

been responsible for the commercialisation of research<br />

and is a co-founder of ‘Empower The User’, an innovative<br />

start-up company in the area of personalisation and soft<br />

skills training.<br />

Research Interests<br />

Prof. Wade’s research interests focus on Knowledge<br />

Engineering research, in particular adaptive web systems,<br />

dynamic personalisation, adaptive management and<br />

control systems, and process management. His research<br />

has been applied in several technology application<br />

areas including eLearning and Management Systems<br />

for next generation networks and distributed services.<br />

Since 1991, he has been TCD’s Principal Investigator for<br />

over fifteen EU research projects under the EU RACE,<br />

Telematics, ESPRIT, ACTS, and IST research programmes.<br />

He was also PI for ADAPT (2005–2007) and Pudecas<br />

(2005–2007), funded under the Technology Innovation<br />

Research Programme (Enterprise Ireland) and PI for the<br />

HEA-sponsored MZONES project (2002–2006).


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 11<br />

Associate Director:<br />

Dr. Páraic Sheridan<br />

Department: School of Computing<br />

University: Dublin City University<br />

Brief Biography<br />

Dr. Páraic Sheridan is Associate Director at <strong>CNGL</strong>.<br />

He received his B.Sc. degree (1st class honours) in<br />

Computer Applications from Dublin City University<br />

(DCU) in 1989. He then completed an M.Sc. degree<br />

in Computer Applications at DCU by research in 1991,<br />

studying the use of Natural Language Processing in<br />

Information Retrieval. This was followed in 1994 by an<br />

M.S. degree in Computational Linguistics at Carnegie<br />

Mellon University (CMU) in Pittsburgh, PA. His study<br />

at CMU was funded by Claris Corporation (Dublin) for<br />

whom he researched the use of Translation Memories<br />

in the software localisation process. He completed his<br />

doctoral work in 1998 at the Swiss Federal Institute of<br />

Technology (ETH) Zürich with a dissertation on the topic<br />

of Cross-Language Information Retrieval. While at ETH<br />

he also helped develop the SPIDER information retrieval<br />

system which was commercialised and spun out from<br />

ETH into the EuroSpider company.<br />

Dr. Sheridan then joined TextWise LLC, a start-up<br />

company in Syracuse, NY which was a spin-out from<br />

Syracuse University-based on research by Prof. Elizabeth<br />

Liddy in the area of Natural Language Processing and<br />

Information Retrieval. Over the course of a 10-year career<br />

at TextWise, Dr. Sheridan held a variety of positions in<br />

research management, programme management and<br />

product management, ultimately achieving the position<br />

of Chief Scientist at the company. This reflected his work<br />

on the CINDOR cross-language search system, initially<br />

as a government-funded research project which was<br />

then commercialised and marketed by TextWise in the<br />

enterprise search space. Dr. Sheridan also led the effort<br />

in adapting the CINDOR product to the needs of the<br />

U.S. Intelligence Community; developing a crosslanguage<br />

English-Arabic query translation module to<br />

integrate with standard enterprise search platforms.


12<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Research Interests<br />

Co-Track Leader: Integrated Language Technologies:<br />

Prof. Nick Campbell<br />

Department: Centre for Language<br />

and Communication Studies (CLCS)<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Prof. Nick Campbell is SFI Stokes Professor of Speech<br />

& Communication Technology at Trinity College Dublin.<br />

He received his Ph.D. degree in Experimental Psychology<br />

from the University of Sussex in the U.K., and was<br />

previously engaged at the Japanese National Institute<br />

of Information and Communications Technology, and<br />

as Chief Researcher in the Department of Acoustics<br />

and Speech Research, Advanced Telecommunications<br />

Research Institute International, Kyoto, Japan, where<br />

he also served as Research Director for the JST/CREST<br />

Expressive Speech Processing and the SCOPE “Robot’s<br />

Ears” projects. He was first invited as a Research Fellow<br />

at the IBM U.K. Scientific Centre, where he developed<br />

algorithms for speech synthesis, and later at the AT&T<br />

Bell Laboratories, where he worked on the synthesis of<br />

Japanese. He served as Senior Linguist at the Edinburgh<br />

University Centre for Speech Technology Research before<br />

joining ATR in 1990. His research interests are based on<br />

large speech databases, and include nonverbal speech<br />

processing, concatenative speech synthesis, and prosodic<br />

information modelling. He spends his spare time working<br />

with postgraduate students as Visiting Professor at the<br />

School of Information Science, Nara Institute of Science<br />

and Technology (NAIST), Nara, Japan, and was also<br />

Visiting Professor at Kobe University, Kobe, Japan for<br />

10 years.<br />

Prof. Nick Campbell’s background is in experimental<br />

psychology and linguistics, but most of his experience<br />

is in speech technology. Prof. Campbell is an advocate<br />

of corpus-based approaches and he has pioneered<br />

advanced (and paradigm-shifting) methods of speech<br />

synthesis and natural conversational speech collection<br />

in a multimodal environment. His principal interest is<br />

in speech prosody, extending this research to social<br />

interaction to show how the voice is used in discourse<br />

to express personal relations as well as propositional<br />

content. Most of his previous work has used speech<br />

materials collected in Japan and, through his move to<br />

Ireland, he can confirm the universality of his previous<br />

findings – both for Irish and for Hiberno-English.<br />

Ultimately, Prof. Campbell is working to produce a<br />

friendlier speech-based human-machine interface for<br />

web-based information, customer-services, games,<br />

and robotics, while trying to understand how humans<br />

perform such often perfect communication.<br />

Career Highlights<br />

} 2010-2015: Science Foundation Ireland Principal<br />

Investigator, FastNet Summary Focus on Actions in<br />

Social Talk; Network Enabling Technology (€1.23M)<br />

} Oct. 2011 – Present: Member, Spoken Language<br />

Technical Committee, IEEE Signal Processing Society<br />

} Feb. 2011: Vice President, European Language<br />

Resources Association<br />

} Nov. 2010 – Present: Board Member, European<br />

Language Resources Association (ELRA)<br />

} 2009 – Present: Board member, International<br />

Speech Communication Association<br />

} 2005 – Present: Board member, Japan British<br />

Association of the Kansai<br />

} Member, International Phonetic Association<br />

} Member, Coordinating Committee on Speech<br />

I/O Database Assessment<br />

} Member, International Committee of Acoustic<br />

Society of Japan<br />

} Member, International Speech Communication<br />

Association Institute of Acoustics (adherent) U.K.<br />

} Member, Acoustic Society of America<br />

} Member, Acoustic Society of Japan<br />

} Member, IEEE Signal Processing Society


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 13<br />

since 1990 and has researched different approaches<br />

to Example Based Machine Translation (EBMT) which<br />

contributed to the work now carried out by The Rosetta<br />

Foundation, using translation tools and technologies<br />

for the provision of translation and localisation services,<br />

supported by volunteer translators, project managers and<br />

engineers.<br />

Career Highlights<br />

Track Leader: Next Generation Localisation:<br />

Mr. Reinhard Schäler<br />

Department: Department of Computer<br />

Science and Information Systems<br />

University: University of Limerick<br />

Brief Biography<br />

Reinhard Schäler has been involved in the localisation<br />

industry in a variety of roles since 1987. He is the founder<br />

and editor of Localisation Focus – The International<br />

Journal of Localisation, a founding editor of the Journal<br />

of Specialised Translation (JosTrans), a former member of<br />

the editorial board of Multilingual Computing (October<br />

1997 to January 2007, covering 70 issues), a founder<br />

and CEO of The Institute of Localisation Professionals<br />

(TILP), and a member of OASIS. He has attracted more<br />

than €5.5M in research funding and has published more<br />

than 50 articles, book chapters and conference papers<br />

on language technologies and localisation. He has been<br />

an invited speaker at EU and international governmentorganised<br />

conferences in Africa, the Middle East, South<br />

America and Asia. In 2009, he founded The Rosetta<br />

Foundation, a non-profit organisation and charity aiming<br />

to make knowledge available in every language. He is<br />

a lecturer at the Department of Computer Science and<br />

Information Systems (CSIS), University of Limerick (UL),<br />

and the founder and director of the Localisation Research<br />

Centre (LRC) at UL, established in 1995.<br />

Research Interests<br />

Schäler’s main research area is the automation of<br />

localisation workflows and the application of tools<br />

and technologies to the localisation of digital content,<br />

including translation, engineering and testing. He has<br />

been researching approaches to Machine Translation<br />

(MT) and Computer Assisted Translation (CAT) systems<br />

} Establishment of the Localisation Research Centre<br />

(LRC), 1995, £250K.<br />

} Establishment of the Grad. Dip./M.Sc. in Software<br />

Localisation at University of Limerick in 1997.<br />

} EU-funded IGNITE project on Linguistic Infrastructure<br />

for Localisation: Language Data, Tools and Standards,<br />

together with four European industrial partners, total<br />

budget: €3.5M, 2005-2007.<br />

} Invited keynotes: Localisation and<br />

Internationalisation of Software for Export,<br />

Florianópolis, Brazil (November 2004);<br />

Manufacturers’ Association for Information<br />

Technology (MAIT), New Delhi, India (December<br />

2004); The First International Conference on Persian<br />

Script & Language Localisation, Supreme Council of<br />

ICT and Iran Telecom Research Centre, Tehran, Iran<br />

(May 2005); The IEEE Professional Communication<br />

Society, International Professional Communication<br />

Conference, Limerick, Ireland (July 2005); LISA<br />

Forum Cairo, The Localisation Industry Standards<br />

Association, Cairo, Egypt (December 2005);<br />

Multilingual Web, Madrid, Spain (October 2010).<br />

} Establishment of The Rosetta Foundation in the<br />

summer of 2009, a not-for-profit organisation<br />

(charity) promoting equality via language and<br />

cultural diversity through access to digital knowledge<br />

and information independent of language.<br />

} Establishment of the Dynamic Coalition for a Global<br />

Localisation Platform: Localisation4all, under the<br />

umbrella of the United Nations Internet Governance<br />

Forum (IGF) in 2009.


14<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Research Interests<br />

Dr. Luz’s research focuses on the theoretical bases of<br />

computer-supported collaboration, more specifically<br />

processes related to information structuring and<br />

retrieval, in scenarios encompassing multimedia data<br />

and multimodal interaction. He is also interested in<br />

natural language parsing, text classification, and dialogue<br />

systems, particularly human-factors research.<br />

Career Highlights<br />

Track Leader: Systems Framework:<br />

Dr. Saturnino Luz<br />

Department: School of Computer Science and Statistics<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Dr. Saturnino Luz has worked on the development of<br />

novel technologies for human-computer interfaces in the<br />

areas of computer-supported cooperative work, spoken<br />

language systems, natural language processing, dialogue<br />

management, and design support tools for multimodal<br />

systems. He has been a Lecturer in Computer Science<br />

at Trinity College since 2001, where he supervises PhD<br />

and M.Sc. students in the areas of natural language<br />

processing, computer supported cooperative work,<br />

human-computer interaction and machine learning.<br />

Dr. Luz has participated in a number of Irish- and<br />

EU-funded research projects, working on computing<br />

support for connected communities, dialogue systems<br />

engineering, technology for medical team meetings, as<br />

well as various topics in machine learning. He has served<br />

on the programme committees of several international<br />

conferences and the editorial boards of international<br />

journals. He has been a member of the Association for<br />

Computing Machinery (ACM) since 1994 and contributes<br />

regularly to the ACM Computing Reviews.<br />

} Acted as Principal Investigator ECOMMET<br />

project on Enhanced Computing Support for<br />

Multidisciplinary Medical Team Meetings, funded<br />

by Science Foundation Ireland.<br />

} Principal Investigator of a Basic Research project<br />

on content indexing for multimedia meeting<br />

recordings, funded by Enterprise Ireland.<br />

} Review selected as a Computing Review highlight;<br />

featured as profiled reviewer in acknowledgement<br />

of his contributions to that publication (2004).<br />

} Invited talks at the University of Ulster (2002),<br />

at the German Research Centre for Artificial<br />

Intelligence (2003), at the University of South Africa<br />

(2004), at the Seminar on New Trends in Corpus<br />

Linguistics for Language Teaching and Translation<br />

Studies (Granada, Spain, 2008), and at KTH<br />

(Stockholm, Sweden, 2010).<br />

} Chaired the programme committee of the Irish<br />

Human-Computer Interaction Conference (2009)<br />

and co-chaired the Special Track on Supporting<br />

Collaboration among Healthcare Workers at the<br />

IEEE International Symposium on Computer-Based<br />

Medical Systems (2008-2010).<br />

} Served as member of the Editorial Board of<br />

Information from 2000 to 2003.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 15<br />

Research Interests<br />

Education and Outreach Manager:<br />

Ms. Cara Greene<br />

Department: School of Computing<br />

University: Dublin City University<br />

Greene has a B.Sc. in Applied Computational Linguistics<br />

from Dublin City University. She then became a learning<br />

support and resource teacher before returning to DCU<br />

to undertake a PhD in Information Communication<br />

Technology (ICT). She is currently writing up her PhD<br />

thesis part-time on integrating ICT into the secondary<br />

school curriculum. Grene’s PhD thesis investigates<br />

whether integrating ICT into the curriculum can produce<br />

inclusive curricula that cater to the needs of all students<br />

(with and without learning difficulties). Post-PhD, Cara<br />

wants to carry out research on the impact of education<br />

programmes provided by large research centres on the<br />

numbers of students taking up these subjects at third<br />

level.<br />

Brief Biography<br />

Cara Greene is Education and Outreach (E&O) Manager<br />

in the Centre for Next Generation Localisation (<strong>CNGL</strong>).<br />

The Education and Outreach Programme is split into<br />

two areas: Education and Outreach. The Education<br />

Programme aims to provide educational and training<br />

opportunities at all levels of education in key areas in the<br />

localisation industry. These range from primary school<br />

courses to localisation professional training courses. It<br />

also provides professional development and research<br />

support to <strong>CNGL</strong> students and staff as well as others<br />

in the localisation industry. The Outreach Programme<br />

encompasses developing public-facing projects, hosting<br />

conferences and industry events, and promoting <strong>CNGL</strong><br />

research in the media.<br />

Career Highlights<br />

} Nominated for the DCU President’s Award for Civic<br />

Engagement 2010.<br />

} Member of the Third Level Education and Outreach<br />

(TREO) Communications and Evaluation working<br />

groups.<br />

} Research paper selected to be presented at the<br />

Young Researchers Consortium at ICCHP 2006.<br />

} Awarded the DCU Chancellor’s Medal at Graduation<br />

2002.


<strong>CNGL</strong> Overview


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 17<br />

<strong>CNGL</strong> Overview<br />

Localisation: Global Challenges<br />

and Opportunities<br />

Localisation is the industrial process of adapting (digital)<br />

content to culture, locale and linguistic environment,<br />

at high quality, speed and low cost. It is a key enabling<br />

multiplier technology of the global manufacturing,<br />

software, services and content distribution industries,<br />

unlocking markets otherwise unavailable. Importantly,<br />

the true potential of localisation goes well beyond<br />

opening up business opportunities across the globe:<br />

many communities find themselves on the wrong side<br />

of the “digital divide” with vital information (hygiene,<br />

health, food, education etc.) not available in the local<br />

languages, with potentially disastrous consequences.<br />

Localisation technologies and processes can make a<br />

considerable contribution to bridging this divide. The<br />

<strong>CNGL</strong> partnership and research focus on both the<br />

commercial and the societal dimensions of localisation.<br />

as speech and image, as well as corporate and usergenerated<br />

content. Personalisation: while traditional<br />

localisation is coarse-grained (focusing on a geographic<br />

locale and language: e.g. the Middle East), information<br />

is most useful if adapted to the user, device, background<br />

information/knowledge and task at hand. In terms of a<br />

slogan, “the person is the ultimate locale”.<br />

The three axes Volume, Access and Personalisation<br />

define the “Localisation Cube” (Figure 1). The <strong>CNGL</strong><br />

mission (derived from its vision) is to develop processes<br />

and technologies that can address each point in the cube<br />

at configurable quality and speed.<br />

Figure 1. The Localisation Cube (and traditional<br />

Enterprise Localisation technologies)<br />

The Centre for Next Generation Localisation (<strong>CNGL</strong>,<br />

2007-<strong>2012</strong>) is an Industry-Academia partnership funded<br />

jointly by Science Foundation Ireland (SFI) and industry<br />

partners. The university partners are DCU (Dublin City<br />

University, lead institution), TCD (Trinity College Dublin),<br />

UCD (University College Dublin) and UL (University of<br />

Limerick). Industry partners include Microsoft Ireland,<br />

Symantec Ireland, Dai Nippon Printing (Japan), SDL,<br />

Translations.com (Alchemy), CAPITA (Applied Language<br />

Solutions), Welocalize, VistaTEC and SpeechStorm,<br />

assembling some of the world-leading software,<br />

publishing and localisation companies in the <strong>CNGL</strong><br />

partnership.<br />

The <strong>CNGL</strong> vision is to enable people to interact with<br />

content, products, services and each other, in their own<br />

language, culture, context and according to their own<br />

personal needs.<br />

To realise this vision, the <strong>CNGL</strong> research programme<br />

concentrates on the challenges of Volume, Access<br />

and Personalisation. Volume: the amount of content<br />

is growing dramatically and massively outstrips human<br />

translation capacity. Access: while traditional localisation<br />

is text, print and (full) screen/keyboard based, mobile<br />

devices enable ubiquitous access to information on<br />

the go, involving additional interaction modalities such<br />

Traditional enterprise localisation technologies tend to<br />

focus on large and well-managed localisation workflows,<br />

with predictable corporate content, targeting the lower,<br />

front, right-most part of the localisation cube (Figure<br />

1), with large parts of the Localisation Cube remaining<br />

unaddressed.<br />

Next Generation Localisation, by contrast, is based on a<br />

set of flexible and adaptive technologies and processes<br />

that allow us to address each point in the Localisation<br />

Cube, at configurable quality and speed. The <strong>CNGL</strong><br />

research programme concentrates on three focal points<br />

in the Cube (Figure 2): 1<br />

1 Note that volume here refers to a single localisation request: while<br />

traditional bulk or enterprise localisation projects may involve the<br />

translation of millions of words into many languages, a single customer<br />

care interaction may only involve a few hundred words and one or two<br />

languages. However, the total effect is, of course, cumulative: millions of<br />

customer care interactions will generate very large total volumes.


18<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

Figure 2: <strong>CNGL</strong> Focal Points in the Localisation Cube<br />

technologies in terms of the flexible and adaptive <strong>CNGL</strong><br />

Components Framework (rather than a single monolithic<br />

one-size-fits-all system) have served us well: it has guided<br />

the <strong>CNGL</strong> research programme and has allowed us to<br />

anticipate and respond flexibly to many of the recent<br />

challenges and opportunities in the localisation space,<br />

including the:<br />

} massive increase in multilingual user-generated (UGC)<br />

content (in addition to professionally edited corporate<br />

content) from user forums and social networking sites<br />

} growing importance of UGC in localisation and<br />

community-based customer support models<br />

The Bulk Localisation Workflow (BLW) scenario targets<br />

large volume localisation tasks with and without human<br />

pre- and post-editing, familiar from large localisation<br />

projects. The focus is on both corporate and NGO<br />

content, automation (translation technologies, in<br />

particular machine translation), the optimal integration<br />

of novel social and collaborative localisation models<br />

(including crowd-sourcing), supported by open standards<br />

and a flexible, open and web-services-based localisation<br />

platform that supports a wide range of workflows<br />

(supporting standard corporate as well as novel<br />

collaborative workflows).<br />

The Personalised Multilingual Customer Care (PMCC)<br />

scenario focuses on supporting global customers<br />

interacting with on-line and perishable corporate and<br />

user-generated multilingual content (e.g. product blogs),<br />

providing for frequent content updates, multi-modal<br />

access (speech and image, in addition to the more<br />

traditional text-based modalities) and increased levels of<br />

personalisation in real time interactions, without (or with<br />

minimal) human pre- and post-processing interventions.<br />

} emergence and impact of novel social and communitybased<br />

localisation in both for-profit and not-for-profit<br />

localisation operations<br />

} increasing number of non-governmental organisations<br />

(NGOs) world-wide targeting the global “digital<br />

divide” striving to provide access to information in the<br />

local language as a basic human right<br />

} increasing number of SMEs (rather than just<br />

Multinationals) targeting global markets with<br />

localisation needs markedly different from those<br />

of the Multinationals<br />

In particular, in <strong>CNGL</strong> project Year 5 (<strong>2012</strong>) we focus on<br />

two related themes, representing the key commercial<br />

and social dimensions of <strong>CNGL</strong> research: (i) Supporting<br />

the Global Customer and (ii) Promoting the Multilingual<br />

Society.<br />

Supporting the Global Customer and Promoting<br />

the Multilingual Society<br />

The Personalised Multilingual Social Networking (PMSN)<br />

scenario focuses fully on user-generated (UGC, in<br />

contrast to corporate) and highly perishable content<br />

prevalent on social networking and messaging sites, with<br />

high levels of personalisation and full use of all access<br />

modalities, developing <strong>CNGL</strong> technologies to monitor<br />

and manage information for customer support and to link<br />

social networking activities across linguistic barriers.<br />

This conceptualisation (the Localisation Cube), the<br />

factoring of challenges and opportunities into three<br />

dimensions (Volume, Access and Personalisation) and<br />

the implementation of the Next Generation Localisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 19<br />

Figure 3: Organisation of the <strong>CNGL</strong> Research Programme<br />

Addressing the Challenges and Making the<br />

Most of the Opportunities: Charting the<br />

<strong>CNGL</strong> Research Map<br />

The <strong>CNGL</strong> mission is to develop flexible and adaptive<br />

next-generation localisation technologies and processes<br />

that allow us to address any point in the space defined<br />

by the Localisation Cube (Figure 1), at configurable<br />

quality and speed, realising the <strong>CNGL</strong> vision to enable<br />

people to interact with content, products, services and<br />

other people in their own language, according to their<br />

own culture, and according to their own personal needs.<br />

This mission directly determines the structure of the core<br />

<strong>CNGL</strong> research programme (Figure 3):<br />

The <strong>CNGL</strong> research programme intertwines four major<br />

research tracks (as well as a demonstrator programme):<br />

two of the tracks, Integrated Language Technologies (ILT)<br />

and Digital Content Management (DCM) are basic research<br />

tracks, and the remaining two, Next Generalisation<br />

Localisation (LOC) and Systems Framework (SF) are<br />

more applied, integrating research tracks.<br />

LOC: technological advances from ILT and DCM<br />

need to be integrated into workflows and blue-prints<br />

of Next Generation Localisation. LOC researches the<br />

life-cycle of digital content, including content design<br />

and development, standards; evaluates sophisticated<br />

language and content management technologies for<br />

integration into novel collaborative, community-driven<br />

and social localisation models; and provides technology<br />

support for such models in terms of an open modular,<br />

component and web services-based architecture, based<br />

on the SOLAS technology platform.<br />

SF: SF research focuses on underexplored software<br />

engineering aspects of complex multilingual digital<br />

content management, including requirements analysis,<br />

user interface design, the development of WebWOZ,<br />

a web-based Wizard-of-Oz technology platform, rapid<br />

prototyping systems, semantic interoperability, adaptive<br />

workflows, and web-based service architectures. SF<br />

coordinates the development of an evolution of <strong>CNGL</strong><br />

demonstrator systems.<br />

ILT: ILT research focuses on Machine Translation (MT),<br />

Speech Technology and Text Analytics to provide the<br />

support technologies for translation and interaction<br />

automation across language and modality (text and<br />

speech) barriers, based on the MaTrEX MT and MUSE<br />

Speech Technology platforms.<br />

DCM: DCM research focuses on combining Adaptive<br />

Hypermedia (AH) with Cross-Lingual and Multimodal<br />

(Text, Image and Speech) Information Retrieval (IR)<br />

technologies to find, dice and slice and recompose<br />

content to support the <strong>CNGL</strong> information access and<br />

personalisation agenda in a multilingual setting, based<br />

on the Adaptive Engine technology platform.<br />

<strong>CNGL</strong> Demonstrator Systems<br />

Demonstrator systems are a core part of <strong>CNGL</strong> research.<br />

The demonstrators provide focal points for project<br />

cohesion and collaboration, combining technologies<br />

and teams from across <strong>CNGL</strong> research tracks and<br />

academic and industry partners. The demonstrators are<br />

an essential component in overall project evaluation and<br />

contribute platforms for research and experimentation<br />

across all <strong>CNGL</strong>. They showcase <strong>CNGL</strong> technologies to<br />

the outside world and ground <strong>CNGL</strong> research outputs<br />

in commercial as well as non-profit societal applications.


20<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

During <strong>CNGL</strong> project Years 1 to 3, the demonstrator<br />

systems focused on the three core use scenarios in<br />

the space defined by the Localisation Cube (Figure<br />

2): the Bulk Localisation Workflow (BLW) scenario,<br />

the Personalised Multilingual Customer Care (PMCC)<br />

scenario and the Personalised Multilingual Social<br />

Networking (PMSN) scenario. Based on this work, during<br />

project Year 4 (2011), the demonstrators showcased a<br />

broad industry story line around the “Supporting the<br />

Global Customer” theme, while for <strong>CNGL</strong> project Year 5<br />

(<strong>2012</strong>) the focus was on advancing and showcasing those<br />

demonstrators with the most commercial and industry<br />

impact and those demonstrators showing promising<br />

research directions for the future.<br />

<strong>CNGL</strong> Outreach and Technology-Transfer<br />

Activities<br />

Technology transfer is a key <strong>CNGL</strong> objective to convert<br />

research outputs into economic and social impact.<br />

<strong>CNGL</strong> carefully manages IP in close collaboration with<br />

the researchers and industry partners and fosters an<br />

entrepreneurial spirit within the <strong>CNGL</strong> researcher<br />

community.<br />

Fostering interest in science and technology, in particular<br />

information technology, in education (within and outside<br />

<strong>CNGL</strong>) and the public in general is a further key objective<br />

for <strong>CNGL</strong>. We offer a wide range of activities, including<br />

projects for first, second, third and fourth level education;<br />

professional development and communication within<br />

<strong>CNGL</strong>; and communication and dissemination in relevant<br />

professional research and industry sectors as well as the<br />

public in general.<br />

Changes and Developments in the <strong>CNGL</strong><br />

Consortium<br />

<strong>CNGL</strong> operates in a dynamic and fast-changing<br />

environment, both in our research and business sectors,<br />

in particular in the localisation space: <strong>2012</strong> saw a strongly<br />

increased focus on commercialisation of <strong>CNGL</strong> research<br />

expertise, in particular in the form of the growth and<br />

traction of <strong>CNGL</strong> spin-out and start-up companies and<br />

not-for-profit organisations:<br />

ILT technologies underpin three start-up companies:<br />

} Xcelerator Machine Translations, through its<br />

KantanMT product (www.kantanmt.com), operates<br />

in the space of cloud-based and scalable provision of<br />

personalised and adaptive MT services that are easy<br />

to configure, manage and operate<br />

} Scream Technologies (www.screamtechnologies.com)<br />

specialises in creating synthetic voices from human<br />

actors, enabling the end user to create humansounding<br />

synthetic speech and control how it sounds.<br />

Scream’s product enables enterprise customers to<br />

find a voice that represents them, and then to use<br />

that voice for all announcements, interactive voice<br />

response, telephone, or advertising without ever<br />

needing to return to a recording studio<br />

} Digital Linguistics (www.digitallinguistics.com) uses<br />

machine learning based text classification technologies<br />

for quality assurance (QA) for localisation projects<br />

DCM technologies underpin two start-up companies:<br />

} Emizar (www.emizar.com) focuses on customer care<br />

applications based on adaptive and personalised<br />

dicing, slicing and recomposing digital content<br />

} Wripl (www.wripl.com) offers Personalisation-as-a-<br />

Service across websites, improving a user’s experience<br />

as they browse across multiple different CMS systems<br />

to solve a particular task. Wripl is spinout preparation<br />

mode at present.<br />

LOC technologies underpin:<br />

The<br />

R SETTA<br />

Foundation<br />

} The Rosetta Foundation (www.therosettafoundation.<br />

org), a not-for-profit organisation that provides<br />

localisation services to NGOs and social causes based<br />

on novel, community-based localisation models (to<br />

date involving 2,600+ volunteers), supported by the<br />

<strong>CNGL</strong> SOLAS technology platform.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 21<br />

These developments (including the 2,600 volunteers<br />

engaging with The Rosetta Foundation and the €1.25M<br />

in venture capital raised by Xcelerator and Scream<br />

Technologies) clearly show the social and economic<br />

relevance of the <strong>CNGL</strong> research programme.<br />

Commercialisation activities are strongly underpinned<br />

by SFI and Enterprise Ireland funded activities<br />

(including 6 Technology Innovation Development<br />

Award grants, 5 Enterprise Ireland Feasibility Awards,<br />

3 Commercialisation Fund Awards, and 2 Innovation<br />

Partnerships) taking research all the way from the labs<br />

into the market.<br />

On the academic and research side, <strong>CNGL</strong> extends a<br />

strong welcome to Prof. Qun Liu, formerly Director of<br />

the Natural Language Processing Lab of the Chinese<br />

Academy of Sciences in Beijing, as the new Professor<br />

of Machine Translation in DCU. Prof. Liu’s expertise<br />

in cutting-edge machine translation and language<br />

technology research and his international standing in<br />

the research community make a key contribution to<br />

<strong>CNGL</strong> and substantially strengthen <strong>CNGL</strong>’s expertise in<br />

multilingual technologies.<br />

Research Highlights <strong>2012</strong><br />

Due to limits in space, unfortunately, below we can only<br />

provide a sneak preview of a few selected highlights. For<br />

full details please consult the subsequent sections in the<br />

<strong>2012</strong> <strong>CNGL</strong> <strong>Annual</strong> <strong>Report</strong>.<br />

Research Outputs <strong>2012</strong><br />

Research performance and output in <strong>2012</strong> has been<br />

strong: <strong>CNGL</strong> has again substantially outperformed its<br />

research KPI targets (Table 1) with 92 conference and<br />

26 journal, book and book chapter publications, a total<br />

of 118 against a cumulative target of 62 for the reporting<br />

period. Since 2007, <strong>CNGL</strong> has published a total of 411<br />

research publications, against a target of 291 (Table 2),<br />

outperforming overall targets by a factor of 1.5.<br />

Table 1: <strong>CNGL</strong> <strong>2012</strong> Research KPIs against Targets<br />

<strong>CNGL</strong> Research Outputs <strong>2012</strong> Actuals Targets<br />

Journal papers, book chapters<br />

and books<br />

26 12<br />

Conference publications 92 50<br />

Conferences/workshops hosted 17 8<br />

Table 2: <strong>CNGL</strong> 2007–<strong>2012</strong> Cumulative Research KPIs<br />

against Targets<br />

<strong>CNGL</strong> Research Outputs<br />

2007-<strong>2012</strong><br />

Journal papers, book chapters<br />

and books<br />

Actuals<br />

Targets<br />

63 43<br />

Conference publications 348 237<br />

Conferences/workshops hosted 58 39<br />

ILT: highlights include best paper awards (Vogel and<br />

Mamani Sánchez, <strong>2012</strong> and Emms and Franco-Penya,<br />

<strong>2012</strong>), winning the SANCL-<strong>2012</strong> Web Parsing challenge<br />

organised by Google at NAACL-HLT <strong>2012</strong> (Le Roux,<br />

Foster, Wagner, Kaljahi and Bryl, <strong>2012</strong>), strong speech<br />

technology publications with 6 journal papers, 2 book<br />

chapters and 5 conference papers at ICASSP and<br />

Interspeech <strong>2012</strong>, the strong presence of <strong>CNGL</strong> at<br />

COLING <strong>2012</strong>, Mumbai, India with a total of 15 full, short<br />

and workshop MT and Text Analytics papers, and the<br />

award to host COLING 2014 in Dublin to <strong>CNGL</strong> partner<br />

DCU with <strong>CNGL</strong> Director Prof. Josef van Genabith as<br />

General Chair. ILT researchers have worked in close<br />

cooperation with <strong>CNGL</strong> industry partners and startup<br />

companies VistaTEC and Digital Linguistics in text<br />

classification for MT quality assessment, Symantec<br />

in tuning MT to User-Generated Content, and with<br />

Xcelerator and Welocalize on integrating MT and TM<br />

technologies. ILT researchers are involved in 2 new EU<br />

FP7 MT projects (QTLaunchPad and the EXPERT Marie<br />

Curie PhD Graduate School) and lead (Dr. Antonio Toral)<br />

the Abu-MaTran FP7 Academia-Industry partnership<br />

project.


22<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

DCM: highlights include the publication of over 30<br />

peer-reviewed papers in international journals (e.g. ACM<br />

CSUR, UMUAI and Journal IR) and at major international<br />

conferences (ACM Hypertext, SIGIR <strong>2012</strong>, CIKM <strong>2012</strong>,<br />

COLING 2013, AAAI <strong>2012</strong>, ACL <strong>2012</strong>, and TPDL 2013).<br />

DCM research has made significant advances in the<br />

personalisation and dynamic aggregation of usergenerated<br />

content, corporate content, and open content<br />

harvested from the open web. This has led to industry<br />

trials in the application areas of Personalised Multilingual<br />

Customer Care and Personalisation as a Service.<br />

Research has progressed on structural content analysis<br />

for web slicing, and the MOODfinger framework for<br />

affective news retrieval has been further developed. DCM<br />

researchers (Prof. Owen Conlan and Prof. Vincent Wade)<br />

won two SFI TIDA grants for research in personalisation,<br />

which have led to the planning of spinout companies<br />

Emizar and Wripl. A third TIDA grant – for work in<br />

automated slicing of content for reuse and repurposing<br />

– has been secured (Prof. Vincent Wade) and work will<br />

commence in early 2013. Trials and evaluations of DCM<br />

technology, including the Personalised Multilingual<br />

Customer Care portal and the Personalised Multilingual<br />

Information Retrieval demonstrator, were conducted in<br />

collaboration with Microsoft and Symantec. TCD (Prof.<br />

Vincent Wade) also established the Enterprise Ireland<br />

Technology Centre for Technology Enhanced Learning<br />

(Learnovate Centre) which is allied to <strong>CNGL</strong>.<br />

LOC: highlights include the continued development of<br />

a flexible, open-source, open-standards-, componentsand<br />

web-services-based platform (SOLAS) supporting<br />

standard but also innovative social, collaborative and<br />

distributed localisation workflows. SOLAS consists of two<br />

main strands: SOLAS Match and SOLAS Productivity.<br />

SOLAS Productivity makes use of a standardised data<br />

container, open web service APIs, and a common<br />

orchestration and process management module, which<br />

connect to any number of component technologies<br />

developed by academic and industrial partners within<br />

<strong>CNGL</strong> as well as with third party technologies and tools.<br />

SOLAS Match provides ground-breaking and intuitive<br />

technology that allows for the seamless and user-friendly<br />

matching of community translation tasks with volunteer<br />

translators. The close collaboration between LOC and<br />

the Rosetta Foundation makes <strong>CNGL</strong> technologies<br />

directly available to social localisation operations and,<br />

in return, tests <strong>CNGL</strong> technologies with currently 2,600+<br />

volunteers.<br />

SF: highlights include strong progress in human<br />

factor and interaction design research, substantial<br />

contributions to standardisation (ITS (W3C) and XLIFF<br />

(OASIS)) and interoperability for systems services<br />

architecture research. Doherty, Karamanis and Luz<br />

(<strong>2012</strong>) investigates the impact of work contexts on the<br />

use of MT in localisation operations. The <strong>CNGL</strong> Wizard<br />

of Oz platform has been made open source and is<br />

available online (www.webwoz.com). A Linked Open<br />

Data approach has been used for end-to-end content<br />

management and localisation integration (Lewis et al.,<br />

<strong>2012</strong>) involving SOLAS and the MaTrEx <strong>CNGL</strong> platform<br />

technologies, provenance tracking and visualisation,<br />

in close collaboration with <strong>CNGL</strong> partners Microsoft<br />

and VistaTEC. Substantial progress has been achieved<br />

in instrumenting CAT tools to capture post-editing of<br />

MT outputs as well as in the visualisation of online<br />

community analytics, closely collaborating with <strong>CNGL</strong><br />

partners Welocalize and Symantec.<br />

Commercialisation<br />

Translating research outputs into economic and social<br />

impact is a key objective for <strong>CNGL</strong>: Table 3 shows a<br />

total of 10 invention and software disclosures, 1 patent<br />

application and 1 spin-out company (against targets<br />

of 20, 4 and 2, respectively) for <strong>2012</strong>. <strong>CNGL</strong> engages<br />

strongly in spin-out and start-up companies as well<br />

as in not-for-profit social operations. The Rosetta<br />

Foundation (www.therosettafoundation.org) focuses<br />

on localisation support for NGOs (and other not-for-profit<br />

organisations) using a novel social and collaborative<br />

localisation platform. Emizar (www.emizar.com) focuses<br />

on digital content and personalisation technologies for<br />

customer support. Xcelerator Machine Translations,<br />

through its KantanMT product (www.kantanmt.com),<br />

provides Cloud-based MT technologies automatically<br />

producing highly scalable custom MT engines by<br />

uploading data resources, requiring minimal technical<br />

expertise on the part of the client. Digital Linguistics<br />

(www.digitallinguistics.com) uses stylometrics and<br />

text classification technologies developed in ILT for<br />

translation quality review. Scream Technologies (www.<br />

screamtechnologies.com) offers custom text-to-speech<br />

systems based on ILT technologies. Additionally, spinout<br />

candidate Wripl (www.wripl.com) offers personalisationas-a-service<br />

across websites, drawing on DCM research.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 23<br />

Table 3: <strong>2012</strong> IP KPIs against Targets<br />

<strong>CNGL</strong> KPIs <strong>2012</strong> Actuals Targets<br />

Patent applications 1 4<br />

Invention and software<br />

disclosures<br />

10 20<br />

Spin-outs 1 2<br />

Outreach<br />

The <strong>CNGL</strong> Education and Outreach Programme<br />

concentrates on first, second, third and fourth level<br />

education, and outreach to industry and the general<br />

public.<br />

<strong>CNGL</strong> founded the All Irish Linguistics Olympiad<br />

(AILO) in 2009 and has organised the competition since<br />

then. In <strong>2012</strong>, more than 400 secondary level students<br />

participated in the national competitions and the top<br />

four individual students represented Ireland at the<br />

International Linguistics Olympiad (ILO) <strong>2012</strong> in Slovenia.<br />

Promoting African languages in the Information<br />

Society, the University of Limerick’s MSc in Multilingual<br />

Computing and Localisation will be delivered through<br />

distance learning and co-hosted by the United Nations<br />

Economic Commission for Africa at its Information<br />

Training Centre for Africa in Addis Ababa, Ethiopia.<br />

Growing the <strong>CNGL</strong> Research Eco-System<br />

<strong>2012</strong><br />

<strong>CNGL</strong> has been highly successful in attracting<br />

competitive research funding nationally and<br />

internationally, rapidly developing a research eco-system<br />

clustering around core <strong>CNGL</strong> based on a large number<br />

of affiliated EU projects (under the FP7 programme),<br />

SFI-funded programmes, <strong>CNGL</strong> business-development<br />

activities funded through Enterprise Ireland programmes<br />

or direct contract research co-operations. Major currently<br />

active projects are listed in Table 4. These provide further<br />

evidence of the rapid development of the international<br />

research standing and recognition of <strong>CNGL</strong>, as well as<br />

of the relevance and commercialisation potential of the<br />

<strong>CNGL</strong> research programme.<br />

Planning for the Future<br />

With the end of project Year 5 in <strong>2012</strong>, <strong>CNGL</strong> has<br />

now completed its original funding cycle (2007-<strong>2012</strong>),<br />

and is completing a number of key on-going research,<br />

commercialisation and outreach projects in a non-costed<br />

extension in 2013. <strong>CNGL</strong> has been a resounding success<br />

generating (to date) more than 400 peer-reviewed<br />

publications, 21 PhD theses, 39 invention and software<br />

disclosures, 9 patent applications, 4 commercial spin-out<br />

and start-up companies, 1 not-for-profit spin-out, strong<br />

industry-academia partnerships and a total of €15.8m<br />

of additional competitive research, development and<br />

commercialisation funding growing the <strong>CNGL</strong> Research<br />

Eco-System.<br />

At the same time, <strong>CNGL</strong> has been successful in winning<br />

further substantial competitive funding from Science<br />

Foundation Ireland for initially 30 months to continue<br />

<strong>CNGL</strong> into the future with a core grant of €10.5M<br />

(<strong>CNGL</strong>II: March 2013 – September 2016). “<strong>CNGL</strong>II” is<br />

based on an evolution of <strong>CNGL</strong>, expanding its remit<br />

from localisation to a broader focus on Digital Content<br />

Management in a Global Intelligent Content setting<br />

based on the concept of a Global Content Value<br />

Chain, where services interact with content to make<br />

it self-describing, self-aware and self-adapting across<br />

language barriers, modalities and interaction platforms,<br />

tuned to context and user. The <strong>CNGL</strong>II application was<br />

successfully led by Prof. Vincent Wade (TCD), who will<br />

take over as Director of <strong>CNGL</strong> on 1 March 2013.


24<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

Table 4: <strong>CNGL</strong> Research Eco-System: Income Received from Active Affiliated Research Projects <strong>2012</strong><br />

Project Funding Body €<br />

contribution<br />

(to <strong>CNGL</strong><br />

partner)<br />

QT Launch Pad EC – FP7 477,960<br />

LT-Web – Language Technology in the Web EC – FP7 396,391<br />

CENDARI (Collaborative European Digital Archive<br />

Infrastructure)<br />

EXPERT (EXPloiting Empirical appRoaches to<br />

Translation)<br />

Abu-MaTran (Automatic building of Machine<br />

Translation)<br />

IRCSET Data Mining for Industrial Apps –<br />

PhD Sponsorship<br />

EC – FP7 120,000<br />

EC – FP7 – Marie Currie 481,000<br />

EC – FP7 – Marie Currie 365,966<br />

Phorest 24,000<br />

IRCSET Data Mining for Industrial Apps –<br />

PhD Sponsorship<br />

Irish Research Council for Science,<br />

Engineering and Technology (IRCSET)<br />

48,000<br />

Learning Technology Centre Enterprise Ireland (EI) 3,000,000<br />

EI Commercialisation with Xcelerator Machine<br />

Translations<br />

EI Feasibility Grant – Adaptive Solutions for Patent<br />

Translation<br />

Enterprise Ireland (EI) 152,000<br />

Enterprise Ireland (EI) 15,000<br />

EI Innovation Voucher with Cipherion Translations Enterprise Ireland (EI) 5,000<br />

EI Innovation Voucher with IntelImpact Enterprise Ireland (EI) 5,000<br />

EI Feasibility Study Critical Data Auditor Feasibility<br />

Study<br />

Enterprise Ireland (EI) 8,827<br />

EI Innovation Voucher with FFiG Enterprise Ireland (EI) 5,000<br />

EI Feasibility Grant – Wripl Enterprise Ireland (EI) 15,000<br />

EI Innovation Partnership Programme with Pixalert –<br />

Crital Data Auditor<br />

Enterprise Ireland (EI) 40,400<br />

EI Commercialisation Fund Ata-Bot Enterprise Ireland (EI) 244,381<br />

PoliMon4Cloud Technology Innovation Development Award (TIDA) 76,384<br />

Integrated Software Suite to provide Next Generation<br />

Personalised Multilingual Customer Care<br />

Technology Innovation Development Award (TIDA) 67,748<br />

MT & TM Integration Technology Innovation Development Award (TIDA) 86,427<br />

UNITE (Personalised Cross-site Personalisation) Technology Innovation Development Award (TIDA) 60,000<br />

Iterative Retraining of Machine Translation with<br />

Post-edits to Increase Post-Editing Productivity in<br />

Localisation Workflows<br />

Linguabox: Automated Open Content Repurposing<br />

Service to support Personalized eLearning<br />

iOmegaT – An Instrumented Replayable<br />

Computer-Aided-Translation Tool<br />

Technology Innovation Development Award (TIDA) 99,218<br />

Technology Innovation Development Award (TIDA) 87,768<br />

Technology Innovation Development Award (TIDA) 92,273<br />

5,973,743


Integrated Language<br />

Technologies


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 27<br />

Strand Name: Integrated Language Technologies<br />

AREA CO-ORDINATORS:<br />

PROF. JOSEF VAN GENABITH, DUBLIN CITY UNIVERSITY<br />

PROF. NICK CAMPBELL, TRINITY COLLEGE DUBLIN<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Mr. Takeshi Fukunaga<br />

Dai Nippon Printing<br />

Prof. Walter Daelemans<br />

Antwerp, Belgium<br />

Mr. Tom Gray<br />

SpeechStorm<br />

Prof. Mikel Forcada<br />

Alicante, Spain<br />

Mr. John Dixon<br />

Dr. Fred Hollowood<br />

Mr. Paul McManus<br />

Mr. Enda McDonnell<br />

Applied Language<br />

Solutions<br />

Symantec<br />

SDL<br />

Alchemy Software<br />

Development<br />

Prof. Bernd Möbius<br />

Prof. Khalil Sima’an<br />

Prof. Eiichiro Sumita<br />

Prof. Antal van den Bosch<br />

Prof. François Yvon<br />

Stuttgart, Germany<br />

Amsterdam, Netherlands<br />

ATR, Japan<br />

Tilburg, Netherlands<br />

Paris, France<br />

Mr. Phil Ritchie<br />

VistaTEC<br />

Dr. Johann Roturier<br />

Symantec<br />

Mr. Dag Schmidtke<br />

Microsoft<br />

Faculty<br />

Prof. Nick Campbell Trinity College Dublin ILT Co-Leader, ILT2 Leader<br />

Dr. Peter Cahill University College Dublin ILT2 Co-Leader<br />

Prof. Julie Carson-Berndsen University College Dublin ILT2 Co-Leader<br />

Dr. Martin Emms Trinity College Dublin ILT3<br />

Dr. Christer Gobl Trinity College Dublin ILT2<br />

Prof. Qun Liu Dublin City University ILT1<br />

Dr. Dorothy Kenny Dublin City University ILT1<br />

Dr. Saturnino Luz Trinity College Dublin ILT3<br />

Prof. Ailbhe Ní Chasáide Trinity College Dublin ILT2<br />

Dr. Sharon O’Brien Dublin City University ILT1<br />

Prof. Josef van Genabith Dublin City University ILT Co-Leader, ILT1 Leader, ILT3<br />

Dr. Carl Vogel Trinity College Dublin ILT3 Leader<br />

Research Integration Officer<br />

Dr. Declan Groves<br />

Dublin City University


28<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Postdoctoral Researchers<br />

Dr. Ergun Biçici Dublin City University ILT1<br />

Dr. Joao Cabral University College Dublin ILT2<br />

Dr. Yvette Graham Dublin City University Affiliated<br />

Dr. Ingmar Steiner University College Dublin ILT2<br />

Dr. Erwan Moreau Trinity College Dublin ILT3<br />

Dr. Sara Morrissey Dublin City University ILT1<br />

Dr. Sudip Kumar Naskar Dublin City University ILT1<br />

Dr. Irena Yanushevskaya Trinity College Dublin ILT2<br />

Dr. Xiaofeng Wu Dublin City University ILT1<br />

Dr. Junhui Li Dublin City University ILT1<br />

PhD Students<br />

Mr. Mohamed Abou-Zleikha University College Dublin ILT2<br />

Mr. Zeeshan Ahmed University College Dublin ILT2<br />

Ms. Hala Al-Maghout Dublin City University ILT1<br />

Mr. Pratyush Banerjee Dublin City University ILT1<br />

Ms. Hanna Béchara Dublin City University ILT1<br />

Mr. Sandipan Dandapat Dublin City University ILT1<br />

Mr. Stephen Doherty Dublin City University ILT1<br />

Ms. Amelie Dorn Trinity College Dublin ILT2<br />

Mr. Hector Hugo Franco Penya Trinity College Dublin ILT3<br />

Mr. John Kane Trinity College Dublin ILT2<br />

Mr. Mark Kane University College Dublin ILT2<br />

Mr. Gerard Lynch Trinity College Dublin ILT3<br />

Mr. Alfredo Maldonado Guerra Trinity College Dublin ILT3<br />

Ms. Liliana Mamani Sanchez Trinity College Dublin ILT3<br />

Ms. Neasa Ní Chiaráin Trinity College Dublin ILT2<br />

Mr. Udochukwu Kalu Ogbureke University College Dublin ILT2<br />

Ms. Maria O’Reilly Trinity College Dublin ILT2<br />

Mr. Ankit Srivastava Dublin City University ILT1<br />

Ms. Eva Szekely University College Dublin ILT2<br />

Mr. Christoph Wendler Trinity College Dublin ILT2<br />

Ms. Amalia Zahra University College Dublin ILT2


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 29<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €1,064,77<br />

SFI TIDA Award Iterative Retraining of Machine<br />

Translation with Post-edits to increase Post-Editing<br />

Productivity in Localisation Workflows €99,218<br />

SFI TIDA Award MT & TM Integration €86,427<br />

<strong>2012</strong> Funding from Other Sources<br />

van Genabith EU FP7 QT Launch Pad: €477,960<br />

van Genabith: EU FP7 LT Web: €87,290<br />

van Genabith: EU FP7 EXPERT Marie Curie PhD Training<br />

€481,000<br />

Toral: Abu-MaTran EU FP7 PEOPLE €365,966


30<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Research Overview: Integrated<br />

Language Technologies (ILT)<br />

Goals<br />

Human languages are a core medium for representing,<br />

storing and sharing knowledge and information. The<br />

objective of the ILT track is to perform basic and applied<br />

research in language technologies (LTs) supporting<br />

content processing and management across languages<br />

and modalities (text and speech). ILT1 focuses on<br />

advancing machine translation (MT), ILT2 on speech<br />

input and output as well as speech translation, and<br />

ILT3 on text classification and annotation. The three<br />

groups work closely together on integrated technologies<br />

providing core <strong>CNGL</strong> language-based services.<br />

Research Barriers and Methodologies<br />

to Address Them<br />

ILT1: Machine Translation<br />

Statistical Machine Translation (SMT), in particular<br />

Phrase-Based SMT (PB-SMT such as the Moses<br />

platform), has been a game-changer in both research<br />

and commercial applications of MT. At the same time<br />

SMT is reaching a performance plateau, with disruptive<br />

improvements in translation quality requiring massive<br />

increases in training data. Traditional PB-SMT uses<br />

string-based information. Substantial improvements<br />

are expected through the use of richer (linguistically or<br />

distributionally motivated) signals, including syntactic<br />

and semantic information, in machine learning-based<br />

approaches to MT. Mining and translating noisy usergenerated<br />

content (UGC) is becoming increasingly<br />

important in global business intelligence and customer<br />

support operations. However, UGC is highly challenging<br />

for MT trained on “clean” professionally-edited data. MT<br />

is applied to increasing numbers of domains and text<br />

types. Novel domain adaptation techniques are required<br />

to ensure optimal MT output quality. Improvement in<br />

MT components (such as alignment) can improve overall<br />

MT performance. Most system combination and hybrid<br />

MT approaches can profit from better machine learning<br />

technologies. Technologies need to be developed to<br />

support fully language-independent quality estimation/<br />

prediction (without access to a reference translation)<br />

that treats the MT system as a black box. Finally, optimal<br />

integration of translation technologies requires full<br />

consideration of the human in the loop.<br />

Almaghout et al. (<strong>2012</strong>a, b) show how linguisticallymotivated<br />

sophisticated syntactic information enriching<br />

synchronous context free grammars (SCFGs) can improve<br />

state-of-the-art hierarchical phrase-based SMT (HPB-<br />

SMT) systems. Graham and van Genabith (<strong>2012</strong>) present<br />

a statistical, deep syntax, LFG-based decoder and MT<br />

system. Banerjee et al. (<strong>2012</strong>a) develop a translationquality<br />

driven supplementary training data selection<br />

model for tuning MT to user-generated content. Banerjee<br />

et al. (<strong>2012</strong>b) compare normalisation and supplementary<br />

training data based approaches to MT of UGC. Pecina<br />

et al. (<strong>2012</strong>) present approaches to adapting log-linear<br />

weight vectors to achieve optimal translation for different<br />

domains given a generic training set without retraining.<br />

Tu et al. (<strong>2012</strong>) show how compact representations of<br />

alignment alternatives can improve MT. Dandapat et al.<br />

(<strong>2012</strong>) develop an efficient system combination approach<br />

integrating EBMT, SMT, TM and IR-based technologies.<br />

The Second Workshop and Shared Task on Applying<br />

Machine Learning Techniques to Optimise the Division<br />

of Labour in Hybrid MT (ML4HMT-12) was co-organised<br />

by <strong>CNGL</strong> (van Genabith, Badia, Federmann, Melero,<br />

Costa-jussà and Okita, <strong>2012</strong>) and <strong>CNGL</strong> research teams<br />

contributed four submissions (Wu et al., <strong>2012</strong>; Okital et<br />

al., <strong>2012</strong>a; Okita, et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared<br />

task. Bicici et al. (2013 accepted for publication) show<br />

how quality prediction can be performed using language<br />

independent features treating MT systems as a black box.<br />

Doherty et al. (<strong>2012</strong>), Doherty and O’Brien (<strong>2012</strong>) and<br />

Doherty and Moorkens (2013) investigate human factors<br />

in translation technology integration using eye-tracking<br />

experiments as well as studies on SMT integration into<br />

translation professional training syllabi.<br />

ILT2: Speech and Machine Translation<br />

The analysis of voice characteristics¸ synthesis of<br />

expressive voices, linking speech with other modalities<br />

(such as facial expressions) and speech-to-speech<br />

translation are some of the core challenges in speech<br />

research.<br />

Kane et al. (<strong>2012</strong>) develop algorithms for automatically<br />

detecting creaky voice and facilitating its inclusion<br />

in speech synthesis. An Invention Disclosure for a<br />

new method for tracking changes in the voice with<br />

applications in speaker identity tracking and emotion<br />

detection has been filed. Székely et al. (<strong>2012</strong>) detects<br />

voice styles in audiobooks and builds synthetic voices for<br />

those voice styles. Abou-Zleikha et al. (<strong>2012</strong>) presents


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 31<br />

novel work on pitch and duration modelling. Cabral<br />

et al. (<strong>2012</strong>) improve modelling of vocal cord vibration<br />

for better voice quality in speech synthesis. Székely et<br />

al. (<strong>2012</strong>) link synthetic speech voice style and facial<br />

expression. Ahmed et al. (<strong>2012</strong>) develop state-of-theart<br />

phone-based hierarchical phrase-based machine<br />

translation (HPB-SMT) models.<br />

ILT3: Analytics<br />

Language data provide “unstructured” representations of<br />

information. Language technologies (LTs) are required to<br />

automatically extract structure from language data and to<br />

record this structure in the form of mark-up, annotation,<br />

metadata and explicit representations of information<br />

content, across languages and domains. In order to<br />

address these challenges, ILT3 develops sophisticated<br />

classification-based language technologies using a<br />

wide variety of features and approaches in document,<br />

sentence and sub-sentential classification problems in<br />

syntax, semantics and pragmatics, often with a focus on<br />

supporting MT. In addition, ILT3 has a strong focus on<br />

domain adaptation, concentrating in particular on usergenerated<br />

content.<br />

of Dubai (Attia et al. (<strong>2012</strong>a), Attia et al. (<strong>2012</strong>b)<br />

show how a combination of finite state and machine<br />

learning-based technologies can be used to produce<br />

wider coverage lexical resources for Modern Standard<br />

Arabic using the Arabic Giga-Word corpus data, as well<br />

as how spell checking for Arabic can be improved. In<br />

collaboration with the Chinese Academy of Sciences and<br />

New York University, the DCU ILT3 team investigates the<br />

granularity of syntactic information required to improve<br />

sentiment analysis (Tu et al., <strong>2012</strong>).<br />

Text classification developed by Dr. Carl Vogel’s team<br />

has produced two Invention Disclosures and a Patent<br />

Application (application no. 11169673.8-1527) with<br />

the European Patent Office, as well as a commercial<br />

licence for Digital Linguistics, a <strong>CNGL</strong> start-up company.<br />

Moreau and Vogel (<strong>2012</strong>) compare supervised and semisupervised<br />

approaches to MT quality estimation. Lynch,<br />

Moreau and Vogel (<strong>2012</strong>) develop accurate classifiers<br />

to decide whether something is a translation or not. If<br />

it is a translation, Lynch and Vogel (<strong>2012</strong>) predict the<br />

source language. Emms (<strong>2012</strong>), Emms and Franco Penya<br />

(<strong>2012</strong>a, b) explore stochastic tree distance similarity<br />

measures and employ it for semantic role labelling<br />

(Emms and Franco Penya, <strong>2012</strong>c). Maldonado-Guerra<br />

and Emms (<strong>2012</strong>) develop methods to investigate the<br />

complex translation behaviour of multi-word expressions.<br />

Vogel and Mamani Sanchez (<strong>2012</strong>) predict the complex<br />

interplay between emoticons and hedges as social signals<br />

in user fora. The DCU-Paris 13 parsing team won the<br />

Web-Parsing Challenge and Shared Task organised by<br />

Google as part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le<br />

Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />

and domain adaptation techniques. In a collaboration<br />

between DCU, Heinrich Heine University in Düsseldorf,<br />

Charles University Prague and the British University<br />

Hector-Hugo Franco-Penya, Dr. Alexandru Ceausu and Dr. Antonio Toral<br />

were among the many participants in the Hadoop Hackathon run by<br />

<strong>CNGL</strong> in March<br />

Year 5 Progress<br />

The final year of the initial funding cycle of <strong>CNGL</strong><br />

(2007–<strong>2012</strong>) has been dominated by strong research<br />

and publication outputs, writing-up of PhD theses<br />

(leading to six successful PhD completions), increased<br />

commercialisation activities translating research outputs<br />

into IP (Invention Disclosures, Patent Applications and<br />

Licences) and considerable time and effort spent on the<br />

<strong>CNGL</strong> final review and <strong>CNGL</strong>II application preparations.<br />

Despite the loss of some members of the research<br />

team who have taken up new positions in industry and<br />

academia, all research tracks in ILT continue to run ahead<br />

of schedule in close collaboration with <strong>CNGL</strong> industry<br />

partners and increased engagements with additional<br />

commercial entities.


32<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Progress in ILT1: Machine Translation<br />

The ILT1 group continues to have an impressive<br />

publication record. Conference papers have been<br />

accepted at a number of world-renowned conferences<br />

including the Association for Computational Linguistics<br />

(ACL-<strong>2012</strong>, Jeju, Korea), the International Conference<br />

for Computational Linguistics (COLING-<strong>2012</strong>, Mumbai,<br />

India), the European Association for Machine Translation<br />

(EAMT-<strong>2012</strong>, Trento, Italy) as well as the Machine<br />

Translation Summit (AMTA-<strong>2012</strong>, San Diego, CA) and<br />

the Workshop on Statistical Machine Translation (WMT-<br />

<strong>2012</strong>, Montreal, Canada). COLING-<strong>2012</strong> was particularly<br />

successful with a total of 9 MT papers at the main<br />

conference and COLING workshops.<br />

Data-driven statistical MT technologies are able to<br />

provide translations suitable for use in commercial<br />

settings, as evidenced by the dramatic increase in<br />

adoption and provision of MT services in the localisation<br />

industry. The question is no longer whether or not to<br />

use MT technologies, but how best to integrate MT into<br />

localisation and content management workflows. At<br />

the same time, statistical MT, in particular phrase-based<br />

statistical MT (PB-SMT), is reaching a performance<br />

plateau, with disruptive improvements in translation<br />

quality requiring massive increases in training data.<br />

Traditional PB-SMT uses string-based information.<br />

Substantial improvements are expected through<br />

the use of richer (linguistically or distributionally<br />

motivated) signals, including syntactic and semantic<br />

information, in machine learning-based approaches to<br />

MT. ILT1 has made key contributions to this research<br />

challenge, evidenced by <strong>CNGL</strong> publications at EAMT-<br />

<strong>2012</strong>, ACL-<strong>2012</strong> and WMT-<strong>2012</strong> from the DCU MT<br />

group. Almaghout et al. (<strong>2012</strong>) and Li et al. (<strong>2012</strong>a,<br />

b) show how linguistically-motivated sophisticated<br />

syntactic information enriching synchronous context<br />

free grammars (SCFGs) can improve state-of-the-art<br />

hierarchical phrase-based SMT (HPB-SMT) systems.<br />

Graham and van Genabith (<strong>2012</strong>) develop a deep syntax<br />

(Lexical-Functional Grammar)-based statistical MT<br />

system.<br />

With increasing volumes of content being generated<br />

by users (rather than professional writers), the need for<br />

mining and making this content (user fora, blogs, tweets)<br />

available across multiple languages has significantly<br />

increased. Coping with potentially noisy user-generated<br />

content (UGC) presents a major challenge for MT and<br />

novel training data selection models are crucial for tuning<br />

MT models to UGC. Working in close cooperation with<br />

<strong>CNGL</strong> industry partner Symantec, a key DCU MT group<br />

publication at COLING-<strong>2012</strong> (Banerjee et al., <strong>2012</strong>a)<br />

presents a translation-quality driven supplementary<br />

training data selection model for tuning MT to UGC,<br />

while Banerjee et al. (<strong>2012</strong>b) investigate the question<br />

whether text normalisation techniques are more<br />

productive in automatic translation of UGC compared<br />

to adding suitable supplementary training data. Tuning<br />

MT to diverse text types and content domains is a crucial<br />

factor in ensuring optimal quality. In many real world<br />

application scenarios, however, a complete retraining of<br />

the MT system on domain specific training material is not<br />

an option: it may either be too costly or suitable training<br />

material is simply not available. A joint DCU MT group<br />

and Charles University Prague COLING-<strong>2012</strong> publication<br />

(Pecina et al., <strong>2012</strong>) presents approaches to adapting loglinear<br />

weight vectors to achieve improved translation for<br />

different domains given a generic training set without the<br />

need for full retraining.<br />

System combination and hybrid MT can improve MT<br />

quality: in partnership with DFKI (The German Research<br />

Center for Artificial Intelligence) and Barcelona Media<br />

(BM), the DCU <strong>CNGL</strong> MT group organised the Second<br />

Workshop and Shared Task on Applying Machine<br />

Learning Techniques to Optimise the Division of<br />

Labour in Hybrid MT (ML4HMT-12) in Mumbai, India,<br />

as a COLING-<strong>2012</strong> workshop (van Genabith, Badia,<br />

Federmann, Melero, Costa-jussà and Okita, <strong>2012</strong>).<br />

The DCU <strong>CNGL</strong> MT research teams contributed four<br />

submissions (Wu et al., 202; Okita et al, <strong>2012</strong>a; Okita<br />

et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared task. System<br />

combination is usually most effective when the MT<br />

systems involved are quite diverse. Dandapat et al.<br />

(<strong>2012</strong>) develop an efficient system combination approach<br />

integrating EBMT, SMT, TM and IR-based technologies.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 33<br />

Improvements in components of statistical MT systems<br />

can lead to better translation outputs. In a collaboration<br />

with the Chinese Academy of Sciences, Tsinghua<br />

University and New York University, Tu et al. (<strong>2012</strong>) show<br />

how compact representations of alignment alternatives<br />

(rather than using a single alignment) can improve MT.<br />

MT and other translation technologies can only deliver<br />

if full consideration is given to the human in the loop:<br />

Doherty et al. (<strong>2012</strong>) develop and validate a syllabus to<br />

teach translators SMT and related skills. Doherty and<br />

O’Brien (<strong>2012</strong>) examine the usability of MT using eye<br />

tracking and find quality for some target languages to<br />

be as good as the source, but detrimental to the user<br />

experience in others. Doherty and Moorkens (2013)<br />

present an evaluation of teaching translation technology<br />

to translators and identify several hurdles and solutions.<br />

Moorkens et al. (2013) use SMT output to remove<br />

consistencies in TM data and demonstrate resulting<br />

improvements in both TM and SMT quality.<br />

MT quality estimation is the task of predicting the<br />

quality of MT output without access to a reference<br />

translation. Ideally this can be done without access to the<br />

internals of the MT system involved and in a language<br />

independent way, i.e. without relying on languagespecific<br />

resources that may require costly supervised<br />

training. Bicici et al. (2013 accepted for publication) show<br />

how quality prediction can be performed using language<br />

independent features treating MT systems as a black<br />

box. Parts of this research have been submitted as an<br />

Invention Disclosure.<br />

Dr. Sharon O’Brien of DCU and <strong>CNGL</strong> alumnus Dr. Sergio Penkale<br />

of CAPITA pictured at the AMTA-<strong>2012</strong> Workshop on Post-editing<br />

Technology and Practice (WPTP) in San Diego, USA<br />

Progress in ILT2<br />

Although the focus of the PhD students has mainly been<br />

on thesis write-up, the ILT2 Speech Technology research<br />

groups at UCD and TCD have made significant progress<br />

in Year 5. Building on research conducted in previous<br />

years, there was significant further development of<br />

methodologies for analysis of voice characteristics<br />

and for text-to-speech synthesis of expressive voices.<br />

The ILT2 group at TCD has developed algorithms for<br />

automatically detecting creaky voice and provided<br />

mechanisms to facilitate its inclusion in speech synthesis<br />

(Kane et al., <strong>2012</strong>). The progress on this topic is reflected<br />

in two publications at the <strong>2012</strong> Interspeech conference<br />

and in one journal article. John Kane (TCD) has also filed<br />

an Invention Disclosure for a new method for tracking<br />

changes in the voice, which may be deployed in a wide<br />

range of applications from improved speech synthesis to<br />

speaker identity tracking and even emotion detection.<br />

Significant developments on synthesis of expressive<br />

voices were made by researchers at the Speech<br />

Technology Group at UCD. One of the major<br />

contributions looks at exploring the variability in voice<br />

qualities in audiobook corpora by detecting voice styles<br />

in this type of corpora and building synthetic voices for<br />

those voice styles (Székely et al., <strong>2012</strong>). Work on pitch<br />

and duration modelling using novel techniques based<br />

on exemplar-based generation also contributed to the<br />

improvement of the prosodic aspect and expressiveness<br />

of the synthetic speech (Abou-Zleikha et al., <strong>2012</strong>).<br />

Research on modelling other aspects of the voice source<br />

than pitch, using the LF-model to represent the signal<br />

produced by vibration of the vocal cords in human<br />

speech production, has also been further investigated<br />

to permit better control of voice quality in speech<br />

synthesis (Cabral et al., <strong>2012</strong>). One of the outcomes<br />

of the research on expressive speech synthesis is the<br />

WinkTalk system developed at UCD as part of the <strong>CNGL</strong><br />

Demonstrator Programme. This system is a multimodal<br />

speech synthesis platform which links facial expression to<br />

expressive voices (Székely et al., <strong>2012</strong>). It allows the user<br />

to control the voice style of the synthetic speech by facial<br />

expression, with the help of a web camera and tools for<br />

facial expression analysis. Another interesting application<br />

of expressive speech synthesis developed at UCD is its<br />

integration into speech-to-speech translation (Székely<br />

et al., <strong>2012</strong>). The resulting prototype system, FEAST


34<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

(Facial Expression-based Affective Speech) classifies the<br />

emotional state of the user and uses it to render the<br />

translated output in an appropriate voice style.<br />

The successful collaboration between researchers<br />

from UCD (ILT2) and DCU (ILT3) in previous years on<br />

the integration of speech recognition with machine<br />

translation, continued this year with work on phonebased<br />

hierarchical phrase-based machine translation<br />

which results in better performance than conventional<br />

speech translation approaches (Ahmed et al., <strong>2012</strong>).<br />

Progress in ILT3<br />

ILT3 continues to provide mark-up, annotation,<br />

metadata, and knowledge through automatic linguistic<br />

analysis for the discovery, transformation and delivery<br />

of unstructured information across languages. ILT3 has<br />

maintained a strong focus on user-generated and possibly<br />

“noisy” content as found on blogs, forums, tweets and<br />

generally on social media, and continues to expand<br />

close collaboration with industry partners, concentrating<br />

on customer care, event detection and sentiment<br />

tracking scenarios. This strand of research focuses on<br />

text classification and annotation and holds that texts<br />

have infinitely many uses, whereby each sort of use<br />

elicits classification decisions. There is no single form of<br />

annotation that has maximal useful impact across actual<br />

or potential uses. It is frequently useful to make the<br />

annotations within the domain of syntax, in semantics,<br />

and with respect to the pragmatic function; however, it<br />

is not to be expected that each application which has a<br />

need for syntactic labels, for example, will benefit from<br />

the same class of labels or level of detail within a class<br />

(sometimes LFG c-structure with f-structure annotation<br />

is necessary; sometimes part-of-speech tagging of lexical<br />

stems alone is necessary). ILT3 research has addressed<br />

document, sentence and sub-sentential classification<br />

problems in syntax, semantics and pragmatics.<br />

Text classification is a core technology in <strong>CNGL</strong>, the<br />

subject of basic research in extending classification<br />

methods, and applied in various contexts – used in<br />

domain tuning and translation quality assessment (inter<br />

alia). The ILT3 team performs text classification from<br />

the perspective of linguistic theory, testing theories of<br />

language use in conjunction with other strands of ILT<br />

and <strong>CNGL</strong>. Tools developed by Dr. Carl Vogel’s team at<br />

TCD for <strong>CNGL</strong> external purposes and deployed within<br />

our demonstrator activities have formed the basis of two<br />

Invention Disclosures, one collaborative with Mr. Phil<br />

Ritchie of VistaTEC and Dr. David Lewis (<strong>CNGL</strong> SF2).<br />

The IP disclosures have culminated in both a Patent<br />

Application with the European Patent Office (application<br />

no. 11169673.8-1527) “Data processing system and<br />

method for assessing quality of a translation” and a<br />

Commercial Licence of this intellectual property to<br />

Digital Linguistics. This work has been developed further,<br />

first of all by comparing supervised and less-supervised<br />

methods of classification in general for the task of quality<br />

estimation (Moreau and Vogel, <strong>2012</strong>; Moreau and Vogel,<br />

under review) towards identification of parameters<br />

that lead to method preference. Secondly, we have<br />

successfully deployed exactly this method in selecting<br />

items for training MT engines on the basis of similarity<br />

between potential training items and the intended<br />

material for translation. This work is collaborative<br />

with the DCU MT team, and is in the process of being<br />

written for formal peer review. Thirdly, we have studied<br />

base-lines in automated processing of texts produced<br />

by language learners for the identification of particular<br />

error types, such as correct preposition use (Lynch et al.,<br />

<strong>2012</strong>). Finally, we have used automatically discoverable<br />

features in texts to analyse potential translations, with<br />

approximately 80% accuracy in not just the binary<br />

classification problem of deciding whether a text is a<br />

translation or originally written in English, but further,<br />

deciding among potential source languages where the<br />

text is translated (Lynch and Vogel, <strong>2012</strong>). In this case,<br />

the texts were not learner texts but professional literary<br />

translations.<br />

Additional basic advances in text classification methods<br />

have been explored in relation to structural analyses of<br />

sentences comprising texts, and follow-on computation<br />

in relation to the trees that model structural analysis.<br />

Emms (<strong>2012</strong>) explored stochastic tree distances and<br />

their training with expectation-maximisation. Emms and<br />

Franco Penya (<strong>2012</strong>a, <strong>2012</strong>b) establish empirical and<br />

analytical differences between tree-difference metrics<br />

established in the literature for distance and similarity.<br />

Emms and Franco Penya (<strong>2012</strong>c) demonstrate how<br />

mappings between trees can be used for the purposes<br />

of identifying the fillers of semantic roles of predicates.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 35<br />

Multi-word expressions (MWEs) are challenging in<br />

text analytics and automatic translation. In the case of<br />

non-compositional MWEs, the meaning of the MWE<br />

is not a conjunction of the meanings of its constituent<br />

parts. This has a strong impact on translation, for<br />

example. <strong>CNGL</strong> research (Maldonado-Guerra, 2011)<br />

on automatically assessing the compositionality of<br />

meaning in fixed-word expressions – collocations) has<br />

been productive: the system exploits the intuition that<br />

a highly compositional collocation would tend to have<br />

a considerable semantic overlap with its constituents,<br />

whereas a collocation with low compositionality would<br />

share little semantic content with its constituents. This<br />

intuition is operationalised via three configurations that<br />

exploit cosine similarity measures to detect the semantic<br />

overlap between the collocation and its constituents.<br />

The system performed competitively in that task. There<br />

are first-order and second-order approaches to vector<br />

encodings of word meanings. Maldonado-Guerra and<br />

Emms (<strong>2012</strong>) consider these systematically, introducing<br />

a matrix multiplication perspective on the 2nd-order<br />

construction, and exploring both the geometry induced<br />

and the performance on supervised and unsupervised<br />

word sense disambiguation/discrimination tasks. In part<br />

led by the matrix multiplication perspective, work has<br />

been carried out on a variety of matrix consolidation<br />

techniques or dimensionality reduction techniques.<br />

On-going work in ILT3 includes, for example, assessing<br />

whether information about linguistic hedges can be<br />

constructively used as a feature that predicts whether<br />

postings in online fora provided by industry partners are<br />

from individuals who ultimately will be rated as forum<br />

leaders. This is a natural development of our success in<br />

this area using Combinatory Categorial Grammar (CCG)<br />

representations of syntactic structures in combination<br />

with n-grams of sub-lexical (orthography and<br />

morphology) features, as well as sentence-level linguistic<br />

features. This work has been successful (Mamani<br />

Sanchez and Vogel, 2013; Vogel and Mamani Sanchez,<br />

<strong>2012</strong>): firstly, we have noted that emoticon use is a kind<br />

of social signal, and significant positive correlations exist<br />

between the use of positive emoticons and propensity for<br />

posts to be rated as useful (and ultimately the withinforum<br />

rank of posters) and the use of negative emoticons<br />

and un-ranked posters (presumably, individuals posting<br />

queries to expert users); secondly, we have noted<br />

interacting effects of the use of linguistic hedges such as<br />

epistemic qualifiers (technical forum users who rate posts<br />

appear to prefer hedged responses).<br />

Parsing web data is challenging due to the scale and<br />

variety of data. To ascertain the current state-of-the-art<br />

with respect to domain adaptation, Google organised a<br />

shared task at the SANCL-<strong>2012</strong> workshop at NAACL-HLT<br />

<strong>2012</strong> (Montreal, Canada). The DCU-Paris 13 parsing team<br />

won the Web-Parsing Challenge and Shared Task (Le<br />

Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />

and domain adaptation techniques. Lexical resources<br />

are a crucial ingredient of many LT applications and are<br />

challenging to obtain automatically for highly inflecting<br />

languages such as Arabic. Attia et al. (<strong>2012</strong>a, <strong>2012</strong>b)<br />

show how a combination of finite state and machine<br />

learning based technologies can be used to produce<br />

wide coverage lexical resources for Modern Standard<br />

Arabic (MSA) using the Arabic Giga-Word corpus<br />

together with data crawled from the Al Jazeera web site,<br />

as well as how spell checking for MSA can be improved.<br />

Sentiment analysis is a key task in many LT applications.<br />

The DCU ILT3 team investigates the granularity of<br />

syntactic information required to improve sentiment<br />

analysis (Tu et al., <strong>2012</strong>).<br />

Collaborations<br />

Collaboration is at the core of <strong>CNGL</strong>, including<br />

close engagement with <strong>CNGL</strong> industry partners,<br />

university-based <strong>CNGL</strong> researchers, and international<br />

collaborators as well <strong>CNGL</strong> participation in international<br />

research projects (including EU FP7 funded projects).<br />

Collaboration is also particularly visible in our<br />

demonstrator systems, which draw on and combine<br />

research from the four <strong>CNGL</strong> research tracks focusing on<br />

industry partner needs and requirements.


36<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Some of the ILT collaboration highlights in <strong>2012</strong> include:<br />

} In partnership with DCU and the National Centre for<br />

Language Technology (NCLT), <strong>CNGL</strong> was successful<br />

in a Marie Curie Mobility grant application for the<br />

EXPERT PhD Graduate School (€481K, DCU PI Prof.<br />

Josef van Genabith) with a total of 15 PhD Marie<br />

Curie fellowships (two of them at DCU) and three<br />

postdoctoral researchers. Led by the University of<br />

Wolverhampton (UK), EXPERT focuses on empirical<br />

approaches to (machine) translation, and as part of<br />

their training PhD students will spend time at DCU’s<br />

EXPERT university and industry partners.<br />

} In partnership with DCU and the National Centre for<br />

Language Technology (NCLT), <strong>CNGL</strong> was successful<br />

in an FP7 SA (Support Action) application called<br />

QTLaunchPad (€426K, DCU PI Prof. Josef van<br />

Genabith). Led by DFKI (German Research Centre<br />

for Artificial Intelligence), QTLaunchPad targets High<br />

Quality MT and funds two postdoctoral researcher<br />

positions at DCU.<br />

project. DCU’s role is around developing word- and<br />

phrase-aligned data resources (including bilingual<br />

dictionaries and transfer grammars) from the<br />

acquired parallel corpora and using this data to<br />

build MT systems.<br />

META-NET aims to mobilise and build a network<br />

between various language technology research<br />

groups within Europe, including commercial<br />

providers of applications and services and other<br />

relevant stakeholders. DCU is heavily involved<br />

in dissemination activities as well as organising<br />

workshops and the provision of data sets and<br />

annotations for the use of machine learning<br />

techniques for MT system combination. In<br />

this way, the project hopes to bridge the gaps<br />

between the machine learning community and<br />

the MT research community. The network is led<br />

by DFKI (Germany). <strong>CNGL</strong> industry partners<br />

DNP, Microsoft, Symantec and Applied Language<br />

Solutions are members of META-NET.<br />

} In partnership with DCU, the National Centre for<br />

Language Technology and international collaborators,<br />

<strong>CNGL</strong> was successful in attracting €1M funding<br />

as lead partner (DCU Lead PI Dr. Antonio Toral)<br />

in the EU FP7 Abu-MaTran project, focusing on<br />

enhancing industry-academia cooperation as a key<br />

aspect to tackle one of Europe’s biggest challenges:<br />

multilinguality.<br />

} <strong>CNGL</strong> and ILT1 in partnership with DCU and the<br />

National Centre for Language Technology, are<br />

continuing their strong engagement in European<br />

EU FP7 Machine Translation projects PANACEA,<br />

CoSyne, PLuTO, and MultilingualWeb-LT as well<br />

as the META-NET/T4ME Network of Excellence:<br />

The CoSyne project focuses on multilingual<br />

content synchronisation for wikis. The project<br />

is led by the University of Amsterdam. DCU’s<br />

involvement centres on diagnostic linguisticbased<br />

evaluation of MT systems between multiple<br />

European languages.<br />

The PANCEA project aims to develop a platform<br />

for automatic, normalised annotation and costeffective<br />

acquisition of language resources<br />

for human language technologies centred on<br />

interoperable web services. The Universitat<br />

Pompeu Fabra (Spain) is co-ordinating the STREP<br />

Pictured at the launch of the META-NET White Paper on The Irish<br />

Language in the Digital Age are its authors including (second from right)<br />

Prof. Ailbhe Ní Chasaide of <strong>CNGL</strong> and (centre) Mr. Dinny McGinley T.D.<br />

Minister for State for the Gaeltacht<br />

The PLuTO (Patent Translations Online) project is<br />

a PSP project focused on delivering a solution for<br />

online patent translation, including the use of MT<br />

and TM technologies tuned to the patent domain.<br />

This project is co-ordinated by DCU, who also look<br />

after research and development of patent-tuned<br />

MT systems for multiple languages.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 37<br />

<strong>CNGL</strong>, represented jointly through ILT1, LOC<br />

and SF2 (TCD, DCU and UL) and together<br />

with <strong>CNGL</strong> industry partners Microsoft and<br />

VistaTEC, were successful in attracting funding<br />

for the EU FP7 Support Action MultilingualWeb-<br />

LT, an international consortium with senior<br />

representatives of the translation industry as well<br />

as standards bodies coordinated by Prof. Felix<br />

Sasaki, DFKI, Germany, to support research on the<br />

interoperability of language technologies on the<br />

web by defining new metadata standards.<br />

} Funding from DCU‘s Ireland-India Fund and the<br />

Government of India‘s India-Ireland Cooperative<br />

Science Programme is facilitating collaboration<br />

of <strong>CNGL</strong> with IIIT Hyderabad on English-Indian<br />

translation systems and has enabled DCU to coorganise<br />

a COLING-<strong>2012</strong> workshop on Machine<br />

Translating and Parsing Indian Languages<br />

(MTPIL-<strong>2012</strong>) in Mumbai, India.<br />

} ILT teams have forged extensive international<br />

research collaborations and have published widely<br />

with colleagues from prestigious institutions in many<br />

countries, including China, USA, Czech Republic,<br />

Germany, Spain, UAE, Belgium, Italy and Hungary.<br />

} Dr. Carl Vogel has been increasingly engaged in the<br />

EU COST action IS1004, WebDataNet, on conducting<br />

iScience availing of the opportunities that emerge<br />

from access to raw data and participants in research<br />

via the Internet.<br />

} <strong>CNGL</strong> demonstrator systems are combining research<br />

teams across <strong>CNGL</strong> tracks, partner universities and<br />

industry partners:<br />

KantanMT – Moses on the Cloud: involves close<br />

collaboration between ILT and <strong>CNGL</strong> spinout<br />

Xcelerator Machine Translations Ltd.<br />

PLuTO – Facilitating Patent Search with Machine<br />

Translation: involves active collaboration with the<br />

PLuTO FP7 project at DCU<br />

Rapid MT Retraining: involves tight collaboration<br />

between ILT, SF, the FP7 PANACEA project at DCU,<br />

and the Multilingual Web-LT project<br />

WebWOZ – A Wizard of Oz Platform: involves<br />

close collaboration between ILT and SF<br />

The <strong>CNGL</strong> Demonstrators Programme has<br />

promoted strong collaboration between ILT1 and<br />

ILT2 researchers in the demonstrator Personalising<br />

Speech for Interpersonal Communication<br />

(MySpeech). One highlight of this collaboration<br />

was to use the Wizard-of-Oz framework to conduct<br />

a preliminary evaluation of the MySpeech system<br />

for pronunciation training of foreign languages<br />

(Cabral et al., <strong>2012</strong>).<br />

} ILT3 has strongly collaborated with DCM, ILT1, SF and<br />

<strong>CNGL</strong> industry partners (particularly Symantec and<br />

VistaTEC) and affiliates (Digital Linguistics) on text<br />

classification for particular applications.<br />

} ILT1 and ILT3 have been collaborating closely with<br />

researchers at the National Centre for Language<br />

Technology (NCLT) on using the LFG AA output to<br />

improve MT evaluation, extending the German LFG<br />

AA feature set to improve parsing the German side<br />

of the EuroParl data, improving the LFG-inspired<br />

constituency to dependency conversion, integrating<br />

multi-word expressions in the LFG AA, integrating<br />

MWEs into constituency parsing, and tuning a number<br />

of statistical parsing architectures to user-generated<br />

data (including Twitter data and user forum data).<br />

} ILT1 and ILT3, in collaboration with the National<br />

Centre for Language Technology (NCLT), are<br />

continuing their close research cooperation with<br />

<strong>CNGL</strong> industry partner Symantec on tuning MT and<br />

text analytics technologies to analyse user-generated<br />

content: in addition to the existing collaboration<br />

(Pratyush Banjeree, PhD student with ILT1), Symantec<br />

is funding research on tuning language technologies<br />

to user-generated text in partnership with IRCSET<br />

(Irish Research Council for Science, Technology and<br />

Engineering) through a project involving one PhD<br />

student and one postdoctoral researcher in a project<br />

led by Dr. Jennifer Foster.<br />

} ILT3 (Dr. Carl Vogel) is continuing collaborations<br />

with VistaTEC and Digital Linguistics, including<br />

preparations for joint publications. Engagement<br />

with Microsoft has commenced leading to joint<br />

development of text classification methods and<br />

tools detecting offensive content in user fora<br />

(both linguistic and non-linguistic content) for 2013.<br />

Text Classification for Bulk Localisation Review:<br />

involves active collaboration between ILT3, SF2<br />

and industry partner VistaTEC.


38<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

} <strong>CNGL</strong>, through the DCU MT team in ILT1, is<br />

continuing to work with Prof. Mikel Forcada from<br />

Universitat d’Alacant, Spain following the success of<br />

his Walton Fellowship at <strong>CNGL</strong> in 2010. Prof. Forcada<br />

has continued to work with PhD student Sandipan<br />

Dandapat and visited <strong>CNGL</strong> again in <strong>2012</strong>. Prof. Mikel<br />

Forcada and Prof. Khalil Sima’an from the University<br />

of Amsterdam partner DCU in an EU FP7 application<br />

DELIQAT in the area of High Quality MT.<br />

Year 5 also saw the arrival of PhD students who joined<br />

ILT2. Christoph Wendler and Maria O’Reilly joined the<br />

<strong>CNGL</strong> speech group at TCD in April and May <strong>2012</strong><br />

respectively.<br />

People<br />

Year 5 has been a very dynamic year in terms of arrivals<br />

and departures. With Prof. Andy Way’s departure to an<br />

industry appointment in June 2011 and the time it took<br />

to find a new Professor in Machine Translation, Prof. Josef<br />

van Genabith (<strong>CNGL</strong> Director) assumed the position<br />

of ILT co-track leader (along with Prof. Nick Campbell<br />

of TCD) as an interim arrangement in addition to his<br />

position as <strong>CNGL</strong> Director and ILT1 lead.<br />

Prof. Qun Liu has joined DCU, <strong>CNGL</strong> and the NCLT as<br />

Professor of Machine Translation and leader of the MT<br />

group. Prof. Liu was the Director of the Natural Language<br />

Processing Research Group in the Institute of Computing<br />

Technology at the Chinese Academy of Sciences (CAS)<br />

in Beijing. He has over 150 research publications and his<br />

work is widely cited internationally. He has produced<br />

ground-breaking research in many aspects of statistical and<br />

rule-based machine translation as well as in Chinese word<br />

segmentation and NLP. He has successfully led a large<br />

number research projects at CAS. His research interests<br />

span Chinese Natural Language Processing, Machine<br />

Translation and Information Extraction. Prof. Liu has<br />

quickly embedded in <strong>CNGL</strong> and made key contributions<br />

to an EU FP7 application currently under review.<br />

Some of the 11 visiting MSc and PhD scholars who worked with ILT<br />

during <strong>2012</strong> under <strong>CNGL</strong>’s postgraduate internship programme<br />

Eleven visiting MSc and PhD interns joined ILT over five<br />

months in <strong>2012</strong>, under <strong>CNGL</strong>’s postgraduate internship<br />

programme. The programme enables students to gain<br />

valuable experience as part of a highly-regarded and<br />

continually-growing research centre. This year’s<br />

programme attracted interns from institutions across the<br />

globe, including Italy, France, China and India. The<br />

internships covered a wide range of topics in Natural<br />

Language Processing and Machine Translation.<br />

Dr. Ergun Biçici joined <strong>CNGL</strong>, NCLT and the DCU MT<br />

team as a postdoctoral researcher from Koç University<br />

(Turkey) and is working on regression-based approaches<br />

for MT and parse quality estimation. Dr. Biçici has a<br />

strong background in machine learning and is<br />

contributing key expertise to the <strong>CNGL</strong> research teams.<br />

Dr. Ingmar Steiner joined the ILT2 group at UCD in June<br />

<strong>2012</strong> and worked jointly with the Speech Communication<br />

group at TCD. As of December <strong>2012</strong>, he has moved to<br />

the Computational Linguistics and Phonetics department<br />

at DFKI, Saarbrücken, Germany, as a senior researcher, to<br />

set up an Independent Research Group.<br />

Prof. Qun Liu joined <strong>CNGL</strong> at DCU as Professor of Machine Translation<br />

in <strong>2012</strong>


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 39<br />

A number of ILT researchers moved on to roles at other<br />

academic institutions or transitioned to industry during<br />

<strong>2012</strong>.<br />

Achievements<br />

Postdoctoral researcher Dr. Junhui Li (ILT1 DCU) took up<br />

a position as postdoctoral researcher at the University of<br />

Maryland (USA), where he is continuing his research on<br />

machine translation.<br />

Dr. Pavel Pecina (ILT1 DCU) accepted a call as an<br />

Associate Professor in Machine Translation to the Charles<br />

University in Prague (Czech Republic).<br />

Dr. Yifan He (former <strong>CNGL</strong> ILT1 PhD student and<br />

MT postdoctoral researcher) accepted a posdoctoral<br />

researcher position at New York University (USA).<br />

Hala Almaghout, Sandipan Dandapat, Pratyush<br />

Banerjee, Ankit Srivastava, Stephen Doherty and Irena<br />

Yanushevskaya successfully defended their PhD vivas in<br />

<strong>2012</strong>.<br />

Dr. Sandipan Dandapat has taken up a lecturing position<br />

at IIT-Guwahati, Assam, India.<br />

After a dedicated contribution to <strong>CNGL</strong> over the past<br />

five years, Dr. Peter Cahill departed the ILT2 group to<br />

become engaged full-time in his spin-out company,<br />

Scream Technologies. His start-up company develops<br />

speech synthesis technology products which have<br />

valuable applications in areas as diverse as video games,<br />

customer support and advertising.<br />

John Kane from ILT2 submitted his thesis in September<br />

<strong>2012</strong> and he is awaiting his defence. Meanwhile, he<br />

departed <strong>CNGL</strong> in October and started a research<br />

position with the Fastnet project, at TCD. The PhD<br />

fellow Amelie Dorn departed TCD in November <strong>2012</strong>.<br />

Stephen Doherty was one of six ILT1 doctoral students to successfully<br />

defend their PhD theses during <strong>2012</strong><br />

Awards and Prizes<br />

} Prof. Josef van Genabith was recipient of the<br />

DCU President’s Research Award for Science and<br />

Engineering <strong>2012</strong>.<br />

} Prof. Carl Vogel and Liliana Mamani Sanchez (TCD)<br />

were awarded a best paper prize for their work<br />

“Epistemic Signals and Emoticons Affect Kudos”<br />

at 3rd IEEE International Conference on Cognitive<br />

Infocommunications in December <strong>2012</strong>.<br />

} Dr. Martin Emms and Hector Franco-Penya (TCD)<br />

were recipients of a best paper award at the<br />

International Conference on Pattern Recognition<br />

Application and Methods (ICPRAM <strong>2012</strong>) in February<br />

<strong>2012</strong>.<br />

} The DCU-Paris 13 team won the Web-Parsing<br />

Challenge and Shared Task organised by Google as<br />

part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le Roux,<br />

Foster, Wagner, Kaljahi and Bryl <strong>2012</strong>), using the<br />

DCU LORG parser platform and domain adaptation<br />

techniques.<br />

} Prof. Josef van Genabith (<strong>CNGL</strong>/NCLT/DCU) has<br />

been appointed as general chair of COLING 2014, to<br />

be held in Dublin in August 2014.<br />

UCD hosts Innovation and Applications in Speech Technology (IAST)<br />

Workshop in March


40<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Prof. Josef van Genabith delivers his address before accepting the DCU<br />

President’s Research Award for Science and Engineering in February<br />

International Collaborations<br />

The EU FP7 projects are continuing on track, with<br />

PANACEA, PLuTO, CoSyne and T4ME/META-NET<br />

successfully passing their second year reviews. Work<br />

carried out on MT within the PLuTO project on patentlanguage<br />

MT has gained a significant amount of<br />

commercial interest and press coverage. EU FP7 Support<br />

Action MultilingualWeb-LT is running at full strength. The<br />

new EU FP7 support action (SA) project QTLaunchPad<br />

commenced in June <strong>2012</strong> and recruitment for the new<br />

Marie Curie EXPERT PhD programme is under way.<br />

Industry Engagement<br />

During <strong>2012</strong> considerable effort has been placed on<br />

exploring avenues for commercialisation of ILT research<br />

and on developing industrially-relevant prototype and<br />

proof-of-concept systems. In turn, there has been a<br />

significant increase in commercial interest in the research<br />

we are carrying out in <strong>CNGL</strong>.<br />

Technology Innovation Development Award (TIDA)<br />

projects are funded by Science Foundation Ireland to<br />

support the transition of basic research outputs from the<br />

lab to industrial applications, primarily through industry<br />

strength implementations and road-testing in commercial<br />

environments.<br />

ILT1 (MT) has been successful in attracting funding for<br />

two TIDA projects. TMTPrime (Machine Translation<br />

and Translation Memory Integration in a Localisation<br />

Workflow, Dr. Declan Groves, DCU) started in mid-<br />

<strong>2012</strong> and is focusing on developing an industry-strength<br />

application to optimally combine the outputs of Machine<br />

Translation (MT) systems with Translation Memory (TM)<br />

(fuzzy) matches, based on <strong>CNGL</strong> ILT1 basic research<br />

reported in (He et al., 2010). The technology uses<br />

translation quality prediction to recommend either MT<br />

or TM output based on estimated post-editing effort.<br />

The project is particularly important as TMs are still the<br />

main-stay technology in many localisation operations<br />

and pricing models are based on TM reuse. TMTPrime<br />

technology guarantees that the MT/TM combination<br />

will have TM-based pricing as an upper bound, with<br />

potentially substantial savings through the use of MT.<br />

Project partners include DCU, Symantec, VistaTEC<br />

and Welocalize. The second ILT1 TIDA (Dr. Antonio<br />

Toral, DCU) focuses on Iterative Retraining of an MT<br />

System with Post-Edits. This is particularly important as<br />

mistakes in MT output corrected by human professional<br />

translators should be made as available as possible as<br />

additional training material to the MT systems in order<br />

to prevent similar mistakes in future. Two challenges<br />

need to be overcome: (i) full retaining of a statistical MT<br />

system is time consuming and computationally expensive<br />

and (ii) post-edits generally constitute a small amount of<br />

additional data unlikely to sway a substantial statistical<br />

MT model. Both challenges are addressed in the TIDA,<br />

partly based on previous <strong>CNGL</strong> ILT1 basic research<br />

reported in Banerjee et al. (<strong>2012</strong>). Recruitment for the<br />

Retraining TIDA is under way.<br />

Parts of <strong>CNGL</strong>’s MT technology have been successfully<br />

licensed for evaluation to a new spin-out company,<br />

Xcelerator Machine Translations Ltd. Founded by Tony<br />

O’Dowd, previously CEO of <strong>CNGL</strong> industry partner<br />

Alchemy, Xcelerator provides cloud-based MT solutions<br />

to individual translators and mid-sized localisation<br />

service providers through its KantanMT cloud-based<br />

MT platform. The company’s vision is to make machine<br />

translation simple to use for everyone.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 41<br />

A former full-time academic with <strong>CNGL</strong> ILT2 at UCD,<br />

Dr. Peter Cahill headed up the <strong>CNGL</strong> spin-out company,<br />

Scream Technologies. It specialises in creating synthetic<br />

voices from human actors, enabling companies to create<br />

human-sounding synthetic speech and control how<br />

it sounds. Dr. Peter Cahill has been named as one of<br />

Ireland’s top technology and start-up leaders.<br />

Continued collaborations between ILT3 and VistaTEC,<br />

Digital Linguistics and Symantec are planned. 2013 will<br />

also deliver direct engagement with Microsoft, through<br />

the deployment of text classification tools developed in<br />

<strong>CNGL</strong>I in the context of <strong>CNGL</strong>II, particularly in the area<br />

of classifying offensive content (this will address both<br />

linguistic and non-linguistic content).<br />

Plans for 2013<br />

For ILT, <strong>2012</strong>, <strong>CNGL</strong> Year 5, was dominated by high<br />

research and publication output, a large number of ILT<br />

PhD students completing, strong industry engagement<br />

and extensive preparations for the second-cycle <strong>CNGL</strong><br />

(<strong>CNGL</strong>II) application and site-review.<br />

ILT technologies will be spread into three key <strong>CNGL</strong>II<br />

themes supporting the Global Content Value Chainbased<br />

architecture of <strong>CNGL</strong>II: ILT3 (Text Analytics) will<br />

move into the <strong>CNGL</strong>II Curation theme, ILT1 (MT) will<br />

move into the Translation and Localisation Theme, while<br />

ILT2 (Speech) will move to the Delivery and Interaction<br />

theme.<br />

2013 will see the completion of a number of ILT-affiliated<br />

EU FP7 projects including the CoSyne and Panacea<br />

STREPs, the META-NET/T4ME Network of Excellence,<br />

and the PLuTO Public Private Partnership, all with key<br />

involvement and successful contributions from project<br />

partner DCU.<br />

At the same time, the ILT-affiliated EU FP7 Support<br />

Action QTLaunchPad will be under full steam in 2013.<br />

QTLaunchPad is charged to develop research and<br />

innovation scenarios including community mobilisation<br />

and technology support for shared tasks in the area<br />

of high-quality machine translation, focusing on novel<br />

quality metrics, quality estimation and targeting specific<br />

MT quality barriers. QTLaunchPad partner DCU is<br />

contributing key expertise. Likewise, the prestigious<br />

EXPERT EU Marie Curie PhD graduate school and<br />

mobility programme was launched at the end of <strong>2012</strong><br />

and PhD candidates will start in early 2013. EXPERT<br />

partner DCU will host 2 PhD students working on<br />

MT system combination and human-centric aspects<br />

of MT technology development. The EU FP7-funded<br />

MultilingualWeb-LT support action involves <strong>CNGL</strong><br />

partners TCD, DCU, UL, Microsoft and VistaTEC,<br />

and continues to focus on developing important<br />

standards and interoperability for multilingual content<br />

management. The EU FP7 Abu-MaTran project (Dr.<br />

Antonio Toral) will tackle the multilingualism challenge<br />

through an Industry-Academia partnership.<br />

The first <strong>CNGL</strong> funding cycle is going into a non-costed<br />

extension phase (December <strong>2012</strong> – November 2013),<br />

completing a small number of <strong>CNGL</strong> research and PhD<br />

projects and preparing and supporting the transition to<br />

<strong>CNGL</strong>II.<br />

ILT1: Machine Translation<br />

Prof. Qun Liu has fully taken charge of the DCU MT<br />

Group and will drive cooperation with research partners<br />

in particular at the Chinese Academy of Sciences as well<br />

as exploring commercial opportunities in the area of<br />

localisation with Chinese industry partners.<br />

Walid Aransa (LIUM, France), Luong Ngoc Quang (LIG, France),<br />

Dr. Antonio Toral (DCU) pictured at the MT Marathon <strong>2012</strong> in<br />

Edinburgh in September


42<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Commercial activities will continue to provide a strong<br />

focus for ILT1 in 2013, focusing in particular on extended<br />

collaborations with <strong>CNGL</strong> start-up company Xcelerator<br />

and <strong>CNGL</strong> industry partners Welocalize and Symantec.<br />

The TMTPrime TIDA (Dr. Declan Groves) has produced<br />

mature TM/MT combination technologies based on<br />

automatic quality prediction. TMTPrime technologies<br />

will be showcased in global localisation industry events<br />

including GALA (Miami, 2013). The second TIDA on<br />

efficient MT retraining technologies (Dr. Antonio Toral)<br />

will provide new opportunities to immediately use user<br />

feedback (such as post-editing corrections) to improve<br />

MT.<br />

ILT2: Speech<br />

ILT2 will see the completion of on-going PhD theses<br />

on prosody, speech-to-speech translation and emotive<br />

speech. Due to Dr. Peter Cahill’s (UCD) departure in<br />

order to lead the Scream Technologies <strong>CNGL</strong> spin-out<br />

company, the remaining UCD speech group (Dr. Joao<br />

Cabral) will transition to Prof. Nick Campbell’s Delivery<br />

and Interaction theme at TCD early in 2013.<br />

ILT3: Text Analytics<br />

ILT3 will complete documentation of results from the<br />

use of ILT3 text-classification methods in selecting<br />

appropriate items for training MT systems for data sets<br />

with otherwise little directly appropriate material.<br />

The analysis of epistemic markers and social signals<br />

in expert forum contexts has shown promise. ILT3 will<br />

continue to develop these analyses and seek additional<br />

ways to fund further follow-on study, including through<br />

exploitations of the methods developed and conclusions<br />

drawn from the study of linguistic and pragmatic<br />

behaviours in the Symantec user forum.<br />

Participation in text classification tasks is already planned<br />

in areas of spotting predatory contributions in social<br />

networks and other authorship attribution exercises in an<br />

upcoming CLEF shared task.<br />

Work on text classification methods extends into <strong>CNGL</strong>II,<br />

with collaborations planned with VistaTEC, Digital<br />

Linguistics, Symantec and Microsoft.


Digital Content<br />

Management


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 45<br />

Strand Name: Digital Content Management<br />

AREA CO-ORDINATOR:<br />

PROF. VINCENT WADE<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Dr. Fred Hollowood<br />

Dr. Johann Roturier<br />

Mr. Jason Rickard<br />

Symantec<br />

Symantec<br />

Symantec<br />

Prof. Helen Ashman<br />

Dr. Prasenjit Majumder<br />

University of Southern<br />

Australia<br />

DAIICT, Gandhinagar India<br />

Mr. Dag Schmidtke<br />

Microsoft<br />

Dr. Alexander Troussov<br />

IBM<br />

Mr. Takeshi Fukunaga<br />

Dai Nippon Printing<br />

Mr. Hideyuki Suzuki<br />

Dai Nippon Printing<br />

Faculty<br />

Dr. Owen Conlan Trinity College Dublin DCM3<br />

Dr. Gareth Jones Dublin City University DCM1 Workpackage Leader<br />

Prof. Declan O’Sullivan Trinity College Dublin DCM2<br />

Dr. Claus Pahl Dublin City University DCM2<br />

Ms. Mary Sharp Trinity College Dublin DCM3<br />

Dr. Tony Veale University College Dublin DCM2 Workpackage Leader<br />

Prof. Vincent Wade Trinity College Dublin DCM3 Workpackage Leader<br />

Postdoctoral Researchers<br />

Dr. Declan Dagger Trinity College Dublin DCM3<br />

Dr. Yanfen Hao University College Dublin DCM2<br />

Prof. Séamus Lawless Trinity College Dublin DCM3<br />

Dr. Johannes Leveling Dublin City University DCM1<br />

Dr. Alexander O’Connor Trinity College Dublin DCM2<br />

Mr. Ian O’Keeffe Trinity College Dublin DCM3<br />

Dr. Melike Sah Trinity College Dublin DCM2<br />

Dr. Dong Zhou Trinity College Dublin DCM1


46<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

PhD Students<br />

Mr. Yalemisew Mintesinot Abgaz Dublin City University DCM2<br />

Ms. Yi Chen Dublin City University DCM3<br />

Mr. Mourad El Moueddeb University College Dublin DCM2<br />

Ms. Bo Fu Trinity College Dublin DCM2<br />

Mr. Debasis Ganguly Dublin City University DCM3<br />

Mr. Mohammed Rami Ghorab Trinity College Dublin DCM1<br />

Mr. Brendan Spillane Trinity College Dublin DCM2<br />

Mr. Muhammad Javed Dublin City University DCM2<br />

Mr. Kevin Koidl Trinity College Dublin DCM3<br />

Mr. Killian Levacher Trinity College Dublin DCM2<br />

Mr. Guofu Li University College Dublin DCM2<br />

Ms. Wei Li Dublin City University DCM2<br />

Ms. Alejandra López Fernández University College Dublin DCM2<br />

Mr. Walid Magdy Dublin City University DCM1<br />

Mr. Jinming Min Dublin City University DCM1<br />

Ms. Catherine Mulwa Trinity College Dublin DCM3<br />

Mr. Neil Peirce Trinity College Dublin DCM3<br />

Mr. Ben Steichen Trinity College Dublin DCM3<br />

Research Assistants<br />

Mr. David Foley Trinity College Dublin DCM3<br />

Mr. Brian Gallagher Trinity College Dublin DCM3<br />

Ms. Yang Yang Trinity College Dublin DCM3<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €680,141<br />

SFI TIDA Award – UNITE: Personalised Cross-site<br />

Personalisation €60,000<br />

<strong>2012</strong> Funding from Other Sources<br />

EC FP7 Cendari TCD €120,000<br />

Enterprise Ireland Learning Technology Centre<br />

€3,000,000<br />

SFI TIDA Award – Linguabox: Automated Open Content<br />

Repurposing Service to support Personalised eLearning<br />

€87,768<br />

SFI TIDA Award – An Integrated Software Suite to<br />

provide Next Generation Personalised Multilingual<br />

Customer Care €67,748


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 47<br />

Research Overview: Digital<br />

Content Management (DCM)<br />

Goals<br />

The key challenge of the DCM research track is to<br />

provide a step change in multilingual digital content<br />

management to enable the delivery of next generation<br />

localisation 2 . DCM focuses on three areas: (i) user query<br />

enhancement; (ii) content metadata and knowledge<br />

model development; and (iii) adaptive content<br />

retrieval and dynamic composition of localised content<br />

(customised for the user’s needs and context of use).<br />

Because users need to gain access to information across<br />

many content boundaries, the DCM research entails not<br />

just traditional corporate content but also open corpus<br />

content, user-generated content (blogs discussion fora,<br />

blogs, wikis) and social networking interactions (tweets,<br />

postings, shared ‘walls’, location check-ins, etc.). The<br />

DCM research track is divided across three work areas,<br />

called DCM1, DCM2 and DCM3:<br />

} Enhancement of user queries based on user context<br />

information and feedback (DCM1)<br />

} Automation and semi-automation of the generation<br />

of knowledge models, metadata and identification of<br />

sentiment required for digital content management<br />

and personalised (re)composition (DCM2)<br />

} Support for dynamic composition of personalised<br />

digital content, customised for the user’s need<br />

and context across such diverse content areas as<br />

corporate, open corpora as well as user-generated<br />

content or content generated via social networking<br />

(DCM3)<br />

This research is integrated across the <strong>CNGL</strong> research<br />

tracks via combined prototypes, experiments and the<br />

<strong>CNGL</strong> Demonstrators. DCM has demonstrated its<br />

ground-breaking technologies within many application<br />

domains such as Personalised Multilingual Customer<br />

Care, Personalised Multilingual Social Networking, and<br />

Personalised Information and Learning Portals, etc.<br />

Such demonstrator systems allow the DCM research<br />

to illustrate the impact of its technology as well as<br />

2 Next Generation Localisation seeks to enable people to interact with<br />

digital content, products, services and each other, in their own language,<br />

according to their own culture, and according to their own personal needs<br />

and preferences.<br />

demonstrate the benefits of integration with all other<br />

<strong>CNGL</strong> research tracks. For example, DCM researchers<br />

collaborate with ILT’s experts on multilingual translation,<br />

speech recognition/synthesis for multimodal operation,<br />

and text analysis for enhanced understanding of the<br />

content).<br />

Research Barriers and Methodologies<br />

to Address Them<br />

With the increasing volume of digital content and<br />

the diversity of sources from which they are created<br />

(e.g. corporate content, user-generated content,<br />

social networking, community content), it is becoming<br />

impossible to discover, manually annotate, slice and<br />

compose appropriate digital content, rendered in the<br />

language and device suited to the intended users.<br />

In addition, next generation localisation is not just<br />

about corporate localisation but must be adapted to<br />

the individual user’s context, languages, preference<br />

and means of access. Therefore, next generation<br />

localisation must not only be adaptive to specific<br />

corporate localisation requirements, but also satisfy<br />

the individual user’s need for information by adapting<br />

it to the context, language, preferences and preferred<br />

delivery device of the individual. DCM research in Year<br />

5 focused increasingly on addressing the problems of<br />

dynamic user-generated (multilingual) content as well<br />

as corporate and open web content. This increased<br />

integration of global social media into the DCM research<br />

is a significant development of <strong>CNGL</strong> research.<br />

The three principal areas of DCM research relate to the<br />

challenges of (i) more accurately identifying and selectively<br />

retrieving appropriate content; (ii) capturing and modelling<br />

knowledge in a structured, reusable way so that the<br />

multilingual, heterogeneous content can be more easily<br />

managed and transformed; and (iii) supporting the user<br />

by harnessing adaptivity/personalisation (based on the<br />

user’s context) to give the user significantly improved<br />

exploration of the information he/she needs. Also<br />

involved in this research is the development of new ways<br />

to evaluate the impact and performance of adaptive<br />

(personalised) systems. A central theme running through<br />

all of these challenges is the need to provide the<br />

information in a form that is tailored to the user’s<br />

requirements, preferences and context, and which<br />

includes not only the direct response to his/her initial<br />

queries, but delivers a unique information presentation<br />

tailored to his/her context, preferences and task.


48<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

The approach taken in the DCM research is to enhance<br />

and combine key aspects of Adaptive Hypermedia<br />

(AH) and Information Retrieval (IR) research to provide<br />

techniques, technology and prototype systems to<br />

implement advanced retrieval, slicing and adaptive<br />

composition of multilingual digital content. The DCM1<br />

work package addresses the issues of personalised and<br />

contextualised multilingual IR and, more specifically,<br />

query enhancement. DCM1 research includes the<br />

application of cross-lingual techniques to permit users<br />

to gain access to information which is not in their native<br />

tongues. It also focuses on Personalised IR (PIR) to<br />

incorporate the use of user modelling techniques to<br />

alter the behaviour of IR systems. The approach employs<br />

techniques from IR and AH to produce hybrid Adaptive<br />

IR systems.<br />

integrated with adaptive hypermedia composition and<br />

social media aggregation techniques developed within<br />

DCM3.<br />

The focus of DCM2 is on the metadata and knowledge<br />

models required by systems to provide this more<br />

intelligent behaviour. DCM2 includes work on generating,<br />

managing and linking structured knowledge in the form<br />

of ontologies, content knowledge models and metadata<br />

description. The main focus of this work is on addressing<br />

the shortcomings in current work on creating and sharing<br />

metadata between different intelligent systems, slicing<br />

content so as to be more easily reused and recomposed<br />

(for personalisation) and deriving knowledge models to<br />

determine aspects of the content and user context e.g.<br />

sentiment.<br />

Finally, DCM3 focuses directly on recomposing and<br />

aggregating content and evaluating the quality and<br />

impact of adaptive systems. A key aspect of this<br />

challenge is the source of the content. DCM3 investigates<br />

the automatic re-composition and aggregation of<br />

content from corporate information repositories,<br />

open documents, user fora and discussion lists, blogs,<br />

shared community content (wikis), social networking<br />

interactions and social media (tweets, postings, shared<br />

‘walls’, etc.). DCM3 focuses on the aggregation and recomposition<br />

of these different forms of digital content to<br />

provide personalised responses for a user.<br />

Although presented separately above, the three Work<br />

Packages are highly integrated. For example, the<br />

metadata models and knowledge models produced<br />

in DCM2 are utilised in DCM3 and DCM1. Also, the<br />

techniques developed in DCM1 for multilingual query<br />

enhancement and Personalised IR techniques are<br />

DCM undergraduate intern Ciarán Porter of Trinity College Dublin<br />

(above right) presents his work on ‘Crowd Sourcing for Query<br />

Development and Relevance Judgement’ at the <strong>CNGL</strong> undergraduate<br />

intern showcase<br />

Year 5 Progress<br />

DCM research in Year 5 has achieved significant impact<br />

both in the quality of its scientific breakthroughs and the<br />

demonstration of industrial potential. DCM has published<br />

over 30 peer-reviewed journal and international<br />

conference publications this year. Journal publications<br />

include scientific papers in ACM Computing Surveys,<br />

UMUAI, Journal of IR, Web Semantic Journal, while<br />

international conference papers included publications<br />

in ACM Hypertext, SIGIR, AAAI, UMAP, COLING, CIKM,<br />

and TPDL.<br />

Progress in DCM1<br />

The research conducted in DCM1 has continued to<br />

deliver significant advancements in the area of adaptive<br />

information retrieval (IR). These advancements are<br />

achieved by enhancing both the queries a user submits<br />

to a search engine, and the results that are returned.<br />

The research developed by DCM1 utilise contextual<br />

information about individual users and implicit and<br />

explicit feedback to create more accurate or more<br />

appropriate queries, to improve the relevancy of search<br />

results and to tailor the presentation of results to that<br />

individual.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 49<br />

Continued progress has been made in enhancing the<br />

existing Personalised Multilingual IR (PMIR) framework,<br />

which employs a number of algorithms to perform both<br />

query and result adaptation. This work is part of the<br />

DCM1 focus on intelligent content discovery and delivery<br />

which is both multilingual and personalised. The PMIR<br />

framework allows multilingual resource discovery and<br />

delivery using on-the-fly machine translation of user<br />

queries and content. The Microsoft Bing API is used<br />

to perform multilingual searches on the open web.<br />

Result lists are then personalised for the individual user<br />

before being presented. The framework is designed<br />

to enable new components and approaches to be<br />

easily integrated and tested as part of an overall search<br />

process. In <strong>2012</strong> the development of the framework has<br />

been completed. The completed framework has been<br />

thoroughly evaluated as part of an experiment with real<br />

users in an authentic search setting. This evaluation<br />

demonstrated improvements in multilingual information<br />

retrieval using query and result adaptation based upon a<br />

multilingual user model. This research has been detailed<br />

in the high-impact Journal of User Modeling and User-<br />

Adapted Interaction, UMUAI (Ghorab et al., <strong>2012</strong>a).<br />

The framework has also been successfully showcased<br />

at the 20th International Conference on User Modeling,<br />

Adaptation and Personalization, UMAP <strong>2012</strong>, in<br />

Montréal, Canada (Ghorab et al., <strong>2012</strong>b). An additional<br />

publication has been submitted to the 22nd International<br />

World Wide Web Conference, for which we are awaiting<br />

review confirmation.<br />

metric for evaluation of patent retrieval effectiveness<br />

developed previously by DCM1, was developed for the<br />

speech retrieval domain as an evaluation metric. PRES<br />

has had continued successful take-up in official patent<br />

retrieval benchmarking tasks at international conferences<br />

and competitions, e.g. CLEF <strong>2012</strong> (CLEF-IP).<br />

DCM research has focused to a larger extent on<br />

processing user-generated queries and content such<br />

as tweets and SMS, as well as processing noisy domainspecific<br />

data. DCM researchers have discovered that<br />

information retrieval tasks on such user-generated<br />

content can benefit from error correction (e.g. from<br />

OCR, spelling errors) and handling domain terminology<br />

(e.g. abbreviations, acronyms, and technical terms).<br />

DCM established a simple but strong retrieval baseline<br />

(without domain adaptation) which would have ranked<br />

among the top five participating groups at the<br />

international TRACMed event 2011.<br />

Collaborative research has continued with DCM3 to<br />

enhance techniques for personalising the web search<br />

using social tagging data. Personalised query expansion is<br />

performed which helps to solve the vocabulary mismatch<br />

problem (Zhou et al., <strong>2012</strong>b). A novel query expansion<br />

framework has been developed which generates<br />

individual user models based upon the data mined from<br />

annotations a user has made and resources the user has<br />

bookmarked on the social bookmarking platform Del.<br />

icio.us. This approach has been extensively evaluated<br />

using test collections created by crawling authentic social<br />

media data from the web. This has resulted in a highimpact<br />

publication in the most high-profile IR venue, the<br />

Journal of Information Retrieval (Zhou et al., <strong>2012</strong>a).<br />

DCM1 has continued research in cross-language IR<br />

and IR for low-resourced languages such as Bengali or<br />

Hindim. A variant of PRES, the patent retrieval score<br />

Prof. Séamus Lawless of <strong>CNGL</strong> presents research on Web Search<br />

Personalization Using Social Data at TPDL <strong>2012</strong> in September <strong>2012</strong><br />

in Paphos, Cyprus<br />

DCM1 researchers have also been involved in the<br />

organisation of various important IR events and<br />

workshops – none more so than the 36th <strong>Annual</strong> ACM<br />

SIGIR Conference, which <strong>CNGL</strong> will host in Dublin in July<br />

2013. SIGIR has significant leadership drawn from <strong>CNGL</strong><br />

(DCM) academics and staff:<br />

} General Chair – Dr. Gareth Jones<br />

} Workshops Co-Chair – Prof. Vincent Wade<br />

} Tutorials Co-Chair – Prof. Séamus Lawless<br />

} Local Organising Chair – Prof. Séamus Lawless<br />

} Publications Chairs – Dr. Liadh Kelly and Dr. Lorraine<br />

Goeuriot


50<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

(Min et al., <strong>2012</strong>). One experiment explored a novel<br />

method for rewriting textual content to make it fit on<br />

devices with limited screen size (i.e. mobile devices)<br />

while retaining the readability.<br />

DCM has also cooperated with researchers in other<br />

tracks in <strong>CNGL</strong> and commercial partners. DCM1<br />

continued to integrate with research components<br />

produced by DCM2 and 3, as well as components<br />

from ILT, LOC and SF as part of the overall <strong>CNGL</strong><br />

Demonstrator Programme. The ClipArt search demo<br />

system was showcased in the SFI review and at the<br />

Localisation Innovation Showcase in 2011. It was adapted<br />

to mobile devices such as iPhones and iPads to provide<br />

image search on mobile devices and showcased at SIGIR<br />

<strong>2012</strong>.<br />

Dr. Páraic Sheridan and Dr. Gareth Jones of <strong>CNGL</strong> introduce SIGIR 2013<br />

to attendees at SIGIR <strong>2012</strong> in August <strong>2012</strong> in Portland, Oregon, USA.<br />

<strong>CNGL</strong> will host SIGIR 2013 in Dublin in July<br />

In terms of publications, DCM1 has maintained<br />

significant publication success in top journals and<br />

high impact conferences including ACM Computing<br />

Surveys (#1 ranked journal in computer science in the<br />

world), Journal of Information Retrieval, Journal of User<br />

Modeling and User Adapted Interaction, UMAP <strong>2012</strong>,<br />

TPDL <strong>2012</strong>, DocEng <strong>2012</strong>, etc. DCM has continued<br />

research in cross-language IR and IR for low-resourced<br />

languages such as Bengali or Hindim (Ganguly et al.,<br />

<strong>2012</strong>), (Ganguly et al., <strong>2012</strong>b), (Ganguly et al., <strong>2012</strong>c),<br />

(Leveling, <strong>2012</strong>).<br />

A recent research topic in IR are topic models, which can<br />

be used to model topical cohesion in digital content and<br />

to enhance IR effectiveness in general (Ganguly et al.,<br />

<strong>2012</strong>b) (Ganguly et al., <strong>2012</strong>c). On-going work in DCM<br />

aims at improving the user’s search experience, query<br />

formulation, and navigation in search results through<br />

topic model visualisation. DCM research still focuses<br />

on domain adaptation and domain-specific IR. In the<br />

medical domain, we conducted retrieval experiments on<br />

patient records from the TREC medical record retrieval<br />

track.<br />

DCM established a simple but strong retrieval baseline<br />

(without domain adaptation) which would have ranked<br />

among the top five participating groups on 2011 data<br />

(Leveling et al., <strong>2012</strong>). DCM also investigated adaptation<br />

of search to mobile devices (Leveling and Jones, <strong>2012</strong>),<br />

In addition, collaboration with the machine translation<br />

research group in the ILT track investigated the<br />

combination of techniques from information retrieval<br />

and machine translation to speed up fuzzy matching for<br />

machine translation (Leveling et al., <strong>2012</strong>b).<br />

The approaches above have been extensively evaluated<br />

on benchmark data provided by the organisers of TREC,<br />

CLEF, INEX and FIRE as well as on collections created by<br />

crawling the social media data. (Leveling, <strong>2012</strong>), (Ganguly<br />

et al., <strong>2012</strong>) (Leveling et al., <strong>2012</strong>), (Leveling and Jones,<br />

<strong>2012</strong>).<br />

Two of our PhD students have finished their internships<br />

in Microsoft Ireland in the area of multilingual query and<br />

personalisation.<br />

DCM1 researchers also organised various important IR<br />

events and workshops. <strong>CNGL</strong> co-organised the second<br />

workshop on Personalised Multilingual Hypertext<br />

Retrieval (PMHR <strong>2012</strong>) at Web Science <strong>2012</strong>. DCM1<br />

researchers also organised an evaluation task on<br />

personalised and collaborative information retrieval (PIR)<br />

at FIRE <strong>2012</strong>.<br />

A significant number of <strong>CNGL</strong> supervised Masters<br />

dissertations and final year undergraduate projects were<br />

submitted and graded in <strong>2012</strong>. A DCM1-specific Masters<br />

dissertation is currently underway in Trinity College<br />

Dublin under the supervision of Prof. Séamus Lawless<br />

investigating “Selecting Appropriate Verticals for Web<br />

Search Results”.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 51<br />

Progress in DCM2<br />

DCM2, the work package concerned with digital content<br />

knowledge modelling, extraction and organisation,<br />

recorded several key successes during <strong>2012</strong>.<br />

The research in structural content analysis for web<br />

slicing has progressed significantly. This has included<br />

the development and evaluation of a slicing tool that<br />

can extract important textual content from open-corpus<br />

web content. The system was evaluated in a successful<br />

user trial, which demonstrated the applicability of the<br />

technique in the area of language learning. This work<br />

will be further developed in 2013 under the SFI TIDA<br />

programme to support language learning through userrelevant<br />

resources harvested from the open web. It is<br />

expected that this research will result in a completed<br />

PhD in early 2013.<br />

The collaboration between DCM2 and DCM3 has<br />

continued in the area of Personalised Multilingual<br />

Customer Care, resulting in an SFI feasibility project,<br />

which developed a commercial-strength version of the<br />

research software. The Emizar system provides users<br />

with a modern, supportive environment for personalised<br />

access to federated content across several support<br />

repositories.<br />

In terms of other collaboration, DCM2 researchers have<br />

continued to work in the Digital Humanities domain,<br />

collaborating with the CULTURA EU FP7 affiliate project,<br />

co-ordinated at Trinity College Dublin.<br />

DCM2 researchers have published at several key<br />

conferences in areas such as eLearning, Hypertext and<br />

Hypermedia, and have had several successful grant<br />

applications, including an SFI Technology Innovation<br />

Development Award (TIDA).<br />

Research and development in DCM2 on lightweight<br />

subject models matured and coalesced in interesting<br />

ways in <strong>2012</strong>. This cohesion was achieved via the<br />

development of the MOODfinger framework for affective<br />

news retrieval. MOODfinger conducts continuous<br />

gathering and indexing of daily news from major web<br />

news sites, and performs affective analysis of each new<br />

story to facilitate future affective retrieval. Lightweight<br />

stereotypical models of familiar ideas are automatically<br />

acquired from the web, and are used to identify the<br />

most interesting and most affect-rich areas of a news<br />

story. These models support powerful affective query<br />

expansion and subsequent affective summarisation of<br />

any retrieved news. Publications on MOODfinger were<br />

presented at top natural language processing (NLP) and<br />

web conferences in <strong>2012</strong>, including ACL <strong>2012</strong> and WWW<br />

<strong>2012</strong>, while the MOODfinger prototype (and related<br />

natural language technologies developed within DCM2)<br />

was also showcased in public demonstrations at these<br />

conferences. MOODfinger represents both a culmination<br />

of work in DCM2 and a sound foundation for future work<br />

in affective text understanding. MOODfinger continues<br />

to be vigorously maintained and developed.<br />

Much of this (MOODfinger) work in DCM2 has focused<br />

on the challenges posed by creative language use (which<br />

is to say, the non-obvious use of familiar words and<br />

ideas). Several publications showcase our achievements<br />

in this area, such as the monograph Exploding the<br />

Creativity Myth: The Computational Foundations<br />

of Linguistic Creativity (T. Veale, from Bloomsbury<br />

Academic) and the collected volume Creativity and the<br />

Agile Mind (principal co-editor T. Veale, from Mouton<br />

deGruyter). We have helped shape European policy<br />

on computational creativity by contributing to expert<br />

consultation sessions with the European Commission,<br />

and have influenced the latest EU ICT call, which now<br />

explicitly lists Computational Creativity as a fundable<br />

objective. Building on work in DCM2, we have secured<br />

EU funding for an international coordination action<br />

to promote the field of computational creativity<br />

(PROSECCO: PROmoting the Scientific Exploration<br />

of Computational Creativity). The project will run for<br />

three years under the scientific leadership of T. Veale<br />

in UCD, and will – through its organisation of contact<br />

forums, summer schools and code camps – serve as a<br />

force magnifier for disseminating the results of DCM2<br />

research. The leadership role of DCM researchers<br />

in the computational creativity community was<br />

further emphasised by UCD’s organisation of the 3rd<br />

International Conference on Computational Creativity<br />

(ICCC <strong>2012</strong>) Dublin in May <strong>2012</strong>, which received<br />

logistical and financial support from <strong>CNGL</strong>.


52<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

Mr. Seán Sherlock, T.D., Minister for Research and Innovation,<br />

launches the <strong>CNGL</strong>-affiliated Learnovate Centre in June <strong>2012</strong><br />

Katrin Drescher of award sponsors Symantec presents the LRC Best<br />

Thesis Award to Prof. Vincent Wade, who accepts the award on behalf<br />

Dr. Ben Steichen. Also pictured is Reinhard Schäler of LRC/<strong>CNGL</strong><br />

Progress in DCM3<br />

DCM3 is responsible for the development of systems<br />

which provide dynamic aggregation and composition<br />

of content, customised for the user’s need and context.<br />

Such content can be sourced from diverse sources i.e.<br />

corporate knowledge bases, open web content and<br />

user-generated content. DCM research in personalised<br />

multilingual content has resulted in significant<br />

publications as well as industry collaboration. DCM<br />

has progressed both the personalisation and dynamic<br />

aggregation of user-generated content (e.g. blogs,<br />

forum posting, messages), corporate content (e.g.<br />

corporate product manuals, how-to guides, technical<br />

documentation), and open content harvested from the<br />

open web. This research has seen the development<br />

of demonstrators and international evaluation of the<br />

technology across multiple languages and countries. This<br />

research and evaluation has resulted in an international<br />

prize for DCM researcher Ben Steichen and his<br />

supervisor Prof. Vinny Wade (LRC Best Thesis Award<br />

<strong>2012</strong>).<br />

Likewise, the ‘Personalisation as a Service’ research in<br />

DCM 3 has reached maturity with Invention Disclosures<br />

being lodged and evaluation of demonstrators across<br />

multiple third party websites being conducted.<br />

A key impact of DCM 3 research has been the industry<br />

engagement in the evaluation of the technology and the<br />

resultant planning for two <strong>CNGL</strong> spinout companies.<br />

These spinout companies are in the area of Multilingual<br />

Personalised Customer Care (Emizar www.emizar.com)<br />

and Personalisation-as-a-service (Wripl www.wripl.com).<br />

The Wripl cross-site personalisation system has<br />

undergone several refinements, and plugins for major<br />

content management systems platforms including<br />

Wordpress have been released. The Wripl team<br />

has concluded its SFI TIDA-funded programme and<br />

is collaborating with Enterprise Ireland on further<br />

developing the company and its product. From the<br />

research perspective, this work has been successfully<br />

evaluated in several experiments, and it is expected that<br />

a PhD will be completed in early 2013.<br />

The Emizar project will complete its SFI TIDA feasibility<br />

study in 2013, and is collaborating with Enterprise Ireland<br />

to develop the product and company. A full launch and<br />

technology licence agreement are expected in early<br />

2013.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 53<br />

Neil Peirce presents his PhD work at the national final of the Thesis<br />

in 3 competition<br />

Industry Engagement<br />

Two of DCM’s PhD students (Rami Ghorab and<br />

Jinming Min) successfully completed their internships<br />

in Microsoft Ireland in area of multilingual query<br />

and personalisation. This Personalised Multilingual<br />

Information Retrieval demonstrator showcased the<br />

enhanced retrieval performance for Microsoft’s Clip Art<br />

collection, and has fully integrated with Microsoft Bing<br />

search and machine translation tools.<br />

Additionally, there is close and on-going collaboration<br />

with Symantec in conducting user trials for the<br />

Personalised Multilingual Customer Care portal. This<br />

has led to <strong>CNGL</strong> DCM researchers making multiple<br />

presentations to senior vice presidents within Symantec<br />

and the planning for a comprehensive trial of <strong>CNGL</strong><br />

technology using Symantec customer care content.<br />

Research in DCM has led to invention disclosures and<br />

one patent application in <strong>2012</strong>. As previously mentioned,<br />

two spinout companies have been planned for <strong>CNGL</strong>,<br />

namely Emizar and Wripl. These spinouts will involve<br />

technologies developed in DCM2 and DCM3.<br />

SFI TIDA funding was sought for a third project<br />

(Linguabox) to investigate the potential for DCM2<br />

technology to support the dynamic slice and rightsizing<br />

of multimedia and user-generated content for learning.<br />

This application was successful and work will begin in<br />

2013.<br />

‘Team wripl’ visits Silicon Valley to connect with local entrepreneurs<br />

and companies. The visit was hosted by the Irish Technology<br />

Leadership Group (ITLG) thanks to wripl’s joint win in the SFI/TIDA<br />

Entrepreneurship course.<br />

Achievements<br />

} DCM research published in over 30 international<br />

journals and conferences in <strong>2012</strong>. Conference<br />

highlights included ACM Hypertext, COLING <strong>2012</strong>,<br />

AAAI <strong>2012</strong>, ACL <strong>2012</strong>, CIKM <strong>2012</strong>, TPDL <strong>2012</strong>, SIGIR<br />

<strong>2012</strong>. Journal highlights include ACM CSUR, UMUAI<br />

and Journal IR publications.<br />

} DCM researchers were involved in the organisation of<br />

various important IR and Personalisation events and<br />

workshops during <strong>2012</strong> including FIRE <strong>2012</strong>, NOMS<br />

<strong>2012</strong>, UMAP <strong>2012</strong>, as well as planning for SIGIR 2013<br />

to be hosted in TCD.<br />

} Prof. Vincent Wade was invited to deliver the keynote<br />

address at ICWL <strong>2012</strong> on Personalisation across Open<br />

Content and Social Media.<br />

} Three PhD students graduated in <strong>2012</strong> in the areas<br />

of Multilingual IR, Adaptive Systems, and Multilingual<br />

Personalisation. A further two students submitted<br />

PhD theses which are currently under examination.<br />

} Two patents are pending from DCM research in the<br />

areas of dynamic content slicing and personalisation<br />

} Significant industry engagement with joint trials<br />

and joint evaluations of multilingual personalisation<br />

technology, e.g. Symantec, Microsoft.


54<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

} Two SFI TIDA grants were awarded to DCM Principal<br />

Investigators Prof. Vincent Wade and Prof. Owen<br />

Conlan for research in personalisation.<br />

} Prof. Vincent Wade established the Enterprise Ireland<br />

Technology Centre for Technology Enhanced Learning<br />

called Learnovate Centre. This centre, which is allied<br />

to <strong>CNGL</strong>, focuses on content technologies and<br />

innovate communication tools for informal learning in<br />

schools, university and corporate training. DCM has<br />

established close collaboration with the new Centre<br />

and is a means of exploiting <strong>CNGL</strong> research results in<br />

the vertical sector of Learning and Education.<br />

} Dr. Tony Veale was Local Chair for the 3rd<br />

International Conference on Computational Creativity<br />

(ICCC <strong>2012</strong>) at UCD.<br />

} Dr. Tony Veale delivered: a keynote (on creative<br />

uses of WordNets) at an event in Oslo hosted by the<br />

National Library of Norway, an invited talk on affective<br />

stereotype acquisition at the ILIKS event in Toulouse,<br />

and 1-week invited course on linguistic creativity at<br />

an autumn school on Computational Creativity in<br />

Helsinki.<br />

} Two spinout companies – Emizar (Aggregration and<br />

Personalisation of Multilingual open content, usergenerated<br />

content and corporate content for selfservice<br />

customer care) and Wripl (Personalisation-asa-service)<br />

– were progressed for spinout in 2013.<br />

} A new SFI TIDA award has been won by Prof. Wade<br />

for the DCM research in automated slicing of content<br />

for reuse and repurposing. This award will further<br />

the development of the technology for informal and<br />

automated e-learning content.<br />

} Two industry internships were successfully completed<br />

in Microsoft by PhD students from TCD and DCU.<br />

Plans<br />

<strong>CNGL</strong>II will be led by Prof. Wade and DCM will be<br />

principally separated into three research themes in<br />

the new <strong>CNGL</strong>II, namely Personalisation; Delivery and<br />

Interaction; and Search and Discovery. <strong>CNGL</strong>II will<br />

progress the research topics from DCM and build on the<br />

success of the DCM research.<br />

Prof. Séamus Lawless pitched Emizar’s investor-ready technology to<br />

hundreds of potential investors and business partners at Enterprise<br />

Ireland’s Big Ideas Showcase <strong>2012</strong>. Emizar was subsequently profiled<br />

in the Sunday Business Post newspaper<br />

<strong>CNGL</strong>I has been granted a no-cost extension to<br />

complete and provide rigorous evaluation of the DCM<br />

technology. This work (January – November 2013)<br />

will see the completion of a number of DCM PhDs as<br />

well as the trialling and evaluation of DCM technology,<br />

specifically in the areas of the Personalised Multilingual<br />

Information Retrieval Framework, Multilingual User<br />

Models, Personalised Multilingual Customer Care trial,<br />

and Evaluation Framework and Tools for Adaptive<br />

(Personalised) Systems.<br />

Conclusion<br />

<strong>2012</strong> was an extremely productive year for DCM in<br />

the two aspects crucial to <strong>CNGL</strong>, namely scientific<br />

excellence and industry impact. DCM researchers have<br />

also maintained and strengthened their leadership in<br />

the respective research areas, and we have seen DCM<br />

PhD students complete their Doctorates and progress<br />

to positions in industry and academia.


Next Generation<br />

Localisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 57<br />

Strand Name: Next Generation Localisation<br />

AREA CO-ORDINATOR:<br />

MR. REINHARD SCHÄLER<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Dr. Fred Hollowood<br />

Mr. Enda McDonnell<br />

Mr. Phil Ritchie<br />

Mr. Dag Schmidtke<br />

Symantec<br />

Alchemy Software<br />

Development<br />

VistaTEC<br />

Microsoft<br />

Dr. Lynne Bowker<br />

Mr. José Eduardo de Lucca<br />

Prof. Patrick Hall<br />

University of Ottawa,<br />

Canada<br />

Universidade Federal<br />

de Santa Catarina, Brazil<br />

Professor Emeritus,<br />

Open University, UK<br />

Dr. James Hogan<br />

Queensland University<br />

of Technology, Australia<br />

Mr. Mahesh Kulkarni<br />

CDAC Pune, India<br />

Ms. Stefanie Scheeder<br />

The Rosetta Foundation,<br />

Germany<br />

Mr. Francis Tsang<br />

Adobe, USA<br />

Faculty<br />

Dr. Jim Buckley University of Limerick LOC3 Leader<br />

Ms. Yvonne Cleary University of Limerick LOC1<br />

Mr. J.J. Collins University of Limerick LOC2 Leader<br />

Dr. Chris Exton University of Limerick LOC1 Leader<br />

Dr. Dorothy Kenny Dublin City University LOC2<br />

Dr. Liam Murray University of Limerick LOC2<br />

Dr. Sharon O’Brien Dublin City University LOC2<br />

Mr. Reinhard Schäler University of Limerick LOC1, LOC2, LOC3, PI<br />

Postdoctoral Researchers<br />

Dr. David Filip University of Limerick LOC1<br />

Dr. Eoin Ó Conchúir University of Limerick LOC3<br />

Dr. Ian O’Keeffe University of Limerick LOC2


58<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

PhD Students<br />

Mr. Solomon Gizaw University of Limerick LOC3.2<br />

Mr. Rajat Gupta University of Limerick LOC2.4<br />

Mr. Joss Moorkens University of Limerick LOC2.2<br />

Ms. Lucía Morado Vázquez University of Limerick LOC1.2<br />

Mr. Aram Morera-Mesa University of Limerick LOC3.3<br />

Mr. Naoto Nishio University of Limerick LOC3.1<br />

Mr. Lorcan Ryan University of Limerick LOC1.1<br />

Mr. Asanka Wasala University of Limerick LOC2.1<br />

Funding<br />

Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €306,611


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 59<br />

Research Strand Overview:<br />

Next Generation Localisation<br />

(LOC)<br />

Since its inception in the 1980s, the localisation industry<br />

has been strongly anchored to the expertise and quality<br />

of both industrial and academic assets in Ireland. Many of<br />

the key, defining, elements of the current industry either<br />

originated with or have strong roots in the pioneering<br />

activities of Irish players in the industry. Indeed, to this<br />

day, research is taking place in Ireland in industry and<br />

academia, as well as industrial academic ventures such as<br />

the <strong>CNGL</strong> that is paving the way for the industry to adapt<br />

and evolve as it moves further, deeper into the 21st<br />

century and the challenges that await it going forward.<br />

The Next Generation Localisation (LOC) track in <strong>CNGL</strong><br />

has a mission to produce world-leading research in (i)<br />

localisation content analysis, (ii) localisation component<br />

technologies evaluation, and (iii) service-oriented<br />

localisation architecture solutions, in collaboration with<br />

the academic and the industrial partners in <strong>CNGL</strong> and<br />

beyond, validated by user communities in (for-profit and<br />

not-for-profit) enterprise localisation.<br />

Taking a view that reaches beyond traditional avenues<br />

for profit and expansion, the LOC track focuses on<br />

using this research to ensure that Ireland retains its<br />

status in the field of localisation as this is, as stated by<br />

the independent international review panel, which<br />

conducted <strong>CNGL</strong>’s Mid-Term Review in July 2011, “a<br />

key industry for Ireland” and Ireland “must remain at the<br />

technological forefront in order to retain and grow this<br />

highly remunerative activity”.<br />

LOC’s view that flexible architectures, as investigated<br />

by LOC researchers in the Service-Oriented Localisation<br />

Architecture Solution (SOLAS), are key to future<br />

innovative technology frameworks supporting emerging<br />

and future localisation scenarios was also confirmed by<br />

another independent international review panel, which<br />

conducted <strong>CNGL</strong>’s Final Review (July <strong>2012</strong>) as they<br />

commented that “the SOLAS architecture offers a solid<br />

reference implementation that addresses integration and<br />

workflow issues that companies like Adobe, Dell, and<br />

Intel are currently trying to address on their own”.<br />

Having recruited four “additional high-end professional<br />

programmers” and allocating “more budgets to<br />

workflow”, as recommended by the reviewers in 2011,<br />

LOC work on the development of the Service-Oriented<br />

Localisation Architecture Solution (SOLAS) has continued<br />

apace. With work splitting the solution into two distinct<br />

frameworks, SOLAS Match and SOLAS Productivity, LOC<br />

is developing, in parallel, solutions that will cover both<br />

the needs of the traditional return on investment-based<br />

localisation industry, and the increasingly important<br />

non-profit and non-market localisation communities.<br />

Indeed, it is this approach that has led the independent<br />

review panel to note in the <strong>CNGL</strong> final review that<br />

LOC and by extension The Rosetta Foundation spinoff<br />

are pioneering “a novel, comprehensive localization<br />

model for organizations seeking to translate content<br />

for underserved communities. The panel feels this<br />

accomplishment has great societal impact that<br />

transcends the boundaries of Ireland and even the EU.”<br />

The views of the international experts on both of these<br />

panels reflect the view of industry thought leaders<br />

consulted by LOC at conferences, such as GALA,<br />

Localisation World and, most recently, the LRC’s 17th<br />

<strong>Annual</strong> International and Localisation Conference.<br />

The following is a brief summary of the LOC track’s vision<br />

and goals agreed and realised in <strong>2012</strong>.<br />

Vision<br />

We empower innovative community and social<br />

localisation efforts driving the most significant growth<br />

opportunity for the industry.<br />

Goals<br />

} Provide content authors with feedback on the quality<br />

(localisability) and re-usability of their content,<br />

demonstrating the impact of good/bad quality source<br />

content on the localisation effort, specifically in the<br />

context of user-generated content<br />

} Assess and evaluate component technologies for<br />

SOLAS, demonstrating the suitability and adaptability<br />

requirements for components, specifically in the<br />

context of community and social localisation<br />

enterprise


60<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

} Develop the Service-Oriented Localisation<br />

Architecture Solution (SOLAS) as<br />

1. A demonstrator and testbed for innovative<br />

localisation solutions, demonstrating the<br />

innovative aspects of this framework in relation to<br />

existing mainstream paradigms, especially in the<br />

context of the emerging collaborative and social<br />

localisation enterprise.<br />

2. A unique suite of open source technologies that<br />

will be made available to service the needs of all<br />

clients that would require flexible, efficient and<br />

fully standards compliant localisation solutions.<br />

3. The de-facto localisation and translation<br />

technology for not-for-profit and development<br />

localisation activities.<br />

} Integrate <strong>CNGL</strong>, third party and open source<br />

components into SOLAS<br />

} Publish research results in world-leading journals, both<br />

related and localisation-specific, according to agreed<br />

targets<br />

} Continue to provide a forum for the publication of<br />

high-impact, innovative and scientific localisation<br />

research through the indexed, peer-reviewed and<br />

dedicated Localisation journal, Localisation Focus –<br />

The International Journal of Localisation<br />

} <strong>Report</strong> on research activities and solicit feedback<br />

at world-leading conferences, both related and<br />

localisation-specific, according to agreed targets<br />

} Actively contribute to and provide leadership<br />

for international localisation initiatives (industry<br />

associations, standards groups, conferences)<br />

} Expand the open source SOLAS code repository<br />

} Build large and significant developer and user<br />

communities around the LOC effort within <strong>CNGL</strong> and<br />

beyond, according to agreed targets<br />

} Demonstrate the industrial value and impact of the<br />

LOC research by active user engagement and trials<br />

with reference to agreed metrics<br />

} Work with <strong>CNGL</strong> towards a re-allocation of budgets to<br />

support a targeted SOLAS demonstrator development<br />

effort<br />

Fundamental Research Barriers and<br />

Methodologies to Address Them<br />

In order to convince large multinational content<br />

publishers to join open standards-based industrywide<br />

initiatives, small and medium-sized publishers<br />

to invest in state-of-the-art technologies, and nonprofit<br />

organisations to take advantage of a localisation<br />

framework, what is required is a solution that is scalable,<br />

modularised, interoperable and affordable. What is<br />

required is a demonstrator framework capable of<br />

delivering proof that the vision of an open localisation<br />

framework can be achieved. The risks involved in<br />

building such a system are considerable. Leading<br />

global management systems have been developed by<br />

companies such as Idiom and GlobalSight (Ambassador).<br />

However, while they aimed to be comprehensive, they<br />

were not; for example, some services such as machine<br />

translation (MT) never became part of the core offering<br />

of these systems; additional service modules required by<br />

customers can generally not be integrated (and even if<br />

they can, then only backed up by significant investment);<br />

and re-configuration of workflows and adaption to<br />

increasingly dynamic localisation environments are often<br />

connected with prohibitive costs. While these systems<br />

attracted significant investment for their development<br />

(in the region of $50 million in some cases), they never<br />

realised their projected market potential and return on<br />

investment.<br />

Although existing systems demonstrate a good<br />

understanding of basic technologies required for a stable<br />

corporate localisation framework, our research has<br />

shown that their overall architecture is not suitable as<br />

the backbone for a modularised, extensible and dynamic<br />

framework, to enable seamless data flows, and to allow<br />

for the automatic configuration and execution of tasks.<br />

During Years 3 and 4 of <strong>CNGL</strong>, the original <strong>CNGL</strong> Bulk<br />

Localisation Workflow (BLW) demonstrator and the work<br />

within the Next Generation Localisation research area<br />

produced a first version of a service-oriented localisation<br />

architecture solution (SOLAS) that addresses the need<br />

for an open, highly-configurable, loosely-coupled<br />

aggregation of heterogeneous services that can meet the<br />

varying demands of the enterprise, SMEs and the nonprofit<br />

sector. At the same time, it facilitates organisations<br />

with software engineering competencies to leverage the<br />

provided infrastructure encapsulated in the demonstrator


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 61<br />

framework and tailor it to their specific needs through<br />

further component development. During Year 5 of<br />

<strong>CNGL</strong>, work on this demonstrator was branched off<br />

into two parallel streams that would allow more rapid<br />

development, deployment and testing and also coverage<br />

of additional use-case scenarios and closer integration of<br />

cross-<strong>CNGL</strong> research area component technologies, as<br />

well as connections with third party technologies, such as<br />

commercial MT and established web technology systems.<br />

This branching allows the two resultant technologies,<br />

SOLAS Productivity and SOLAS Match, to move even<br />

further away from the family of platforms whose large<br />

footprints have often proven cost prohibitive, and refine<br />

the Service-Oriented Architecture (SOA) philosophy that<br />

enables the development of a component marketplace<br />

for the platform. SOLAS Productivity makes use of a<br />

standardised data container, open web service APIs,<br />

and a common orchestration and process management<br />

module, which connect to any number of component<br />

technologies developed by academic and industrial<br />

partners within <strong>CNGL</strong> as well as with third party<br />

technologies and tools. SOLAS Match provides groundbreaking<br />

and intuitive technology that allows for the<br />

seamless, and user friendly, matching of community<br />

translation tasks with volunteer translators. This open<br />

source technology revolutionises the distribution<br />

and management of translation tasks using simplified<br />

web interfaces matched with sophisticated back-end<br />

technologies. The use of SOLAS Match increases speed<br />

and reduces overhead for these translation tasks and<br />

as such is perfectly positioned to be adopted by any<br />

number of not-for-profit and non-market localisation<br />

organisations.<br />

In SOLAS technologies, researchers have gained access<br />

to a common standards-based and interoperable open<br />

source localisation eco-system for their research, similar<br />

to those available to the MT communities with Moses<br />

and to the speech communities with platforms such<br />

as the Festival Speech Synthesis System or the MuSE<br />

speech technology platform. SOLAS is the first working<br />

innovation platform developed in its entirety within<br />

<strong>CNGL</strong>.<br />

Research Strand Overview: Next Generation<br />

Localisation<br />

In LOC, research concentrates on the improvement<br />

of key areas of localisation automation, such as the<br />

construction of a common, standards-based data<br />

model to develop, process and maintain localisation<br />

knowledge (LOC1) (Ryan, 2010; Morado Vázquez<br />

and Mooney, 2010; Anastasiou and Morado Vázquez,<br />

2010; Anastasiou, 2010); the interoperability of suitable<br />

tools and technologies, the assessment of quality<br />

measurement methodologies, and the facilitation of<br />

crowdsourcing and collaboration (LOC2) (Nishio et al.,<br />

2010; Wasala et al., 2010; Gupta and Aouad, 2010; Exton<br />

et al., 2010); and the modelling of intelligent localisation<br />

processes, workflows and process management (LOC3)<br />

(Filip and O’Conchúir, 2011; Lenker et al., 2010; Lenker,<br />

2010; Lenker and Anastasiou, 2010). The availability<br />

of a demonstrator system has been a pre-requisite for<br />

advancing this research and for measuring its success.<br />

The Service-Oriented Localisation Architecture Solution<br />

(SOLAS) has become an important focus for research in<br />

LOC for several reasons (Aouad et al., 2011; Ó Conchúir,<br />

2011). It offers a common standards-based (meta-)<br />

data container, web services API for Next Generation<br />

Localisation communication and connectivity, and an<br />

orchestration and process management module all<br />

shared across the framework (Morado and del Rey, 2011;<br />

Morado et al., 2011). Component technologies from<br />

industrial partners and third parties as well as research<br />

components coming from across <strong>CNGL</strong> (Wasala et<br />

al., 2011) can be integrated into SOLAS with relative<br />

ease, demonstrating in very real terms the benefits of<br />

individual components in an end-to-end localisation<br />

workflow, as well as providing a showcase for cross-<strong>CNGL</strong><br />

industrial and academic collaboration. While SOLAS<br />

origins lie in our research around the development of a<br />

demonstrator system for bulk localisation workflows, it<br />

is transcending this narrow field and is aiming to offer<br />

frameworks for a whole open localisation eco-system,<br />

addressing the needs not just of commercial large and<br />

medium-sized enterprises but also those of non-profit<br />

organisations which require solutions that can easily<br />

adapt to new languages, actors and workflows in a<br />

highly collaborative and dynamic environment.


62<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

Initially, against this background, the main objective<br />

was to develop a heterogeneous loosely coupled<br />

platform. This is achieved through Component-Based<br />

Development (CBD) techniques where SOLAS integrates<br />

components that are connected through web services<br />

to realise a Service-Oriented Architecture (SOA). These<br />

concepts are also capable of operating in a stand-alone<br />

mode, further increasing the flexibility of this approach.<br />

The architecture also permits the easy integration of any<br />

future component developments from across <strong>CNGL</strong>.<br />

(XLIFF TC), GALA, and Localization World, including<br />

diversification of funding (Canada Research Council)”. A<br />

Partner Group with The Rosetta Foundation was created<br />

to raise the visibility and to develop the involvement of<br />

enterprises (for-profit and not-for-profit) collaborating and<br />

supporting community translation efforts, to find ways<br />

to connect this effort to economic criteria that resonate<br />

with industrial partners, to look for new enterprise<br />

partners, and to seek alliances with non-profit and other<br />

organisations to promote these efforts.<br />

In this regard the initial SOLAS technology was<br />

demonstrated during the <strong>CNGL</strong> SFI Mid-Term Review<br />

of 2011, at the Localisation Innovation Showcase 2011<br />

at Croke Park, as well as at the Autumn Scientific<br />

Committee Meeting in TCD. It generated significant<br />

interest from industrial collaborators and invited industry<br />

representatives, multinational publishers, SMEs, and the<br />

non-profit and government sector.<br />

However, as research and development progressed on<br />

SOLAS in <strong>2012</strong>, and as collaboration with The Rosetta<br />

Foundation deepened, leading to the granting of an<br />

exclusive licence for SOLAS Match (aka Translation<br />

eXchange) to The Rosetta Foundation by UL, it became<br />

apparent that there was potential for more than what<br />

was detailed in this initial offering. The decision was<br />

made to branch SOLAS development into two distinct<br />

yet connectable technologies. SOLAS Productivity, which<br />

is a continuation of the initial technology path as detailed<br />

above and SOLAS Match, a new paradigm for enabling<br />

volunteer translation and localisation through intuitive<br />

and user-friendly interfaces backed by dynamic and<br />

powerful backend technologies.<br />

The collaboration with The Rosetta Foundation and the<br />

move of <strong>CNGL</strong> IP generated by LOC researchers to the<br />

Foundation with its 2,600+ volunteers has been very<br />

successful (Wasala et al., 2011). Uptake and trials of<br />

<strong>CNGL</strong> output by the Foundation provide very valuable<br />

feedback to <strong>CNGL</strong> researchers and evidence of the value<br />

of this output to potential commercial parties, especially<br />

in the SME sector. As noted by the international<br />

independent review panel in <strong>CNGL</strong>’s Final Review of<br />

<strong>CNGL</strong> (July <strong>2012</strong>), further evidence of the value of this<br />

collaboration comes from the increase of “International<br />

reach and exposure to government and industry outside<br />

of the usual Ireland and EU-centric bodies: the creation<br />

of AGIS conferences, growing presence at W3C, OASIS<br />

Other Relevant Work in the Field and How<br />

This Compares<br />

There are commercial efforts under way to develop<br />

proprietary automated localisation platforms integrating<br />

process automation and management functionality<br />

with localisation and translation automation, such as<br />

terminology management, translation memory systems<br />

and machine translation. Large multinational content<br />

publishers, among them Oracle, SAP and Microsoft, have<br />

demonstrated the commercial viability of such solutions<br />

with their proprietary in-house solutions. However, they<br />

have also shown the limits of proprietary solutions and<br />

have started exploring ways to connect their proprietary<br />

systems with third party tools and technologies; one<br />

example is that of the open XML-based Localisation File<br />

Format (XLIFF) and the Microsoft proprietary Localisation<br />

Exchange Format (LCX) as reported at the LRC XV<br />

conference in 2010 by Microsoft and LOC researchers<br />

(Wasala et al., 2010). Oracle also presented its usage<br />

of XLIFF in its localisation strategies at the LRC XVI<br />

conference in 2011. At FEISGILT <strong>2012</strong> it became known<br />

that, based in large part upon the research initiated by<br />

Wasala et al. (2010 and <strong>2012</strong>), Microsoft will be adopting<br />

XLIFF as a primary file format going forward.<br />

In this regard <strong>CNGL</strong> research is at the forefront of many<br />

industry concerns with the SOLAS platform representing<br />

a head-start with its highly innovative approach to<br />

addressing a wide variety of localisation requirements<br />

that, as noted by the <strong>2012</strong> international independent<br />

review panel, “companies like Adobe, Dell, and Intel are<br />

currently trying to address on their own”.<br />

SOLAS is the first open, standards-based framework of its<br />

kind in the localisation space anywhere in the world. It<br />

already provides an integrated plug-and-play framework<br />

for configurable component technologies to interoperate,<br />

and as it continues to be developed and refined, SOLAS


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 63<br />

Productivity will allow the seamless connection and<br />

integration of complementary technologies into a core,<br />

functional and industrial-scale platform which itself is<br />

highly modular and extensible, while SOLAS Match will<br />

redefine the technological landscape of volunteer and<br />

development localisation with its open source translation<br />

space.<br />

Achievements<br />

Work Package LOC1<br />

The overall aim of LOC1 is to embed internationalisation<br />

and localisation issues into the design and development<br />

cycle of digital content production (Ryan, 2010), moving<br />

localisation up the value chain. The Work Package<br />

is divided into two sections, LOC1.1 Digital Content<br />

Production for Localisation and LOC1.2 Localisation<br />

Knowledge – Capture, Organisation, Use.<br />

LOC1 has produced highly innovative research results<br />

into an XLIFF-based (meta-)data container tasked<br />

with identifying, classifying and leveraging localisation<br />

knowledge encapsulated in previous processes<br />

(Anastasiou and Morado Vázquez, 2010). The result is<br />

a localisation memory container (LMC), conceptually<br />

similar to the established translation memory technology,<br />

but focused directly on localisation rather than “just” on<br />

translation requirements (Morado Vázquez and Mooney,<br />

2010). The LMC will improve the quality and consistency<br />

of the localisation process itself and minimise errors in<br />

the final product. This work is closely linked to ILT and<br />

SF2 (data access, exchange and integrity issues).<br />

LOC1 researchers have also produced highly innovative<br />

research into the benefits of the development of a<br />

(meta-) data container, the Localisation Knowledge<br />

Repository (LKR) (Ryan, 2010; Ryan, 2011). The highly<br />

innovative LKR developed as part of this research is<br />

based on a localisation taxonomy that allows the storage,<br />

maintenance and reuse of localisation-relevant data<br />

during content development.<br />

Lucía Morado Vázquez (LOC1.2) successfully passed her<br />

PhD viva in September. Lucía completed her PhD under<br />

the supervision of Reinhard Schäler. Lucía has now taken<br />

up a postdoctoral position at the Multilingual Information<br />

Processing Department at the Faculty of Translation and<br />

Interpretation, University of Geneva.<br />

Pictured at the launch of UL’s MSc in Multilingual Computing and<br />

Localisation co-hosted by the UN in Africa are (L-R) Solomon Gizaw,<br />

<strong>CNGL</strong>, Reinhard Schäler, <strong>CNGL</strong>, Prof. Don Barry, President, University<br />

of Limerick and Ms. Aida Opoku-Mensah, United Nations Economic<br />

Commission for Africa (UNECA)<br />

The research into internationalisation and localisation<br />

knowledge leveraging aims to increase the quality,<br />

consistency and accessibility of content throughout the<br />

localisation process. It addresses the needs for standards<br />

and guidelines to content developers. In an environment<br />

that is increasingly dealing with (often) low quality, usergenerated<br />

content, this will facilitate the preparation<br />

of content that is more usable and readable for source<br />

language speakers, and more translatable for localisation<br />

professionals and technologies. The guidelines are<br />

sourced from both academic research and industrial<br />

best practices. LOC1 also has two representatives on the<br />

XLIFF Technical Committee.<br />

Three articles by Lorcan Ryan – on ‘Global Authoring<br />

Techniques’, ‘Global Diversity and Localistion Issues’<br />

and ‘Global Authoring Resources’ – were published in<br />

Communicator during <strong>2012</strong>.<br />

Work Package LOC2<br />

The Work Package is divided into four sections, LOC2.1<br />

Addressing the Problem of Interoperability in Localisation<br />

Process Management, LOC2.2 Technology Evaluation<br />

– The User Perspective, LOC2.3 Service Descriptor<br />

Development (Web Services) and LOC2.4 Collaborative<br />

Localisation Platform: Crowdsourcing.


64<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

LOC2 addresses quality assessment of translations in<br />

a crowdsourced and distributed localisation context<br />

(Gupta and Aouad, 2010; Exton et al., 2010; Anastasiou<br />

and Gupta, 2011). The specification of evaluation metrics<br />

is specifically targeting quantitative and qualitative<br />

evaluation of translation memories (TMs) in order<br />

to verify the existence of inconsistency propagation<br />

(Moorkens, 2011a; Moorkens, 2011b). It also addresses<br />

more general metrics in evaluation methodologies<br />

throughout the localisation process. Joss Moorkens<br />

successfully defended his PhD thesis, entitled “Measuring<br />

Consistency in Translation Memories: A Mixed-Methods<br />

Case Study”, in July. Joss was supervised at DCU by<br />

Dr. Dorothy Kenny and Dr. Sharon O’Brien.<br />

Finally, research is also being carried out in the area of<br />

cultural adaptation, with a particular focus on multimedia<br />

content, and how this might be supported in interchange<br />

formats such as XLIFF (O’Keeffe, 2011b).<br />

Dr. David Filip played a central role in the organisation<br />

and delivery of the inaugural FEISGILTT event, which<br />

took place on 16th-17th October in Seattle, USA. The<br />

FEISGILTT <strong>2012</strong> (Federated Event for Interoperability<br />

Standardization in Globalization, Internationalization,<br />

Localization, and Translation Technologies) brought<br />

together experts from the language services industry,<br />

R&D labs that are exploring new interoperability<br />

solutions, and the various standards bodies instrumental<br />

in making such solutions accessible as conformable<br />

specifications. It offered a neutral venue where these<br />

stakeholders exchanged knowledge and experiences<br />

and discussed future directions for addressing the<br />

interoperability challenges facing the industry. FEISGILTT<br />

incorporated the 3rd International XLIFF Symposium.<br />

Lucía Morado Vázquez, Aram Morera Mesa, Dr. Chris Exton and Karl<br />

Kelly pictured at the LRC Summer School <strong>2012</strong>. The theme of this year’s<br />

Summer School was Mobile Application Development and Localisation<br />

LOC2 is addressing component and data interoperability<br />

in order to allow an efficient information exchange<br />

specifically through the specification and use of<br />

standardised metadata (Wasala et al., 2010). Research<br />

from this work package continues to drive the<br />

development of several components within SOLAS as<br />

well as feeding back into ILT (development of automated<br />

translation technologies) and SF2. LOC2 is also specifying<br />

templates for supporting service descriptions necessary<br />

for Service Level Agreements between localisationoriented<br />

service providers and consumers. Web Services<br />

contract negotiation and agreement protocols will then<br />

be used to map abstract localisation units into concrete<br />

services and components (Nishio et al., 2010).<br />

Dr. David Filip has also led work on Internationalization<br />

Tag Set (ITS) Version 2.0 as co-chair of the<br />

MultilingualWeb-LT (Language Technology) Working<br />

Group. The Working Group aims to develop new W3C<br />

(World Wide Web Consortium) standards to support<br />

the translation and adaptation of Web content to local<br />

needs, from its creation through to its delivery to end<br />

users. By so doing, the new standards will help to remove<br />

language barriers to international trade and facilitate the<br />

free flow of information across language borders.<br />

At the <strong>CNGL</strong> Localisation Innovation Showcase in<br />

Limerick in September, Dr. David Filip demonstrated<br />

the <strong>CNGL</strong> demonstrator system CMS-LIONSolas Integration: Full Content Lifecycle Metadata<br />

Interoperability TestBed. Developed in collaboration with<br />

the SF track, this is a unique platform for testing complex<br />

metadata designs spanning process areas over the full<br />

multilingual content life cycle. David showed how a RDFbased<br />

provenance store is used between Web Content<br />

Management System (CMS) and XLIFF-based translation<br />

workflows. This demonstrates use cases for the roundtripping<br />

of Internationalisation Tag Set (ITS) metadata<br />

between content generation and publication in HTML5/<br />

XML and localisation processes in XLIFF. This therefore<br />

provides direct testable input into current standardisation<br />

working groups developing ITS, XLIFF and HTML5.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 65<br />

Dr. Ian O’Keeffe, postdoctoral researcher, left University<br />

of Limerick during Quarter 3 <strong>2012</strong>. He is now Manager<br />

of Software Engineering/Development at Fidelity<br />

Investments. Also departing University of Limerick<br />

in Quarter 3 was postdoctoral researcher Dr. Eoin Ó<br />

Conchúir. Eoin is now participating in the New Frontiers<br />

entrepreneur development programme.<br />

The overall aim of LOC3 is to focus on localisation<br />

workflow re-engineering and recommendation, in<br />

addition to empirically defining relevant attributes and<br />

terms in generating personalised localised content<br />

(Morera, Aouad et al., 2011a; Morera, Aouad et al.,<br />

2011b). This research has conducted an empirical<br />

evaluation of proposed localisation workflows against<br />

current industry practice (Lenker et al., 2010; Lenker,<br />

2011a; Lenker, 2011b).<br />

Dr. Thomas Arend, International Product Lead at Twitter addresses the<br />

LRC Conference <strong>2012</strong> on the theme “Social Localisation at Twitter –<br />

translating the world in 140 Characters”<br />

Joss Moorkens submitted his thesis in <strong>2012</strong>, reporting<br />

on the outcome of Measuring Consistency in Translation<br />

Memories: A Mixed-Methods Case Study. His work<br />

questioned the widely-held assumption that humanmade<br />

translation memories lead to higher quality, as<br />

well as faster and cheaper translations as they provided<br />

access to a large body of high-quality bilingual or<br />

multilingual language resources produced by professional<br />

human translators. The result of his research involving<br />

an examination of large volumes of authentic translation<br />

memories acquired from <strong>CNGL</strong> partners, as well as<br />

qualitative research involving industry experts, clearly<br />

corrects this view and suggests caution. Joss’s thesis has<br />

already led to enquiries by and significant interest from<br />

academia and industry alike.<br />

Work Package LOC3<br />

The LOC3 Work Package is divided into three<br />

sections, LOC3.1 Localisation Workflow Specifications<br />

for Enterprise Localisation; LOC3.2 Taxonomy of<br />

Personalisation for Generating Personalised Content,<br />

and LOC3.3 Localisation Workflow Mining.<br />

Another focus is the research, design and experimental<br />

implementation of a workflow recommendation system.<br />

This system takes into account a list of the most relevant<br />

tasks in a localisation process, and uses a decision tree<br />

to select those that should be part of the workflow<br />

according to the specific quality requirements, time<br />

constraints, and cost constraints of the project on<br />

hand. Aram Morera has advanced his research on the<br />

identification and description of workflow patterns in<br />

social localisation, leading to a workflow recommender<br />

for specific social localisation scenarios, stretching from<br />

charitable, to non-profit, to for-profit approaches. The<br />

identification of these patterns has led to the discovery<br />

of serious shortcomings in current technologies which<br />

are being addressed by the SOLAS development team<br />

in LOC. It is expected that Aram will submit his thesis<br />

reporting on his research in the first half of 2013.<br />

The final area of research concerns personalisation<br />

issues in localisation. This involves considering individual<br />

preferences, gathered explicitly or implicitly, to go<br />

beyond the traditional ‘locale’ or ‘community interest’.<br />

The aim here is the creation of an empirical definition of<br />

personalisation attributes to demonstrate their feasibility<br />

and relevance for generating adequate personalised<br />

content. Research conducted within this work package<br />

includes the specification and the development of<br />

demonstrator crowdsourcing localisation environments<br />

and platforms (Lenker, 2010; Lenker and Anastasiou,<br />

2010). Solomon Gizaw has focused on the identification<br />

of communication patterns in cross-cultural information<br />

exchange and the application of personalisation<br />

techniques to a community-based translation and<br />

localisation environment. Solomon has analysed a large<br />

amount of actual user data from live communication<br />

exchanges and is planning to use the results of this<br />

analysis for the adaptation of SOLAS to the requirement<br />

and needs of specific users, rather than just locales.<br />

Solomon is planning to submit his thesis in the first half<br />

of 2013.


66<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

Industry Engagement<br />

LOC has closely collaborated with its main industrial<br />

partners, especially with Symantec, VistaTEC and<br />

Microsoft. Additional collaboration with international<br />

collaborators from The Rosetta Foundation also<br />

provided valuable input. Following the open sourcing<br />

of GlobalSight and the establishment of The Rosetta<br />

Foundation as a spin-off from the University of Limerick<br />

and <strong>CNGL</strong>, LOC also collaborated closely with The<br />

Rosetta Foundation and Welocalize. The engagement<br />

with industrial partners happened through site visits and<br />

one-to-one focused meetings between them and LOC<br />

researchers.<br />

In the SOLAS platform LOC supports the development<br />

of a <strong>CNGL</strong> open localisation platform that will, in<br />

addition to serving as a test bed for <strong>CNGL</strong> research in<br />

the different work packages, provide large multinational<br />

publishers with a solid case study for the viability of<br />

open standards for the negotiation of localisation<br />

data and localisation knowledge, thus providing them<br />

with the arguments necessary for a migration from an<br />

enclosed proprietary localisation scenario to a more<br />

open, interconnecting and interoperable framework. This<br />

platform will also encourage the uptake of localisation<br />

and process automation solutions by small and mediumsized<br />

enterprises, create new business opportunities and<br />

support the up-scaling of localisation offerings by smaller<br />

firms. More than 40 individuals and companies have so<br />

far joined the Dynamic Coalition for a Global Localisation<br />

Platform: Localisation for All, initiated by LOC and The<br />

Rosetta Foundation. We expect the platform to generate<br />

increased activity in sectors of the localisation industry<br />

(some first indicators show that growth by a factor of 100,<br />

in certain sectors, is not out of reach). Subsequently, we<br />

expect employment to rise in these sectors driven by a<br />

growth in translation and localisation as well as in the<br />

technical support and development area.<br />

The opportunities and the requirements for SOLAS,<br />

especially in the non-profit sector, are significant. In 2007,<br />

almost 1.5 million non-profits were registered with the US<br />

Tax Authorities and non-profits reported US$1.9 trillion in<br />

revenue and US$4.3 trillion in assets. From 1998 to 2005,<br />

non-profit employment grew 16.4 per cent, compared to<br />

6.2 per cent for overall employment in the US.<br />

It is in the nature of non-profit to deal with a multilingual<br />

and multicultural constituency. Surprisingly, no adequate<br />

technology is available to support their localisation and<br />

translation activities.<br />

In Ireland, the non-profit sector employs more than<br />

100,000 people with pay costs in the order of €3.5bn, has<br />

revenues of more than €6bn, and holds assets valued at<br />

more than €3.5bn. The sector is, perhaps, the principal<br />

source of social capital in Irish society, with more than<br />

560,000 people engaged as volunteers, and more than<br />

50,000 people engaged in their governance. In scale,<br />

the non-profit sector in Ireland is at least comparable to<br />

if not greater than agriculture or tourism as a source of<br />

employment.<br />

Research into SOLAS by <strong>CNGL</strong>, with subsequent<br />

development of this framework through The Rosetta<br />

Foundation, has the potential to turn Ireland into the hub<br />

for the internationally traded localisation and translation<br />

service provision of the world-wide non-profit sector, with<br />

revenues of more than US$1.9 trillion in the USA alone.<br />

Indeed, as the international independent review panel<br />

stated in its review of <strong>CNGL</strong> in July <strong>2012</strong>, “<strong>CNGL</strong>’s goal<br />

of making significant societal impact is illustrated by the<br />

potentially ground-breaking social localization concept,<br />

embodied in a spinout (The Rosetta Foundation).”<br />

Achievements (grouped by category)<br />

Operational Management and Governance<br />

} On-going research collaboration with <strong>CNGL</strong> ILT<br />

Track, e.g. in the area of MT; with DCM, e.g. in the<br />

area of personalisation; and SF, e.g. in the area of<br />

interoperability and metadata<br />

} On-going active engagement with LOC’s international<br />

collaborators<br />

} On-going engagement with world-leading standards<br />

associations, including Unicode and the world-wideweb<br />

consortium (W3C)<br />

} Participation and programme input to the world’s<br />

leading localisation events, including Localization<br />

World and GALA<br />

} Engagement with the non-profit sector, including<br />

the Irish umbrella body for non-profits, The Wheel,<br />

representing close to 2,000 Irish non-profit enterprises,<br />

and Dochas, representing the Irish-based overseas aid<br />

organisations<br />

} Collaboration with one of the developers of one of<br />

the most widely used open source localisation tools,<br />

Translate.za.org, and its principal Dwayne Bailey


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 67<br />

Research Programme<br />

LOC1<br />

} Continuing contributions to the development of<br />

the XLIFF standard of OASIS (members of Technical<br />

Committee)<br />

} RDF-XLIFF mapping (contacts from LREC: Thierry<br />

Declerck, Tobias Wunner and John McCrae (DERI),<br />

also Dr. David Lewis SF, Dr. Alex O’Connor SF)<br />

} Successful PhD defence by Lucía Morado Vázquez,<br />

who is now employed as postdoctoral researcher at<br />

the University of Geneva, Switzerland<br />

} Significant contributions to the knowledge of content<br />

development for global markets<br />

} Filing of invention disclosures<br />

} Integration of LOC1 components into the overall<br />

SOLAS framework<br />

} Successful PhD defence by Lorcan Ryan<br />

LOC 2<br />

} Significant contribution to the knowledge of<br />

localisation resource evaluation and interoperability<br />

} Further research and implementation of LocConnect<br />

component<br />

} Integration of LOC2 components into the overall<br />

SOLAS framework<br />

} Further research and assessment of quality and<br />

consistency in Translation Memories, including<br />

successful PhD defence by Joss Moorkens of his thesis<br />

“Measuring Consistency in Translation Memories:<br />

A Mixed-Methods Case Study”. This work involved<br />

significant industry input and has led to substantial<br />

industry interest in its outcomes.<br />

} Further research and implementation of Localisation<br />

Service Descriptor component<br />

} Further research and implementation of Quality<br />

Assessment Engine component<br />

} Cross-strand collaboration with ILT1<br />

} Asanka Wasala writing up PhD thesis<br />

} Filing of invention disclosure for several research<br />

demonstrators<br />

LOC3<br />

} Research and implementation of Workflow<br />

Recommendation Engine component<br />

} Investigation of industrial workflows<br />

} Investigation of data transfer practices for Term Bases<br />

and Glossaries<br />

} PhD students approaching write-up stage, reporting<br />

very significant results on their research into<br />

localisation service descriptors, strategies to surpass<br />

the established concept of locale in localisation, and<br />

community-based social localisation workflows.<br />

} Filing of invention disclosure for several research<br />

demonstrators<br />

} Integration of LOC3 components into the overall<br />

SOLAS framework<br />

LOC Overall<br />

} Collaboration with the United Nations Internet<br />

Governance Forum (IGF)<br />

} Support for the University of Limerick and the United<br />

Nations Economic Commission for Africa’s launch of<br />

the MSc in Multilingual Computing and Localisation<br />

to be delivered through distance learning and cohosted<br />

by UNECA at its Information Training Centre<br />

for Africa (ITCA) in Addis Ababa, Ethiopia. The aim of<br />

the programme is to promote African languages in the<br />

Information Society.<br />

} Invention Disclosures for Localisation Knowledge<br />

Repository (LKR), Automated Optimal Machine<br />

Translation System Selection supporting XLIFF, XLLIFF<br />

Phoenix, LocConnect and Workflow Recommender.<br />

} Further development of SOLAS integrated system and<br />

branching into SOLAS Productivity and SOLAS Match<br />

products.<br />

Industry Partner Engagement<br />

} Alchemy Software Development played an integral<br />

part in the <strong>2012</strong> LRC Summer School, preparing<br />

and presenting materials related to mobile device<br />

localisation.<br />

} LRC <strong>Annual</strong> Conference featured contributions from<br />

industry partners Symantec and Welocalize, as well<br />

as presentations from <strong>CNGL</strong> Spinout The Rosetta<br />

Foundation.


68<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

} LRC Best Thesis Award <strong>2012</strong> was sponsored by<br />

Symantec Ireland.<br />

Tech Transfer Activities<br />

The following invention disclosures have been filed with<br />

the Technology Transfer office at UL:<br />

2006167 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Localisation Knowledge Repository (LKR)<br />

2006166 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Automated Optimal Machine Translation System<br />

Selection Supporting XML Localization Interchange<br />

File Format (XLIFF)<br />

2006165 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

XLIFF Phoenix<br />

2006164 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February<br />

– LocConnect – Localisation Orchestration<br />

Framework<br />

2006163 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Workflow Recommender<br />

} Localization World Paris and Seattle – Rosetta<br />

Foundation invited to exhibit at both European and<br />

American events.<br />

} LOC postdoctoral researcher Dr. David Filip launched<br />

FEISGILTT <strong>2012</strong>, a new federated event dedicated<br />

to Interoperability Standardization in Globalization,<br />

Internationalization, Localisation, and Translation<br />

Technologies.<br />

} Launch of AGIS Africa initiative, in collaboration with<br />

the Rosetta Foundation, United Nations Economic<br />

Commission for Africa, GALA and the University of<br />

Limerick.<br />

Plans<br />

The Next Generation Localisation area will work with<br />

The Rosetta Foundation as well as with the United<br />

Nation’s Internet Governance Forum (IGF) working<br />

group Dynamic Coalition for a Global Open Localization<br />

Platform: Localization for All on the further development<br />

of SOLAS leading to its deployment as an Open<br />

Localisation Platform, supported by the SF1 and SF2<br />

<strong>CNGL</strong> research areas.<br />

Education and Outreach<br />

} 10th <strong>Annual</strong> LRC Internationalisation and Localisation<br />

Summer School took place from 13-15 June <strong>2012</strong> in<br />

Limerick. The Summer School focused on Mobile<br />

Application development and localisation and was<br />

presented by a mix of <strong>CNGL</strong> industrial partners<br />

(Alchemy Software Development), PhD Students,<br />

academic staff and UL students.<br />

} Localisation Focus – The International Journal of<br />

Localisation published and sent out to libraries and<br />

subscribers, as well as being made available online<br />

for free at www.localisation.ie. Direct download links<br />

were sent to all members of the LRC mailing list<br />

(approximately 2,500).<br />

} LRC XVII, 20-21 September <strong>2012</strong>, Limerick, annual<br />

conference. Conference also featured <strong>CNGL</strong><br />

Innovation Showcase <strong>2012</strong>.<br />

} Launch and support of the MSc in Global Computing<br />

and Localisation by distance learning.<br />

Reinhard Schäler (second from left) presented on “Opportunities and<br />

Growth in Africa” at GALA <strong>2012</strong> in Monaco in March. Pictured with<br />

Reinhard are Renée Salzman (GALA Co-Founder), Hans Fenstermacher<br />

(GALA CEO) and María José Velasco (GALA founding member and<br />

Mondragón Lingua)<br />

The Rosetta Foundation was launched in 2009 by the<br />

President of UL and is supported by <strong>CNGL</strong> through<br />

formal decisions by its Integration and Management<br />

Committees. It works with more than 2,600 volunteers<br />

in over 40 languages and with 50 partner organisations<br />

including Special Olympics Europe Eurasia/International,<br />

Trócaire, the London School for Tropical Medicine and


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 69<br />

Hygiene, and Ruhama. IP developed by <strong>CNGL</strong> has been<br />

transferred to The Rosetta Foundation to support its<br />

technology platform and The Rosetta Foundation has<br />

become a UL Campus Company. The Rosetta Foundation<br />

has already provided very valuable feedback into <strong>CNGL</strong><br />

research which has resulted in joint publications (ASLIB<br />

Translation and the Computer, 2010). The platform<br />

serves as a test bed for the SOLAS research carried out in<br />

LOC, specifically with regard to SOLAS Match, allowing<br />

it to demonstrate the viability and to measure the<br />

improvements achieved in the localisation process. This<br />

work has been documented in at least two non-<strong>CNGL</strong><br />

funded MSc theses in <strong>2012</strong>.<br />

The LOC track publishes ‘Localisation Focus – the International Journal<br />

of Localisation’<br />

In line with this research, the platform is being open<br />

sourced with the aim of allowing SOLAS Match to<br />

become the de facto platform for non-industrial, nonprofit<br />

and non-market localisation and translation<br />

activities, driving social localisation as defined by LOC<br />

researchers and in support of the social agenda of <strong>CNGL</strong>.<br />

The results of this work have been commented on in the<br />

<strong>CNGL</strong> Final Review as the independent international<br />

review panel commented that “The most visible success<br />

here is without a doubt the Rosetta Foundation spinoff,<br />

which pioneers a novel, comprehensive localization<br />

model for organizations seeking to translate content<br />

for underserved communities. The panel feels this<br />

accomplishment has great societal impact that<br />

transcends the boundaries of Ireland and even the EU.”<br />

Testing is underway with a focus on demonstrating<br />

the viability of the SOLAS Match platform with a<br />

subset of projects within the Rosetta Foundation.<br />

The publication of specifications and an invitation for<br />

“open” contributions (such as from the African Network<br />

for Localisation; the Centre for the Development of<br />

Advanced Computing (CDAC) in Pune, India; the<br />

micro-lending organisation KIVA and other organisations<br />

such as TechSoup Global or Zafen), the creation of<br />

the component repository, and the demonstration of<br />

“open” interoperability, in collaboration with industry<br />

associations such as GALA and Interoperability Now are<br />

on-going priorities in this area.<br />

Improvements will be demonstrated and measured in<br />

relation to particular tasks, e.g. MT and MT post-editing,<br />

and in relation to the overall process, e.g. user interaction<br />

evaluation, (re-)use of localisation knowledge and flexible<br />

workflow specification supported by the platform. Each<br />

section in each LOC work package is associated with<br />

one particular aspect of this demonstrator and each will<br />

contribute to an improvement in the performance of the<br />

overall platform with component technologies from LOC<br />

sections connected to the localisation platform. This will<br />

enable us to measure the impact of these technologies<br />

on the performance of the overall localisation workflow.<br />

The LOC research track will support The Rosetta<br />

Foundation on the development and the deployment<br />

of SOLAS which, in turn, will provide highly valuable<br />

feedback from a concrete implementation scenario<br />

into the scientific research carried out within LOC<br />

and other <strong>CNGL</strong> areas. Now that the platform can be<br />

demonstrated, additional component technologies from<br />

other <strong>CNGL</strong> research areas are being considered for<br />

integration.<br />

The LOC research strand of <strong>CNGL</strong> will be subsumed<br />

in <strong>CNGL</strong>II under the Translation and Localisation<br />

Challenge (T&L), and the Interoperability and Analytics<br />

Challenge (I&A). T&L2 will focus on Social Localisation<br />

and continue with the research and development of<br />

the service-oriented localisation architecture solution<br />

(SOLAS) initiated under <strong>CNGL</strong> as the bulk localisation<br />

demonstrator. The work will focus on the identification<br />

and resolution of current problems around the correct<br />

identification of resources for localisation (SOLAS Match),<br />

as well as the identification and development of an<br />

adequate support infrastructure in terms of language<br />

technologies and resources in an ad hoc and dynamic<br />

setting.


Systems<br />

Framework


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 71<br />

Strand Name: Systems Framework<br />

AREA CO-ORDINATOR: DR. SATURNINO LUZ<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

Prof. Andy Way<br />

Mr. Takeshi Fukunaga<br />

Mr. Dag Schmidtke<br />

Dr. Alexander Troussov<br />

Mr. David Clarke<br />

Capita<br />

Dai Nippon Printing<br />

Microsoft<br />

IBM<br />

Welocalize<br />

International<br />

Collaborators<br />

Dr. Alistair Edwards<br />

Dr. Masood Masoodian<br />

Prof. Michael McTear<br />

Prof. Chris Mellish<br />

University of York<br />

The University of Waikato<br />

University of Ulster<br />

University of Aberdeen<br />

Dr. Olga Beregovaya<br />

Welocalize<br />

Mr. Phil Richie<br />

VistaTEC<br />

Dr. Fred Hollowood<br />

Symantec<br />

Mr. Jason Rickard<br />

Symantec<br />

Faculty<br />

Prof. Julie Carson-Berndsen University College Dublin SF1<br />

Dr. Gavin Doherty Trinity College Dublin SF1, SF2<br />

Prof. Josef van Genabith Dublin City University SF2<br />

Dr. David Lewis Trinity College Dublin SF2 Leader<br />

Dr. Saturnino Luz Trinity College Dublin SF1 Leader<br />

Mr. Reinhard Schäler University of Limerick SF1, SF2<br />

Prof. Vincent Wade Trinity College Dublin SF1, SF2<br />

Postdoctoral Researchers<br />

Mr. Dominic Jones Trinity College Dublin SF2<br />

Dr. Nikiforos Karamanis* Trinity College Dublin SF1<br />

Dr. Anton Gerdelan* Trinity College Dublin SF2


72<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

PhD Students<br />

Mr. John McAuley Trinity College Dublin SF2<br />

Mr. John Moran Trinity College Dublin SF2<br />

Ms. Ilana Rozanes Trinity College Dublin SF1<br />

Ms. Anne Schneider Trinity College Dublin SF1<br />

Mr. Stephan Schlögl Trinity College Dublin SF1<br />

Technicians<br />

Mr. Leroy Finn Trinity College Dublin SF2<br />

* Affiliated postdoctoral researchers<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

€342,924<br />

SFI TIDA ‘iOmegaT: Instrumented CAT Tool’<br />

(12/TIDA/I2424) €92,273 over 12 months<br />

<strong>2012</strong> Funding from Other Sources<br />

EC FP7 Coordination and Support Action Language<br />

Technology Web – €149,280 to TCD over two years<br />

(UL, DCU, Microsoft and VistaTEC are also partners)


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 73<br />

Research Overview: Systems<br />

Framework (SF)<br />

Goals<br />

The Systems Framework track seeks to ensure that basic<br />

language technologies can be effectively integrated to<br />

form next generation localisation systems that meet<br />

high standards of usability, and to facilitate the use of<br />

such technologies in advanced research prototypes to<br />

creatively explore novel design spaces for interactive<br />

systems. SF aims to produce system services architecture<br />

and a system design methodology to support the<br />

integration of linguistic technologies, localisation<br />

workflow and digital content management. The ultimate<br />

goal being to enable rapid, iterative and instrumented<br />

integration of industrial software and academic research<br />

prototypes and to support their evaluation through<br />

provision of: a software integration platform based on<br />

open standards, guidelines and tools for developing<br />

workflows and applications using this platform, and<br />

methods for iterative prototyping and user studies. From<br />

a research perspective, SF focuses on the study of users<br />

(and potential users) of language technology-enabled<br />

systems in real work contexts, on the investigation of<br />

novel interaction design techniques, and on system<br />

support to the development of speech- and languageenabled<br />

applications.<br />

The work packages, SF1 and SF2 pursue these objectives<br />

from different perspectives. The Interaction Design<br />

Work Package (SF1) deals primarily with human-factors<br />

research and it explores the design of novel systems<br />

incorporating language technology. The Systems<br />

Service Architecture Work Package (SF2) has a dual<br />

role in <strong>CNGL</strong>: it acts as a coordinator and facilitator of<br />

practical systems integration for the <strong>CNGL</strong> Demonstrator<br />

Programme and it conducts research into service<br />

integration and service management techniques. These<br />

two roles are interrelated in that the Demonstrator<br />

Programme, due to its size and variety, offers a unique<br />

interoperability and evaluation laboratory that operates<br />

over a wide range of linguistic and digital content<br />

processing services and applications.<br />

The specific goals for <strong>2012</strong> were to (1) provide continued<br />

support for the demonstrator activities and incorporate<br />

lessons learned into service and metadata models that<br />

are contributing to international standards activities; (2)<br />

to analyse and create theories based on the workplace<br />

studies conducted in various work contexts, with focus<br />

on the work of medical interpreters; (3) to report<br />

research results in journal and conference publications;<br />

(4) to further disseminate and evaluate the Wizard-of-Oz<br />

system; and (5) to conduct further evaluation of language<br />

technologies in interactive contexts (e.g. speech-tospeech<br />

systems). These goals were satisfactorily met,<br />

several papers were published, substantive contributions<br />

were made to extension to the ITS (W3C) and XLIFF<br />

(OASIS) standards, and 3 PhD theses were submitted.<br />

Research Barriers and Methodologies<br />

to Address Them<br />

As noted in previous reports, we have identified a gap<br />

between language technology and systems development<br />

methodologies (including both systems and interaction<br />

design issues) which seems to extend beyond the<br />

usual issues in putting together demonstrator systems<br />

and research prototypes. The research done by SF has<br />

attempted to bridge this gap.<br />

John Moran of TCD presents at AMTA-<strong>2012</strong> Workshop on Post-editing<br />

Technology and Practice (WPTP) in San Diego, USA.


74<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

semantic web, and to interface with standards used in<br />

localisation, such as XLIFF, for integration with work<br />

done in the LOC strand. SF also examines specific<br />

service management issues, in the context of its support<br />

for the demonstrator systems, namely the support for<br />

management of online communities, the unobtrusive<br />

monitoring of post-editing effort with regard to different<br />

configurations of SMT and other localisation support<br />

technologies and interoperable linked data formats for<br />

content management and localisation integration.<br />

Year 5 Progress<br />

Dominic Jones presents his PhD work at the national final of the Thesis<br />

in 3 competition<br />

From an interaction design perspective, SF has<br />

investigated methods for incorporating work contexts<br />

into the analysis of requirements for natural language<br />

generation systems, with special focus on MT technology<br />

in a localisation context (Doherty, Karamanis and Luz,<br />

<strong>2012</strong>; Karamanis, Luz and Doherty, 2011), ethnographic<br />

methods for the study of multilingual situations, with<br />

focus on the work of medical interpreters (Rozanes, Luz<br />

and Doherty, 2011) and rapid prototyping and evaluation<br />

methods for interactive language technologies such<br />

as systems that combine speech input/output to MT<br />

(Schlögl et al., 2011; Schneider and Luz, 2011).<br />

From a software engineering perspective, SF has<br />

successfully promoted the adoption of a Service Oriented<br />

Architecture (SOA) approach across <strong>CNGL</strong>, integrating<br />

different technologies into a range of applications<br />

spanning the use scenarios addressed by <strong>CNGL</strong>. This has<br />

allowed individual components, tools and platforms to<br />

retain autonomy in their choice of software technology,<br />

provided they adhere to some common interoperability<br />

models. The overall strategy was to employ existing<br />

standards as much as possible, by defining a common<br />

model based on standard languages from the W3C<br />

addressing provenance, internationalisation and the<br />

SF activities in Year 5 consisted largely of analysing<br />

and publishing results of research work conducted in<br />

the last 18 months, with a focus on the completion<br />

of PhD theses. Several papers have been written.<br />

Two papers appeared in major HCI journals (van der<br />

Sluis, Luz et al., <strong>2012</strong>; Doherty, Karamanis and Luz,<br />

<strong>2012</strong>), one will appear in the proceedings of the ACM<br />

Computer Supported Cooperative Work conference<br />

(Kane, Toussaint and Luz, 2013) and four others are<br />

under preparation for publication (to be submitted to<br />

journals ‘Interacting with Computers’ and ‘Computer<br />

Supported Cooperative Work’ and the conferences ACL<br />

2013 and Interact 2013). Research related to service and<br />

content and language resource interoperability were<br />

published at WWW <strong>2012</strong> (Filip, Lewis and Sasaki, <strong>2012</strong>)<br />

and LREC <strong>2012</strong> (Lewis et al., <strong>2012</strong>) and a paper and<br />

book chapter on community management were also<br />

published. Several presentations and talks were given,<br />

including presentations at the <strong>CNGL</strong> review meeting<br />

and <strong>CNGL</strong> Scientific Committee Meeting. In addition,<br />

several presentations were made at industrially-focused<br />

events, including Multilingual Web workshops. Two<br />

international workshops were organised. The first,<br />

focused on Multilingual Web and Linked Open Data,<br />

was held in June in Dublin. The second (FEISGILT <strong>2012</strong>),<br />

organised in collaboration with LOC and co-located with<br />

Localization World in Seattle in September, was focused<br />

on standardisation and interoperability issues around<br />

globalisation, internationalisation, localisation and<br />

translation. SF members also contributed significantly to<br />

the <strong>CNGL</strong>II proposal.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 75<br />

Accomplishments, Impact and Plans<br />

We have concluded the study on support for<br />

collaborative aspects of translation work and analysis<br />

of impacts of language technology, particularly MT,<br />

and reported this work in two major journal papers:<br />

the Machine Translation journal and the Computer<br />

Supported Cooperative Work journal (Karamanis, Luz<br />

and Doherty, 2011; Doherty, Karamanis and Luz, <strong>2012</strong>).<br />

The work on language generation in cross-cultural<br />

settings has also been concluded and published in the<br />

premier HCI journal (van Der Sluis et al., <strong>2012</strong>). The<br />

Wizard-of-Oz platform has been fully deployed online<br />

and released as an open source project. Experiments and<br />

interviews to assess wizard performance and the usability<br />

of the platform have been successfully concluded and<br />

a paper is currently in preparation for submission to<br />

the journal ‘Interacting with Computers’. Community<br />

management trials with Symantec were successfully<br />

completed, and MT post-editing trials with professional<br />

translators at Welocalize and with crowdsource translators<br />

were completed. The former resulted in post-editing<br />

machine translation (PEMT) analytics solutions being<br />

licensed to Welocalize, while the latter demonstrated<br />

strong MT improvement resulting from selective training<br />

based on PEMT logging. Further details of on-going<br />

activities and plans for future work are given below.<br />

Fieldwork for Language Technologies in Work<br />

Contexts<br />

SF PhD student Ilana Rozanes has concluded the<br />

elaboration of a grounded theory of the work of<br />

medical interpreters. This work spanned two years of<br />

extensive observation of medical interpreters at work,<br />

interviews, data collection and data coding. Results are<br />

being currently written up for publication in journal and<br />

HCI conference papers. These papers will explore the<br />

data and theory in the context of designing languagetechnology<br />

applications for use by interpreters in medical<br />

settings, drawing on <strong>CNGL</strong> technology.<br />

Figure 4: <strong>CNGL</strong> Wizard-of-Oz Homepage<br />

Writing of a paper describing the results of these<br />

activities is in progress for submission to a journal.<br />

Stephan Schlögl has submitted his PhD thesis, and<br />

his viva is scheduled for January 2013. The WebWOZ<br />

software has now been released under an open source<br />

licence and we plan to use and extend it in <strong>CNGL</strong>II.<br />

Interaction Design for Speech-to-Speech<br />

Translation<br />

Complementing our published work (Schneider and<br />

Luz, 2011; Schneider) a further experiment has been<br />

conducted on the use of speech recognition in an<br />

instructional task. Results are currently being written<br />

up for a paper to be submitted to ACL 2013 or SIGdial<br />

2013. The overall aim of this line of research is to<br />

assess the potential mismatches between intrinsic<br />

and extrinsic evaluation methods for component<br />

language technologies. In this case we focused on how<br />

well or otherwise (intrinsic metric) word-error rate<br />

correlates to (extrinsic measures of) task success, and<br />

proposed alternative methods for identifying potential<br />

communication difficulties in automatic speech<br />

recognition (ASR)-mediated communication.<br />

Wizard-of-Oz Platform<br />

The <strong>CNGL</strong> Wizard-of-Oz platform (WebWOZ) was made<br />

available online in 2011 (http://www.webwoz.com).<br />

Since then it has been used to gather data on patterns<br />

of usage of the tool and on wizard performance. Specific<br />

projects (e.g. HCI coursework projects) were designed to<br />

assess the tool under controlled conditions.


76<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

Figure 5: End-to-end localisation workflow monitoring using RDF Provenance, through integration of CMS-LION,<br />

SOLAS, Matrex and other components<br />

Linked Data for Content Management<br />

and Localisation Integration<br />

Integration of Content Management Systems and<br />

Localisation Workflows remains a challenge with no<br />

established standard. However, content is increasingly<br />

generated and revised in a continuous stream including<br />

user-generated content, while existing push-based<br />

integration between content management and<br />

localisation systems constrains both agility in support<br />

of new content processing modes (e.g. crowdsourcing)<br />

and upstream feedback from translators. This activity<br />

therefore provides a standard linked data-oriented<br />

approach to agile multilingual content management.<br />

The approach supports both push- and pull-based<br />

CMS-Localisation interactions via a common Resource<br />

Description Framework (RDF) Provenance Model. This is<br />

implemented in a system, CMS-LION, which populated<br />

the RDF Provenance Model from exchanges of XLIFF files<br />

within a localisation workflow operated by LOC’s SOLAS<br />

platform. This model uses the RDF Open Provenance<br />

Vocabulary to log all CMS-Localisation interactions<br />

and content transformations. This allows standard<br />

SPARQL queries to be used for workflow monitoring and<br />

translation corpora extraction from fresh post-editing,<br />

for immediate retraining of an MT engine based on<br />

MaTrex from DCU and the bi-text corpora processing<br />

chains developed in the PANACEA project. Over several<br />

retraining iterations, this approach showed a strong 25%<br />

improvement in BLEU scores within a single crowdsourced<br />

translation job.<br />

To demonstrate this approach, a crowd-sourced<br />

translation application has been implemented with<br />

a Drupal frontend via which users can create and<br />

contribute to translation jobs in XLIFF. An RDFLogger<br />

component is used to change the XLIFF document<br />

into RDF provenance statements and then log these to<br />

a triple store. The Sesame Triple Store used provides<br />

an open source Java framework for storing, querying<br />

and reasoning with RDF. A RDF Provenance Visualiser<br />

has been implemented for exploring outcomes of<br />

process steps. This platform was also used in prototype<br />

integration with translation quality assurance data<br />

gathered by translation review processes conducted<br />

by VistaTEC.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 77<br />

Figure 6: Petri – A community analytics tool developed for identifying and tracking community guru behaviour<br />

at Symantec<br />

This experience and demonstration using CMS-LION<br />

and SOLAS have enabled <strong>CNGL</strong> to promote a strong<br />

vision of end-to-end interoperability and monitoring<br />

of localisation workflows. This, in turn, has fed into new<br />

metadata definitions related to translation provenance<br />

in the new version of the Internationalization Tag Set<br />

being developed by the Language Technology-Web<br />

project via the W3C’s Multilingual Web-Language<br />

Technology working group. Feedback on this approach<br />

has also been provided to the XLIFF Technical<br />

Committee and the W3C PROV working group. This<br />

capability has also placed <strong>CNGL</strong> well for continued<br />

international collaboration at the intersection of linked<br />

data and language resource technology research, both<br />

through collaboration on workshops in the area and<br />

through two EU-funded project proposal submissions.<br />

Visual Analytics for Online Communities<br />

Visual analytics can help users to extract knowledge<br />

from massive amounts of data, make sound decisions<br />

based on evidence and increase understanding of<br />

complex online processes. However, applications are<br />

generally developed with a focus on the researcher or the<br />

analyst, and lack a clear context for the end-user. This<br />

research seeks to investigate the potential<br />

of visual analytics for online communities. It<br />

has evaluated how to extract knowledge from<br />

communication data and represent this visually<br />

to support evidence-based decision making and<br />

understanding complex processes in online communities.<br />

An initial visual analytics tool was developed for the<br />

Stack Exchange Super-User meta community. The<br />

tool visualises the community’s social and temporal<br />

interaction patterns and provides collaboration support<br />

in the form of visualisation bookmarking, view sharing<br />

and threaded discussion. Based on this experience, a<br />

revised tool was tailored to the community management<br />

requirements of customer support staff in Symantec,<br />

enabling an evaluation of innovative methodologies<br />

and tools in developing such a tool.<br />

The tool, Petri, was designed to encourage a more<br />

analytical approach to online community management<br />

that is based on cycles of observation and intervention.<br />

We conducted several interviews and design workshops<br />

with Symantec’s online community team to help<br />

formalise a set of requirements. These requirements<br />

were then used to inform the design. Petri enables<br />

the community manager to analyse their community<br />

from multiple perspectives, shifting between phases


78<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

of explorative and confirmative analysis, and to identify<br />

users that could prove valuable to the community<br />

over time. Explorative evaluation, conducted with five<br />

members of the Symantec community management<br />

team, found the visualisation tool to be both useful<br />

and usable. This work, therefore, proposed a new<br />

approach to online community management, which is<br />

built upon cycles of analysis and informed intervention.<br />

It is supported by the implementation of advanced<br />

visual analytic technologies and has established a set of<br />

design requirements that can be readdressed by other<br />

researchers interested in online community visualisation.<br />

termed instrumented OmegaT or iOmegaT, has been<br />

deployed at a large <strong>CNGL</strong> industrial partner, Welocalize.<br />

As a result, large quantities of translation process field<br />

data have been gathered from production tests where<br />

individual translator speed was measured for segments<br />

that were translated using MT and segments that were<br />

not (Human Translation). So far, over half a million<br />

words in approximately 60,000 sentences have been<br />

translated by more than 50 translators using iOmegaT<br />

and this number is growing on a monthly basis as more<br />

productivity tests are carried out. This uniquely large data<br />

set is currently being analysed to determine: whether<br />

automated MT metrics or string distance calculations<br />

correlate with post-editing (PE) time data; if analysis<br />

of patterns in keystroke and other translation process<br />

(TP) field data provides insight into the MT post-editing<br />

process; if features of source sentences which correlate<br />

with increased post-editing time across multiple<br />

languages can be identified; and what volume of PEMT<br />

data needs to be gathered to form reliable analyses of<br />

MT engines.<br />

Industry Engagement and Future Plans<br />

Prof. Felix Sasaki of DFKI and Dag Schmidtke of Microsoft Ireland confer<br />

with Dr. Mark Davis, President of the Unicode Consortium via video link<br />

at the W3C Multilingual Web Workshop at TCD<br />

Instrumenting CAT Tools to evaluate Post-editing<br />

of SMT<br />

Machine translation (MT) evaluation metrics based on<br />

n-gram co-occurrence statistics are financially cheap<br />

to execute and their value in comparative research is<br />

well documented. However, their value as a standalone<br />

measure of MT output quality is questionable. In<br />

contrast, manual methods of MT evaluation are<br />

financially expensive. This work is developing a lowcost<br />

means of acquiring MT evaluation data in an<br />

operationalised manner in a commercial post-edited<br />

MT context. To this effect, OmegaT, a popular open<br />

source CAT tool has been augmented to capture postediting<br />

keystroke and other CAT tool actions, and to<br />

capture this in an open XML log file so that it can be<br />

analysed by workflow managers. The resulting tool,<br />

Strong industry engagement through deployment and<br />

trialling of tools has been conducted with Welocalize<br />

and Symantec, resulting in one technology licence.<br />

Further close collaboration is being undertaken together<br />

with LOC and ILT through the W3C’s MultilingualWeb-<br />

Language Technology working group. These and<br />

earlier engagements are resulting in on-going industry<br />

collaboration at the level of proposal writing, especially<br />

<strong>CNGL</strong>II, FP7 and Science Foundation Ireland/Enterprise<br />

Ireland Technology Innovation Development Award<br />

(TIDA). The WOZ and CMS-LION systems now also<br />

form core platforms for research in interactivity,<br />

interoperability and analytics in <strong>CNGL</strong>II.


Year 5 Demonstrator<br />

Programme


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 81<br />

Year 5 Demonstrator Programme<br />

Goals<br />

The <strong>CNGL</strong> Demonstrator Programme aims to: promote<br />

and guide collaborative scientific work between <strong>CNGL</strong><br />

partners and between research tracks; showcase the<br />

relevance of <strong>CNGL</strong> research to industry and society in<br />

general; and provide regular milestones for assessing the<br />

collective progress and impact of <strong>CNGL</strong>.<br />

Research Challenges and Methods<br />

The Demonstrator Programme has achieved these<br />

objectives through a rolling programme of engagement<br />

by multiple teams of collaborating researchers across<br />

<strong>CNGL</strong> tracks that address specific use scenarios in<br />

response to industry needs. Each team developed<br />

a demonstrator system in an iterative manner<br />

and presented them in bi-annual showcases. The<br />

Demonstrator Programme balanced the scientific<br />

needs of individual researchers and PhD topics, diverse<br />

and evolving requirements of industry partners, and<br />

of the Centre in advancing collaborative research<br />

and IP commercialisation. It has also tracked and<br />

assessed progress visible via demonstrator systems<br />

and communicated this internally, to reviewers and<br />

advisers, to industry and the general public at <strong>CNGL</strong><br />

Localisation Innovation Showcase events. <strong>CNGL</strong> has<br />

carefully developed and resourced a flexible coordination<br />

organisational structure that enabled the programme to<br />

address its challenges effectively and in a timely fashion.<br />

Work on advancing the demonstrator systems was<br />

conducted by demonstration teams with members<br />

drawn from across universities, research tracks and<br />

industry partners. Demonstrator systems must exhibit<br />

potential industrial impact, but are also vehicles for<br />

scientific collaboration and instances of model-driven<br />

interoperability. These three factors therefore form the<br />

basis for evaluating demonstrator systems. Evaluations<br />

are recorded so as to track the progress through<br />

increasing maturity across the Demonstrator Programme<br />

as well as track links to the peer-review publications<br />

produced by the Centre.<br />

Achievements in Year 5 (<strong>2012</strong>)<br />

The Demonstrator Programme accomplished four major<br />

milestones in <strong>2012</strong>:<br />

1. In July, a showcase of selected demonstrator<br />

systems was presented to a panel of distinguished<br />

international reviewers as part of <strong>CNGL</strong>’s Year 5<br />

Review site visit.<br />

2. The Programme’s work on Metadata Semantics for<br />

Next Generation Localisation and its instantiation<br />

in demonstrator systems is receiving international<br />

recognition and is exerting a coordinated impact on<br />

both the major extant international standardisation<br />

efforts in localisation, namely W3C working group<br />

on Multilingual Web – Language Technology and<br />

the OASIS XLIFF Technical Committee.<br />

3. A large set of the demonstrator systems was<br />

showcased at a final public event at the Localisation<br />

Research Centre conference in Limerick in<br />

September.<br />

4. Several of the demonstrator systems have successfully<br />

graduated to the <strong>CNGL</strong> Commercialisation<br />

Programme and are now receiving seed funding from<br />

various sources to further develop their commercial<br />

potential.<br />

Another important achievement for the Demonstrator<br />

Programme has been in showing the real benefits<br />

of active resource curation and its role in improving<br />

the quality and performance of language technology<br />

components.<br />

As shown below in Figure 7, collectively the <strong>CNGL</strong><br />

Demonstrator Programme covers a range of content<br />

processing scenarios, from community management, to<br />

multimodal interaction to personalised discovery and<br />

consumption of content. Processes to both translate and<br />

slice/recompose content are core to these activities. The<br />

role of language technology (e.g. text analytics, machine<br />

translation and speech processing) in these scenarios is<br />

supported by the active curation of language resources.


82<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

Figure 7: Content Processing Scenarios covered by the Demonstrator Programme<br />

Curating language resources as a secondary output<br />

of content processing activities promises significant<br />

progressive improvement of language technology<br />

components through systematic targeting and reuse of<br />

these resources. Such active curation has already shown<br />

significant improvement in Statistical Machine Translation<br />

(SMT) performance within a crowd-sourced translation<br />

project where the rapid retraining of SMT has been made<br />

possible by the active curation and reuse of human<br />

translation corrections. This work has led to funding<br />

being secured (SFI TIDA feasibility funding) to develop<br />

further rapid SMT retraining techniques.<br />

Demonstrator Showcases<br />

As mentioned above, the demonstrator systems were<br />

showcased at two events in <strong>2012</strong>, the <strong>CNGL</strong> Year 5<br />

Review (July) and the <strong>CNGL</strong> Localisation Innovation<br />

Showcase at the Localisation Research Centre conference<br />

(September). The following provides an overview of the<br />

key systems that were showcased at these events.<br />

This initial set of demonstrators highlights the<br />

commercialisation outputs of <strong>CNGL</strong> that have emanated<br />

from the Demonstrator Programme:<br />

} Text Classification for Bulk Localisation Review<br />

[Digital Linguistics/TCD – ILT/SF]: Phil Ritchie<br />

(Digital Linguistics) and Gerard Lynch demonstrated<br />

Review Sentinel, a software-as-a-service offering for<br />

scalable and consistent language quality management<br />

from <strong>CNGL</strong> spinout Digital Linguistics. This direct<br />

licensing and commercialisation of <strong>CNGL</strong> academic/<br />

industrial collaboration reduces linguistic review cost<br />

while ensuring the highest levels of style and brand<br />

consistency.<br />

} Wripl – Personalisation-as-a-Service across<br />

Websites [TCD – DCM]: Kevin Koidl and Brian<br />

Gallagher demonstrated non-invasive cross-site<br />

personalisation. This work improves a user’s<br />

experience as they browse across multiple different<br />

CMS systems to solve a particular task. As the user<br />

browses from site to site, the system gains knowledge<br />

about their task and gives hints to the CMS on which<br />

content to recommend. Wripl has been developed<br />

with the support of Science Foundation Ireland (SFI)/<br />

Enterprise Ireland (EI) Technology Innovation<br />

Development Award (TIDA) funding and its<br />

development is now supported by the Enterprise<br />

Ireland Commercialisation Development Fund.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 83<br />

} Emizar – Personalised Retrieval Composition and<br />

Presentation [TCD/Symantec – DCM]: Dr. Alex<br />

O’Connor presented the Personalised Multilingual<br />

Customer Care Adaptive Portal. This system combines<br />

formal technical content with social content harvested<br />

from user forums to present tailored, task-specific<br />

solutions to users for customer support and technical<br />

support problems, across languages and levels of<br />

expertise. This is already in receipt of SFI/EI TIDA<br />

funding and is undertaking trials with industrial<br />

reference customers.<br />

} KantanMT – Moses on the Cloud [DCU/<br />

Xcelerator – ILT]: Tony O’Dowd (Xcelerator)<br />

and Dr. Declan Groves demonstrated how ILTbased<br />

machine technology and know-how is being<br />

leveraged commercially to provide cloud-based MT<br />

services. Tony O’Dowd has formed a DCU spin out,<br />

Xcelerator, to commercialise this technology, which<br />

already has over 400 mid-sized client LSPs. Xcelerator<br />

secured US$1.2 million in funding from a syndicate of<br />

investors which will allow for the creation of 25 new<br />

development jobs.<br />

} iOmegaT – An Instrumented CAT Tool and its<br />

use in a Commercial Machine Translation Study<br />

[TCD/Welocalize – SF]: John Moran demonstrated<br />

how his instrumented version of the CAT tool<br />

OmegaT was used to collate post-editing time data<br />

during commercial MT evaluation projects conducted<br />

by Welocalize. Such data has the potential to be vital<br />

in assessing the post-editing effort and quality of<br />

machine translation and assessing the performance<br />

of different MT offerings in a commercial translation<br />

setting. This technology has also now secured SFI/EI<br />

TIDA feasibility funding for 2013.<br />

} PLuTO – Facilitating Patent Search with Machine<br />

Translation [DCU/FP7 – ILT]: Dr. John Tinsley<br />

demonstrated work by the EU-funded PLuTO<br />

project which has developed in-browser software<br />

that allows patent search professionals to carry out<br />

personalised translations on-the-fly. The technology<br />

uses statistical machine translation that has been<br />

adapted to the patent domain and deployed as a web<br />

service. This technology is now supported by the EI<br />

Commercialisation Development Fund for further<br />

development at DCU.<br />

} SOLAS Match – Leveraging community translation<br />

[UL/Rosetta Foundation – LOC]: Dr. Eoin Ó<br />

Conchúir demonstrated how SOLAS Match is used as<br />

a collaborative localisation platform for communitybased<br />

volunteer translators. This is being rolled out in<br />

the non-profit <strong>CNGL</strong> spin-out, the Rosetta Foundation,<br />

where it is being used to support a cohort of 6,000<br />

volunteer translators.<br />

} Rapid SMT Re-training [DCU/TCD/MLW-LT/<br />

PANACEA – ILT/SF]: Dr. Antonio Toral (Affiliated<br />

project – PANACEA) and Leroy Finn showed how<br />

a statistical machine translation (SMT) engine is<br />

re-trained using post-edits from non-professional<br />

translators. <strong>CNGL</strong> provides CMS-LION, which offers<br />

crowd-sourced post-editing integrated with Content<br />

Management Systems (CMS). PANACEA provides a<br />

web service for machine translation and workflows for<br />

the retraining of the SMT engine. This has resulted in<br />

additional SFI/EI TIDA feasibility funding to develop<br />

more rapid SMT retraining techniques.<br />

The following demonstrators showcased a high degree<br />

of industrial engagement and impact:<br />

} Visual Analytics for the Management of Online<br />

Communities [TCD/Symantec – SF]: John McAuley<br />

showed how visual analytics can make analysis of<br />

online interactions accessible to all members of an<br />

online community. This actively supports online<br />

communities in discussing and planning the evolution<br />

of their policies and processes, thereby increasing<br />

member engagement, and has been trialled through<br />

development of a tool to enable members of customer<br />

support communities at Symantec to observe and gain<br />

insight into the behaviour of key community members,<br />

or ‘gurus’.<br />

} Multilingual User Modelling for Personalised<br />

Multilingual Information Retrieval [TCD/DCU/<br />

Microsoft – DCM]: M. Rami Ghorab demonstrated<br />

a framework for multilingual search personalisation.<br />

This work provides a system to permit the delivery<br />

and evaluation of different combinations of functional<br />

elements of a personalised, multilingual information<br />

retrieval system, such as user modelling, query<br />

adaptation, results adaptation and translation. This<br />

demonstrator was advanced through a placement at<br />

Microsoft Ireland offices.


84<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

} CMS-LIONSolas Integration: Full Content Lifecycle<br />

Metadata Interoperability TestBed [UL/TCD/MLW-<br />

LT – SF/LOC]: Dr. David Filip demonstrated a unique<br />

platform for testing complex metadata designs<br />

spanning process areas over the full multilingual<br />

content life cycle. David showed how a RDF-based<br />

provenance store is used between Web Content<br />

Management System (CMS) and XLIFF-based<br />

translation workflows. This demonstrates use cases<br />

for the round-tripping of Internationalisation Tag Set<br />

(ITS) metadata between content generation and<br />

publication in HTML5/XML and localisation processes<br />

in XLIFF. This therefore provides direct testable<br />

input into current standardisation working groups<br />

developing ITS, XLIFF and HTML5<br />

The final set of demonstrators highlights promising<br />

research directions that have influenced the focus of<br />

<strong>CNGL</strong>II:<br />

} MOODfinger – An Affective Search Engine [UCD-<br />

DCM]: Alejandra López-Fernández and Yanfen Hao<br />

presented their initial prototype of a search engine<br />

that retrieves texts that express a certain mood for a<br />

given query and then ranks the texts according to the<br />

degree to which they exhibit this mood. As part of this<br />

work, an affective lexicon is built which can be used to<br />

help retrieve, filter and rank web content in the most<br />

emotionally useful ways. The affective qualities of<br />

content, especially user-generated content, underpin<br />

several research activities in <strong>CNGL</strong>II.<br />

} WebWOZ – A Wizard of Oz Platform [SF/<br />

ILT – TCD/UCD]: Stephan Schlögl demonstrated<br />

a web-based system for supporting online dynamic<br />

intervention by designers while testing user<br />

interactions with application prototypes that will later<br />

incorporate language processing components. This<br />

provides a flexible tool for rapidly iterating low fidelity<br />

application prototypes using Wizard-of-Oz techniques.<br />

Stephan also discussed how WebWOZ was leveraged<br />

by ILT researchers in UCD for user evaluations of<br />

their MySpeech system. Released as an open source<br />

system, WebWOZ forms a key platform for multimodal<br />

interaction and dialogue research in <strong>CNGL</strong>II.<br />

} WinkTalk – Linking Facial Expressions to<br />

Expressive Synthetic Voices [UCD – ILT]: Éva<br />

Székely and Zeeshan Ahmed presented their work on<br />

using facial gestures to automatically select between<br />

expressive synthetic voice styles for use by synthetic<br />

voices and speech generating devices. The expressive<br />

features of the synthetic voices represent dimensions<br />

of emotional intensity rather than distinct emotions.<br />

This work shows the potential for supporting affectdriven<br />

dialogue systems.<br />

Metadata Semantics for Next Generation<br />

Localisation<br />

In addition to developing and showcasing a set of<br />

demonstrator systems, the Demonstrator Programme<br />

provides a basis for examining and modelling problems of<br />

interoperability across the scope of end-to-end content<br />

processing. The components used and integrated in<br />

the demonstrator systems derive from a number of<br />

different research and industrial communities, where<br />

typically either metadata was not formally defined or was<br />

specified in a fragmented set of standards.<br />

The Metadata Group (MDG) was established to<br />

concentrate and integrate the metadata knowledge from<br />

these different communities, including statistical machine<br />

translation and text analytics research, adaptive content<br />

and personalisation research, and localisation workflow<br />

and interoperability expertise. To address the universal<br />

trend towards web-based content and to offer a wellsupported,<br />

community-neutral approach to semantic<br />

modelling of metadata, the standardised languages of<br />

the W3C Semantic Web initiative were used, specifically<br />

the Resource Description Framework (RDF). This allowed<br />

multiple existing metadata standards and component<br />

metadata requirements to be incorporated into a single<br />

model. This thereby demonstrates the interrelation<br />

and utility of such interlinked metadata and provides<br />

a focus for wider consensus building on a semantic<br />

model that combines content management, localisation,<br />

natural language processing and content adaptation/<br />

personalisation. Such an approach enables existing<br />

service-oriented system integration to be enhanced<br />

through semantic annotation of differing interfaces,<br />

e.g. those used in SOLAS, or for SMT integration. It also<br />

supports linked-data provenance annotation for the pullbased<br />

interoperability approach used for the CMS-LION<br />

system.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 85<br />

Figure 8: WinkTalk Workflow<br />

With strong input from the demonstrator teams, the<br />

MDG has developed semantic models of content and<br />

process taxonomies that span the content processing<br />

scenarios (see Figure 7) supported by the Demonstrator<br />

Programme. This provides a broad and validated<br />

semantic model that will guide future interoperability<br />

solutions across the Global Intelligent Content space.<br />

This broad view of interoperability based around<br />

semantic metadata and its mapping to existing<br />

standards has allowed <strong>CNGL</strong> to impact on international<br />

standardisation efforts. As well as UL’s established<br />

participation in the OASIS XLIFF Technical Committee,<br />

in <strong>2012</strong> UL, TCD, DCU, Microsoft and VistaTEC together<br />

with several international academic and industrial<br />

collaborator and the support of EU funding, founded<br />

a new W3C working group on Multilingual Web –<br />

Language Technology. This working group addresses<br />

the interoperability challenges that exist in integrating<br />

content management systems, localisation systems and<br />

machine translation services. Interoperability use cases<br />

being addressed include: CMS-based content translation<br />

and quality assurance; CMS-LSP metadata round-tripping<br />

and content metadata for machine translation training<br />

and on-demand content translation. The consortium is<br />

led by DFKI (Germany), and contains other academic<br />

experts, a CMS vendor (Cocomore), several LSP and<br />

language technology providers (Moravia, Enlaso,<br />

LinguaServ. ]Init[. Logrus, Tilde and Lucy Software)<br />

as well as attracting further participation from large<br />

localisation clients including Adobe, SAP, Intel and IBM.<br />

Input from the Demonstrator Programme has been in the<br />

form of integration between CMS-LION, SOLAS, MaTrEx,<br />

PANACEA MT training services and localisation quality<br />

assurance from Digital Linguistics and VistaTEC. UL(LOC)<br />

and TCD(SF) have also been instrumental in driving<br />

roundtrip scenarios between ITS in HTML5/XML files<br />

and XLIFF-based workflows, thereby acting to harmonise<br />

parallel specification activities in the MLW-LT working<br />

group at the W3C and the XLIFF Technical Committee at<br />

OASIS as well as contributing to those groups individually<br />

as editors and co-chairs.


86<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

Figure 9: Overview of Semantic Modelling of the Metadata Group in influencing international standards<br />

The chairing and organisation by the Metadata Group<br />

of the inaugural FEISGILTT <strong>2012</strong> interoperability and<br />

standards harmonisation workshop, co-located with<br />

Localization World in Seattle in October, played a key<br />

role in this harmonisation, and will be repeated at<br />

Localization World in London in 2013. In addition, the<br />

Metadata Group organised, in collaboration with the<br />

MLW-LT working group, a workshop in the Multilingual<br />

Web series on the role of Linked Open Data in the<br />

development of the multilingual web. This together with<br />

committee involvement in the Multilingual Semantic Web<br />

workshop in Boston and the Multilingual Linked Open<br />

Data for Enterprises workshop in Leipzig demonstrates<br />

that the <strong>CNGL</strong> Metadata Group is playing a significant<br />

role in guiding the convergence of language and<br />

localisation technologies with the linked data cloud. This<br />

role will continue in <strong>CNGL</strong>II through the Interoperability<br />

and Analytics theme as well as through proposed new<br />

EU projects.


Industry Partnerships and<br />

Technology Transfer


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 89<br />

Industry Partnerships and<br />

Technology Transfer<br />

Overview<br />

For the last twenty years the localisation industry has<br />

focused on delivering valuable solutions that adapted<br />

content for specific geographic regions and cultures.<br />

The next twenty years will be about inventing realtime<br />

solutions which operate across the global content<br />

value chain to transform content into small actionable<br />

bits of information personalised to specific individuals,<br />

regardless of their current location. To accommodate<br />

this industry shift, the <strong>CNGL</strong> research programme has<br />

expanded out to look at key aspects of the end-to-end<br />

content value chain.<br />

Knowledge transfer within the Centre operates under<br />

an industry-standard Collaborative Research and<br />

IP Agreement. The IP agreement was signed by all<br />

parties in May 2008, while the Collaborative Research<br />

Agreement was signed in May 2009 at an event held at<br />

the IBM campus in Dublin. The Collaborative Research<br />

Agreement clearly defines how intellectual property<br />

generated by the Centre is managed and ultimately<br />

commercialised.<br />

<strong>CNGL</strong> is working with our industrial partners to deliver<br />

a range of solutions across the global content value<br />

chain that provide consistently fine-grained analysis and<br />

services to an ever more empowered and demanding<br />

group of global consumers.<br />

As <strong>CNGL</strong>’s fifth year draws to a close, we can report<br />

progress on multiple fronts, particularly in our<br />

commercialisation and industry outreach efforts. During<br />

the past year the <strong>CNGL</strong> Centre Management team has<br />

placed significant emphasis on maturing and deepening<br />

relationships with our current industry partners, as<br />

well as engaging with the broader ecosystem. At the<br />

same time, our Intellectual Property portfolio and<br />

commercialisation pipeline have come together and<br />

are demonstrating significant market potential. To date<br />

<strong>CNGL</strong> spinouts have raised in excess of €1.25M in<br />

venture capital funding and are projecting the creation of<br />

25+ private-sector jobs in the coming year.<br />

In <strong>2012</strong> <strong>CNGL</strong> continued its successful Localisation<br />

Innovation Showcase series, which has continually<br />

strong attendance since it was launched in 2009. The<br />

event, which attracts upwards of 100 attendees, is an<br />

opportunity to showcase emerging <strong>CNGL</strong> innovations.<br />

In addition, the event has become a catalyst for<br />

an expanding array of interactions between <strong>CNGL</strong><br />

researchers and practitioners from the broader industrial<br />

ecosystem.<br />

<strong>CNGL</strong> Spinout Showcase at Symantec’s offices in Ballycoolin, Dublin<br />

As a commercially-focused research centre, <strong>CNGL</strong><br />

depends upon its industrial partners to provide<br />

candid guidance regarding the research agenda and<br />

to continually assess our progress towards key project<br />

milestones. Industrial partners have representatives on<br />

every significant management committee within the<br />

<strong>CNGL</strong> organisational structure; this provides them with<br />

formal top-down communication channels through<br />

which to influence the research agenda. Furthermore,<br />

our corporate engagement strategy emphasises oneon-one<br />

reciprocal relationships between academic<br />

researchers in <strong>CNGL</strong> and their corporate equivalents,<br />

which provides equally important and effective bottomup<br />

communication channels.


90<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Alchemy’s initial 5-year commitment to <strong>CNGL</strong> had been<br />

valued at €630K, which is a combination of software<br />

licences and consulting expertise. The company has<br />

already contributed the full complement of software to<br />

our research tracks, valued at over €600K. In addition to<br />

software licences, Alchemy personnel have dedicated a<br />

significant number of hours working directly with <strong>CNGL</strong><br />

staff.<br />

TCD Winners of EI/SFI Technology Innovation Development Awards<br />

(TIDA) <strong>2012</strong> include Prof. Séamus Lawless, Dr. Alex O’Connor and Prof.<br />

Vincent Wade of <strong>CNGL</strong><br />

During <strong>2012</strong> Alchemy Software Development has been<br />

particularly active with respect to the use of Machine<br />

Translation technology and was a key supporter during<br />

<strong>2012</strong> of the successful SFI TIDA application which<br />

secured additional funding for research prototyping in<br />

the area of rapid machine translation retraining.<br />

Capita Translation & Interpreting<br />

(Previously Applied Language Solutions)<br />

Current Industrial Partnerships<br />

<strong>CNGL</strong> currently has 10 diverse corporate partners<br />

who maintain a strong commitment to the long-term<br />

success of our research efforts. Our partners include<br />

multinational companies such as DNP, IBM, Microsoft,<br />

and Symantec as well as indigenous and regional SMEs<br />

including Alchemy, SDL, SpeechStorm, Applied Language<br />

Solutions, Welocalize and VistaTEC.<br />

The diversity of our partners is a reflection of the<br />

challenges facing <strong>CNGL</strong> as well as the importance of<br />

our research to both the Irish economy and global<br />

marketplace. A successful realisation of the <strong>CNGL</strong><br />

objectives will help drive not only the development and<br />

productisation of novel early stage technologies but also<br />

solidify Ireland as the centre of excellence for multilingual<br />

localisation research and development.<br />

Alchemy<br />

Capita Translation & Interpreting became a full member<br />

of <strong>CNGL</strong> in January <strong>2012</strong> with its acquisition of Applied<br />

Language Solutions, which in turn had previously<br />

acquired original <strong>CNGL</strong> Partner Traslán. The company<br />

employs more than 150 members of staff worldwide<br />

and provides language solutions to customers in over 90<br />

countries, in more than 200 different languages. Traslán’s<br />

initial 5-year commitment to <strong>CNGL</strong> had been valued at<br />

€958K, which is a combination of software licences and<br />

consulting expertise. Applied Language Solutions has<br />

taken over the mantra and is already making significant<br />

contributions in terms of translation memories. One<br />

of the key benefits of <strong>CNGL</strong> membership is talent<br />

acquisition – having access to highly skilled researchers.<br />

Applied Language Solutions has hired three <strong>CNGL</strong><br />

researchers to enable the fast growth translation services<br />

provider to improve further its industry-leading service,<br />

through driven development of its machine-assisted<br />

translation solution.<br />

Alchemy Software Development is one of the world’s<br />

foremost and recognised localisation technology<br />

providers. The company was founded as an Irish SME<br />

in 2000 and, as a result of its phenomenal growth and<br />

success, completed a merger with Translations.com, a<br />

leading provider of software, website and enterprisewide<br />

localisation services, as well as localisation-related<br />

technology products, in 2008.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 91<br />

DNP<br />

Founded in Japan in 1876, Dai Nippon Printing (DNP)<br />

has grown to become one of the world’s leading<br />

comprehensive printing companies. DNP has developed<br />

a unique vision of the future of multilingual multi-modal<br />

digital media based on its significant expertise in the<br />

management of global multilingual content distribution.<br />

With the company predicting the coexistence of paper<br />

and digital media along with the anticipated creation<br />

of new forms of media, DNP’s participation in <strong>CNGL</strong><br />

is of particular strategic importance to its long-term<br />

objectives.<br />

Despite the distance, DNP is actively involved in the<br />

strategic direction of the Centre. During the course of<br />

<strong>2012</strong> DNP has sent representatives to Dublin to discuss<br />

commercialisation strategy as well as hosted a <strong>CNGL</strong><br />

Delegration to discuss <strong>CNGL</strong>’s new research programme.<br />

IBM<br />

IBM is one of the world’s leading technology and<br />

service providers dedicated to helping clients succeed<br />

in delivering business value by becoming more efficient<br />

and competitive through the use of business insight<br />

and information technology. As a multinational firm,<br />

IBM takes a globally integrated approach to innovation<br />

with a network of more than 60 software development<br />

and research laboratories that explore, test and support<br />

a wide range of emerging technologies. IBM first set<br />

up operations in Ireland over 50 years ago and since<br />

then the region has become the hub of worldwide<br />

research into linguistic technologies. Furthermore, the<br />

recently established IBM Dublin Centre for Advanced<br />

Studies (CAS) has made Human Language Technologies<br />

one of its core research priorities. IBM launched the<br />

LanguageWare project in 2001 with the vision of creating<br />

a componentised linguistic platform with applications<br />

across the company’s entire product portfolio.<br />

LanguageWare is now the most broadly used linguistic<br />

technology across IBM.<br />

Over the initial five years of the <strong>CNGL</strong> operation, IBM<br />

has committed a total of €8.65M in funding to the<br />

programme, €7.7M in the form of software licences and<br />

1.75 FTEs valued at €950K. To date we have integrated<br />

€6.9M worth of IBM software licences.<br />

Microsoft<br />

Founded in 1975, Microsoft is the global leader in<br />

software, services and solutions that help people and<br />

businesses realise their full potential. The company first<br />

set up operations in Ireland in 1985 and has steadily<br />

expanded its base of activity, now employing almost<br />

2,000 full-time and contract staff. As a company that<br />

localises products and services into 60+ languages,<br />

the need for integrated enterprise and personalised<br />

localisation tools is one of the fundamental challenges<br />

stretching across each of Microsoft’s business units.<br />

The company’s participation in <strong>CNGL</strong> provides our<br />

researchers with a unique industry perspective on the<br />

challenges of international product development.<br />

Microsoft has already contributed the full complement<br />

of original proposed contribution to the research tracks,<br />

valued at over €2M in terms of translation memories,<br />

helping researchers both in Bulk Enterprise Localisation<br />

and Personalised Multilingual Customer Care. Microsoft<br />

has filled two intern positions with <strong>CNGL</strong> researchers<br />

during <strong>2012</strong> and continues to be proactive on the<br />

industrial committee.<br />

SDL<br />

SDL was founded in 1992 and has since grown to become<br />

one of the world’s foremost localisation providers to<br />

businesses maintaining a global market presence. SDL<br />

is at the forefront of research and development in the<br />

fields of machine translation and global information<br />

management technologies. SDL’s industry leading<br />

position in the translation supply chain offers <strong>CNGL</strong><br />

researchers unparalleled access the tools and expertise<br />

that are used to serve over 400 of the world’s leading<br />

enterprises.


92<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

SDL’s initial commitment to <strong>CNGL</strong> included a localisation<br />

management system (Idiom Worldserver) valued in<br />

excess of €300K over the life of the project. This software<br />

had already been delivered during the first year of the<br />

Centre’s operation and formed the backbone of the<br />

baseline <strong>CNGL</strong> Demonstrator System.<br />

SpeechStorm<br />

SpeechStorm is a solutions provider that specialises<br />

in integrating market leading voice platforms and<br />

speech recognition software with in-house application<br />

development expertise. The company is an SME based<br />

in Northern Ireland and serves a range of customers<br />

including multiple government agencies, utility providers<br />

and financial service firms. The company’s expertise<br />

in integrating multiple voice platforms and speech<br />

recognition systems is particularly relevant to the<br />

research work packages on Speech Technology within<br />

the Integrated Language Technologies track.<br />

SpeechStorm’s initial five-year commitment to <strong>CNGL</strong> was<br />

valued at €140K, which includes €80K worth of software<br />

services and 0.10 FTEs valued at €60K. SpeechStorm<br />

has to date interacted primarily through direct research<br />

engagements with the Speech Technology groups at<br />

UCD and TCD.<br />

Symantec<br />

Symantec is a global forerunner in the provision of<br />

solutions to help individuals and enterprises assure<br />

the security, availability and integrity of their digital<br />

information. The Symantec Shared Engineering Services<br />

group is responsible for company-wide localisation<br />

management along with on-going research and<br />

development efforts.<br />

Symantec’s primary areas of localisation-related research<br />

focus on machine translation, MT customer satisfaction<br />

studies, and techniques to enhance Rule-Based MT<br />

(RBMT) performance. During <strong>2012</strong> Symantec funded an<br />

additional PhD and postdoctoral research in the area of<br />

natural language parsing.<br />

Symantec’s initial commitments to <strong>CNGL</strong> have been<br />

exceeded, valued at €2.25M comprised of €2.0M worth<br />

of multiple translation memories and 2.15 FTEs valued<br />

at €225K. <strong>CNGL</strong> has seen additional commitments<br />

of content and translation memory resources from<br />

the company during <strong>2012</strong>. Symantec has also helped<br />

the researchers with specification of use scenarios for<br />

Demonstrator Systems and provided cash contributions<br />

to further the research and development in the area<br />

of Domain Adaption and Personalised Multilingual<br />

Customer Care.<br />

VistaTEC<br />

VistaTEC is a supplier of premier quality Translation,<br />

Linguistic Review and other language-related business<br />

services to leading high-tech companies throughout<br />

the world. Its sophisticated service delivery platforms<br />

contribute significant value to customers by providing<br />

them with enterprise solutions which are: scalable, time<br />

efficient, cost effective, synergistic and innovative.<br />

As a prominent provider of Language Services, VistaTEC<br />

has committed to an extensive programme of Research<br />

and Development that ensures that the firm remains at<br />

the forefront of the localisation industry and can offer<br />

its customers the pinnacle of added value. VistaTEC<br />

is a founding Industrial Partner of the Centre for Next<br />

Generation Localisation. VistaTEC’s research activities<br />

during <strong>2012</strong> have centred on the area of Text Analytics<br />

for translation review and the company has contributed<br />

to this research in terms of providing large testing data<br />

and access to human translation quality review. This<br />

commitment from VistaTEC has resulted in a very<br />

successful commercialisation of the research.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 93<br />

New Industrial Partnerships<br />

In the past year <strong>CNGL</strong> has continued with its extensive<br />

programme of industry outreach, following a strategy<br />

of targeting specific industry verticals where <strong>CNGL</strong> has<br />

developed robust, rapidly transferable expertise. This<br />

has resulted in one-on-one discussions with an array of<br />

companies and a number of new industrial collaborations<br />

that serve to extend the reach of our activities and help<br />

diversify funding to complement the initial investment<br />

made by Science Foundation Ireland. As a result of these<br />

activities we were pleased to welcome Intel/McAfee,<br />

our as our newest industrial partner, to <strong>CNGL</strong>’s research<br />

consortium starting in 2013.<br />

The industrial outreach efforts of <strong>CNGL</strong> emphasise two<br />

main pillars:<br />

Mr. Phil Ritchie presents Digital Linguistics at the <strong>CNGL</strong> Spinout<br />

Showcase in September <strong>2012</strong><br />

Welocalize<br />

} Ireland as a centre of excellence for high-value R&D<br />

(top-20 globally) with a critical mass of industry<br />

participants and ancillary activities<br />

} <strong>CNGL</strong> has a critical mass of applied academic research<br />

expertise in localisation and related industries which is<br />

valuable for partners and collaborators.<br />

Welocalize became the tenth industrial partner of <strong>CNGL</strong><br />

during 2011. Welocalize was founded in 1997, and is a<br />

privately-held, venture-backed company. Welocalize<br />

has more than 500 employees in 11 offices located<br />

in the USA, UK, Ireland, Germany, China and Japan.<br />

Clients include eight of the world’s top ten software<br />

and hardware companies. Welocalize provides nextgeneration<br />

translation supply chain management that<br />

delivers market-ready, translated content – when and<br />

where users demand – at a higher output, a faster<br />

pace and an affordable price. Welocalize supports<br />

organisations throughout the entire global content<br />

lifecycle, from authoring and product development,<br />

translation and quality assurance, to complete business<br />

process outsourcing and market validation.<br />

In conjunction with our industry outreach efforts, we<br />

have launched the <strong>CNGL</strong> Collaboration Framework,<br />

which provides mechanisms for new partners to<br />

engage with the Centre. This collaboration framework<br />

is designed to foster the flow of information among<br />

trusted partners while at the same time respecting the<br />

intellectual property obligations set forth by the <strong>CNGL</strong><br />

Collaborative Research Agreement. There are three<br />

broad types of classified collaboration opportunities set<br />

out: Full Members, Collaborators and Associates.<br />

Figure 10: <strong>CNGL</strong> Collaboration Framework<br />

Welocalize’s contribution to <strong>CNGL</strong> will be in terms<br />

of software development resources and supporting<br />

researchers with access to GlobalSight, a collaborative,<br />

flexible and sustainable translation management system.<br />

Welocalize was a key supporter during <strong>2012</strong> of the<br />

successful TIDA application which secured €98K funding<br />

for research in the area of rapid retraining of machine<br />

translation systems.


94<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Full Members<br />

Full Members are both industrial and academic partners<br />

who have agreed to be bound by the terms of the<br />

<strong>CNGL</strong> Collaborative Research and IP Agreements. Full<br />

Membership is available on a limited basis to third<br />

parties, who have a long-term strategic interest in <strong>CNGL</strong><br />

and the wherewithal to contribute substantial resources<br />

to on-going research activities within the Centre. Full<br />

Membership provides preferential IP access, Committee<br />

Membership, and direct access to researchers and staff<br />

within <strong>CNGL</strong>.<br />

Associate Members<br />

Associate Membership provides a springboard for<br />

organisations that may be interested in establishing<br />

deeper ties with <strong>CNGL</strong>. In exchange for a small<br />

membership fee, associates are granted an array of<br />

benefits, the most noteworthy being access to the prescreened<br />

<strong>CNGL</strong> publication stream. While Associates<br />

are not granted preferential access to IP generated in<br />

<strong>CNGL</strong>, it is expected that this group will play a critical<br />

role in the commercialisation and licensing of emerging<br />

technologies.<br />

Commercialisation<br />

<strong>CNGL</strong> is entering its fifth year with a rich pipeline of<br />

business opportunities. Previously, in order to support<br />

the maturation of our commercial pipeline, the<br />

management of <strong>CNGL</strong> placed significant emphasis on<br />

developing the Centre’s entrepreneurial ecosystem. In<br />

<strong>2012</strong>, as part of our Commercialisation Strategy, <strong>CNGL</strong><br />

initiated a comprehensive outbound effort to engage<br />

with the broader entrepreneurial ecosystem. This effort<br />

was made possible through the continued support of the<br />

Enterprise Ireland Commercial Development Manager<br />

(CDM) programme. The CDM programme has provided<br />

<strong>CNGL</strong> with a full-time staff member who focuses<br />

specifically on partnering strategies, open innovation<br />

initiatives, fund-raising and business development<br />

activities within the Centre.<br />

Mr. Tony O’Dowd of <strong>CNGL</strong> spinout Xcelerator Machine Translations<br />

discusses the KantanMT product with Mr. Steve Gotz, <strong>CNGL</strong><br />

Commercial Development Manager<br />

Collaborators<br />

Collaborators engage directly with <strong>CNGL</strong> on issues of<br />

strategic importance to them. Collaborators can be<br />

both industrial and academic entities that are either<br />

1) a <strong>CNGL</strong> Full Member who has sponsored a specific<br />

research project or 2) a legal entity not previously<br />

affiliated with <strong>CNGL</strong>. Collaborator projects are governed<br />

under separate and individual Collaborative Research, IP<br />

and Confidentiality Agreements, which provide a range<br />

of structural options. While collaborators operate under<br />

separate agreements, there is a benefit to integrating<br />

them under the broader <strong>CNGL</strong> umbrella, thereby<br />

facilitating valuable interactions and sharing of expertise.<br />

<strong>CNGL</strong> finished <strong>2012</strong> with two actively trading spinout<br />

companies: Xcelerator Machine Translation Solutions<br />

and Scream Technologies. To date these companies<br />

have raised a combined €1.25M in venture capital<br />

funding from an array of investors including Delta<br />

Partners, Enterprise Ireland as well as two private family<br />

offices. The companies are projecting the creation of<br />

over 25 private-sector jobs in the coming year.<br />

Scream Technologies<br />

Scream Technologies is a <strong>CNGL</strong> spinout company that<br />

specialises in creating synthetic voices from human<br />

actors, enabling companies to create human-sounding<br />

synthetic speech and control how it sounds. The service,<br />

which can run as a standalone installation, embedded<br />

solution or web application, has valuable applications in<br />

areas as diverse as video games, customer support and<br />

advertising.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 95<br />

Scream Technologies is currently located in DogPatch<br />

Labs Dublin, a startup incubator funded by Polaris<br />

Venture Partners. The company is promoted by Dr.<br />

Peter Cahill, a funded <strong>CNGL</strong> researcher, who joined<br />

the company full-time in <strong>2012</strong>. During <strong>2012</strong> Dr. Cahill<br />

was named one of Ireland’s top technology and startup<br />

leaders by well-known entrepreneurs Dylan Collins and<br />

Sean Blanchfield.<br />

Xcelerator Machine Translation Solutions<br />

The Localisation Service Market generates over US$20BN<br />

in annual revenues and has a robust and resilient<br />

annual growth rate of 7.5%. However, while the demand<br />

for translation services is surging upwards, there is<br />

downward pressure on prices and reducing margins. This<br />

is coupled with a demand by customers for shortened<br />

turnaround cycles for translation projects (shortened<br />

project cycles). Essentially, clients want more for less,<br />

faster.<br />

SME Collaboration Spotlight<br />

Reverbeo is a startup company using technology<br />

to help companies of all sizes growth their global<br />

audience. The company’s novel technology is able<br />

to harvest monolingual websites, translate them<br />

into multiple languages using a range of services<br />

including machine translation, crowd-sourcing and<br />

professional translators, and ultimately republish<br />

them with minimal effort.<br />

During <strong>2012</strong>, while the company participated<br />

in the NDRC Launchpad Programme, a team of<br />

<strong>CNGL</strong> researchers worked with the founders to<br />

help refine their minimum viable product and<br />

extend their product development roadmap.<br />

During 2013 <strong>CNGL</strong> is expanding the collaboration<br />

with Reverbeo, supported by Enterprise Ireland,<br />

and applying <strong>CNGL</strong> expertise to the challenge of<br />

domain-tuned machine translation systems.<br />

Professional translators need to explore new ways of<br />

improving productivity and reducing project turnaround<br />

times whilst maintaining exacting quality standards and<br />

linguistic consistency for their clients. The downward<br />

pressure on pricing and restraints on client budgetary<br />

plans makes this a daunting challenge. Xcelerator is a<br />

spin-out, promoted by Tony O’Dowd, which is developing<br />

software solutions to help professional translators<br />

address these challenges head-on; improving quality and<br />

consistency, and reducing project turnaround times and<br />

costs.<br />

Beyond startups, <strong>CNGL</strong> research and expertise<br />

has helped a range of external companies which<br />

are launching new products and services. These<br />

collaborations have leveraged crucial Enterprise Ireland<br />

funding schemes (Innovation Partnerships, Innovation<br />

Vouchers, Commercialisation Fund) to bridge the<br />

gap between research and the market. During <strong>2012</strong><br />

DigitalLinguistics, a <strong>CNGL</strong> licensee, launched its<br />

first product: ReviewSentinel. The product leverages<br />

core <strong>CNGL</strong> research in the area of text analytics to<br />

automatically perform linguistic quality assurance<br />

testing in a scalable and cost-efficient manner.<br />

Intellectual Property Management<br />

There are three agreements providing the legal<br />

framework in which the <strong>CNGL</strong> operates. The Funding<br />

Agreement outlines the financial arrangements between<br />

SFI and the lead institution. The IP Agreement outlines<br />

how IP is managed within <strong>CNGL</strong>, and the Collaborative<br />

Research Agreement is the all-encompassing agreement<br />

on how the programme is governed and managed.<br />

One of the core missions of <strong>CNGL</strong> is excellence in<br />

research, expanding the state-of-the-art through<br />

dissemination of research results. At the same time,<br />

<strong>CNGL</strong> is required to protect valuable IP and make it<br />

available for commercial exploitation. This needs careful<br />

management and our researchers operate under a<br />

publication code of practice. Before a paper is submitted<br />

to a conference it is uploaded to a publication tracking<br />

system and in turn emailed automatically to the <strong>CNGL</strong><br />

IP Committee to review for valuable IP. <strong>CNGL</strong> is a large<br />

research centre that generates over 100 publications<br />

each year and this is one of the ways in which all<br />

partners and PIs can identify IP across all research tracks.


96<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Another way to identify IP is through relationship and<br />

event-driven audits. Formal mechanisms are in place<br />

such as software disclosures and invention disclosures<br />

and publication reviews. Nonetheless, one of the best<br />

ways to identify IP is through continuous engagement<br />

with the researchers through both informal and formal<br />

meetings across the four universities. This engagement<br />

enables the IP team to identify patterns of activity across<br />

the research streams before publication of material.<br />

This also helps to promote awareness of IP at an early<br />

stage before the formal disclosures and helps the<br />

commercialisation team to bridge any gaps between<br />

research and industry.<br />

One of the mandates of <strong>CNGL</strong> is to diversify the funding<br />

base through affiliate collaborations. This presents certain<br />

challenges with regards to IP Management. Nevertheless,<br />

there is a framework in place that allows us to manage<br />

these collaborative projects in a way that protects the<br />

rights of the <strong>CNGL</strong> members as well as our affiliated<br />

partners. This year has seen a successful application of<br />

this framework across multiple EU FP 7 projects, IRCSETand<br />

EI-funded projects and direct industry funded<br />

engagements. The collaboration framework is designed<br />

to foster the flow and control of information between the<br />

affiliated project and the core <strong>CNGL</strong>, while at the same<br />

time respecting the IP obligations set forth by the original<br />

CRA.<br />

Spinouts panel at the <strong>CNGL</strong> Spring Scientific Committee Meeting,<br />

which took place in Dublin in May<br />

To facilitate successful implementation of our<br />

commercialisation strategy, we have been at the forefront<br />

of developing internal platforms that allow us to better<br />

collect, identify and manage all of the IP being generated<br />

by our researchers. This has been evident in the roll-out<br />

of a new product called LabJam that is currently in Beta.<br />

This system is designed to provide a more detailed view<br />

into our research streams and activities and to give our<br />

industry partners and SFI visibility into our innovation<br />

pipeline.


Management and<br />

Governance


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 99<br />

Management and Governance<br />

Management Overview<br />

The Centre for Next Generation Localisation believes<br />

that clear and simple Management and Governance<br />

structures are essential to ensure the scientific,<br />

commercial and operational success of the Centre. Our<br />

Management and Governance structures are designed to<br />

support a world-class research environment based on:<br />

} simple, effective and efficient planning and decision<br />

making<br />

} clear responsibility<br />

} open and transparent communication structures<br />

} balanced and comprehensive representation and<br />

involvement of all partners and stakeholders<br />

} provision of point of contact and procedures for<br />

conflict resolution<br />

} flexibility to respond quickly and appropriately<br />

to changing environments<br />

} structures and support for Intellectual Property<br />

management, Technology Transfer and commercial<br />

exploitation<br />

} regular appraisal of the scientific programme by<br />

international experts<br />

} regular appraisal of management and governance<br />

structures<br />

} reflecting best practice in management and<br />

governance of large collaborative research centres.<br />

The Centre Director, Prof. Josef van Genabith, provides<br />

overall scientific leadership and responsibility for<br />

the running of the Centre. A number of boards and<br />

committees support the Director in the management,<br />

integration and oversight of the Centre’s research and<br />

operations following the principles set out above. In<br />

particular, the research efforts of the Centre involve a<br />

considerable amount of cross-site collaboration and<br />

interdependency between our four academic and ten<br />

industrial partners. This requires a strong emphasis on<br />

cross-site coordination.<br />

The overall management and governance of the Centre is organised as follows:<br />

Figure 11: <strong>CNGL</strong> Governance and Management Structure


100<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

Research Co-Ordination<br />

The <strong>CNGL</strong> research programme is organised in a<br />

hierarchy of research tracks, work-packages, and subwork-packages.<br />

The four main research tracks relate<br />

to work in Integrated Language Technologies (ILT),<br />

Digital Content Management (DCM), Next Generation<br />

Localisation (LOC) and Systems Framework (SF). Within<br />

these four research tracks, the research programme is<br />

organised into 11 main work-packages, with individual<br />

research projects then organised in 50 sub-workpackages.<br />

Following this structure, co-ordination of the<br />

<strong>CNGL</strong> research activities operates across four interrelated<br />

levels:<br />

} CSET Coordination<br />

} Research Track Coordination<br />

} Main Work-package Coordination<br />

} Sub-Work-package Coordination<br />

Overall CSET Coordination is the responsibility of the<br />

Centre Director, Prof. Josef van Genabith. Research<br />

track coordination is the responsibility of the four Track<br />

Coordinators:<br />

} Integrated Language Technologies (ILT): Prof. Nick<br />

Campbell, TCD<br />

} Digital Content Management (DCM): Prof. Vincent<br />

Wade, TCD<br />

} Next Generation Localisation (LOC): Mr. Reinhard<br />

Schäler, UL<br />

} Systems Framework (SF): Dr. Saturnino Luz, TCD<br />

Each of the eleven main work-packages within the<br />

four research tracks has a work-package co-ordinator<br />

who liaises with the relevant research track leader. The<br />

structure of the four research tracks, 11 main workpackages<br />

and 50 individual sub-work-packages is shown<br />

below:<br />

Figure 12: <strong>CNGL</strong> Research Organisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 101<br />

Integration Committee<br />

The <strong>CNGL</strong> research programme is highly collaborative,<br />

with two basic (ILT & DCM) and two applied (LOC<br />

& SF) research tracks and a demonstrator systems<br />

programme centred around shared use scenarios and<br />

demonstrator systems. Given the level of research coordination<br />

and integration across the four research tracks<br />

and main work-packages, and the level of integration<br />

involved in building demonstrator systems from research<br />

outputs, the <strong>CNGL</strong> Integration Committee is the main<br />

body dealing with the operations of the <strong>CNGL</strong> with<br />

particular emphasis on scientific matters. The Integration<br />

Committee is composed of the Centre Director (who<br />

chairs the committee), the Associate Director, all four<br />

track leaders, Prof. Julie Berndsen from UCD, and<br />

a representative of each industry partner to ensure<br />

maximum engagement of industry partners in oversight<br />

of the research programme. The Integration Committee<br />

meets on a bi-monthly schedule, with additional ad-hoc<br />

meetings called when necessary.<br />

Scientific Committee<br />

The <strong>CNGL</strong> Scientific Committee is comprised of all<br />

members of the Centre across all levels and functions.<br />

The full Scientific Committee typically meets twice every<br />

year in a two- or three-day plenary session to review and<br />

share research progress and outcomes. The meetings of<br />

the Scientific Committee also provide the opportunity<br />

for engagement with our International Collaborators and<br />

External Scientific Advisory Board.<br />

The inaugural <strong>CNGL</strong> Innovation Charette at the Spring Scientific<br />

Committee Meeting<br />

The <strong>CNGL</strong> Spring Scientific Meeting was held over<br />

two days (17th–18th May) at Chartered Accountants<br />

House near Trinity College Dublin. With participation<br />

from across the entire CSET and Industry Partners, the<br />

Meeting focused on discussion of the past and future<br />

of language and content research as well as ways to<br />

further catalyse collaboration with industry. The Meeting<br />

included presentations on key scientific areas including<br />

rapid-prototyping tools, personalised search using<br />

social media, and open-source localisation frameworks.<br />

It also featured demonstrations by <strong>CNGL</strong> spinout<br />

companies, along with a hands-on session on <strong>CNGL</strong>’s<br />

LabJam research activity platform, and the inaugural<br />

<strong>CNGL</strong> Innovation Charette. A charrette is an intense<br />

collaborative session designed to allow participants the<br />

opportunity to work together in a close setting to discuss<br />

real-world challenges and potential solutions. Following<br />

a vigorous period of interaction, each of the teams<br />

presented a three-minute pitch and the audience then<br />

had the opportunity to “invest” in the best ideas. The<br />

charette encouraged participants to imagine inspirational<br />

products that the Centre’s members could create with<br />

their knowledge, and it proved an excellent vehicle<br />

through which to foster imaginative thinking.<br />

<strong>CNGL</strong> researchers and industrial collaborators share their research<br />

highlights at the <strong>CNGL</strong> Spring Scientific Committee Meeting<br />

Due to significant Centre-wide planning for <strong>CNGL</strong>II<br />

and preparations for the <strong>CNGL</strong> Localisation Innovation<br />

Showcase in September, the Centre did not host an<br />

Autumn Scientific Committee Meeting in <strong>2012</strong>.


102<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

Operational Management<br />

Centre Operations Team<br />

The day-to-day implementation of the Centre’s<br />

operational decisions and policies, financial management,<br />

activity co-ordination, tracking and reporting is carried<br />

out by the Centre Operations Team in close co-operation<br />

with the Centre Director. The Centre Operations Team<br />

is led by Dr. Páraic Sheridan and meets weekly with<br />

the Centre Director and Deputy Director to continually<br />

monitor and prioritise activities across all operational<br />

functions, including finance, human resources, reporting,<br />

system administration and software and IP management.<br />

The composition of the Centre Operations team is as<br />

follows:<br />

} Dr. Páraic Sheridan, Associate Director<br />

} Ms. Hilary McDonald, Project Manager<br />

} Mr. Steve Gotz, Commercial Development Manager<br />

} Mr. Stephen Roantree, IP Manager (departed <strong>CNGL</strong><br />

in Quarter 2 <strong>2012</strong>)<br />

} Ms. Sophie Matabaro, Centre Administrator (on<br />

maternity leave from September <strong>2012</strong>)<br />

} Mr. Joachim Wagner, Systems Administrator<br />

} Ms. Fiona Maguire, Financial Administrator<br />

} Ms. Eithne McCann, Centre Secretary<br />

} Ms. Cara Greene, Education and Outreach Manager<br />

} Ms. Laura Grehan, Marketing and Communications<br />

Officer<br />

Mr. Stephen Roantree departed from his position as<br />

<strong>CNGL</strong> Intellectual Property Manager during Quarter 2,<br />

to take up a senior management role with Lionbridge,<br />

based in Dublin. Stephen now leads Lionbridge’s Quality<br />

and Innovation, Engineering, Testing, DTP and Web<br />

Publishing Groups. He continues to engage with <strong>CNGL</strong>.<br />

In addition to the day-to-day work of the Centre<br />

Operations team in executing the operational policies<br />

and activities of the <strong>CNGL</strong>, several Management Boards<br />

and Committees provide direction and prioritisation of<br />

the Centre’s various activities.<br />

Management Committee<br />

The Management Committee is the <strong>CNGL</strong>’s decision<br />

making body and provides leadership, policy, strategy,<br />

resource allocation, performance monitoring and<br />

review, management of CSET membership, and conflict<br />

resolution. The Management Committee meets quarterly<br />

and is chaired by the Centre Director. Its membership is<br />

made up of the Centre’s Co-Principal Investigators and,<br />

although Industry Partner representatives are invited to<br />

participate in Management Committee meetings, they<br />

do not hold a vote. The membership of the Management<br />

Committee for <strong>2012</strong> included:<br />

} Prof. Josef van Genabith, DCU (Director) [Chair]<br />

} Prof. Vincent Wade, TCD (Deputy Director)<br />

} Prof. Nick Campbell, TCD<br />

} Mr. Reinhard Schäler, UL<br />

} Dr. Saturnino Luz, TCD<br />

Education and Outreach Board<br />

The Education and Outreach Board provides leadership,<br />

policy and strategy, objectives and resource allocation<br />

for the Centre’s Education and Outreach Programme.<br />

The Education and Outreach Board meets quarterly<br />

and reports to the <strong>CNGL</strong> Management Committee.<br />

The Board is chaired by the Education and Outreach<br />

Manager, and consists of participants from the academic<br />

participants who have funded E&O Programmes<br />

(TCD and UL) and one nominee from the Industrial<br />

participants in the Centre. The membership of the<br />

Education and Outreach Board in <strong>2012</strong> included:<br />

} Ms. Cara Greene, DCU [Chair]<br />

} Dr. Páraic Sheridan, DCU<br />

} Mr. Karl Kelly, UL<br />

} Dr. Seamus Lawless, TCD<br />

} Ms. Laura Grehan, DCU<br />

} Dr. Fred Hollowood, Symantec


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 103<br />

IP Management Board<br />

The IP Management Board manages the Intellectual<br />

Property of the Centre and facilitates Technology<br />

Transfer and commercial exploitation of IP generated<br />

by the Centre. The IP Management Board advises the<br />

Centre on all IP issues and, in particular, evaluates<br />

proposed publications and invention disclosures in<br />

accordance with the Centre’s IP agreement. The<br />

IP Management Board meets quarterly. The IP<br />

Management Board consists of nominees from each of<br />

the participating university and industrial partners plus<br />

the Centre’s Associate Director. It was chaired by the IP<br />

Manager, Mr. Stephen Roantree until his departure from<br />

<strong>CNGL</strong> to Lionbridge in Quarter 2. The IP Management<br />

Board membership draws on academic membership both<br />

from the Research Leaders (Co-PIs) and representatives<br />

from the respective University Technology Transfer<br />

Offices (TTOs). The IP Management Board reports to the<br />

Management Committee.<br />

External Oversight<br />

Following SFI guidelines and best practice for the<br />

oversight and governance of large research centres,<br />

<strong>CNGL</strong> has two external advisory and oversight boards<br />

that meet regularly to review the scientific and<br />

operational progress of the Centre.<br />

Mr. Steve Gotz, <strong>CNGL</strong> Commercial Development Manager, H.E. Mr. John<br />

Neary, Ambassador of Ireland to Japan, Dr. Páraic Sheridan, Associate<br />

Director, <strong>CNGL</strong>, and Ms. Diane Foley, IDA Ireland Deputy-Director<br />

Japan. <strong>CNGL</strong> delivered a seminar to Japanese businesses in Tokyo in<br />

April, which was hosted by the Irish Ambassador to Japan and facilitated<br />

by IDA Ireland’s Japan Office.<br />

External Scientific Advisory Board<br />

Mr. Stephen Roantree, previously <strong>CNGL</strong> Intellectual Property Manager,<br />

now with Lionbridge Dublin<br />

Commercialisation Committee<br />

The Commercialisation Committee promotes and<br />

oversees the agenda of research commercialisation,<br />

which is a core part of the Centre’s strategy. The<br />

Committee meets on a quarterly basis and its meetings<br />

are co-located with meetings of the IP Management<br />

Board and the Industry Advisory Board at <strong>CNGL</strong> Industry<br />

Partner sites.<br />

The External Scientific Advisory Board provides review of<br />

the long-term scientific direction, impact and progress of<br />

the Centre. It advises, challenges and provides guidance<br />

to the Management Committee on both the overall<br />

scientific goals and objectives of the Centre as well as on<br />

the on-going management of the Centre. The External<br />

Scientific Advisory Board aims to meet bi-annually and<br />

work in close co-operation with the Executive Committee<br />

and the Centre Director. The <strong>CNGL</strong> External Scientific<br />

Advisory Board consists of recognised world leaders from<br />

both academia and industry in the fields of Language<br />

Technology, Machine Translation, Speech, Adaptive<br />

Hypermedia, Information Retrieval, and Localisation.<br />

The External Scientific Advisory Board is chaired by an<br />

expert from the area of Localisation, Mr. Francis Tsang.<br />

Mr. Tsang is Director of Globalisation at Adobe Systems<br />

Inc. He is responsible for the strategy and delivery of all<br />

localised Adobe product releases and the development<br />

of tools and libraries in the internationalisation area. Mr.<br />

Tang has spent the last twenty years building software<br />

for various international markets. He holds degrees in<br />

computing and business management.


104<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

The <strong>CNGL</strong> External Scientific Advisory Board actively<br />

participates in the bi-annual <strong>CNGL</strong> Scientific Committee<br />

meetings and reports back to the Centre Director<br />

and Management Committee. The board is currently<br />

composed of the following members:<br />

} Mr. Francis Tsang, Adobe Corporation, USA<br />

[Localisation] (Chair)<br />

} Dr. Andrew Bredenkamp, Acrolinx GmbH, Germany<br />

[Language Technology]<br />

} Prof. Lauri Karttunen, PARC, USA [Language<br />

Technology]<br />

} Prof. Makato Nagao, President, NIST, Japan<br />

[Machine Translation]<br />

} Prof. Carol Espy-Wilson, University of Maryland, USA<br />

[Speech Technology]<br />

} Prof. Peter Brusilovsky, University of Pittsburgh, USA<br />

[Adaptive Hypermedia]<br />

} Prof. Elizabeth Liddy, Syracuse University, USA<br />

[Information Retrieval and NLP]<br />

} Dr. Mike Dillinger, Principal, TOPs Globalization<br />

Consulting<br />

External Oversight Board<br />

In accordance with SFI requirements, the President of<br />

DCU as the host institution has appointed an External<br />

Oversight Board to help with the oversight and<br />

assessment of the Centre’s progress. The Oversight Board<br />

reports to SFI on a quarterly basis. The Oversight Board<br />

is composed of members drawn from a mix of academic<br />

partners, a representative from the <strong>CNGL</strong> Industry<br />

Partners, and other external independent members.<br />

The board currently consists of the following members:<br />

} Mr. David MacDonald [Chair]<br />

} Prof. Josef van Genabith, Centre Director<br />

} Prof. Alan Harvey (VP Research, DCU)<br />

} Prof. Vinny Cahill (Dean of Research, TCD)<br />

} Mr. Gearóid Mooney (Enterprise Ireland)<br />

} Mr. Aidan Sweeney (IBEC)<br />

In addition to the full members of the External<br />

Governance Board (which included Centre Director Prof.<br />

Josef van Genabith), <strong>CNGL</strong> is represented at quarterly<br />

meetings by:<br />

} Prof. Vincent Wade, Deputy Director<br />

} Dr. Páraic Sheridan, Associate Director<br />

The Oversight Board met quarterly during <strong>2012</strong> to review<br />

<strong>CNGL</strong> progress against its scientific and operational<br />

targets to review Key Performance Indicators (KPIs) and<br />

report back to SFI.<br />

<strong>2012</strong> Significant Accomplishments<br />

In the fifth year of the Centre for Next Generation<br />

Localisation, the following management and governance<br />

accomplishments have been recorded:<br />

} <strong>CNGL</strong> successfully passed its SFI Final Review and<br />

succeeded in its application for a second cycle of<br />

funding from SFI. The Review and funding application<br />

appraisal were conducted over two days in July at<br />

Trinity College Dublin. The review panel, which<br />

comprised senior figures from industry and academia,<br />

assessed the Centre’s performance and future<br />

potential on a range of criteria, including scientific<br />

excellence and social and economic impact. In its<br />

report the panel stated that “<strong>CNGL</strong> successfully built<br />

the infrastructure for a fully functioning, professional<br />

research centre, including strong capabilities in<br />

overall research direction, reporting, professional<br />

administration, outreach, budget allocation, and<br />

more.” The panel also acknowledged the Centre’s<br />

“mature change management approach” and its<br />

“forward-thinking, strong IP management and tech<br />

transfer capability”, and “was impressed by the<br />

educational outreach at all levels”.<br />

} The Centre Operations Team performed excellently<br />

the challenging task of final reporting for <strong>CNGL</strong>I<br />

alongside providing significant input into preparation<br />

of the <strong>CNGL</strong>II proposal and coordinating the Site Visit<br />

of the review panel in July. The Site Visit included an<br />

exhibition of posters and demos of <strong>CNGL</strong> research<br />

to date. This substantial additional workload was<br />

managed while still maintaining quality delivery of the<br />

day-to-day operations of the Centre and roll-out of a<br />

number of new initiatives in the areas of education<br />

and outreach, commercialisation, reporting and<br />

finance.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 105<br />

Demonstrator showcase at the <strong>CNGL</strong> SFI Site Visit in July at Trinity College Dublin<br />

} The Centre Operations Team has continued to adapt<br />

to the evolving needs of the Centre with changes in<br />

<strong>2012</strong> reflecting in particular the greater emphasis on<br />

commercial engagement. The Associate Director and<br />

Commercial Development Manager spearheaded<br />

a coordinated campaign to attract new industrial<br />

collaborators. Supported by industry-facing Marketing<br />

and Communications resources, the campaign team<br />

stepped up the Centre’s presence at and input into<br />

key industry events in the Intelligent Content area<br />

and delivered pitches to high priority targets. The<br />

campaign has led to Intel signing up as Industry<br />

Partner for <strong>CNGL</strong>II and it has also generated a<br />

number of other promising active leads.<br />

Operational and Management plans for the coming<br />

year focus on ensuring smooth transition to <strong>CNGL</strong>’s<br />

second cycle of funding. Priorities include rollout of<br />

the Centre’s novel research programme centred on the<br />

Global Intelligent Content theme, attracting talented new<br />

recruits at all levels, establishing the Centre’s new Design<br />

and Innovation Lab, finalising and signing off on renewed<br />

IP and collaborative research agreements, and securing<br />

additional industry partners. There will also be significant<br />

input from the Centre Operations team into the running<br />

of SIGIR 2013 – The 36th <strong>Annual</strong> Conference of the ACM<br />

Special Interest Group on Information Retrieval. <strong>CNGL</strong><br />

will co-host SIGIR 2013 in Dublin in July-August 2013.<br />

} The ‘Localisation Innovation Showcase’ event<br />

collocated in Limerick with the LRC <strong>Annual</strong><br />

Conference in September was a huge success,<br />

drawing in more than 70 industry representatives<br />

from companies based in Ireland and abroad. The<br />

Showcase event included 10 individual stations of<br />

<strong>CNGL</strong> demonstrator systems as well as a multitude<br />

of research posters, industry partner booths, and<br />

display of education and outreach activities.


Education and Outreach


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 107<br />

Education and Outreach<br />

The <strong>CNGL</strong> Education and Outreach Programme<br />

encompasses a broad range of activities from internal<br />

communications and professional development, public<br />

relations and marketing to public-facing projects and<br />

education programmes to foster the next generation of<br />

professionals in content-related industries. We aim to<br />

raise the profile of scientific research within Ireland by<br />

highlighting education and career opportunities in key<br />

areas in the content field. Through <strong>CNGL</strong> carrying out<br />

world-class research and commercialisation activities,<br />

we are promoting Ireland as a global leader in the<br />

localisation industry. Below is an overview of activities<br />

under each Programme.<br />

Overview of <strong>CNGL</strong> Education and Outreach<br />

Reach and Impact<br />

Education and Human Capital Development<br />

Strategic Marketing and Communications<br />

Education and Human Capital Development<br />

The aim of our Education Programme is to provide<br />

education and promote career opportunities in key areas<br />

of content intelligence, computer science and language<br />

technology. We aim to engage young people in these<br />

areas to build a strong Irish base of future computer<br />

scientists in content related industries.<br />

<strong>CNGL</strong> offers a comprehensive programme of education<br />

programmes aimed at all age-groups ranging from<br />

courses for primary school students, secondary<br />

school programmes, undergraduate and postgraduate<br />

programmes, to internal professional development for<br />

our <strong>CNGL</strong> researchers and staff. Above is an overview<br />

of the education programme’s aims for each target level.<br />

Education and Human Capital Development<br />

Highlights from <strong>2012</strong><br />

Fourth Level Education: <strong>CNGL</strong> supports a number of<br />

seminar series across individual component research<br />

disciplines, including a popular series with the National<br />

Centre for Language Technology, seminars hosted by<br />

each of the member research groups and the Dublin<br />

Computational Linguistics Research Seminars series.<br />

<strong>CNGL</strong> operates internal member-focused training<br />

programmes on presentation skills, Intellectual Property,<br />

commercialisation and entrepreneurship and project<br />

management. <strong>CNGL</strong> also provides “101” sessions for all<br />

staff on key <strong>CNGL</strong> topics. PhD students are also given<br />

opportunities to undertake an internship with industry<br />

partners.<br />

Eleven visiting MSc and PhD interns joined ILT<br />

over a period of five months in <strong>2012</strong>, under <strong>CNGL</strong>’s<br />

postgraduate internship programme. The programme<br />

enables students to gain valuable experience as part of a<br />

highly-regarded and continually-growing research centre.<br />

This year’s programme attracted interns from institutions<br />

across the globe, including Italy, France, China and India.<br />

The internships covered a wide range of topics in Natural<br />

Language Processing and Machine Translation.<br />

Education Programme Aims and Targets<br />

Encourage ICT and Language awareness<br />

Promote study of STEM disciplines<br />

Promote focus on <strong>CNGL</strong><br />

research topics<br />

Preparing graduates<br />

for careers<br />

Career Opportunities<br />

Primary Level<br />

Second Level<br />

Third Level<br />

Fourth Level<br />

In partnership with DCU and the National Centre for<br />

Language Technology, <strong>CNGL</strong> was successful in a Marie<br />

Curie Mobility grant application for the EXPERT PhD<br />

Graduate School with a total of 15 PhD Marie Curie<br />

fellowships (two of them at DCU) and three postdoctoral<br />

researchers. EXPERT comprises DCU and five other<br />

university partners and five industry partners. It focuses<br />

on empirical approaches to (machine) translation, and<br />

as part of their training PhD students will spend time at<br />

DCU’s EXPERT university and industry partners.


108<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

Table 1<br />

Project name Student Supervisor Track<br />

Using Biometric Response to Locate Personally Interesting Digital Content Robert Lis Liadh Kelly DCM<br />

Implementing new methods for speech retrieval Tom Mason Liadh Kelly DCM<br />

Exploring Personalised and Collaborative Information Retrieval Paul Redmond Liadh Kelly DCM<br />

Visualisation of Topic Models Conor O’Gorman Liadh Kelly DCM<br />

Crowd-sourcing for query development and relevance judgment Ciaran Porter Liadh Kelly DCM<br />

Communications and Education Siobhan O’Mara Cara Greene/<br />

Laura Grehan<br />

E&O<br />

Facial recognition for real-time content personalisation (Kinect) Thomas Dunne Steve Gotz CM<br />

Facial recognition for real-time content personalisation (Kinect) Emer Hedderman Steve Gotz CM<br />

Building ontology-based content management (OCM) system<br />

James Mark<br />

Hender<br />

Yalemisew<br />

Abgaz<br />

DCM<br />

Generation of interactive infographics from Semantic and Open Data Erika Duriakova Alex O’Connor DCM<br />

Yodle – Generating Presentations from Wikipedia Alla Kovaleva Alex O’Connor DCM<br />

Query-biased summarization Shane McQuillan Gareth Jones DCM<br />

Communications and Education Siobhan Swords Cara Greene/<br />

Laura Grehan<br />

E&O<br />

Real-time Web Annotation Kristo Mikkonen Dominic Jones E&O<br />

Economic Commission for Africa at its Information<br />

Training Centre for Africa in Addis Ababa, Ethiopia.<br />

The aim of the programme is to promote African<br />

languages in the Information Society.<br />

Finally, the LRC Best Thesis Award <strong>2012</strong> was presented<br />

in September to former <strong>CNGL</strong> PhD student Ben<br />

Steichen for his thesis “Adaptive Retrieval, Composition<br />

& Presentation of Closed-Corpus and Open-Corpus<br />

Information”. Katrin Drescher of Award-sponsors<br />

Symantec praised the scientific excellence and industrial<br />

relevance of Ben’s work.<br />

Ms. Aida Opoku-Mensah of the United Nations Economic Commission<br />

for Africa (UNECA) speaks at the launch of University of Limerick’s MSc<br />

in Multilingual Computing and Localisation to be co-hosted by UNECA<br />

in Ethiopia.<br />

Another exciting development on the fourth level<br />

education front was the announcement in November<br />

that University of Limerick’s MSc in Multilingual<br />

Computing and Localisation is to be delivered through<br />

distance learning and co-hosted by the United Nations<br />

Third Level Education: The <strong>CNGL</strong> Undergraduate<br />

Internship Programme continued to attract top students<br />

in <strong>2012</strong>. The primary aim of the <strong>CNGL</strong> undergraduate<br />

internship is to offer exceptional undergraduate students<br />

the opportunity to participate in and contribute to<br />

exciting research projects at <strong>CNGL</strong>. The programme<br />

enables interns to use leading research facilities and<br />

we aim to inspire these students to take the first step<br />

on a path to a research career. It is also an important<br />

opportunity to promote taught Masters programmes<br />

at <strong>CNGL</strong> universities and to host interns at our


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 109<br />

industrial partners. The internships consist of INTRA/<br />

Co-op placements for 6 months, and 8-week summer<br />

internships. Many of those interns go on to do <strong>CNGL</strong>themed<br />

third and fourth year projects with <strong>CNGL</strong><br />

supervisors.<br />

<strong>CNGL</strong> hosted ten undergraduate interns across a wide<br />

range of research areas. Table 1 shows the list of <strong>2012</strong><br />

Undergraduate Summer internships.<br />

<strong>CNGL</strong> is currently creating an online graduate brochure<br />

aimed at third level students with information on the<br />

Taught Masters and PhD programmes available in each<br />

of the <strong>CNGL</strong> universities. The brochure also includes<br />

profiles of our graduated PhD students and former<br />

postdoctoral researchers. The profiles detail <strong>CNGL</strong><br />

alumni education and career paths since graduating<br />

from <strong>CNGL</strong>.<br />

Some of the 450 second level students completed the <strong>CNGL</strong>-supported<br />

‘ComputeTY’ programme at DCU in January <strong>2012</strong><br />

<strong>CNGL</strong> continued to support the ComputeTY <strong>2012</strong><br />

Programme in DCU. ComputeTY students select<br />

one of two streams: Web Design or Introduction to<br />

Programming. The overall content offers a broad<br />

range of computing skills from the creative aspect of<br />

website design to the problem-solving challenges of the<br />

programming stream. 450 students attended the course<br />

over 4 weeks in January <strong>2012</strong> with the same number due<br />

to complete the course in January 2013. Since its launch<br />

in 2005, ComputeTY has been completed by almost<br />

3,500 Transition Year students from Dublin schools.<br />

The programme has a strong track record of recruiting<br />

students to study computing at third level.<br />

<strong>CNGL</strong>’s undergraduate interns showcase the outcomes of their work<br />

at a poster and demo display at DCU<br />

Second Level Education: Secondary school students are<br />

a key demographic for the education programme with<br />

more than 1,500 secondary school students engaging<br />

with <strong>CNGL</strong> education programmes and competitions.<br />

<strong>CNGL</strong> aims to attract students to study fields related to<br />

content intelligence by running programmes that foster<br />

key problem-solving skills that are needed for<br />

this industry.<br />

The outstanding success of the Education Programme is<br />

the <strong>CNGL</strong> All Ireland Linguistics Olympiad (AILO). Over<br />

3,500 secondary school students from 167 schools in<br />

the Republic of Ireland and Northern Ireland have taken<br />

part in AILO since the first competition in 2009. The<br />

competition challenges secondary school students to<br />

apply logic and computational thinking to solve complex<br />

puzzles in unfamiliar languages. Past participants have<br />

gone on to pursue studies in computer science, maths<br />

and linguistics at third level, which suggests that the<br />

competition is meeting its goal of fostering the next<br />

generation of problem solvers.<br />

More than 400 students from 44 schools in 23 counties<br />

competed in the preliminary round of AILO <strong>2012</strong>. The top<br />

100 performers were allocated a <strong>CNGL</strong> researcher, who<br />

acted as a tutor for the national final in March at DCU.


110<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

The top four individual students went on to represent<br />

Ireland at the International Linguistics Olympiad (ILO) in<br />

Slovenia in July <strong>2012</strong>.<br />

Also targeting the second level market is <strong>CNGL</strong>’s<br />

Language Trap: An Adaptive Language Learning Video<br />

Game. The game was initially designed to aid students<br />

in preparing for the Leaving Certificate German Oral<br />

Examinations by means of an iteractive dialogue<br />

system. The Irish language version of the game has<br />

been evaluated with schools. The German game is now<br />

available on http://seriousgames.cs.tcd.ie/.<br />

<strong>CNGL</strong> promoted its research and education programmes at the SFI<br />

booth at the BY Young Scientist & Technology Exhibition in January<br />

Education Programme plans for 2013 will focus on<br />

the transition to a second cycle of funding for <strong>CNGL</strong>,<br />

in which the Centre will pioneer the concept “Global<br />

Intelligent Content”. Opportunities in this area include<br />

app competitions for secondary school students, and<br />

establishing a Masters programme in Intelligent Content.<br />

Strategic Marketing and Communications<br />

Ms. Mary Mitchell-O’Connor, T.D. attended the national final of the All<br />

Ireland Lingusitics Olympiad at DCU. Deputy Mitchell-O’Connor urged<br />

students to use their aptitude for problem-solving to pursue careers at<br />

the intersection of computing, language and linguistics<br />

The <strong>CNGL</strong> Education Programmes are complemented<br />

by Transition Year internships in the <strong>CNGL</strong> labs and<br />

by the high-quality Careers Brochure focused on the<br />

commercial career opportunities at the intersection of<br />

Computing, Languages, Culture and Business. The guide<br />

was distributed to guidance counsellors in 729 secondary<br />

schools. <strong>CNGL</strong> exhibited at the BT Young Scientist <strong>2012</strong><br />

competition in the RDS in January <strong>2012</strong>. Students got<br />

the chance to try out demos and also test their problemsolving<br />

skills with AILO puzzles.<br />

<strong>CNGL</strong>’s Outreach Programme aims to highlight <strong>CNGL</strong><br />

achievements, to engage with the public and to promote<br />

Ireland as a world leader in localisation. The programme<br />

spans public relations and marketing, to hosting industry<br />

and academic events, publishing ‘Localisation Focus – the<br />

International Journal of Localisation’, and attending the<br />

BT Young Scientist and Technology Exhibition.<br />

Strategic Marketing and Communications<br />

Higlights from <strong>2012</strong><br />

<strong>CNGL</strong> has raised its media profile with 84 media<br />

mentions recorded in <strong>2012</strong>. The <strong>CNGL</strong> newsletter<br />

was published on quarterly basis and has proved an<br />

effective means through which to communicate <strong>CNGL</strong><br />

news, events, success stories and researcher profiles to<br />

government, media, industry and academic stakeholders.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 111<br />

VOL 01 ISSUE<br />

07<br />

QUARTER 2 <strong>2012</strong><br />

Mr. Steve Gotz, <strong>CNGL</strong><br />

Commercial Development<br />

Manager, H.E. Mr. John<br />

Neary, Ambassador of<br />

Ireland to Japan, Dr.<br />

Páraic Sheridan, Associate<br />

Director, <strong>CNGL</strong>, and Ms.<br />

Diane Foley, IDA Ireland<br />

Deputy-Director Japan<br />

<strong>CNGL</strong>News<br />

News<br />

QUARTERLY NEWSLETTER OF THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>)<br />

this issue<br />

Headline News P.1-3<br />

Partnerships & Commercialisation P.4-5<br />

Education & Outreach P.6<br />

Research Track Updates P.7-8<br />

News: In Brief P.9<br />

<strong>CNGL</strong> People P.10<br />

Conferences & Workshops P.11-12<br />

Upcoming Events P.13<br />

Irish Ambassador to Japan hosts <strong>CNGL</strong> Seminar in Tokyo<br />

T<br />

Subscribe to<br />

<strong>CNGL</strong>News!<br />

represented at the event by Mr. Steve<br />

he Centre for Next Generaon Localisaon (<strong>CNGL</strong>) delivered a with Dai Nippon Prinng (DNP). The Tokyo seminar aracted representaves from a further 14 Japanese-based<br />

Gotz, <strong>CNGL</strong>’s Commercial Development<br />

Manager. The seminar concluded<br />

with a networking recepon, which<br />

seminar to Japanese businesses companies, who were<br />

“The event aracted<br />

produced some promising leads.<br />

in Tokyo in April, which was hosted by the Irish Ambassador to Japan and officially welcomed by H.E.<br />

Mr. John Neary, many new contacts for<br />

Japan”<br />

Mr. Derek Fitzgerald of IDA Ireland<br />

aracted<br />

commented that the event facilitated by IDA Ireland’s Japan Office. Ambassador of Ireland to IDA Ireland in variety of high-level<br />

a good professionals and, in many cases,<br />

Japan.<br />

The event, which was held at the aimed to<br />

these were new contacts for IDA<br />

residence, Mr. Derek Fitzgerald, IDA<br />

Ambassador’s Ireland in Japan.<br />

highlight opportunies for Japanese companies to engage with <strong>CNGL</strong>’s team of more than 150 researchers and to reinforce Ireland’s status as a world and Ireland Director Japan and<br />

Ms. Diane Foley, IDA Ireland Deputy-Director Japan presented an overview of Irish research and development.<br />

The seminar marked the start of a<br />

series of meengs which <strong>CNGL</strong> aended<br />

with individual companies, including<br />

leader in the fields of localisaon Dr. Páraic Sheridan, <strong>CNGL</strong> Associate partner DNP, in Japan over two weeks in<br />

global content.<br />

Director, introduced aendees to <strong>CNGL</strong>’s<br />

also April.<br />

<strong>CNGL</strong> already has strong links with research programme. <strong>CNGL</strong> was Japan through its industry partnership C<br />

- Mr. Derek Fitzgerald,<br />

IDA Ireland Director, Japan<br />

<strong>CNGL</strong>: Contributing to a Strategic Research Agenda for Europe<br />

<strong>CNGL</strong> Director advocates support of technologies for data access across languages<br />

assessing the key<br />

aended by 1,150 delegates. A further 4,000 idenfying and challenges to delivering the benefits of a<br />

NGL is influencing Europe’s<br />

people followed through live web stream and through digital society and economy to Europe’s<br />

23<br />

strategic research agenda<br />

more than 1,000 contributed acvely cizens”, says van Genabith. “With through its engagement with<br />

official languages in the EU alone, it is<br />

the Digital Agenda for Europe iniave.<br />

social media.<br />

vital that we connue to develop<br />

Digital Agenda<br />

Headed by European Nellie Kroes, Digital Agenda is<br />

technologies to enable cizens and<br />

<strong>CNGL</strong> Director, Prof. Josef van Genabith last month addressed the European Commission’s Digital Agenda Assembly <strong>2012</strong> (DAA12) on the Commissioner Europe’s strategy for a flourishing digital<br />

economy by 2020. It outlines policies and<br />

to maximise the benefits of the Digital<br />

companies to access digital content in<br />

their own language”, adds van Genabith.<br />

A recorded stream of DAA12 will be<br />

substanal benefits to be derived from of advanced<br />

acons Revoluon for all, and will help to shape the<br />

Programme for<br />

available shortly at daa.ec.europa.eu<br />

the development for the access and EU’s Horizon 2020 Framework Innovaon.<br />

technologies exploitaon of data across languages. The Assembly, which was hosted in<br />

Brussels on 21st-22nd June, was Research and “<strong>CNGL</strong> is among the stakeholders involved in<br />

www.cngl.ie<br />

1<br />

The Centre’s international reach has been enhanced<br />

through closer engagement with international<br />

organisations including the Globalization and<br />

Localization Association (GALA). A new industry<br />

prospectus is in production, and this will support the<br />

Centre’s drive to attract additional industry partners and<br />

clients.<br />

The <strong>CNGL</strong> quarterly newsletter, available in both e-zine and print format<br />

The Marketing and Communications Officer has worked<br />

closely with the Centre’s Commercial Development<br />

Manager to strenghten industry outreach efforts.<br />

Significant progress has been made on the customer<br />

relationship management front, including further<br />

development of <strong>CNGL</strong>’s mailing list, which now includes<br />

over 2,000 subscribers. <strong>CNGL</strong> exhibited at a significant<br />

number of industry and commercialisation events,<br />

including Localization World (in Seattle, USA in October),<br />

DCU Tech Transfer Exhibition in June, and Enterprise<br />

Ireland’s ‘Big Ideas’ showcase in November. <strong>CNGL</strong> also<br />

presented a panel on Global Content Intelligence at<br />

the Gilbane Conference in Boston in November, and a<br />

seminar for Japanese Business which was hosted by the<br />

Ambassador of Ireland to Japan and supported by IDA<br />

Ireland Japan in April.<br />

<strong>CNGL</strong> booth at Localization World Seattle in October<br />

<strong>CNGL</strong> continued to host conferences and workshops<br />

for the international research community this year<br />

in the computational linguistics, digital content<br />

management and localisation areas. The 17th<br />

<strong>Annual</strong> LRC Internationalisation and Localisation<br />

Conference took place in Limerick in September<br />

with 70 participants from localisation companies and<br />

academia. The conference was collocated with the<br />

<strong>2012</strong> <strong>CNGL</strong> Localisation Innovation Showcase, which<br />

has now been established as a “must attend” event<br />

for professionals in Ireland involved in localistion and<br />

multilingual customer care. The keynote address was<br />

this year delivered by Dr. Thomas Arend, International<br />

Product Lead at Twitter.<br />

Irish Times coverage of <strong>CNGL</strong>’s work on sign language machine<br />

translation(left) and opinion piece by Prof. Josef van Genabith in the<br />

Irish Independent (right)


112<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

Feature on localisation careers in ‘Education’ magazine<br />

<strong>CNGL</strong>’s Localisation Innovation Showcase was collocated with the 17th<br />

<strong>Annual</strong> LRC Internationalisation and Localisation Conference in Limerick<br />

Other significant scientific events organised by <strong>CNGL</strong> in<br />

<strong>2012</strong> include the Interntational Postgraduate Conference<br />

in Translating and Interpreting, the Workshop on<br />

Innovation and Applications in Speech Technology,<br />

and the Workshop on Best Practices in Post-editing<br />

(in assocation with the Translation Automation Users’<br />

Society) at Localization World in Paris. The Centre<br />

was successful in its bid to bring COLING 2014, one of<br />

the world’s largest and most influential computational<br />

linguistics conferences, to Dublin in 2014.<br />

<strong>CNGL</strong>’s strong second-level education programmes were<br />

this year strengthened significantly by the production<br />

of a guide to ‘Careers in Next Generation Localisation’.<br />

This high-quality brochure focuses on commercial<br />

career opportunities at the intersection of Computing,<br />

Languages, Culture and Business. The guide was<br />

distributed to guidance counsellors in 729 secondary<br />

schools and has generated over 1,400 unique views<br />

of our careers web page to date. The brochure was<br />

launched in February by Mr. Seán Sherlock, T.D., Minister<br />

for Research and Innovation, and generated substantial<br />

media interest including spreads in ‘Education’ magazine<br />

and ‘Guideline’ – the official magazine of the Institute of<br />

Guidance Counsellors.<br />

The social impact of <strong>CNGL</strong>’s research programmes<br />

is evident in its social spinout activity, The Rosetta<br />

Foundation. The Foundation now has more than 2,600<br />

registered volunteer translators and the number of NGO<br />

partners increased fourfold during <strong>2012</strong>, allowing it to<br />

further its goal of facilitating access to information and<br />

knowledge to those who really need it. The Rosetta<br />

Foundation’s first NGO partner, Special Olympics – the<br />

world’s largest sports organisation for children and adults<br />

with intellectual disabilities – remained the most active in<br />

<strong>2012</strong>, with over fifty translation projects submitted. Other<br />

partners benefitting from the work of the Foundation’s<br />

volunteers include Community Eye Journal, The World<br />

Association of Girl Guides and Girl Scouts, Ruhama and<br />

Trócaire.<br />

Mr. Seán Sherlock T.D., Minister for Research and Innovation and Prof<br />

Josef van Genabith, Director of <strong>CNGL</strong> pictured at the launch of <strong>CNGL</strong>’s<br />

Next Generation Localisation careers guide in February


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 113<br />

<strong>CNGL</strong> coordinated Thesis in Three <strong>2012</strong> with the<br />

Systems Biology Ireland and CLARITY research centres.<br />

The aim of the competition is for PhD students to give<br />

an elevator pitch for the PhD thesis. Three slides in<br />

just three minutes. On the night, centre directors and<br />

principal investigators also delivered elevator pitches for<br />

their research centres. The event, held in collaboration<br />

with Innovation Dublin, attracted an audience of more<br />

than two hundred. The night celebrated the best of Irish<br />

science and innovation in bite-sized chunks.<br />

Plans for 2013<br />

Strategic marketing and communication plans for 2013<br />

will focus on the transition to a second cycle of funding<br />

for <strong>CNGL</strong>, in which the Centre will pioneer the concept<br />

“Global Intelligent Content”. Creation of a new brand<br />

for <strong>CNGL</strong> is already underway. This brand will reflect<br />

the broadening of the Centre’s research programme and<br />

will reflect the Centre’s greater emphasis on industrial<br />

engagement. A new website that communicates our<br />

vision of global intelligent content is in train, and the new<br />

branding will be rolled out across a suite of marketing<br />

materials designed to support business development<br />

efforts.<br />

Significant events planned for 2013 include SIGIR<br />

2013 – the 36th <strong>Annual</strong> ACM SIGIR Conference, which<br />

<strong>CNGL</strong> will co-host in July/August 2013, and Think Latin<br />

America, which will take place at Carton House, Kildare<br />

in April 2013.<br />

Jonathan McCrea, host of Newstalk’s ‘Futureproof’ show, is MC<br />

for Thesis in 3


Appendices


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 115<br />

Appendix 1: People and Partnerships<br />

CSET RESEARCH TEAMS<br />

Team Members Associated with the CSET During the <strong>Report</strong>ing Period<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Yalemisew Abgaz PhD DCU DCM MSc M Ethiopian Yes Dr Claus Pahl<br />

Mohamed Abou-Zleikha PhD UCD ILT MSc M Syrian Yes Prof Julie Carson-<br />

Berndsen<br />

Zeeshan Ahmed PhD UCD ILT MSc M Pakistani Yes Prof Julie Carson-<br />

Berndsen<br />

Dimitra Anastasiou Postdoctoral<br />

Researcher<br />

Lamine Aouad Postdoctoral<br />

Researcher<br />

Ruwan<br />

Asanka<br />

Wasala<br />

UL LOC PhD F Greek Yes Mr Reinhard Schäler<br />

UL LOC PhD M Algerian Yes Mr Reinhard Schäler<br />

PhD UL LOC MSc M Sri Lankan No Mr Reinhard Schäler<br />

Akshat Bakliwal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Renu Balyan PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Pratyush Banerjee PhD DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Jonathan Barr Graphics<br />

Designer<br />

DCU E&O BA M Irish Yes Prof Josef van<br />

Genabith<br />

Hanna Béchara PhD DCU ILT BA F Irish Yes Prof Josef van<br />

Genabith<br />

Urvesh Bhowan Research<br />

Assistant<br />

TCD DCM MSc M South Afican Yes Prof Vincent Wade<br />

Arianna Bisazza PhD Intern DCU ILT MSc M Italian Yes Prof Josef van<br />

Genabith<br />

Anton Bryl Postdoctoral<br />

Researcher<br />

DCU ILT PhD M Belarussian Yes Prof Josef van<br />

Genabith<br />

Jim Buckley Co-Supervisor UL LOC PhD M Irish No N/A<br />

Joao Cabral Postdoctoral<br />

Researcher<br />

Nick Campbell Co-Principal<br />

Investigator<br />

Julie<br />

Carson-<br />

Berndsen<br />

Co-Principal<br />

Investigator<br />

Özlem Çetinoglu Postdoctoral<br />

Researcher<br />

UCD ILT PhD M Portugese Yes Prof Julie Carson-<br />

Berndsen<br />

TCD ILT PhD M British No N/A<br />

UCD ILT DPhil F Irish No N/A<br />

DCU ILT PhD F Turkish Yes Prof Josef van<br />

Genabith<br />

Yi Chen PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />

Yvonne Cleary Co-Supervisor UL LOC PhD F Irish No N/A<br />

JJ Collins Co-Supervisor UL LOC PhD M Irish No N/A<br />

Declan Dagger Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade


116<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Sandipan Dandapat Postdoctoral<br />

Researcher<br />

Domenico De Feo Research<br />

Assistant<br />

Gavin Doherty Co-Principal<br />

Investigator<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

DCU ILT PhD M Indian Yes Prof Josef van<br />

Genabith<br />

TCD DCM MSc M Italian Yes Prof Vincent Wade<br />

TCD SF PhD M Irish No N/A<br />

Amelie Dorn PhD TCD ILT MSc F French Yes Prof Ailbhe Ní<br />

Chasaide<br />

Thomas Dunne Intern DCU CM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Erika Duriak Intern TCD DCM UnderGrad F Slovakian Yes Prof Josef van<br />

Genabith<br />

Mohammed<br />

Rami<br />

ElHussein<br />

Ghorab<br />

Martin Emms Co-Principal<br />

Investigator<br />

PhD TCD DCM MSc M Egyptian Yes Prof Vincent Wade<br />

TCD ILT PhD M Irish No N/A<br />

Maria Eskevich PhD DCU DCM Msc F Russian Yes Prof Gareth Jones<br />

Chris Exton Co-Supervisor UL LOC PhD M Australian/Irish No N/A<br />

David Filip Postdoctoral<br />

Researcher<br />

UL LOC PhD M Czech Yes Mr Reinhard Schäler<br />

Ríona Finn Administrative DCU CM MSc F Irish Yes N/A<br />

Hector Hugo Franco Penya PhD TCD ILT BSc M Spanish Yes Dr Martin Emms<br />

Brian Gallagher Technician TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Debasis Ganguly PhD DCU DCM MTech M Indian Yes Dr Gareth Jones<br />

Solomon Gizaw PhD UL LOC MSc M Ethiopian Yes Mr Reinhard Schäler<br />

Christer Gobl Co-Principal<br />

Investigator<br />

Yvette Graham Postdoctoral<br />

Researcher<br />

TCD ILT PhD M American No N/A<br />

DCU ILT PhD F Irish Yes Prof Josef van<br />

Genabith<br />

Cara Nicole Greene E&O Manager DCU E&O BSc F Irish Yes N/A<br />

Laura Grehan Marketing and<br />

Communications<br />

Officer<br />

Alfredo<br />

Guerra<br />

Maldonado<br />

DCU E&O MSc F Irish Yes N/A<br />

PhD TCD ILT BSc M Mexican/Irish Yes Dr Carl Vogel<br />

Rajat Gupta PhD UL LOC BSc M Indian Yes Mr Reinhard Schäler<br />

Yanfen Hao Postdoctoral<br />

Researcher<br />

UCD DCM PhD M Chinese Yes Dr Tony Veale<br />

Geraldine Harrahill Administrative UL CM FETAC F Irish Yes N/A<br />

Emer Hedderman Intern DCU CM UnderGrad F Irish Yes Prof Josef van<br />

Genabith<br />

James Mark Hender Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Yu Hui PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Muhammad Javed PhD DCU DCM MSc M Pakistani Yes Dr Claus Pahl


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 117<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Gareth Jones Co-Principal<br />

Investigator<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

DCU DCM PhD M British No N/A<br />

Supervisor<br />

Amir Kamran PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

John Kane PhD TCD ILT MPhil M Irish Yes Prof Ailbhe Ní<br />

Chasaide<br />

Mark Kane PhD UCD ILT MSc M Irish Yes Prof Julie Carson-<br />

Berndsen<br />

Bridget Kane Postdoctoral<br />

Researcher<br />

TCD SF PhD F Irish Yes Dr Saturnino Luz<br />

Karl Kelly Administrative UL E&O Grad Dip M Irish Yes N/A<br />

Dorothy Kenny Co-Principal<br />

Investigator<br />

DCU ILT PhD F Irish No N/A<br />

Kevin Koidl PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Alla Kovaleva Intern TCD DCM UnderGrad F Kazakhstan Yes Prof Josef van<br />

Genabith<br />

Ru Kuang PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Sudip Kumar Naskar Postdoctoral<br />

Researcher<br />

Séamus Lawless Assistant<br />

Professor<br />

DCU ILT PhD M Indian Yes Prof Josef van<br />

Genabith<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade<br />

Madeleine Lenker PhD UL LOC MA F German Yes Mr Reinhard Schäler<br />

Killian Levacher PhD TCD DCM MSc M French/Irish Yes Prof Vincent Wade<br />

Johannes Leveling Postdoctoral<br />

Researcher<br />

David Lewis Funded<br />

Investigator<br />

DCU DCM PhD M German Yes Dr Gareth Jones<br />

TCD SF PhD M English Yes N/A<br />

Wei Li PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />

Junhui Li Postdoctoral<br />

Researcher<br />

DCU ILT PhD M Chinese Yes Prof Josef van<br />

Genabith<br />

Robert Lis Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Qun Liu Co-Principal<br />

Investigator<br />

Luca Longa Research<br />

Assistant<br />

Alejandra<br />

Lopez<br />

Fernandez<br />

DCU ILT PhD M Chinese No N/A<br />

TCD DCM MSc M Italian Yes Prof Vincent Wade<br />

PhD UCD DCM MSc F Mexican Yes Dr Tony Veale<br />

Juan Luo PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Saturnino Luz Co-Principal<br />

Investigator<br />

TCD SF PhD M Brazilian No N/A<br />

Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr Carl Vogel<br />

Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr. Carl Vogel<br />

Walid Magdy Postdoctoral<br />

Researcher<br />

DCU DCM PhD M Egyptian Yes Dr Gareth Jones


118<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Fiona Maguire Finance<br />

Administrative<br />

Liliana<br />

Mamani-<br />

Sanchez<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

DCU CM CIMA F Irish Yes N/A<br />

Supervisor<br />

PhD TCD ILT MSc F Peruvian Yes Dr Carl Vogel<br />

Tom Mason Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Sophie Matabaro Centre<br />

Administrative<br />

DCU CM F Irish Yes N/A<br />

John McAuley PhD TCD SF MPhil M Irish Yes Dr David Lewis<br />

Eithne McCann PA to Director DCU E&O National<br />

Cert<br />

F Irish Yes N/A<br />

Hilary McDonald Project Manager TCD CM MSc F Irish Yes N/A<br />

Shane McQuillan Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Kristos Mikkonen Intern TCD E&O UnderGrad M Finnish Yes Dr David Lewis<br />

Jinming Min PhD DCU DCM MSc M Chinese Yes Dr Gareth Jones<br />

Joss Moorkens Postdoctoral<br />

Researcher<br />

Lucía<br />

Morado<br />

Vásquez<br />

DCU LOC PhD M Irish Yes Dr Sharon O’Brien<br />

PhD UL LOC MSc F Spanish Yes Mr Reinhard Schäler<br />

John Moran PhD TCD SF BSc M Irish Yes Dr David Lewis<br />

Erwan Moreau Postdoctoral<br />

Researcher<br />

TCD ILT PhD M French Yes Dr Carl Vogel<br />

Aram Morera Mesa PhD UL LOC Grad Dip M Spanish Yes Mr Reinhard Schäler<br />

Sara Morrissey Postdoctoral<br />

Researcher<br />

DCU ILT PhD F Irish Yes Prof Josef van<br />

Genabith<br />

Catherine Mulwa PhD TCD DCM MSc F Kenyan Yes Prof Vincent Wade<br />

Dat Tien Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />

Genabith<br />

Dat Quoc Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />

Genabith<br />

Ailbhe Ní Chasaide Co-Principal<br />

Investigator<br />

TCD ILT PhD F Irish No N/A<br />

Neasa Ní Chiaráin PhD TCD ILT MSc F Irish Yes Prof Ailbhe Ní<br />

Chasaide<br />

Naoto Nishio PhD UL LOC Grad Dip M Japanese Yes Mr Reinhard Schäler<br />

Conor O Gorman Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Siobhan O Mara Research<br />

Assistant<br />

Sharon O’Brien Co-Principal<br />

Investigator<br />

DCU E&O BA F Irish Yes Prof Josef van<br />

Genabith<br />

DCU ILT PhD F Irish No N/A<br />

Eoin Ó’Conchuir Technician UL CM PhD M Irish Yes Mr Reinhard Schäler<br />

Alexander O’Connor Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 119<br />

First<br />

Name<br />

Udochukwu<br />

Kalu<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Ogbureke PhD UCD ILT MPhil M Nigerian Yes Prof Julie Carson-<br />

Berndsen<br />

Ian O’Keeffe Postdoctoral<br />

Researcher<br />

Declan O’Sullivan Co-Principal<br />

Investigator<br />

Claus Pahl Co-Principal<br />

Investigator<br />

UL LOC PhD M Irish Yes Mr Reinhard Schäler<br />

TCD DCM PhD M Irish No N/A<br />

DCU DCM PhD M German No N/A<br />

Santanu Pal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Ciaran Porter Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Enda Quigley PhD UL LOC BSc M Irish Yes Mr Reinhard Schäler<br />

Paul Redmond Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Corentin Ribeyre PhD Intern DCU ILT MSc M French Yes Prof Josef van<br />

Genabith<br />

Stephen Roantree IP Manager DCU CM Grad Dip M Irish Yes N/A<br />

Ilana Rozanes PhD TCD SF MSc F American Yes Dr Saturnino Luz<br />

Lorcan Ryan PhD UL LOC MSc M Irish Yes Mr Reinhard Schäler<br />

Melike Sah Postdoctoral<br />

Researcher<br />

Reinhard Schäler Lead Principal<br />

Investigator<br />

TCD DCM PhD F Cypriot Yes Prof Vincent Wade<br />

UL LOC MSc M German No N/A<br />

Stephan Schlögl PhD TCD SF MSc M Austrian Yes Dr Saturnino Luz<br />

Anne Schneider PhD TCD SF BSc F German Yes Dr Saturnino Luz<br />

Mary Sharp Co-Principal<br />

Investigator<br />

Páraic Sheridan Associate<br />

Director<br />

TCD DCM BSc F Irish No N/A<br />

DCU CM PhD M Irish Yes N/A<br />

Harold Somers E&O DCU E&O PhD M British Yes N/A<br />

Brendan Spillane PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Ben Steichen Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Luxembourgish Yes Prof Vincent Wade<br />

Siobhan Swords Intern DCU E&O UnderGrad F Irish Yes Prof Josef van<br />

Genabith<br />

Eva Szekely PhD UCD ILT MA F Hungarian Yes Prof Julie Carson-<br />

Berndsen<br />

Josef van Genabith Lead Principal<br />

Investigator<br />

Tony Veale Co-Principal<br />

Investigator<br />

Carl Vogel Co-Principal<br />

Investigator<br />

DCU CM PhD M German Yes N/A<br />

UCD DCM PhD M Irish No N/A<br />

TCD ILT PhD M American No N/A<br />

Joris Vreeke Programmer DCU E&O M Dutch Yes Prof Josef van<br />

Genabith


120<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Vincent Wade Lead Principal<br />

Investigator<br />

Joachim Wagner Systems<br />

Administrator<br />

Andy Way Lead Principal<br />

Investigator<br />

Xiaofeng Wu Postdoctoral<br />

Researcher<br />

Irena Yanushevskaya Postdoctoral<br />

Researcher<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

TCD DCM PhD M Irish No N/A<br />

Supervisor<br />

DCU CM MA M German Yes Prof Josef van<br />

Genabith<br />

DCU ILT PhD M British No N/A<br />

DCU ILT PhD M Chinese Yes Prof Josef Van<br />

Genabith<br />

TCD ILT PhD F Russian Yes Prof Ailbhe Ní<br />

Chasaide<br />

Amalia Zahra PhD UCD ILT BSc F Indonesian Yes Prof Julie Carson-<br />

Berndsen<br />

Dong Zhou Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Chinese Yes Prof Vincent Wade<br />

Affiliated Members and Collaborators Not Receving Funds<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Hala Al Maghout Postdoctoral<br />

Researcher<br />

Mohammed Attia Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD F Syrian No Prof Josef van<br />

Genabith<br />

DCU Affiliated PhD M Egyptian No Prof Josef Van<br />

Genabith<br />

Eoin Bailey PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />

Ergun Bicicci Postdoctoral<br />

Researcher<br />

Peter Cahill Co-Principal<br />

Investigator<br />

DCU Affiliated PhD M Cypriot No Prof Josef van<br />

Genabith<br />

UCD Affiliated PhD M Irish No Prof Julie Carson-<br />

Berndsen<br />

Oscar Cassetti PhD TCD Affiliated MSc M Italian No Dr Saturnino Luz<br />

Alexandru Ceausu Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Romanian No Dr Páraic Sheridan<br />

Yi Chen PhD DCU Affiliated PhD F Chinese No Prof Gareth Jones<br />

Owen Conlan Assistant Professor TCD Affiliated PhD M Irish No NA<br />

Seamus Coogan Marketing Lead TCD Affiliated BSc M Irish No Prof Vincent Wade<br />

Stephen Curran Research<br />

Assistant/<br />

Programmer<br />

Aswarth Dara Postdoctoral<br />

Researcher<br />

Stephen Doherty Postdoctoral<br />

Researcher<br />

TCD Affiliated MSc M Irish Yes Dr David Lewis<br />

DCU Affiliated PhD M Indian No Prof Josef van<br />

Genabith<br />

DCU Affiliated PhD M Irish No Prof Josef van<br />

Genabith<br />

David Faherty Research Assistant TCD Affiliated BSc M Irish No Prof Vincent Wade<br />

Leroy Finn Programmer TCD Affiliated MSc M Irish No Dr David Lewis<br />

Frank Fowley Research Assistant DCU Affiliated MSc M Irish No Dr Claus Pahl<br />

Manisha Ganguly Programmer DCU Affiliated F Indian No Prof Gareth Jones


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 121<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Federico Gaspari Postdoctoral<br />

Researcher<br />

Anton Gerdelan Postdoctoral<br />

Researcher<br />

Lorraine Goeuriot Postdoctoral<br />

Researcher<br />

Steve Gotz Commercialisation<br />

Manager<br />

Declan Groves Research<br />

Integration Officer<br />

Cormac Hampson Postdoctoral<br />

Researcher<br />

Deirdre Hogan Postdoctoral<br />

Researcher<br />

Dominic Jones Postdoctoral<br />

Researcher<br />

John Judge Postdoctoral<br />

Researcher<br />

Liadh Kelly Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Italian No Prof Josef van<br />

Genabith<br />

TCD Affiliated PhD M New Zealand No Dr Dave Lewis<br />

DCU Affiliated MSc F French No Dr Gareth Jones<br />

DCU Affiliated MSc M American No N/A<br />

DCU Affiliated PhD M Irish No N/A<br />

TCD Affiliated PhD M No Prof Vincent Wade<br />

DCU Affiliated Phd F Irish No N/A<br />

TCD Affiliated MSc M British No Dr David Lewis<br />

DCU Affiliated PhD M Irish No Prof Josef Van<br />

Genabith<br />

DCU Affiliated MSc F Irish No Dr Gareth Jones<br />

Alex Killen Programmer DCU Affiliated BSc M Irish No Prof Josef van<br />

Genabith<br />

Kris McGlinn Research Assistant TCD Affiliated MSc M Irsh No Dr David Lewis<br />

Brenda McGuirk Project<br />

Co-ordinator<br />

TCD Affiliated F Irish No Prof Vincent Wade<br />

Gavin<br />

Mendel-<br />

Gleeson<br />

Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Irish No Dr Deirdre Hogan<br />

Sebastian Molines Research Assistant TCD Affiliated MSc M French Yes Dr David Lewis<br />

Adam Moore Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Lynda O Donovan Pedagogical Lead TCD Affiliated MSc F Irish No Prof Vincent Wade<br />

Ian O’Keeffe Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Tsuyoshi Okita PhD DCU Affiliated MSc M Japanese No Prof Josef Van<br />

Genabith<br />

Neil Peirce Research Assistant TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Raphael Rubino Postdoctoral<br />

Researcher<br />

Rasoul Samad Zadeh Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M French No Dr Jennifer Foster<br />

DCU Affiliated PhD M Iranian No Dr Jennifer Foster<br />

Eduardo Shanahan Programmer DCU Affiliated BSc M Argentina No Prof Josef van<br />

Genabith<br />

Ankit Srivastava Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Indian No Prof Josef van<br />

Genabith


122<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Thanos Staikopolous Postdoctoral<br />

Researcher<br />

John Tinsley Project Coordinator<br />

Antonio Toral Postdoctoral<br />

Researcher<br />

Lamia Tounsi Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Greek No Prof Vincent Wade<br />

DCU Affiliated PhD M Irish No Dr Páraic Sheridan<br />

DCU Affiliated PhD M Spanish No Prof Andy Way<br />

DCU Affiliated PhD F Algerian No Prof Josef Van<br />

Genabith<br />

Eddie Walsh PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />

Rachel Wrafter Postdoctoral<br />

Researcher<br />

Lei Xu Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD F Irish No Prof Vincent Wade<br />

DCU Affiliated PhD M Chinese No Dr Claus Pahl<br />

Hong Yi Wang Research Assistant DCU Affiliated BSc F Chinese No Dr Deirdre Hogan<br />

Bilal Yousuf PhD TCD Affiliated MSc M No Prof Vincent Wade<br />

Jian Zhang Technician DCU Affiliated BSc M Chinese No Dr Páraic Sheridan<br />

Industry Partners and Contact Names<br />

Industry Partners<br />

Contact<br />

Organisation<br />

Type<br />

Organisation<br />

Name<br />

Location<br />

Date joined<br />

CSET<br />

Date departed First Name Surname Position<br />

SME<br />

MNC<br />

Alchemy Software<br />

Development<br />

Dai Nippon<br />

Printing<br />

Dublin, Ireland 04/12/2007 N/A Enda McDonnell Director of Engineering<br />

Tokyo, Japan 04/12/2007 N/A Takeshi Fukunaga Advisor of Headquarters<br />

MNC IBM Dublin, Ireland 04/12/2007 N/A Brian O’Donovan Program Director,<br />

IBM Dublin Centre for<br />

Advanced Studies<br />

MNC Microsoft Dublin, Ireland 04/12/2007 N/A Dag Schmidtke Program Manager for<br />

Language Technology<br />

Strategy<br />

MNC SDL Wicklow, Ireland 04/12/2007 N/A Paul McManus General Manager<br />

SME SpeechStorm Belfast, Northern<br />

Ireland<br />

04/12/2007 N/A Oliver Lennon Chief Executive Officer<br />

MNC Symantec Dublin, Ireland 04/12/2007 N/A Fred Hollowood Research Director,<br />

Shared Engineering<br />

Services<br />

MNC<br />

CAPITA (Formerly<br />

Applied Language<br />

Solutions)<br />

Manchester, U.K. 04/12/2007 N/A Gavin Wheeldon Chief Executive Officer<br />

SME VistaTEC Dublin, Ireland 04/12/2007 N/A Phil Ritchie Chief Technology Officer<br />

MNC Welocalize Dublin, Ireland 23/02/2011 N/A Derek Coffey Vice President,<br />

Technology and<br />

Professional Services


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 123<br />

Governance Committee Members<br />

Role First Name Surname Organisation Position<br />

Chair David MacDonald IMS Maxims Chairman<br />

Member Alan Harvey DCU Vice President for Research<br />

Member Vinny Cahill TCD Dean of Research<br />

Member Gearóid Mooney Enterprise Ireland Director, Informatics Research and Commercialisation<br />

Member Phil Ritchie VistaTEC Chief Technical Officer<br />

Member Aidan Sweeney IBEC R&D Policy Executive<br />

Member Josef van Genabith DCU <strong>CNGL</strong> Director<br />

In Attendance Páraic Sheridan DCU <strong>CNGL</strong> Associate Director<br />

In Attendance Vincent Wade TCD <strong>CNGL</strong> Deputy Director<br />

Scientific Advisory Board Members<br />

Role First Name Surname Organisation Position<br />

Chair Francis Tsang Adobe Systems Director of Globalisation<br />

Member Andrew Bredenkamp Acrolinx Chief Executive Officer<br />

Member Carol Espy-Wilson University of Maryland, Department of Electrical<br />

and Computer Engineering<br />

Professor<br />

Member Lauri Karttunen Palo Alto Research Center Computational Linguist<br />

Member Makato Nagao NIST President


124<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

Appendix 2: Outputs<br />

PhDs Awarded<br />

Name Nationality Gender Institute<br />

Dominic Jones English M TCD<br />

Joachim Wagner Germany M DCU<br />

Ben Steichen Luxembourgish M TCD<br />

Lucia Morado Vazquez Spanish F UL<br />

Stephen Doherty Irish M UL<br />

Joss Moorkens Irish M DCU<br />

Ian O’Keeffe Irish M TCD<br />

Zohar Etzioni Israeli M TCD<br />

Walid Magdy Egyptian M DCU<br />

Sandipan Dadapat Indian M DCU<br />

Ankit Srivastava Indian M DCU<br />

Pratyush Banerjee Indian M DCU<br />

Hala Al Maghout Syrian F DCU<br />

All CSET Publications<br />

All <strong>CNGL</strong> publications are stored in a central document management system and are available through the institutional<br />

Open Access repositories.<br />

Refereed Conference and Workshop Papers<br />

Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />

Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />

Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />

Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />

Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />

In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />

Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />

In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />

Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy<br />

Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />

of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />

Republic of Korea<br />

Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />

Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />

detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai, India<br />

Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />

24th International Conference on Computational Linguistics, Mumbai, India


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 125<br />

Refereed Conference and Workshop Papers<br />

Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />

Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />

(EAMT-<strong>2012</strong>), Trento, Italy<br />

Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />

Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />

Resources and Evaluation (LREC), Istanbul, Turkey<br />

Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />

(LISTA) Workshop, Edinburgh, UK<br />

Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />

Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />

(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />

Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />

Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />

30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />

Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />

for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />

Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />

Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />

on Pattern Recognition Application and Methods, Portugal<br />

Eskevich, M., Magdy, W., Jones, G.J.F. (<strong>2012</strong>). New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval. ECIR <strong>2012</strong>, pages 170-181<br />

Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />

of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />

Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />

Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />

Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />

France, ACM proceedings 978-1-4503-1229-5/11/04<br />

Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />

Mumbai, India<br />

Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />

Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />

International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Rule-based stemmers for Bengali and Hindi. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum<br />

for Information Retrieval Evaluation, pages 37-42, Kolkata,India, <strong>2012</strong>. ISI.<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@INEX-<strong>2012</strong>: Exploring sentence retrieval for tweet contextualization. In Pamela Forner, Jussi Karlgren,<br />

and Christa Womser-Hacker, editors, CLEF <strong>2012</strong> Evaluation Labs and Workshop, Online Working Notes, 17-20 September, Rome, Italy<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). Technical challenges and design issues in Bangla language processing, chapter Bengali (Bangla) Information<br />

Retrieval. IGI Global, <strong>2012</strong>. (to appear)<br />

Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />

In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />

Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />

Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />

Knowledge Evolution and Ontology Dynamics, Boston, USA<br />

Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in<br />

HMM-based Speech Synthesis<br />

Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />

Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas


126<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Refereed Conference and Workshop Papers<br />

Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>,<br />

Portland, USA<br />

Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />

Florence, Italy<br />

Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />

Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />

international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />

Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation. In<br />

Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />

Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />

Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />

Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of<br />

The European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />

of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />

Leveling, J. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Monolingual and crosslingual SMS-based FAQ retrieval. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum for Information<br />

Retrieval Evaluation, pages 37-42, Kolkata, India, <strong>2012</strong>. ISI.<br />

Leveling, J. (<strong>2012</strong>). On the effect of stopword removal for SMS-based FAQ retrieval. In Gosse Bouma, Ashwin Ittoo, Elisabeth Métais, and Hans Wortmann,<br />

editors, Natural Language Processing and Information Systems – 17th International Conference on Applications of Natural Language to Information<br />

Systems, NLDB <strong>2012</strong>, 26-28 June, Groningen, The Netherlands, Proceedings, volume 7337 of Lecture Notes in Computer Science (LNCS), pages 128-139.<br />

Springer, <strong>2012</strong>.<br />

Leveling, J., Goeuriot, L., Kelly, L., Jones, G.J.F. (<strong>2012</strong>). DCU@TRECMed <strong>2012</strong>: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of<br />

TREC <strong>2012</strong>. NIST, <strong>2012</strong>.<br />

Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference<br />

(AIRS <strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />

Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />

International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />

Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />

Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />

Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />

Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />

Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />

Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />

Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />

of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />

Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />

Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />

Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />

word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles<br />

(JADT <strong>2012</strong>), Liège.<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />

or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 127<br />

Refereed Conference and Workshop Papers<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />

Infocommunications, Kosice, Slovenia<br />

McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />

Canada: ACM<br />

Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />

<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />

Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />

Machine Translation, Montreal, Canada<br />

Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />

on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />

Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal,<br />

Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />

In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />

O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />

O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />

International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />

O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />

Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />

O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />

In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />

Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation, COLING<br />

<strong>2012</strong>, Mumbai, India<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />

Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />

& Demonstrations Track, Graz, Austria<br />

Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />

Translation, December 2011, San Francisco, USA<br />

Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />

translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />

Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />

expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />

Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />

Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />

Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />

LREC <strong>2012</strong>, Istanbul, Turkey<br />

Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />

In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />

Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />

In Proceedings of ICASSP, Kyoto, Japan<br />

Truran, M., Georg, G., Cavazza, M., Zhou, D. (<strong>2012</strong>). A Section Title Authoring Tool for Clinical Guidelines. In Proceedings of 12th ACM Symposium<br />

on Document Engineering (DocEng <strong>2012</strong>), 4-7 September, Paris, France, 41-44.<br />

Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />

Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea


128<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Refereed Conference and Workshop Papers<br />

Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai,<br />

India<br />

Veale, T. (<strong>2012</strong>) Detecting and Generating Ironic Comparisons: An Application of Creative Information Retrieval. AAAI Fall Symposium Series <strong>2012</strong><br />

Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />

Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />

In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />

Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />

Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />

Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis of<br />

Non-Canonical Language (SANCL), Montreal, Canada<br />

Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />

Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />

Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />

Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />

Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />

international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />

Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />

Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />

Conference on Natural Language Processing, Vienna, Austria<br />

Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />

Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory<br />

and Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />

Not Recorded in 2011 <strong>Annual</strong> <strong>Report</strong><br />

Refereed Conference and Workshop Papers<br />

Doherty, S. Exploring the Cognitive Elements of Think-Aloud Protocols. Show and Tell: Proceedings of the 2011 SALIS Postgraduate Showcase<br />

Abgaz, Y., Javed, M., Pahl, C. A Framework for Change Impact Analysis of Ontology-driven Content-based Systems. In Proceedings of On the Move to<br />

Meaningful Internet Systems: OTM Workshops. 7th International IFIP Workshop on Semantic Web and Web Semantics (SWWS), October, 2011, Crete,<br />

Greece<br />

Book Chapters<br />

Asanka Wasala, R., Buckley, J., Exton, C., Schaler, R., Weerasinghe, A. R. (<strong>2012</strong>). Building Multilingual Language Resources in Web Localisation:<br />

A Crowdsourcing Approach. In I. Gurevych and J. Kim (Eds.), The People’s Web Meets NLP: Collaboratively Constructed Language Resources,<br />

Springer Verlag Berlin Heidelberg [In Press]<br />

Banerjee, P. (<strong>2012</strong>). In Alexander Clark, Chris Fox and Shalom Lappin (eds.): Handbook of computational linguistics and natural language processing.<br />

Machine Translation. 10.1007/s10590-012-9124-2 (OnlineFirst)<br />

“Kane, M., Mauclair, J. and Carson-Berndsen, J. (2011). Automatic Identification of Phonetic Similarity based on Underspecification. Human Language<br />

Technology: Challenges for Computer Science and Linguistics. Lecture Notes in Computer Science (LNCS 6562) Poznan, pp.47-58<br />

Morera Mesa, A., Collins, J.J., Aouad, L. (<strong>2012</strong>). Assessing Support for Community Workflows in Localisation. In Florian Daniel, Kamel Barkaoui and<br />

Schahram Dustdar (Eds.) Business Process Management Workshops, BPM 2011 International Workshops Clermont-Ferrand, France, August 29, 2011,<br />

Revised Selected Papers, Part I, Lecture Notes in Business Information Processing (LNBIP) volume 99, part 3, pp 195-206, Springer Berlin Heidelberg<br />

O’Keeffe, I., Aouad, L., Collins, J.J., Asanka Wasala, R., Nishio, N., Morera Mesa, A., Morado Vázquez, L., Ryan, L., Gupta, R., Schaler, R. (<strong>2012</strong>).<br />

A View of Future Technologies and Challenges for the Automation of Localisation Processes: Visions and Scenarios. ICHIT (2) 2011: 371-382


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 129<br />

Refereed Original Articles<br />

Kane, J. and Gobl, C. Wavelet maxima dispersion for breathy to tense voice discrimination, IEEE Transactions on Audio, Speech and Language Processing<br />

[In Press]<br />

Doherty, G., Karamanis, N., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work. Computer Supported<br />

Cooperative Work (CSCW), August <strong>2012</strong><br />

Ghorab, R.M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: survey and classification. User Modeling and User-Adapted<br />

Interaction (UMUAI), <strong>2012</strong>.<br />

Kane, J., Drugman, T., Gobl, C. Improved automatic detection of creak, Computer Speech and Language [In Press]<br />

Karamanis, N., Doherty, G., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work, Computer Supported<br />

Cooperative Work (CSCW), August <strong>2012</strong><br />

Lambert, P., Petitrenaud S., Ma Y., Way A., (<strong>2012</strong>). What types of word alignment improve statistical machine translation In Machine Translation, Volume<br />

26,(4), edited by Springer, p.289-323, <strong>2012</strong><br />

Moorkens, J. (<strong>2012</strong>). A mixed-methods study of consistency in Translation Memories. Localisation Focus, Volume 11(1)<br />

Morena Mesa, A. (<strong>2012</strong>). Translation and localization project management: the art of the possible. In Keiran J. Dunne and Elena S. Dunne (eds.).<br />

Translation and localization project management: the art of the possible<br />

Mulwa, C., Lawless, S., O’Keeffe, I., Sharp, M., Wade, V. (<strong>2012</strong>). A recommender Framework for the Evaluation of End User Experience in Adaptive<br />

Technology enhanced Learning Systems. International Journal of Technology Enhanced Learning, IJTEL, Special Issue on “Datasets and Data Supported<br />

Learning in Technology-Enhanced Learning”, Vol. 4, pp. 67-84, Nos. 1/2, <strong>2012</strong><br />

O’Keeffe, I. (<strong>2012</strong>). Soundtrack Localisation: Culturally Adaptive Music Content for Computer Games, Journal of Internationalisation and Localisation<br />

Rami Ghorab, M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: Survey and Classification, In User Modeling and User<br />

Adapted Interaction (UMUAI) Journal1-63 (Published Online First: http://dx.doi.org/10.1007/s11257-012-9124-1), Springer.<br />

Ryan, L. (<strong>2012</strong>). Global Authoring Resources, Communicator, Spring <strong>2012</strong>, ISTC<br />

Ryan, L. (<strong>2012</strong>). Global Authoring Techniques. Communicator, Autumn <strong>2012</strong>, ISTC<br />

Ryan, L. (<strong>2012</strong>). Global Diversity and Localisation Issues. Communicator, Summer <strong>2012</strong>, ISTC XXX<br />

Sah, M. and Wade, V. (<strong>2012</strong>). Automatic Metadata Mining from Multilingual Enterprise Content. In Web Semantics: Science, Services and Agents on the<br />

World Wide Web, Volume 11, issue (March, <strong>2012</strong>), p. 41-62<br />

Van Der Sluis, I., Luz, S., Breitfus, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />

xpressions in a virtual world. International Journal of Human-Computer Studies, Volume 70, Issue 9, <strong>2012</strong><br />

Pages 611-619<br />

Van der Sluis, I., Luz, S., Breitfuß, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />

expressions in a virtual world. International Journal of Human-Computer Studies, 70(9):611-629, <strong>2012</strong>.<br />

Wasala, A., Schmidtke, D., Schaler, R. (<strong>2012</strong>). XLIFF and LCS Format: A Comparison. Localisation Focus, Volume 11(1)<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Improving search via personalized query expansion using social media, Information Retrieval, 15(3-4), 218-242<br />

Zhou, D., Truran, M., Brailsford, T., Wade, V., Ashman, H. (<strong>2012</strong>). Translation Techniques in Cross-Language Information Retrieval. ACM Computing<br />

Surveys (CSUR), 45(1), Article 1, 1-44. <strong>2012</strong><br />

Conference Presentations<br />

Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />

Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />

Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />

Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />

Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />

In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />

Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />

In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />

Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy


130<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Conference Presentations<br />

Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />

of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />

Republic of Korea<br />

Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />

Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />

detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai,<br />

India<br />

Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />

24th International Conference on Computational Linguistics, Mumbai, India<br />

Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />

Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />

(EAMT-<strong>2012</strong>), Trento, Italy<br />

Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />

Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />

Resources and Evaluation (LREC), Istanbul, Turkey<br />

Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />

(LISTA) Workshop, Edinburgh, UK<br />

Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />

Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />

(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />

Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />

Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />

30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />

Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />

for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />

Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />

Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />

on Pattern Recognition Application and Methods, Portugal<br />

Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />

of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />

Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />

Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />

Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />

France, ACM proceedings 978-1-4503-1229-5/11/04<br />

Filip, D., Lewis, D., Wasala, A., Jones, D., Finn, L. (<strong>2012</strong>). CMSL10n SOLAS Integration as an ITS 2.0 XLIFF test bed. Paper presented at the W3C<br />

MultilingualWeb (ITS 2.0) Track, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.<br />

Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />

Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />

International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />

Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />

Mumbai, India<br />

Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />

In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />

Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />

Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />

Knowledge Evolution and Ontology Dynamics, Boston, USA<br />

Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMMbased<br />

Speech Synthesis<br />

Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>, Portland, USA


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 131<br />

Conference Presentations<br />

Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />

Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas<br />

Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />

Florence, Italy<br />

Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />

Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />

international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />

Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation.<br />

In Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />

Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />

Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />

Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of The<br />

European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />

of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />

Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference (AIRS<br />

<strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />

Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />

International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />

Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />

Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />

Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />

Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />

Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />

Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />

Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />

of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />

Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />

Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />

Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />

word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles (JADT<br />

<strong>2012</strong>), Liège.<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />

or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />

Infocommunications, Kosice, Slovenia<br />

McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />

Canada: ACM<br />

Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />

<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />

Moorkens, J., Doherty, S., Kenny, D., O’Brien, S. 2013 (forthcoming). A Virtuous Circle: Laundering Translation Memory Data using Statistical Machine<br />

Translation. Tralogy Conference, January 2013, Paris, France<br />

Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />

Machine Translation, Montreal, Canada


132<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Conference Presentations<br />

Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />

on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />

Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />

In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />

O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />

O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />

International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />

O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />

Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />

O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />

In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />

Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation,<br />

COLING <strong>2012</strong>, Mumbai, India<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />

Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />

& Demonstrations Track, Graz, Austria<br />

Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />

Translation, December 2011, San Francisco, USA<br />

Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />

translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />

Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />

expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />

Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />

Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />

Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />

LREC <strong>2012</strong>, Istanbul, Turkey<br />

Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />

In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />

Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />

In Proceedings of ICASSP, Kyoto, Japan<br />

Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />

Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea<br />

Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai, India<br />

Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />

Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />

In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />

Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />

Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />

Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis<br />

of Non-Canonical Language (SANCL), Montreal, Canada<br />

Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />

Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />

Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />

Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />

Wasala, A., Filip, D., Exton, C., R., Schäler R. (<strong>2012</strong>). Making Data Mining of XLIFF Artefacts Relevant for the Ongoing Development of the XLIFF Standard.<br />

Paper presented at the 3rd International XLIFF Symposium, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 133<br />

Conference Presentations<br />

Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />

international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />

Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />

Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />

Conference on Natural Language Processing, Vienna, Austria<br />

Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />

Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory and<br />

Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />

Workshops and Conferences Hosted<br />

Date Event Title Location<br />

09/03/<strong>2012</strong> - 10/03/<strong>2012</strong> Workshop on Innovation and Applications in Speech Technology (IAST) University College Dublin<br />

11/03/<strong>2012</strong> <strong>CNGL</strong> Hadoop Hackathon Dublin City University<br />

16/05/<strong>2012</strong> - 17/05/<strong>2012</strong> <strong>CNGL</strong> Spring Scientific Committee Meeting (incorporating inaugural<br />

Innovation Charette)<br />

Chartered Accountants House, Dublin<br />

30/05/<strong>2012</strong> - 01/06/<strong>2012</strong> International Conference on Computational Creativity (ICCC) University College Dublin<br />

04/06/<strong>2012</strong> - 06/06/<strong>2012</strong> Workshop on Best Practices in Post-editing (in association with TAUS) at<br />

Localization World<br />

Paris, France<br />

13/06/<strong>2012</strong> - 15/06/<strong>2012</strong> LRC Summer School <strong>2012</strong> – Mobile App Localisation Carlton Castletroy Park Hotel, Limerick<br />

25/06/<strong>2012</strong> Trinity Access Programme ‘Editing Wikipedia’ Workshop Trinity College Dublin<br />

20/09/<strong>2012</strong> - 21/09/<strong>2012</strong> 17th <strong>Annual</strong> Localisation & Internationalisation Conference (LRC XVII) Carlton Castletroy Park Hotel, Limerick<br />

11/06/<strong>2012</strong> - 13/06/<strong>2012</strong> W3C Multilingual Web Workshop Trinity College Dublin<br />

08/10/<strong>2012</strong> - 09/10/<strong>2012</strong> International Workshop on Intelligent Exploration of Semantic Data (IESD)<br />

<strong>2012</strong> at 18th International Conference on Knowledge Engineering and<br />

Knowledge Management (EKAW<strong>2012</strong>)<br />

16/10/<strong>2012</strong> - 17/10/<strong>2012</strong> FEISGILTT <strong>2012</strong> (Federated Event for Interoperability Standardization<br />

in Globalization, Internationalization, Localization, and Translation<br />

Technologies)<br />

Galway, Ireland<br />

Seattle, USA<br />

11/1/<strong>2012</strong> Workshop on Monolingual Translation at AMTA <strong>2012</strong> San Diego, USA<br />

10/28/<strong>2012</strong> Workshop on Post-editing Technology and Practice (WPTP-12) at AMTA<br />

<strong>2012</strong><br />

08/11/<strong>2012</strong> - 11/11/<strong>2012</strong> International Postgraduate Conference in Translating and Interpreting<br />

(IPCITI)<br />

San Diego, USA<br />

Dublin City University<br />

24/11/<strong>2012</strong> The Multimodality and Cyberpsychology Conference Dublin City University<br />

09/12/<strong>2012</strong> Second Workshop on Applying Machine Learning Techniques to Optimise<br />

the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)<br />

[<strong>CNGL</strong> co-organiser]<br />

08/12/<strong>2012</strong> -15/12/<strong>2012</strong> Machine Translation and Parsing in Indian Languages (MTPIL-<strong>2012</strong>) [<strong>CNGL</strong><br />

co-organiser]<br />

Mumbai, India<br />

Mumbai, India


134<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Equipment Valued at Over €50,000 Funded by <strong>CNGL</strong><br />

Item Price Description<br />

No equipment valued over €50,000 was purchased by <strong>CNGL</strong> in <strong>2012</strong><br />

Invention and Software Disclosures Filed<br />

Date Track Title<br />

25/01/<strong>2012</strong> ILT A Measurement method for detecting changes in tone-of-voice (voice quality) from recorded speech signals<br />

25/01/<strong>2012</strong> ILT AlignRank:An Evidence Propagation Algorithmfor Word Alignment<br />

12/03/<strong>2012</strong> LOC WorkFlow Recommender<br />

12/03/<strong>2012</strong> LOC LocConnect – Localisation Orchestration Framework<br />

12/03/<strong>2012</strong> LOC Localisation Knowledge Repository<br />

12/03/<strong>2012</strong> LOC XLIFF Phoenix<br />

12/03/<strong>2012</strong> LOC MT Mapper<br />

19/04/<strong>2012</strong> ILT IR Retrieval Model which combines the integrated Recommendation Results with IR retrieval results<br />

30/05/<strong>2012</strong> SF CAT Tool Instrumentation<br />

08/10/<strong>2012</strong> ILT Machine Translation Performance Predictor<br />

Patent Applications Submitted or Granted, and Licence Agreements Signed<br />

Date Title Application number Inventor Track Status<br />

30/05/<strong>2012</strong> Automatic Metadata Extraction from Multilingual<br />

Enterprise Content<br />

61/656,499 Melike Sah DCM US Provisional<br />

Licensed Technologies<br />

Licensed To Technology Track<br />

Xcelerator Data Visualisation Dashboard ILT<br />

Xcelerator Data Health Estimator for Machine Translation ILT<br />

Xcelerator Predictive Performance Estimator for Machine Translation ILT<br />

Welocalize A System for Tracking and Analysing Translator Behaviour in an Instrumented Post-editing Environment ILT<br />

Spinout Companies Created<br />

Company<br />

Emizar Customer Solutions Ltd.<br />

Incorporation Date 8th November 2011<br />

Registration # 505776<br />

Website<br />

www.emizar.com


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 135<br />

Awards and Honours Received<br />

Name Award Body Award Type Date<br />

Prof. Josef van Genabith Dublin City University DCU President’s Research Award<br />

for Science and Engineering<br />

February <strong>2012</strong><br />

Dr. Martin Emms,<br />

Hector Franco-Penya<br />

Joachim Wagner (<strong>CNGL</strong>/DCU),<br />

Dr. Jennifer Foster (NCLT/DCU),<br />

Rasul Samad Zadeh Kaljahi (NCLT/Symantec),<br />

Dr. Anton Bryl (Systransis AG, formerly <strong>CNGL</strong>),<br />

Dr. Joseph Le Roux (Université Paris 13,<br />

formerly NCLT/DCU)<br />

International Conference on Pattern<br />

Recognition Application and Methods<br />

(ICPRAM <strong>2012</strong>)<br />

First Workshop on Syntactic Analysis of<br />

Non-Canonical Language (SANCL)<br />

Best Paper Award February <strong>2012</strong><br />

Shared Task Win June <strong>2012</strong><br />

Ben Steichen Localisation Research Centre Best Thesis Award September <strong>2012</strong><br />

Liliana Mamani Sanchez,<br />

Dr. Carl Vogel<br />

Debasis Ganguly<br />

Debasis Ganguly,<br />

Dr. Johannes Leveling,<br />

Dr. Gareth Jones<br />

3rd IEEE International Conference on<br />

Cognitive Infocommunications<br />

Morpheme Extraction Task (MET)<br />

of FIRE <strong>2012</strong><br />

Eighth ASIA Information Retrieval<br />

Societies Conference (AIRS <strong>2012</strong>)<br />

Steering Committee<br />

Best Paper Award December <strong>2012</strong><br />

Bengali (Best), Hindi (Second Best) December <strong>2012</strong><br />

AIRS’12 Best Poster Paper Award December <strong>2012</strong><br />

Media Coverage<br />

Date Media Outlet Event Headline Link<br />

05/01/<strong>2012</strong> Techcentral.ie Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Siliconrepublic.com Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Businessandfinance.ie Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Businessandleadership.com Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

14/01/<strong>2012</strong> Limerick Leader Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

14/01/<strong>2012</strong> Limerick Post Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra to grow Limerick<br />

Operation (translation<br />

services company attracted<br />

by third level research)<br />

20 new jobs as Cetra rolls<br />

into town<br />

120 new jobs for Dublin and<br />

Limerick<br />

Cetra to locate European<br />

centre in Limerick creating<br />

20 jobs<br />

New Limerick firm translates<br />

into 20 positions<br />

Translation services<br />

company to create 20 jobs<br />

http://www.techcentral.ie/article.<br />

aspxid=18040<br />

http://www.siliconrepublic.com/careers-centre/<br />

item/25210-20-new-jobs-for-limerick-as/<br />

http://www.businessandfinance.ie/news/120ne<br />

wjobsfordublinandlimerick<br />

http://www.businessandleadership.com/<br />

business/item/33541-cetra-to-locate-european/<br />

Page 6<br />

Page 14<br />

24/01/<strong>2012</strong> Siliconrepublic.com Launch of <strong>CNGL</strong> careers<br />

guide<br />

Students urged to consider a<br />

career in localisation<br />

http://www.siliconrepublic.com/careers-centre/<br />

item/25466-students-urged-to-consider/<br />

24/01/<strong>2012</strong> Education Matters (www.<br />

educationmatters.ie)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

High demand for graduates<br />

in localisation<br />

http://www.educationmatters.ie/<strong>2012</strong>/01/24/<br />

high-demand-for-graduates-in-localisation/<br />

25/01/<strong>2012</strong> Scoop It! Language Blog<br />

(www.scoop.it)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

<strong>CNGL</strong> Localisation Careers<br />

http://www.scoop.it/t/translation-andlocalization/p/1050392099/cngl-localisationcareers


136<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

25/01/<strong>2012</strong> www.science.ie Launch of <strong>CNGL</strong> careers<br />

guide<br />

A world of opportunities in<br />

localisation<br />

http://www.science.ie/science-news/<br />

opportunities-in-localisation.html<br />

25/01/<strong>2012</strong> Waterford Institute of<br />

Technology website (www.<br />

wit.ie)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

Graduates in high demand<br />

in Ireland’s 16,000-job<br />

localisation sector<br />

http://www.wit.ie/News/News/<br />

MainBody,48355,en.html<br />

26/01/<strong>2012</strong> Lingport i18nlog Launch of <strong>CNGL</strong> careers<br />

guide<br />

<strong>CNGL</strong> Launches Localization<br />

Careers Guide<br />

http://i18nblog.com/<strong>2012</strong>/01/26/cngllaunches-localization-careers-guide/<br />

26/01/<strong>2012</strong> Gradireland.com (Ireland’s<br />

official graduate jobs and<br />

careers website)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

Localisation – a growth area<br />

and a career opportunity<br />

http://gradireland.wordpress.com/<strong>2012</strong>/01/26/<br />

localisation-a-growth-area-and-a-careeropportunity/<br />

26/01/<strong>2012</strong> www.mysciencecareer.ie Launch of <strong>CNGL</strong> careers<br />

guide<br />

01/02/<strong>2012</strong> Education Matters ezine Launch of <strong>CNGL</strong> careers<br />

guide<br />

01/02/<strong>2012</strong> Siliconrepublic.com All Ireland Linguistics<br />

Olympiad<br />

01/02/<strong>2012</strong> egovmonitor.com Launch of <strong>CNGL</strong> careers<br />

guide<br />

06/02/<strong>2012</strong> Evening Echo Launch of <strong>CNGL</strong> careers<br />

guide<br />

08/02/<strong>2012</strong> Irish Independent Fostering foreign language<br />

skills<br />

Ireland becoming a global<br />

expert in localisation<br />

High demand for graduates<br />

in localisation area<br />

AILO fosters next generation<br />

of Irish computational<br />

linguists<br />

“Ireland Is Recognised As A<br />

Leader In The Localisation<br />

And Global Services Sector<br />

But We Need To Do More”<br />

– Sherlock<br />

With technology and a<br />

second language you will be<br />

a professional in demand<br />

Teaching languages at<br />

primary level will be a key to<br />

our economic future<br />

http://www.mysciencecareer.ie/resources/newsand-events/localisation-in-ireland<br />

http://www.siliconrepublic.com/innovation/<br />

item/25584-skillsfeb/<br />

http://www.egovmonitor.com/node/46072<br />

Page 33<br />

Page 15<br />

08/02/<strong>2012</strong> Irish Independent website<br />

(www.independent.ie)<br />

Fostering foreign language<br />

skills<br />

Teaching languages at<br />

primary level will be a key to<br />

our economic future<br />

http://www.independent.ie/lifestyle/education/<br />

features/in-my-opinion-teaching-languagesat-primary-level-will-be-a-key-to-our-economicfuture-3012676.html<br />

09/02/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />

Olympiad<br />

14/02/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

All Ireland Linguistics<br />

Olympiad<br />

Budding Strokestown<br />

linguists seek to decode the<br />

languages of the world<br />

Page 15<br />

Page 53<br />

07/03/<strong>2012</strong> Dublin City of Science<br />

website (www.<br />

dublinscience<strong>2012</strong>.ie)<br />

All Ireland Linguistics<br />

Olympiad<br />

All Ireland Linguistics<br />

Olympiad<br />

http://www.dublinscience<strong>2012</strong>.ie/<strong>2012</strong>/03/allireland-linguistics-olympiad/<br />

07/03/<strong>2012</strong> Evening Echo All Ireland Linguistics<br />

Olympiad<br />

09/03/<strong>2012</strong> Céist website (www.ceist.ie) All Ireland Linguistics<br />

Olympiad<br />

13/02/<strong>2012</strong> Techcentral.ie ComputeTY transition year<br />

programme<br />

Students have strategy to<br />

solve problems<br />

All Ireland Linguistics<br />

Olympiad (AILO)<br />

Transition year students<br />

decode Web design<br />

http://www.ceist.ie/news_events/view_article.<br />

cfmloadref=2&id=595<br />

http://www.techcentral.ie/article.<br />

aspxid=18301&utm_source=TechCentral<br />

+newsletter&utm_campaign=4755350324-<br />

13_022_13_<strong>2012</strong>&utm_<br />

medium=email#ixzz1mGomGpL0


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 137<br />

Date Media Outlet Event Headline Link<br />

14/03/<strong>2012</strong> Sunday Business Post<br />

website (www.businesspost.<br />

ie)<br />

Xcelerator/DCU licence<br />

Startup of the day:<br />

Xcelerator<br />

http://www.businesspost.ie/#!story/Home/<br />

News/Startup+of+the+day%3A+Xcelerator/<br />

id/86478410-84f6-07f6-0e90-4e777652148<br />

18/03/<strong>2012</strong> Sunday Business Post Xcelerator/DCU licence Startup of the day:<br />

Xcelerator<br />

22/03/<strong>2012</strong> Guideline Magazine Next Generation<br />

Localisation Careers<br />

21/03/<strong>2012</strong> Irish Examiner All Ireland Linguistics<br />

Olympiad<br />

21/03/<strong>2012</strong> Irish Examiner Online All Ireland Linguistics<br />

Olympiad<br />

Graduates in high demand<br />

in Ireland’s localisation<br />

sector<br />

Pupils pit wits against<br />

language puzzles<br />

Pupils pit wits against<br />

language puzzles<br />

Cover & Page 3<br />

http://www.irishexaminer.com/ireland/pupilspit-wits-against-language-puzzles-187781.html<br />

27/03/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

AILO Olympiad Page 55<br />

28/03/<strong>2012</strong> Irish Independent Language Advocacy Adios Espanol – Quinn<br />

dumps languages in primary<br />

schools (Cara Greene<br />

comment)<br />

Page 17<br />

29/03/<strong>2012</strong> South Tipp Today All Ireland Linguistics<br />

Olympiad<br />

29/03/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />

Olympiad<br />

30/03/<strong>2012</strong> www.sam-xlation.de SAM Xlation GbmH tests<br />

KantanMT product of <strong>CNGL</strong><br />

spinout Xcelerator<br />

02/04/<strong>2012</strong> Education Magazine Next Generation<br />

Localisation Careers<br />

School Ruain student in<br />

Linguistics Olympiad<br />

Scoil Ruain Student in<br />

Linguistics Olympiad<br />

Machine Translation Testing<br />

Graduates in high demand<br />

in Ireland’s localisation<br />

sector<br />

Page 31<br />

Page SS 3<br />

http://www.sam-xlation.de/index.php/de/aktue<br />

lles#MachineTranslationTesting<br />

Pages 12-13<br />

22/04/<strong>2012</strong> LANGTECHNEWS Innovation Voucher<br />

collaboration with Cipherion<br />

Translations<br />

Irish localisation company to<br />

add MT, crowd-sourcing and<br />

gamification<br />

24/04/<strong>2012</strong> Irish Times Insight<br />

supplement<br />

30/04/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

Sign Language Machine<br />

Translation<br />

wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Lost in translation Page 13<br />

SFI-funded scientists head to<br />

Silicon Valley<br />

http://www.enterprise.gov.ie/News/Irish_<br />

researchers_secure_coveted_prize_of_trip_to_<br />

Silicon_Valley_.html<br />

30/04/<strong>2012</strong> Techcentral.ie wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

30/04/<strong>2012</strong> TechCentral ezine wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Irish researchers secure trip<br />

to Silicon Valley<br />

Irish researchers secure trip<br />

to Silicon Valley<br />

http://www.techcentral.ie/article.<br />

aspxid=18832<br />

30/04/<strong>2012</strong> www.studentnews.ie wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Irish science researchers<br />

land key trip to Silicon Valley<br />

to meet technology chiefs<br />

http://langtechnews.hivefire.com/<br />

articles/146423/irish-localisation-company-toadd-mt-crowd-sourcin/<br />

http://www.studentnews.ie/irish-scienceresearchers-land-key-trip-to-silicon-valley-tomeet-technology-chiefs-5724<br />

April/<br />

May <strong>2012</strong><br />

edition<br />

Multilingual Magazine Localistion standards The localization standards<br />

ecosystem (article by Dr<br />

David Filip, <strong>CNGL</strong> at UL)


138<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

03/05/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

All Ireland Linguistics<br />

Olympiad<br />

Double Gold for Belfast<br />

Schools in All Ireland<br />

Linguistics Olympiad<br />

http://www.enterprise.gov.ie/News/Double_<br />

Gold_for_Belfast_Schools_in_All_Ireland_<br />

Linguistics_Olympiad.html<br />

03/05/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

All-Ireland Linguistics Final Page SS 6<br />

12/05/<strong>2012</strong> South Belfast News All Ireland Linguistics<br />

Olympiad<br />

17/05/<strong>2012</strong> Northern Standard All Ireland Linguistics<br />

Olympiad<br />

Olympiad gold for<br />

Wellington team<br />

Photo: Mrs Geraldine Kelly<br />

making a presentation to<br />

Zoe Vance for her success<br />

in the Linguistics Olympiad<br />

© Rory Geary/Northern<br />

Standard<br />

Page 17<br />

Page 24<br />

24/05/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

Google parsing challenge<br />

DCU-Paris 13 Team excels in<br />

Google parsing challenge<br />

http://www.enterprise.gov.ie/News/DCU-<br />

Paris_13_Team_excels_in_Google_Parsing_<br />

Challenge.html<br />

31/5/012 Ballincollig Today All Ireland Linguistics<br />

Olympiad<br />

31/05/<strong>2012</strong> Mid Cork Today All Ireland Linguistics<br />

Olympiad<br />

Photo: Among the<br />

winners at the Ballincollig<br />

Community School’s <strong>Annual</strong><br />

Awards Night was Grainne<br />

Hutchinson (Ovens),<br />

bronze award at All Ireland<br />

Linguistics Olympiad<br />

Photo: Among the<br />

winners at the Ballincollig<br />

Community School’s <strong>Annual</strong><br />

Awards Night was Grainne<br />

Hutchinson (Ovens),<br />

bronze award at All Ireland<br />

Linguistics Olympiad<br />

Page 8<br />

Page 8<br />

13/06/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

LRC Summer School<br />

Irish mobile app developers<br />

urged to localise their apps<br />

http://www.siliconrepublic.com/new-media/<br />

item/27729-irish-mobile-app-developers/<br />

13/06/<strong>2012</strong> Techcentral.ie LRC Summer School Irish mobile app developers<br />

must think global, says LRC<br />

http://www.techcentral.ie/19122/irishmobile-app-developers-must-think-global-sayslrc#ixzz1xfU9YK00<br />

13/06/<strong>2012</strong> TechCentral ezine LRC Summer School Irish mobile app developers<br />

must think global, says LRC<br />

13/06/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

LRC Summer School<br />

Irish mobile app developers<br />

must think global, says<br />

Localisation Research Centre<br />

http://www.enterprise.gov.ie/News/Irish_<br />

mobile_app_developers_must_think_global_<br />

says_Localisation_Research_Centre.html<br />

13/06/<strong>2012</strong> Polish Interpreting (www.<br />

polish-interpreting.co.uk)<br />

LRC Summer School<br />

Irish mobile app developers<br />

urged to localise …<br />

http://polish-interpreting.co.uk/<strong>2012</strong>/06/13/<br />

irish-mobile-app-developers-urged-to-localise/<br />

14/06/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

W3C Multilingual Web<br />

Workshop<br />

Internet experts in Dublin to<br />

talk about multilingual web<br />

http://www.siliconrepublic.com/innovation/<br />

item/27759-internet-experts-in-dublin/<br />

14/06/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

W3C Multilingual Web<br />

Workshop<br />

<strong>CNGL</strong> researchers at<br />

heart of efforts to facilitate<br />

Internationalisation of Web<br />

http://www.enterprise.gov.ie/News/<strong>CNGL</strong>_<br />

researchers_at_heart_of_efforts_to_facilitate_<br />

Internationalisation_of_Web.html<br />

17/06/<strong>2012</strong> Sunday Business Post Xcelerator/DCU<br />

collaboration<br />

Translation is finally brought<br />

up to speed


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 139<br />

Date Media Outlet Event Headline Link<br />

30/06/<strong>2012</strong> Limerick Leader – County<br />

Edition<br />

Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

30/06/<strong>2012</strong> Limerick Leader Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

30/06/<strong>2012</strong> Limerick Leader West<br />

Edition<br />

Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

11/07/<strong>2012</strong> Multilingual E-Zine Xcelerator/DCU<br />

collaboration<br />

02/08/<strong>2012</strong> East Cork Journal International Linguistics<br />

Olympiad<br />

Commercialisation Fund<br />

Project<br />

Cork participating in<br />

International Linguistics<br />

Olympiad<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2521<br />

Page 16<br />

03/08/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

International Linguistics<br />

Olympiad<br />

Four Irish students in<br />

Slovenia to battle it out in<br />

Linguistics Olympiad<br />

http://www.siliconrepublic.com/innovation/<br />

item/28660-four-irish-students-in/<br />

03/08/<strong>2012</strong> World Irish (www.worldirish.<br />

com)<br />

International Linguistics<br />

Olympiad<br />

Four Irish Students Compete<br />

in International Linguistics<br />

Olympiad in Slovenia<br />

http://m.worldirish.com/listening-post/view/<br />

four-irish-students-compete-in-internationallinguistics-olympiad-in-slovenia-1641<br />

10/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

Innovation Showcase<br />

<strong>CNGL</strong> Localisation<br />

Innovation Showcase <strong>2012</strong><br />

http://www.siliconrepublic.com/events/<br />

event/2859-cngl-localisation-in<br />

11/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

LRC Conference<br />

Localisation conference in<br />

Limerick to focus on social<br />

trends<br />

http://www.siliconrepublic.com/innovation/<br />

item/29199-localisation-conference-in/<br />

13/09/<strong>2012</strong> www.newswhip.com LRC Conference Localisation conference in<br />

Limerick to focus on social<br />

trends<br />

21/09/<strong>2012</strong> Irish Independent Language Advocacy Only one in 25 primary<br />

pupils learn a language<br />

21/09/<strong>2012</strong> Limerick Post LRC Conference Twitter trends to aid<br />

translation<br />

http://www.newswhip.com/MoreInfo/<br />

Localisation-conference-in-Limerick-to-f/7480567<br />

11<br />

Page 86<br />

21/09/<strong>2012</strong> Galway City Tribune KantanMT spinout<br />

recruitment drive<br />

Cloud-based operation<br />

seeks people ‘hungry for a<br />

challenge’<br />

Page 10<br />

25/09/<strong>2012</strong> Tech Central (www.<br />

techcentral.ie)<br />

META-NET White Paper<br />

Most European languages<br />

not ready for ‘digital age’<br />

http://www.techcentral.ie/article.<br />

aspxid=19947<br />

25/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

Qun Liu joins <strong>CNGL</strong><br />

Prof Qun Liu, Professor Of<br />

Machine Translation<br />

http://www.siliconrepublic.com/careers/<br />

appointments/984-prof-qun-liu-centre-for<br />

26/09/<strong>2012</strong> The Sociable (http://<br />

sociable.co)<br />

META-NET White Paper<br />

Most European languages<br />

“unlikely to survive in the<br />

digital age”<br />

http://sociable.co/technology/most-europeanlanguages-unlikely-to-survive-in-the-digital-age/<br />

26/09/<strong>2012</strong> Multilingual E-Zine Qun Liu joins <strong>CNGL</strong> Centre for Next Generation<br />

Localisation appoints<br />

Professor of Machine<br />

Translation<br />

26/09/<strong>2012</strong> Gaelport META-NET White Paper Bagairt don Ghaeilge sa ré<br />

dhigiteach<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2526#8441<br />

http://www.gaelport.com/<br />

nuachtNewsItemID=8677


140<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

27/09/<strong>2012</strong> Radio na Gaeltachta META-NET White Paper Cormac ag a cuig http://www.rte.ie/radio/radioplayer/<br />

rteradioweb.html#!rii=17%3A3402740%3A1159<br />

8%3A27%2D09%2D<strong>2012</strong>%3A<br />

28/09/<strong>2012</strong> Newstalk – Splanc META-NET White Paper Agallamh le Ailbhe Ní<br />

Chasaide<br />

30/09/<strong>2012</strong> The Sunday Times META-NET White Paper Briefing Digital Irish: Lost for<br />

Words<br />

http://www.newstalk.ie/programmes/all/<br />

splanc/<br />

Page 16<br />

October/<br />

November<br />

<strong>2012</strong> Issue<br />

Multilingual Magazine Localisation Localization for the long tail:<br />

Part 1 (article by Dr David<br />

Filip, <strong>CNGL</strong> at UL)<br />

03/10/<strong>2012</strong> Siliconrepublic.com META-NET White Paper Irish language at risk of<br />

digital extinction, research<br />

shows<br />

19/11/<strong>2012</strong> South East Radio Cipherion Translations Mark Rodgers of Cipherion<br />

Translations on fruits of<br />

collaboration with <strong>CNGL</strong> at<br />

DCU (17 mins 45 secs)<br />

http://www.siliconrepublic.com/innovation/<br />

item/29483-irish-language-at-risk-of/<br />

https://www.youtube.com/<br />

watchv=zEhEPaPzZXU<br />

December<br />

<strong>2012</strong> Issue<br />

Multilingual Magazine Localisation Localization for the long tail:<br />

Part 2 (article by Dr David<br />

Filip, <strong>CNGL</strong> at UL)<br />

02/12/<strong>2012</strong> Sunday Business Post Emizar Emizar<br />

19/12/<strong>2012</strong> Multilingual E-Zine LORG parser LORG natural language<br />

parser<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2532#8521<br />

25/12/<strong>2012</strong> Antrim Times All Ireland Linguistics<br />

Olympiad<br />

Successful year for Antrim<br />

Grammar<br />

Page 6


Centre for Next Generation Localisation<br />

Dublin City University<br />

Dublin 9, Ireland<br />

Tel: +353-1-700 6700<br />

Fax: +353-1-700 6702<br />

Email: info@cngl.ie<br />

www.cngl.ie

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!