Modeling Letter to Phoneme Conversion as a Phrase Based ...

phoneme sequence among the candidate translationscan be represented as:ˆf = arg max {Pr (f | l)} (1)fWe model the problem of letter to phoneme conversionbased on the noisy channel model. Reformulatingthe above equation using Bayes Rule:f best = arg max p(l | f) p (f) (2)fThis formulation allows for a phoneme n-grammodel p (f) and a transcription model p (l | f). Givena sequence of letters l, the argmax function is asearch function to output the best phonemic sequence.During the decoding phase, the letter sequencel is segmented into a sequence of J letter segments¯l J1 . Each segment ¯lj in ¯l J1 is transcribed intoa phoneme segment ¯f j . Thus the best phoneme sequenceis generated from left to right in the form ofpartial translations. By using an n-gram model p LMas the language model, we have the equations:f best = arg max p(l | f) p LM (3)fwith p (l | f) written asp(¯l J 1 | ¯f J 1 ) =J∏Φ(¯l j | ¯f j) (4)j=1From the above equation, the best phoneme sequenceis obtained based on the product of the probabilitiesof transcription model and the probabilitiesof a language model and their respective weights.The method for obtaining the transcription probabilitiesis described briefly in the next section. Determiningthe best weights is necessary for obtainingthe right phoneme sequence. The estimation of themodels’ weights can be done in the following manner.The posterior probability Pr (f | l) can be directlymodeled using a log-linear model. In this model,we have a set of M feature functions h m (f, l), m =1...M . For each feature function there exists aweight or model parameter λ m , m = 1...M. Thusthe posterior probability becomes:p (f | l) = p λ M1(f | l) (5)The above modeling entails finding the suitablemodel parameters or weights which reflect the propertiesof our task. We adopt the criterion followedin (Och, 2003) for optimising the parameters of themodel. The details of the solution and proof for theconvergence are given in Och(Och, 2003). The models’weights used for the L2P task are obtained fromthis training.4 Letter to Phoneme AlignmentWe used GIZA++ (Och and Ney, 2003), an opensource toolkit, for aligning the letters with thephonemes in the training data sets. In the contextof SMT, say English-Spanish, the parallel corpus isaligned bidirectionally to obtain the two alignments.The IBM models give only one-to-one alignmentsbetween words in a sentence pair. So, GIZA++ usessome heuristics to refine the alignments (Och andNey, 2003).In our input data, the source side consists ofgrapheme (or letter) sequences and the target sideconsists of phoneme sequences. Every letter orgrapheme is treated as a single ‘word’ for theGIZA++ input. The transcription probabilities canthen be easily learnt from the alignments inducedby GIZA++, using a scoring function (Koehn et al.,2003). Figure 1 shows the alignments induced byGIZA++ for the example words which are mentionedby Jiampojamarn et al (Jiampojamarn et al.,2007). In this figure, we only show the alignmentsfrom graphemes to phonemes.Figure 1: Example Alignments from GIZA++5 EvaluationWe have evaluated our models on the EnglishCMUDict, French Brulex, German Celex andDutch Celex speech dictionaries. These dictionariesare available for download on the websiteof PROANALSYL 1 Letter to Phoneme Conversion1 http://www.pascal-network.org/Challenges/PRONALSYL/

ReferencesSusan Bartlett, Grzegorz Kondrak, and Colin Cherry.2008. Automatic syllabification with structured SVMsfor letter-to-phoneme conversion. In Proceedings ofACL-08: HLT, pages 568–576, Columbus, Ohio, June.ACL.Max Bisani and Hermann Ney. 2002. Investigationson joint-multigram models for grapheme-to-phonemeconversion. In International Conference on SpokenLanguage Processing, pages 105–108, Denver, CO,USA, September.A.W. Black, K. Lenzo, and V. Pagel. 1998. Issues inBuilding General Letter to Sound Rules. In The ThirdESCA/COCOSDA Workshop (ETRW) on Speech Synthesis.ISCA.M. Collins. 2002. Discriminative training methods forhidden Markov models: theory and experiments withperceptron algorithms. In Proceedings of the ACL-02conference on EMNLP-Volume 10, pages 1–8. ACL,Morristown, NJ, USA.K. Crammer and Y. Singer. 2003. Ultraconservative onlinealgorithms for multiclass problems. The Journalof Machine Learning Research, 3:951–991.W.N.I.P. Daelcnians, P. Antal, and J. van denBosch. 1997. Language-Independent Data-0rientedGrapheme-to-Phoneme Conversion. Progress inSpeech Synthesis.R.I. Damper, Y. Marchand, J.D. Marseters, and A. Bazin.2004. Aligning Letters and Phonemes for Speech Synthesis.In Fifth ISCA Workshop on Speech Synthesis.ISCA.Sittichai Jiampojamarn, Grzegorz Kondrak, and TarekSherif. 2007. Applying many-to-many alignmentsand hidden markov models to letter-to-phoneme conversion.In HLT 2007: The Conference of the NAACL;Proceedings of the Main Conference, pages 372–379,Rochester, New York, April. ACL.Sittichai Jiampojamarn, Colin Cherry, and GrzegorzKondrak. 2008. Joint processing and discriminativetraining for letter-to-phoneme conversion. In Proceedingsof ACL-08: HLT, pages 905–913, Columbus,Ohio, June. ACL.P. Koehn, F.J. Och, and D. Marcu. 2003. Statisticalphrase-based translation. In Proceedings of the 2003Conference of the NAACL:HLT-Volume 1, pages 48–54. ACL Morristown, NJ, USA.P. Koehn, H. Hoang, A. Birch, C. Callison-Burch,M. Federico, N. Bertoldi, B. Cowan, W. Shen,C. Moran, R. Zens, et al. 2007. Moses: Open SourceToolkit for Statistical Machine Translation. In ACL,volume 45, page 2.J. Kominek and A.W. Black. 2006. Learning pronunciationdictionaries: language complexity and word selectionstrategies. In HLT-NAACL, pages 232–239. ACL,Morristown, NJ, USA.Y. Marchand and R.I. Damper. 2000. A MultistrategyApproach to Improving Pronunciation by Analogy.Computational Linguistics, 26(2):195–219.F.J. Och and H. Ney. 2003. A Systematic Comparison ofVarious Statistical Alignment Models. ComputationalLinguistics, 29(1):19–51.F.J. Och. 2003. Minimum error rate training in statisticalmachine translation. In Proceedings of the 41st AnnualMeeting on ACL-Volume 1, pages 160–167. ACL,Morristown, NJ, USA.J. Schroeter, A. Conkie, A. Syrdal, M. Beutnagel,M. Jilka, V. Strom, Y.J. Kim, H.G. Kang, and D. Kapilow.2002. A Perspective on the Next Challenges forTTS Research. In IEEE 2002 Workshop on SpeechSynthesis.Tarek Sherif and Grzegorz Kondrak. 2007. Substringbasedtransliteration. In Proceedings of the 45th AnnualMeeting of the Association of ComputationalLinguistics, pages 944–951, Prague, Czech Republic,June. Association for Computational Linguistics.A. Stolcke. 2002. Srilm – an extensible language modelingtoolkit.P. Taylor. 2005. Hidden Markov Models for Graphemeto Phoneme Conversion. In Ninth European Conferenceon Speech Communication and Technology.ISCA.K. Toutanova and R.C. Moore. 2002. Pronunciationmodeling for improved spelling correction. In Proceedingsof the 40th annual meeting of ACL, pages144–151.A. van den Bosch and S. Canisius. 2006. Improvedmorpho-phonological sequence processing with constraintsatisfaction inference. In Proceedings of theEighth Meeting of the ACL-SIGPHON at HLT-NAACL,pages 41–49.A. van den Bosch and W. Daelemans. 1998. Do not forget:Full memory in memory-based learning of wordpronunciation. proceedings of NeMLap3/CoNLL98,pages 195–204.R. Zens and H. Ney. 2004. Improvements in phrasebasedstatistical machine translation. In HLT Conf. /NAACL, pages 257–264, Boston, MA, May.

Modeling Letter to Phoneme Conversion as a Phrase Based ...

Create successful ePaper yourself

Delete template?

Save as template?