The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

12.07.2015 Views
14.2 Statistically-derived Class Maps 195predicted. Considering for clarity the bigram 6 case, then given G(.) the language model has theterms w i , w i−1 , G(w i ) and G(w i−1 ) available to it. The probability estimate can be decomposedas follows:P class’ (w i | w i−1 ) = P (w i | G(w i ), G(w i−1 ), w i−1 )× P (G(w i ) | G(w i−1 ), w i−1 ) (14.7)It is assumed that P (w i | G(w i ), G(w i−1 ), w i−1 ) is independent of G(w i−1 ) and w i−1 and thatP (G(w i ) | G(w i−1 ), w i−1 ) is independent of w i−1 , resulting in the model:P class (w i | w i−1 ) = P (w i | G(w i )) × P (G(w i ) | G(w i−1 )) (14.8)Almost all reported class n-gram work using statistically-found classes is based on clusteringalgorithms which optimise G(.) on the basis of bigram training set likelihood, even if the class mapis to be used with longer-context models. It is interesting to note that this approximation appearsto works well, however, suggesting that the class maps found are in some respects “general” andcapture some features of natural language which apply irrespective of the context length used whenfinding these features.14.2 Statistically-derived Class MapsAn obvious question that arises is how to compute or otherwise obtain a class map for use in alanguage model. This section discusses one strategy which has successfully been used.Methods of statistical class map construction seek to maximise the likelihood of the trainingtext given the class model by making iterative controlled changes to an initial class map – in orderto make this problem more computationally feasible they typically use a deterministic map.14.2.1 Word exchange algorithm[Kneser and Ney 1993] 7 describes an algorithm to build a class map by starting from some initialguess at a solution and then iteratively searching for changes to improve the existing class map.This is repeated until some minimum change threshold has been reached or a chosen number ofiterations have been performed. The initial guess at a class map is typically chosen by a simplemethod such as randomly distributing words amongst classes or placing all words in the first classexcept for the most frequent words which are put into singleton classes. Potential moves are thenevaluated and those which increase the likelihood of the training text most are applied to the classmap. The algorithm is described in detail below, and is implemented in the HTK tool Cluster.Let W be the training text list of words (w 1 , w 2 , w 3 , . . .) and let W be the set of all words in W.From equation 14.1 it follows that:P class (W) =∏P class (x | y) C(x,y) (14.9)x,y∈Wwhere (x, y) is some word pair ‘x’ preceded by ‘y’ and C(x, y) is the number of times that the wordpair ‘y x’ occurs in the list W.In general evaluating equation 14.9 will lead to problematically small values, so logarithms canbe used:log P class (W) =∑C(x, y). log P class (x | y) (14.10)x,y∈WGiven the definition of a class n-gram model in equation 14.8, the maximum likelihood bigramprobability estimate of a word is:P class (w i | w i−1 ) =C(w i )C(G(w i )) × C (G(w i), G(w i−1 ))C(G(w i−1 ))(14.11)6 By convention unigram refers to a 1-gram, bigram indicates a 2-gram and trigram is a 3-gram. There is nostandard term for a 4-gram.7 R. Kneser and H. Ney, “Improved Clustering Techniques for Class-Based Statistical Language Modelling”;Proceedings of the European Conference on Speech Communication and Technology 1993, pp. 973-976

14.2 Statistically-derived Class Maps 196where C(w) is the number of times that the word ‘w’ occurs in the list W and C(G(w)) is thenumber of times that the class G(w) occurs in the list resulting from applying G(.) to each entry ofW; 8 similarly C(G(w x ), G(w y )) is the count of the class pair ‘G(w y ) G(w x )’ in that resultant list.Substituting equation 14.11 into equation 14.10 and then rearranging gives:∑( )C(x) C(G(x), G(y))log P class (W) = C(x, y). log×=x,y∈W∑x,y∈W= ∑ x∈WC(x). log( C(x)C(x, y). logC(G(x))C(G(x)) C(G(y)))+ ∑( )C(G(x), G(y))C(x, y). logC(G(y))x,y∈W)+ ∑( ) C(g, h)C(g, h). logC(h)( C(x)C(G(x))g,h∈G= ∑ C(x). log C(x) − ∑ C(x). log C(G(x))x∈Wx∈W+ ∑C(g). log C(g)C(g, h). log C(g, h) − ∑g,h∈G g∈G= ∑ C(x). log C(x) +∑C(g, h). log C(g, h)x∈Wg,h∈G− 2 ∑ g∈GC(g). log C(g) (14.12)where (g, h) is some class sequence ‘h g’.Note that the first of these three terms in the final stage of equation 14.12, “ ∑ x∈W C(x) .log(C(x))”, is independent of the class map function G(.), therefore it is not necessary to considerit when optimising G(.). The value a class map must seek to maximise, F MC , can now be defined:∑F MC =C(g). log C(g) (14.13)g,h∈GC(g, h). log C(g, h) − 2 ∑ g∈GA fixed number of classes must be decided before running the algorithm, which can now beformally defined:1. Initialise: ∀w ∈ W : G(w) = 1Set up the class map so that all words are in the first class and all otherclasses are empty (or initialise using some other scheme)2. Iterate: ∀i ∈ {1 . . . n} ∧ ¬sFor a given number of iterations 1 . . . n or until some stop criterion s isfulfilled(a) Iterate: ∀w ∈ WFor each word w in the vocabularyi. Iterate: ∀c ∈ GFor each class cA. Move word w to class c, remembering its previous classB. Calculate the change in F MC for this moveC. Move word w back to its previous classii. Move word w to the class which increased F MC by the most,or do not move it if no move increased F MCThe initialisation scheme given here in step 1 represents a word unigram language model, makingno assumptions about which words should belong in which class. 9 The algorithm is greedy and so8 That is, C(G(w)) =Px:G(x)=G(w) C(x).9 Given this initialisation, the first (|G| − 1) moves will be to place each word into an empty class, however, sincethe class map which maximises F MC is the one which places each word into a singleton class.

Page 1 and 2: The HTK BookSteve YoungGunnar Everm

Page 3 and 4: II HTK in Depth 454 The Operating E

Page 5 and 6: 12 Networks, Dictionaries and Langu

Page 8 and 9: 17.20HSLab . . . . . . . . . . . .

Page 10 and 11: Part ITutorial Overview1

Page 12 and 13: 1.1 General Principles of HMMs 31.1

Page 14 and 15: 1.2 Isolated Word Recognition 5wher

Page 16 and 17: 1.4 Baum-Welch Re-Estimation 7indep

Page 18 and 19: 1.5 Recognition and Viterbi Decodin

Page 20 and 21: 1.6 Continuous Speech Recognition 1

Page 22 and 23: 1.7 Speaker Adaptation 13N-best whi

Page 24 and 25: 2.2 Generic Properties of a HTK Too

Page 26 and 27: 2.3 The Toolkit 17HLEHLSDTATSTransc

Page 28 and 29: 2.4 Whats New In Version 3.2 19full

Page 30 and 31: 2.4 Whats New In Version 3.2 212.4.

Page 32 and 33: 3.1 Data Preparation 233.1 Data Pre

Page 34 and 35: 3.1 Data Preparation 25S0001 ONE VA

Page 36 and 37: 3.1 Data Preparation 273.1.4 Step 4

Page 38 and 39: 3.1 Data Preparation 29TIMITPrompts

Page 40 and 41: 3.2 Creating Monophone HMMs 31 5 2

Page 42 and 43: 3.2 Creating Monophone HMMs 33and i

Page 44 and 45: 3.3 Creating Tied-State Triphones 3



Page 50 and 51: 3.5 Running the Recogniser Live 41W

Page 52 and 53: 3.6 Adapting the HMMs 43would produ

Page 54 and 55: Part IIHTK in Depth45

Page 56 and 57: 4.1 The Command Line 474.1 The Comm

Page 58 and 59: 4.3 Configuration Files 49Configura

Page 60 and 61: 4.6 Strings and Names 514.6 Strings

Page 62: 4.8 Input/Output via Pipes and Netw

Page 65 and 66: Chapter 5Speech Input/OutputMany to

Page 67 and 68: 5.2 Speech Signal Processing 58then

Page 69 and 70: 5.3 Linear Prediction Analysis 60Ce

Page 71 and 72: 5.5 Vocal Tract Length Normalisatio

Page 73 and 74: 5.7 Perceptual Linear Prediction 64

Page 75 and 76: 5.10 Storage of Parameter Files 66w

Page 77 and 78: 5.11 Waveform File Formats 685.10.2

Page 79 and 80: 5.11 Waveform File Formats 705.11.5

Page 81 and 82: 5.12 Direct Audio Input/Output 72Va

Page 83 and 84: 5.14 Vector Quantisation 74LPC_ELPC

Page 85 and 86: 5.15 Viewing Speech with HList 76in

Page 87 and 88: 5.16 Copying and Coding using HCopy

Page 89 and 90: 5.18 Summary 805.18 SummaryThis sec

Page 91 and 92: 5.18 Summary 82Module Name Default

Page 93 and 94: 6.2 Label File Formats 84Orthogonal

Page 95 and 96: 6.3 Master Label Files 866.3 Master

Page 97 and 98: 6.3 Master Label Files 88search wit

Page 99 and 100: 6.4 Editing Label Files 90These few

Page 101 and 102: 6.4 Editing Label Files 92DC V iy a

Page 103 and 104: Chapter 7HMM Definition FilesSpeech

Page 105 and 106: 7.2 Basic HMM Definitions 96• opt

Page 107 and 108: 7.2 Basic HMM Definitions 98∼h

Page 109 and 110: 7.3 Macro Definitions 100∼o 4 2

Page 111 and 112: 7.3 Macro Definitions 102Once defin

Page 113 and 114: 7.4 HMM Sets 104∼o 4 ∼s “sta

Page 115 and 116: 7.4 HMM Sets 106HTool -H mf1 -H mf2

Page 117 and 118: 7.6 Discrete Probability HMMs 108

Page 119 and 120: 7.9 Regression Class Trees for Adap

Page 121 and 122: 7.11 The HMM Definition Language 11

Page 123 and 124: 7.11 The HMM Definition Language 11

Page 125 and 126: Chapter 8HMM Parameter Estimationth

Page 127 and 128: 8.1 Training Strategies 118Labelled

Page 129 and 130: 8.2 Initialisation using HInit 120P

Page 131 and 132: 8.3 Flat Starting with HCompV 122da

Page 133 and 134: 8.4 Isolated Unit Re-Estimation usi

Page 135 and 136: 8.5 Embedded Training using HERest

Page 137 and 138: 8.6 Single-Pass Retraining 1288.6 S

Page 139 and 140: 8.8 Parameter Re-Estimation Formula

Page 141 and 142: 8.8 Parameter Re-Estimation Formula

Page 143 and 144: Chapter 9HMM AdaptationSpeaker Inde

Page 145 and 146: 9.1 Model Adaptation using MLLR 136

Page 147 and 148: 9.2 Model Adaptation using MAP 138

Page 149 and 150: 9.3 Using HEAdapt 1409.3 Using HEAd

Page 151 and 152: 9.4 MLLR Formulae 142Mthe model set

Page 153 and 154: Chapter 10HMM System RefinementHHED

Page 155 and 156: 10.3 Parameter Tying and Item Lists

Page 157 and 158: 10.4 Data-Driven Clustering 148When

Page 159 and 160: 10.5 Tree-Based Clustering 150TC 10

Page 161 and 162: 10.6 Mixture Incrementing 152QS "L_

Page 163 and 164: 10.8 Miscellaneous Operations 154

Page 165 and 166: 11.2 Using Discrete Models with Spe

Page 167 and 168: 11.3 Tied Mixture Systems 158Speech

Page 169 and 170: 11.4 Parameter Smoothing 160used, t

Page 171 and 172: 12.1 How Networks are Used 162HBuil

Page 173 and 174: 12.2 Word Networks and Standard Lat

Page 175 and 176: 12.3 Building a Word Network with H

Page 177 and 178: 12.4 Bigram Language Models 168wher

Page 179 and 180: 12.6 Testing a Word Network using H

Page 181 and 182: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡

Page 183 and 184: 12.8 Word Network Expansion 174(a)

Page 185 and 186: 12.9 Other Kinds of Recognition Sys

Page 187 and 188: Chapter 13Decoding?silonetwo...zero

Page 189 and 190: 13.2 Decoder Organisation 180When u

Page 191 and 192: 13.3 Recognition using Test Databas

Page 193 and 194: 13.4 Evaluating Recognition Results

Page 195 and 196: 13.5 Generating Forced Alignments 1

Page 197 and 198: 13.7 Recognition using Direct Audio

Page 199 and 200: 13.8 N-Best Lists and Lattices 190V

Page 201 and 202: Chapter 14Fundamentals of languagem

Page 203: 14.1 n-gram language models 194prob

Page 207 and 208: 14.3 Robust model estimation 198Dis

Page 209 and 210: 14.5 Overview of n-Gram Constructio

Page 211 and 212: 14.6 Class-Based Language Models 20

Page 213 and 214: 15.1 Database preparation 204you ma

Page 215 and 216: 15.2 Mapping OOV words 20615.2 Mapp

Page 217 and 218: 15.4 Testing the LM perplexity 2080

Page 219 and 220: 15.6 Model interpolation 210may set

Page 221 and 222: 15.7 Class-based models 212together

Page 223 and 224: 15.8 Problem solving 214$ LPlex -u

Page 225 and 226: Chapter 16Language Modelling Refere

Page 227 and 228: 16.4 Class Map Files 218The first t

Page 229 and 230: 16.5 Gram Files 220Notice that the

Page 231 and 232: 16.7 Word LM file formats 22216.7.1

Page 233 and 234: 16.8 Class LM file formats 22416.8.

Page 235 and 236: 16.9 Language modelling tracing 226

Page 237 and 238: 16.11 Compile-time configuration pa

Page 239 and 240: Part IVReference Section230

Page 241 and 242: 17.1 Cluster 23217.1 Cluster17.1.1

Page 243 and 244: 17.1 Cluster 23417.1.3 TracingClust

Page 245 and 246: 17.2 HBuild 23617.2.3 TracingHBuild

Page 247 and 248: 17.3 HCompV 238-l s The string s mu

Page 249 and 250: 17.4 HCopy 240-t n Set the line wid

Page 251 and 252: 17.5 HDMan 24217.5 HDMan17.5.1 Func

Page 253 and 254: 17.5 HDMan 24417.5.3 TracingHDMan s

Page 255 and 256: 17.6 HEAdapt 246-m f Set the minimu

Page 257 and 258: 17.7 HERest 24817.7 HERest17.7.1 Fu

Page 259 and 260: 17.7 HERest 250-w f Any mixture wei

Page 261 and 262: 17.8 HHEd 252For example,stateComp

Page 263 and 264: 17.8 HHEd 254FA varscaleComputes an

Page 265 and 266: 17.8 HHEd 256where V s is the dimen

Page 267 and 268: 17.8 HHEd 258mixture(m) all mixture

Page 269 and 270: 17.9 HInit 26017.9 HInit17.9.1 Func

Page 271 and 272: 17.10 HLEd 26217.10 HLEd17.10.1 Fun

Page 273 and 274: 17.10 HLEd 26417.10.3 TracingHLEd s

Page 275 and 276: 17.12 HLMCopy 26617.12 HLMCopy17.12

Page 277 and 278: 17.13 HLRescore 268-p f Set the wor

Page 279 and 280: 17.14 HLStats 27017.14.3 UseHLStats

Page 281 and 282: 17.15 HParse 272Note that C style c

Page 283 and 284: 17.15 HParse 274HParse will then re

Page 285 and 286: 17.16 HQuant 276where vqFile is the

Page 287 and 288: 17.17 HRest 278-v f This sets the m

Page 289 and 290: 17.18 HResults 280WORD: %Corr=63.91

Page 291 and 292: 17.18 HResults 28217.18.3 TracingHR

Page 293 and 294: 17.20 HSLab 28417.20 HSLab17.20.1 F

Page 295 and 296: 17.20 HSLab 286Load Load a speech d

Page 297 and 298: 17.21 HSmooth 28817.21 HSmooth17.21

Page 299 and 300: 17.22 HVite 29017.22 HVite17.22.1 F

Page 301 and 302: 17.22 HVite 292-v f Enable word end

Page 303 and 304: 17.23 LAdapt 29417.23.3 TracingLAda

Page 305 and 306: 17.25 LFoF 29617.25 LFoF17.25.1 Fun

Page 307 and 308: 17.26 LGCopy 298-r s Set the root n

Page 309 and 310: 17.28 LGPrep 30017.28 LGPrep17.28.1

Page 311 and 312: 17.28 LGPrep 302-r s Set the root n

Page 313 and 314: 17.30 LMerge 30417.30 LMerge17.30.1

Page 315 and 316: 17.32 LNorm 30617.32 LNorm17.32.1 F

Page 317 and 318: 17.33 LPlex 30817.33.3 TracingLPlex

Page 319 and 320: Chapter 18Configuration VariablesTh

Page 321 and 322: 18.1 Configuration Variables used i

Page 323 and 324: 18.2 Configuration Variables used i

Page 325 and 326: 19.1 Generic Errors 316HCopy 1000-1

Page 327 and 328: 19.2 Summary of Errors by Tool and








Page 343 and 344: Chapter 20HTK Standard Lattice Form

Page 345 and 346: 20.4 Field Types 336segments = ":"

Page 347 and 348: 20.5 Example SLF file 338J=24 S=19

Page 349 and 350: Index 340default, 48format, 49types

Page 351 and 352: Index 342SILMARGIN, 72SILSEQCOUNT,

Page 353 and 354: Index 344output lattice format, 189

Page 355: Index 346USEHAMMING, 59USEPOWER, 62

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ... ... View more The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...

Delete template?

Save as template ?

The HTK Book Steve Young Gunnar Evermann Dan Kershaw ... The HTK Book Steve Young Gunnar Evermann Dan Kershaw ...