12.07.2015 Views

Finite State Transducer mechanisms in speech recognition

Finite State Transducer mechanisms in speech recognition

Finite State Transducer mechanisms in speech recognition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Spoiler◮ “New” part of this talk is: a “perfect” lattice generationalgorithm.◮ Informally, when we generate a lattice for an utterance, wewant it to:◮ Conta<strong>in</strong> all sufficiently likely word sequences◮ Have the “correct” alignments and likelihoods for suchsequences◮ Not be too large due to too-unlikely sequences or duplicatealignments◮ Current lattice generation algorithms generally fall <strong>in</strong>to twocategories:◮ Store one traceback for each state → quick, but approximate.◮ Store N tracebacks → fairly exact, but slow.◮ Our method stores one traceback, but is exact.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Introduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)A less trivial FSAaa0 1b2◮ The previous example doesn’t show the power of FSAsbecause we could represent the set of str<strong>in</strong>gs f<strong>in</strong>itely.◮ This example represents the <strong>in</strong>f<strong>in</strong>ite set {ab,aab,aaab,...}◮ Note: a str<strong>in</strong>g is “accepted” (<strong>in</strong>cluded <strong>in</strong> the set) if:◮ There is a path with that sequence of symbols on it.◮ That path is “successful’ (starts at an <strong>in</strong>itial state, ends at af<strong>in</strong>al state).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)The ǫ (epsilon) symbola0 1◮ The symbol ǫ has a special mean<strong>in</strong>g <strong>in</strong> FSAs (and FSTs)◮ It means “no symbol is there”.◮ This example represents the set of str<strong>in</strong>gs {a,aa,aaa,...}◮ If ǫ were treated as a normal symbol, this would be{a,aǫa,aǫaǫa,...}◮ In text form, ǫ is sometimes written as ◮ Toolkits implement<strong>in</strong>g FSAs/FSTs generally assume isthe symbol numbered zeroDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Determ<strong>in</strong>istic FSAs0aa13bc24a0 1bc23◮ The right hand FSA above is determ<strong>in</strong>istic◮ Means: no state has two outgo<strong>in</strong>g arcs with the same label◮ The classical def<strong>in</strong>ition of “determ<strong>in</strong>istic” also forbids ǫ arcs.◮ There is a def<strong>in</strong>ition that allows ǫ (Mohri/AT&T/OpenFst).◮ Work<strong>in</strong>g out whether a given str<strong>in</strong>g (e.g. ab) is accepted ismore efficient <strong>in</strong> determ<strong>in</strong>istic FSAs.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Determ<strong>in</strong>iz<strong>in</strong>g FSAs<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)0aa13bc24a0 1bc23◮ There is a fairly easy algorithm to determ<strong>in</strong>ize FSAs◮ Each state <strong>in</strong> the determ<strong>in</strong>ized FSA corresponds to a set ofstates <strong>in</strong> the orig<strong>in</strong>al.◮ The algorithm starts from the start state and uses a queue...◮ In this example, states 1 and 3 on get comb<strong>in</strong>ed as state 1.◮ The classical approach has to treat ǫ specially.◮ (Mohri’s version simplifies th<strong>in</strong>gs, mak<strong>in</strong>g ǫ removal aseparate stage).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)M<strong>in</strong>imial determ<strong>in</strong>istic FSAsa0 1bc23dd45a0 1bc23dd4◮ Here, the left FSA is not m<strong>in</strong>imal but the right one is.◮ “M<strong>in</strong>imal” is normally only applied to determ<strong>in</strong>istic FSAs.◮ Th<strong>in</strong>k of it as suffix shar<strong>in</strong>g, or comb<strong>in</strong><strong>in</strong>g redundant states.◮ It’s useful to save space (but not as crucial as determ<strong>in</strong>ization,for ASR).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)M<strong>in</strong>imization of FSAs◮ Common m<strong>in</strong>imization algorithm partitions states <strong>in</strong>to sets.◮ Start out with a partition of just two sets (f<strong>in</strong>al/nonf<strong>in</strong>al).◮ Keep splitt<strong>in</strong>g sets until we can’t split any more.◮ The sets are the states <strong>in</strong> the m<strong>in</strong>imal FSA.◮ Set splitt<strong>in</strong>g is based on yes/no criteria like “does a state havea transition with symbol a to a member of set s?”Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Other algorithms for FSAs◮ Equality and equivalence test<strong>in</strong>g◮ Revers<strong>in</strong>g (like revers<strong>in</strong>g the arrows)◮ Trimm<strong>in</strong>g (remov<strong>in</strong>g unreachable states)◮ Union, difference (view these <strong>in</strong> terms of the “set of str<strong>in</strong>gs”)◮ Concatenation (of str<strong>in</strong>gs accepted by two FSAs)◮ Closure (allow<strong>in</strong>g arbitrary repetitions of the accepted str<strong>in</strong>gs)◮ Epsilon removalDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)FSAs/FSTs and testability◮ Many useful FSA/FST operations preserve equivalence (e.g.determ<strong>in</strong>ization, m<strong>in</strong>imization)◮ We can test equivalence◮ Useful to test correctness of algorithms like determ<strong>in</strong>ization,m<strong>in</strong>imization.◮ E.g.: generate random FSA; determ<strong>in</strong>ize; check determ<strong>in</strong>istic;check equivalent.◮ This is one of the advantages of the FST framework◮ You could hand-code algorithms to solve your specificproblems (e.g. decod<strong>in</strong>g), but could you test them?Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors: normal casea/10 1b/12/1◮ Like a normal FSA but with costs on the arcs and f<strong>in</strong>al-states◮ Note: cost comes after “/”. For f<strong>in</strong>al-state, “2/1” meansf<strong>in</strong>al-cost 1 on state 2.◮ View WFSA as a function from a str<strong>in</strong>g to a cost.◮ In this view, unweighted FSA is f : str<strong>in</strong>g → {0, ∞}.◮ If multiple paths have the same str<strong>in</strong>g, take the one with thelowest cost.◮ This example maps ab to (3 = 1 + 1 + 1), all else to ∞.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Semir<strong>in</strong>gs<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)◮ The semir<strong>in</strong>g concept makes WFSAs more general.◮ A semir<strong>in</strong>g is◮ A set of elements (e.g. R)◮ Two special elements ¯1 and ¯0 (the identity element and zero)◮ Two operations, ⊕ (plus) and × (times) satsify<strong>in</strong>g certa<strong>in</strong>axioms.◮ Examples:◮ The reals, with ⊕ and ⊗ def<strong>in</strong>ed as the normal + and ×,¯1 = 1, ¯0 = 0.◮ The <strong>in</strong>tegers, as above.◮ The nonnegative reals, as above.◮ The nonnegative reals as above, except withx ⊕ y = max(x, y).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Semir<strong>in</strong>gs and Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)◮ The normal semir<strong>in</strong>gs used <strong>in</strong> ASR applications are◮ The tropical semir<strong>in</strong>g, which is the simplest case (⊕ means:take the m<strong>in</strong>imum cost; × means, add the costs)◮ The log semir<strong>in</strong>g, which is equivalent to the nonnegative reals,except represent<strong>in</strong>g them as negative log.◮ There are also “fancier” semir<strong>in</strong>gs used (e.g.) <strong>in</strong>determ<strong>in</strong>ization, where the weight conta<strong>in</strong>s a str<strong>in</strong>gcomponent.◮ In WFSAs, weights are ⊗-multiplied along paths.◮ Weights are ⊕-summed over paths with identicalsymbol-sequences.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Costs versus weights: term<strong>in</strong>ology<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)0a/0b/21/02c/13/1◮ Personally I use “cost” to refer to the numeric value, and“weight” when speak<strong>in</strong>g abstractly, e.g.:◮ The acceptor above accepts a with unit weight.◮ It accepts a with zero cost.◮ It accepts bc with cost 4 = 2 + 1 + 1◮ <strong>State</strong> 1 is f<strong>in</strong>al with unit weight.◮ The acceptor assigns zero weight to xyz.◮ It assigns <strong>in</strong>f<strong>in</strong>ite cost to xyz.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Function <strong>in</strong>terpretation of WFSAs<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)b/1a/01/2d/13/10a/12b/24/2◮ Consider the WFSA above, with the tropical (“Viterbi-like”)semir<strong>in</strong>g. Take the str<strong>in</strong>g ab.◮ We “multiply” (⊗) the weights along paths; this meansadd<strong>in</strong>g the costs.◮ Two paths for ab:◮ One goes through states (0, 1, 1); cost is (0 + 1 + 2) = 3◮ One goes through states (0, 2, 3); cost is (1 + 2 + 2) = 5◮ We add weights across different paths; tropical ⊕ is “take m<strong>in</strong>cost” → this WFSA maps ab to 3Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Equivalence of WFSAs<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)b/1b/1a/00 1/1a/10 1/0◮ Two WFSAs are equivalent if they represent the samefunction from (str<strong>in</strong>g) to (weight).◮ For example, the above two WFSAs are equivalent (but notequal).◮ Easy to test only if we can determ<strong>in</strong>ize.◮ Otherwise there are randomized algorithms (pick randomstr<strong>in</strong>g from one; make sure it has same weight <strong>in</strong> both).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Determ<strong>in</strong>ization of Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors◮ Determ<strong>in</strong>ization is also applicable to WFSAs.◮ In algorithm, (subset of states) → (weighted subset of states).◮ Not all WFSAs are determ<strong>in</strong>izable◮ If you try to determ<strong>in</strong>ize a nondeterm<strong>in</strong>izable WFSA:◮ There will be an <strong>in</strong>f<strong>in</strong>ite sequence of determ<strong>in</strong>ized-states withthe same subset of states but different weights◮ Determ<strong>in</strong>ization will fail to term<strong>in</strong>ate; eventually memory willbe exhausted.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)The tw<strong>in</strong>s propertyb/1a/11x/13/10a/1b/22y/14/1◮ A WFSA is determ<strong>in</strong>izable if it has the “tw<strong>in</strong>s property”...◮ (that “if” is “iff” subject to certa<strong>in</strong> extra conditions).◮ A WFSA (as above) fails to have the tw<strong>in</strong>s property 1 if:◮ There are two states s and t...◮ that are reachable from the start state with the same str<strong>in</strong>g...◮ and there are cycles at both s and t...◮ with the same str<strong>in</strong>g on them but different weights.1 We are gloss<strong>in</strong>g over some details hereDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Other algorithms on Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors◮ Shortest-path and shortest-distance algorithms, e.g.:◮ Total weight of FSA, summed over all “successful” paths, i.e.from <strong>in</strong>itial to f<strong>in</strong>al.◮ Total weight for all paths to f<strong>in</strong>al-state, start<strong>in</strong>g at each state◮ Best path through FSA◮ N-best paths through FSA◮ We can also apply most unweighted FSA algorithms to theweighted case.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)a:x/10 1/0◮ Like a WFSA except with two labels on each arc.◮ View it as a function from a (pair of str<strong>in</strong>gs) to a weight◮ This one maps (a,x) to 1 and all else to ∞◮ Note: view 1 and ∞ as costs. ∞ is ¯0 <strong>in</strong> semir<strong>in</strong>g.◮ Symbols on the left and right are termed “<strong>in</strong>put” and“output” symbols.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Equivalence of WFSTsa:/1a:x/1 0 10 1/0:x/02/0◮ WFSTs are equivalent if they represent same function from(str<strong>in</strong>g, str<strong>in</strong>g) to weight.◮ Both these WFSTs map (a,x) to 1 and all else to ∞◮ Therefore they are equivalent (although not equal)Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>WFSTs versus matrices<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)a:x/20 1/0b:y/2b:/32/0x xya 2 ∞ab 5 4◮ WFSTs can be viewed as function from (<strong>in</strong>put str<strong>in</strong>g, outputstr<strong>in</strong>g) to (weight).◮ Matrices (over reals) can be viewed as function from (row<strong>in</strong>dex, column <strong>in</strong>dex) to (real).◮ Note: the set of possible <strong>in</strong>put and output str<strong>in</strong>gs is <strong>in</strong>f<strong>in</strong>ite.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Functional WFSTsa:x/11:y/12/0a:x/11:y/12/00b:z/13/00a:z/13/0◮ A WFST is functional iff each <strong>in</strong>put str<strong>in</strong>g has a non-¯0 weightwith at most one output str<strong>in</strong>g.◮ Left transducer is functional: maps a to xy with cost 2, and bto z with cost 1.◮ The right transducer is non-functional:◮ The str<strong>in</strong>g a has a nonzero weight with two str<strong>in</strong>gs.◮ I.e. (a, xy) and (a, z) both appear on successful paths.◮ WFST can be <strong>in</strong>terpreted as function from <strong>in</strong>put to (output,weight) only if functional.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Inversion of WFSTsa:x/10 1/0x:a/10 1/0◮ Inversion just means swapp<strong>in</strong>g the <strong>in</strong>put and output symbols.◮ If A if WFST, we write its <strong>in</strong>verse as A −1 .◮ Only vaguely analogous to function <strong>in</strong>version ...◮ In matrix analogy, it’s transposition.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Projection of WFSTsa:x/10 1/0b:y/1a:a/10 1/0b:b/1◮ Projection on the <strong>in</strong>put means copy<strong>in</strong>g <strong>in</strong>put symbols tooutput, so both are identical.◮ Projection on the output means copy<strong>in</strong>g the output symbol to<strong>in</strong>put.◮ In FST toolkits like OpenFst, if <strong>in</strong>put and output symbols areidentical the WFST is termed an acceptor (WFSA) and canbe treated as one.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Composition of WFSTsA B Ca:x/10 1/0b:x/1x:y/10 1/0a:y/20 1/0b:y/2◮ Notation: C = A ◦ B means, C is A composed with B.◮ In special cases, composition is similar to function composition◮ Composition algorithm “matches up” the “<strong>in</strong>ner symbols”◮ i.e. those on the output (right) of A and <strong>in</strong>put (left) of BDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Composition of WFSTs vs matrix multiplication◮ Composition of WFSTs corresponds exactly with matrixmultiplication.◮ Let s, t and u be str<strong>in</strong>gs, and w be a weight.◮ Interpret A,B,C as functions from (str<strong>in</strong>g, str<strong>in</strong>g) to (weight).◮ Then C(s,u) = ⊕ t A(s,t) ⊗ B(t,u) (sum over and discardcentral str<strong>in</strong>g t).◮ Matrix multiplication follows the same pattern (sum overcentral <strong>in</strong>dex).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Composition of WFSTs (algorithm)◮ Ignor<strong>in</strong>g ǫ symbols, algorithm is quite simple.◮ <strong>State</strong>s <strong>in</strong> C correspond to tuples of (state <strong>in</strong> A, state <strong>in</strong> B).◮ But some of these may be <strong>in</strong>accessible and pruned away.◮ Ma<strong>in</strong>ta<strong>in</strong> queue of pairs, <strong>in</strong>itially the s<strong>in</strong>gle pair (0,0) (startstates).◮ When process<strong>in</strong>g a pair (s,t):◮ Consider each pair of (arc a from s), (arc b from t).◮ If these have match<strong>in</strong>g symbols (output of a, <strong>in</strong>put of b):◮ Create transition to state <strong>in</strong> C correspond<strong>in</strong>g to (next-state ofa, next-state of b)◮ If not seen before, add this pair to queue.◮ With ǫ <strong>in</strong>volved, need to be careful to avoid redundant paths...Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong><strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)Determ<strong>in</strong>istic WFSTsa:x/10 1/0b:x/1◮ Taken to mean “determ<strong>in</strong>istic on the <strong>in</strong>put symbol”◮ I.e., no state can have > 1 arc out of it with the same <strong>in</strong>putsymbol.◮ Some <strong>in</strong>terpretations (e.g. Mohri/AT&T/OpenFst) allow ǫ<strong>in</strong>put symbols (i.e. be<strong>in</strong>g ǫ-free is a separate issue).◮ I prefer a def<strong>in</strong>ition that disallows epsilons, except asnecessary to encode a str<strong>in</strong>g of output symbols on an arc.◮ Regardless of def<strong>in</strong>ition, not all WFSTs can be determ<strong>in</strong>ized.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Determ<strong>in</strong>iz<strong>in</strong>g WFSTs<strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (FSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> Acceptors (WFSAs)Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)◮ Easiest to view the output symbol as part of the weight anddeterm<strong>in</strong>ize as WFSA.◮ Encode the WFST as a WFSA <strong>in</strong> an appropriate semir<strong>in</strong>g thatconta<strong>in</strong>s the weight and str<strong>in</strong>g.◮ Do WFSA determ<strong>in</strong>ization (and hope it succeeds).◮ Two basic failure modes:◮ Fails because <strong>in</strong>put FST was non-functional (multiple outputsfor one <strong>in</strong>put)◮ Fails because WFSA fails to have tw<strong>in</strong>s property ... i.e. hascycles with same <strong>in</strong>put-symbol seq, different weights.◮ Can also fail if there are paths with more output than <strong>in</strong>putsymbols and you don’t want to <strong>in</strong>troduce ǫ-<strong>in</strong>put arcs (I allowthese as a special case).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationSpeech <strong>recognition</strong> application of WFSTs◮ HCLG ≡ H ◦ C ◦ L ◦ G is the <strong>recognition</strong> graph◮ G is the grammar or LM (an acceptor)◮ L is the lexicon◮ C adds phonetic context-dependency◮ H specifies the HMM structure of context-dependent phonesInput OutputH p.d.f. labels ctx-dep. phoneC ctx-dep phone phoneL phone wordG word wordDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationDecod<strong>in</strong>g graph construction (simple version)◮ Create H, C, L, G separately◮ Compose them together◮ Determ<strong>in</strong>ize (like mak<strong>in</strong>g a tree-structured lexicon)◮ M<strong>in</strong>imize (like suffix shar<strong>in</strong>g)Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationDecod<strong>in</strong>g graph construction (complexities)◮ Have to do th<strong>in</strong>gs <strong>in</strong> a careful order or algorithms “blow up”◮ Determ<strong>in</strong>ization for WFSTs can fail◮ Need to <strong>in</strong>sert “disambiguation symbols” <strong>in</strong>to the lexicon.◮ Need to “propagate these through” H and C.◮ Need to make sure f<strong>in</strong>al HCLG is stochastic, for optimalprun<strong>in</strong>g performance.◮ I.e. sums to one, like a properly normalized HMM.◮ Standard algorithm to do this (weight-push<strong>in</strong>g) can fail◮ ... because FST representation of backoff LMs isnon-stochastic◮ Also we can’t always recover the phone sequence from paththrough HCLG (but see later).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationDecod<strong>in</strong>g graph construction (our approach)◮ Mostly followed quite closely the AT&T recipe.◮ Added a disambiguation symbol for backoff arcs <strong>in</strong> the LMFST◮ Partly for reasons of taste; not comfortable with treat<strong>in</strong>g ǫ as“real symbol”.◮ It’s actually necessary because we use a determ<strong>in</strong>izationalgorithm that does ǫ removal.◮ Different approach to “weight push<strong>in</strong>g”– see next slide.◮ Put special symbols on <strong>in</strong>put of HCLG that encode more thanthe p.d.f.– see below.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationOur approach to weight push<strong>in</strong>g◮ Want to ensure that HCLG can be <strong>in</strong>terpreted as properlynormalized HMM (“stochastic”).◮ AT&T recipe does this as a f<strong>in</strong>al “weight push<strong>in</strong>g” step.◮ This can fail for backoff LMs, or lead to weird results.◮ Our approach is to ensure that if G is stochastic, each step ofgraph creation keeps it stochastic.◮ This is done <strong>in</strong> such a way that where G is “nearly”stochastic, HCLG will be “nearly” stochastic.◮ Requires do<strong>in</strong>g some algorithms (e.g. determ<strong>in</strong>ization) <strong>in</strong> logsemir<strong>in</strong>g◮ Some algorithms (e.g. epsilon removal), need to “know about”two semir<strong>in</strong>gs at once.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationSymbols on <strong>in</strong>put of HCLG – the problem◮ Normally, the symbols on the <strong>in</strong>put of H (or HCLG) representp.d.f.’s◮ The symbol would actually be the <strong>in</strong>teger identifier of thep.d.f., with range e.g. 5k if there are 5k p.d.f.’s.◮ The decoder will give you an <strong>in</strong>put-symbol str<strong>in</strong>g I, anoutput-symbol str<strong>in</strong>g O, and a weight w.◮ We want I (<strong>in</strong>put-symbol seq). to give enough <strong>in</strong>formation to◮ tra<strong>in</strong> transition models◮ work out the phone sequence◮ reconstruct the path through the HMM.◮ We don’t want to have to assume different phones havedist<strong>in</strong>ct p.d.f.’s.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationSymbols on <strong>in</strong>put of HCLG – our solution◮ Make the symbols on HCLG a slightly a f<strong>in</strong>er-gra<strong>in</strong>edidentifier.◮ We call this a “transition id”.◮ From a transition id, we can work out◮ The p.d.f. <strong>in</strong>dex◮ The phone◮ The transition arc that was taken with<strong>in</strong> the prototype HMMtopology◮ There would typically be about twice as many transition-ids asp.d.f. id’s.◮ Doesn’t expand the graph size, <strong>in</strong> normal configurations.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationDecod<strong>in</strong>g characteristics◮ For a simple, medium-vocab task (Resource Management),about 20 x faster than real time.◮ We can decode Wall Street Journal read <strong>speech</strong> <strong>in</strong> about realtime without significant loss <strong>in</strong> accuracy.◮ This is with a pruned trigram language model– about 0.8Marcs.◮ F<strong>in</strong>al HCLG takes about 0.5G (gigabytes) <strong>in</strong> default OpenFstformat.◮ Can’t go much larger or memory for graph compilationbecomes a problem.◮ Note: our HCLG currently has self-loops.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Ways to decode with bigger LMsThe “standard” recipeThe Kaldi recipeLattice generation◮ Lattice rescor<strong>in</strong>g◮ We can already do this (see lattice generation, below).◮ Store HCLG <strong>in</strong> a more compact form◮ E.g. remove self-loops (decoder would add them)◮ Could then “factor” FST by compactly stor<strong>in</strong>g l<strong>in</strong>ear seqs ofstates.◮ Could decrease f<strong>in</strong>al graph size 10-fold but graph creationwould still require a lot of memory.◮ Onl<strong>in</strong>e “correction” of LM scores <strong>in</strong> HCLG◮ Idea is to add the difference <strong>in</strong> LM score between (small, big)LM when we cross a word◮ Already done prelim<strong>in</strong>ary work (w/ Gilles Boulianne)◮ OpenFst’s “fast lookahead matcher”– need to <strong>in</strong>vestigate this.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationLattice generation◮ From now, will describe how lattice generation <strong>in</strong> Kaldi works.◮ Up to now, we have described standard concepts (with a fewm<strong>in</strong>or twists)◮ Here starts the “novel” part.◮ We will beg<strong>in</strong> with some prelim<strong>in</strong>aries before describ<strong>in</strong>g thealgorithm.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>What is a lattice?The “standard” recipeThe Kaldi recipeLattice generation◮ The word “lattice” is used <strong>in</strong> the ASR literature as:◮ Some k<strong>in</strong>d of compact representation of the alternate wordhypotheses for an utterance.◮ Like an N-best list but with less redundancy.◮ Usually has time <strong>in</strong>formation, sometimes state or wordalignment <strong>in</strong>formation.◮ Generally a directed acyclic graph with only one start node andonly one end node.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationWFST view of a <strong>speech</strong> <strong>recognition</strong> decoder1/4.861/4.163/5.1602/4.9413/5.314/5.912/5.443/6.314/5.0224/8.531/6.022/6.473/0◮ First– a “WFST def<strong>in</strong>ition” of the decod<strong>in</strong>g problem.◮ Let U be an FST that encodes the acoustic scores of anutterance (as above).◮ Let S = U ◦ HCLG be called the search graph for anutterance.◮ Note: if U has N frames (3, above), then◮ #states <strong>in</strong> S is ≤ (N + 1) times #states <strong>in</strong> HCLG.◮ Like N + 1 copies of HCLG.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationPrun<strong>in</strong>g <strong>in</strong> a <strong>speech</strong> <strong>recognition</strong> decoder◮ With beam prun<strong>in</strong>g, we search a subgraph of S.◮ The set of “active states” on all frames, with arcs l<strong>in</strong>k<strong>in</strong>gthem as appropriate, is a subgraph of S.◮ Let this be called the beam-pruned subgraph of S; call it B.◮ A standard <strong>speech</strong> <strong>recognition</strong> decoder f<strong>in</strong>ds the best paththrough B.◮ In our case, the output of the decoder is a l<strong>in</strong>ear WFST thatconsists of this best path.◮ This conta<strong>in</strong>s the follow<strong>in</strong>g useful <strong>in</strong>formation:◮ The word sequence, as output symbols.◮ The state alignment, as <strong>in</strong>put symbols.◮ The cost of the path, as the total weight.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationA lattice – our format◮ We let a lattice be an FST, similar to HCLG.◮ The <strong>in</strong>put symbols will be the p.d.f.’s (actually, transition-ids).◮ The output symbols will be words.◮ The weights will conta<strong>in</strong> the acoustic costs plus the costs <strong>in</strong>HCLG◮ ... well, actually we use a semir<strong>in</strong>g that conta<strong>in</strong>s two floats:◮ Keeps track of (graph, acoustic) costs separately whilebehav<strong>in</strong>g as if they were added together.◮ We also have an “acceptor” form of the lattice. Here,◮ Words appear on both the <strong>in</strong>put and output side.◮ The p.d.f.’s/transition-ids are encoded as part of the weight.◮ This format is more compact, and sometimes more convenient.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationOur statement of the lattice generation problem◮ Let the lattice beam be α (e.g. α = 8).◮ No miss<strong>in</strong>g paths:◮ Every word sequence whose best path <strong>in</strong> B is with<strong>in</strong> α of theoverall best path, should be present <strong>in</strong> L◮ No extra paths:◮ No path should be <strong>in</strong> L if there is not an equivalent path <strong>in</strong> B.◮ No duplicate paths:◮ No two paths <strong>in</strong> L should have the same word sequence.◮ Accurate paths:◮ Each path through L should be equivalent to the best paththrough B with the same word sequence.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationOur solution to the lattice generation problem◮ Dur<strong>in</strong>g decod<strong>in</strong>g, generate B (beam-pruned subgraph of S).◮ Prune B with beam α.◮ I.e. prune all arcs not on path with<strong>in</strong> α of best path.◮ Actually we do this <strong>in</strong> an onl<strong>in</strong>e way to avoid memory bloat.◮ This is not the same as the beam prun<strong>in</strong>g used <strong>in</strong> search.◮ Let this pruned version of B be called P◮ Do a special form of determ<strong>in</strong>ization on P (next slide)◮ Prune the result with beam αDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Lattice determ<strong>in</strong>izationThe “standard” recipeThe Kaldi recipeLattice generation◮ Special determ<strong>in</strong>ization algorithm we call “latticedeterm<strong>in</strong>ization”◮ This is not really a determ<strong>in</strong>ization algorithm <strong>in</strong> a WFSTsense, as it does not preserve equivalence.◮ It keeps only the lowest-cost output-symbol sequence for each<strong>in</strong>put-symbol sequence◮ Note: we’d apply it after <strong>in</strong>version, so words are on <strong>in</strong>put.◮ Conceptual view (not how we really do it):◮ Convert WFST to acceptor <strong>in</strong> a special semir<strong>in</strong>g (see below)◮ Remove epsilons from that acceptor◮ Determ<strong>in</strong>ize;◮ Convert backDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationConvert<strong>in</strong>g an FST to an acceptor◮ Suppose we had an arc <strong>in</strong> B with “1520:HI/3.67” on it.◮ I.e. <strong>in</strong>put symbol is 1520, output is “HI”, cost is 3.67◮ First, <strong>in</strong>vert → “HI:1520/3.67”◮ Next, convert to acceptor → “HI:HI/(3.67,1520)”◮ Remember– acceptors have same <strong>in</strong>put/output symbols onarcs.◮ The weight (after “/”) is a tuple of (old-weight, str<strong>in</strong>g)◮ Other valid weights:“(12.20,1520 1520 1629)”, “(-2.64,)”(empty str<strong>in</strong>g)Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationAcceptor determ<strong>in</strong>ization – conventional semir<strong>in</strong>g◮ Multiplication (⊗) is ⊗-multiply<strong>in</strong>g the weights,concatenat<strong>in</strong>g the str<strong>in</strong>gs.◮ Addition (⊕) only def<strong>in</strong>ed if the str<strong>in</strong>g part is the same◮ Just corresponds to ⊕-add<strong>in</strong>g the weights <strong>in</strong> their semir<strong>in</strong>g.◮ I.e. addition will crash if str<strong>in</strong>gs are different 2◮ ... this is not really a proper semir<strong>in</strong>g◮ ... designed to detect non-functional FSTs and to fail then.◮ Common-divisor of (w,s) and (x,t) is (w ⊕ x,u)◮ where u is the longest common prefix of s and t.◮ ... this is an operation needed for determ<strong>in</strong>ization (tonormalize subsets)2 This is the way it’s done <strong>in</strong> OpenFst, anyway.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationSemir<strong>in</strong>g for lattice-determ<strong>in</strong>ization◮ Only def<strong>in</strong>ed if there is a total order over the weight part, and⊕ on the weights corresponds to tak<strong>in</strong>g the max.◮ Multiplication is as before.◮ Addition means tak<strong>in</strong>g pair that has the largest weight part.◮ We need to be a bit careful when the weights are the same.◮ This is for mathematical purity– it wouldn’t make a practicaldifference.◮ It’s a question of satisfy<strong>in</strong>g semir<strong>in</strong>g axioms.◮ We take the shortest str<strong>in</strong>g (if lengths differ), then uselexicographical order.◮ Simple lexicographical order would have violated one of theaxioms (distributivity of ⊗ over ⊕).◮ Common-divisor is as before.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationLattice-determ<strong>in</strong>ization vs. conventional determ<strong>in</strong>ization◮ Let’s compare them where they are both def<strong>in</strong>ed (for tropicalsemir<strong>in</strong>g, which is like Viterbi).◮ When creat<strong>in</strong>g a weighted subset of states...◮ Suppose a particular state appears twice <strong>in</strong> the subset (i.e. weprocess > 1 l<strong>in</strong>ks to that state)...◮ If the “weights” on that state have the same str<strong>in</strong>g part– bothsemir<strong>in</strong>gs behave the same.◮ Otherwise (different str<strong>in</strong>g part):◮ Lattice-determ<strong>in</strong>ization will cont<strong>in</strong>ue happily, choos<strong>in</strong>g thestr<strong>in</strong>g with better weight.◮ Conventional determ<strong>in</strong>ization will say “I detectednon-functional FST” and fail.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Lattice-determ<strong>in</strong>ization <strong>in</strong> practiceThe “standard” recipeThe Kaldi recipeLattice generation◮ The “OpenFst”/AT&T way to do it would be to:◮ Convert to our special semir<strong>in</strong>g; remove epsilons; determ<strong>in</strong>ize;convert back.◮ We don’t do it this way, because lattices would have a lot of<strong>in</strong>put-epsilons◮ The remove-eps step would “blow up”.◮ Instead we wrote a determ<strong>in</strong>ization algorithm that removesepsilons itself.◮ This is a bit more complex and harder to formally provecorrectness (probably why Mohri doesn’t like it)◮ But easy to verify us<strong>in</strong>g random examples (can checkequivalence and check if determ<strong>in</strong>istic)◮ Optimized it by us<strong>in</strong>g efficient data-structures to store str<strong>in</strong>gs.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationLattice generation– simple version◮ If we had unlimited memory, basic process would be:◮ Generate beam-pruned subgraph B of search graph S◮ The states <strong>in</strong> B correspond to the active states on particularframes.◮ Prune B with beam α to get pruned version P.◮ Convert P to acceptor and lattice-determ<strong>in</strong>ize to get A(determ<strong>in</strong>istic acceptor)◮ Prune A with beam α to get f<strong>in</strong>al lattice L (<strong>in</strong> acceptor form).Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Lattice generation– real versionThe “standard” recipeThe Kaldi recipeLattice generation◮ B would be too large to easily fit <strong>in</strong> memory– prune as we go,not at the end◮ We have a special algorithm for this that’s efficient and exact.◮ I.e. it won’t prune anyth<strong>in</strong>g we wouldn’t have pruned away atthe end.◮ Determ<strong>in</strong>ization of P can use a lot of memory for someutterances (if α is large, e.g. ≥ 10)◮ This is caused by l<strong>in</strong>ks be<strong>in</strong>g generated that would have beenpruned away later anyway.◮ We could create a determ<strong>in</strong>ization algorithm that would avoidthis bloat◮ ... but it would probably be quite slow.◮ For now we just detect this, reduce the beam a bit and tryaga<strong>in</strong>.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationLattice generation– why our algorithm is good◮ We can prove that lattices generated <strong>in</strong> this way will satisfythe properties we stated 3◮ We can get “exact” lattices without approximation (e.g. the“word-pair approximation” of Ney), while only stor<strong>in</strong>g a s<strong>in</strong>gletraceback.◮ No extra time cost for generat<strong>in</strong>g lattices (for small-ish α)◮ This is unlike the traditional “more exact” approaches, whichhave to store multiple tokens <strong>in</strong> each state.◮ ... and those approaches are not even as exact as ours.◮ We have verified that LM rescor<strong>in</strong>g of these lattices gives thesame results as decod<strong>in</strong>g directly with the orig<strong>in</strong>al LM.◮ Will also verify us<strong>in</strong>g an N-best algorithm that the latticeshave the properties we stated.3 We have to slightly weaken one of the propertiesDaniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The “standard” recipeThe Kaldi recipeLattice generationLattice generation– experimental aspects◮ We have verified that LM rescor<strong>in</strong>g of these lattices gives thesame results as decod<strong>in</strong>g directly with the orig<strong>in</strong>al LM (asbeams get large enough)◮ Will also verify us<strong>in</strong>g an N-best algorithm that the latticeshave the properties we stated.◮ Also measur<strong>in</strong>g oracle WERs of lattices:◮ This is ma<strong>in</strong>ly because people expect it.◮ I don’t acknowledge that oracle WER is a relevant metric(consider a s<strong>in</strong>gle word-loop).◮ Not present<strong>in</strong>g detailed results here (not ready yet/ not thatuseful anyway).◮ Hard to compare experimentally with previous approaches:◮ These are often either not published, not completely described,or too tied to a particular type of decoder.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>Lattice generation– summaryThe “standard” recipeThe Kaldi recipeLattice generation◮ I believe this is the “right” way to do lattice generation.◮ No compromise between speed and exactness.◮ Lattices have clearly def<strong>in</strong>ed properties that can be verified.◮ Interest<strong>in</strong>g algorithm:◮ A common objection to WFSTs is, “I could code this easierwithout WFSTs”◮ Many WFST algorithms have simple analogs, e.g.determ<strong>in</strong>ization = tree-structured lexicon; m<strong>in</strong>imization =suffix shar<strong>in</strong>g.◮ This algorithm is hard to expla<strong>in</strong> without WFSTs, and showsthe power of the framework.Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>


Outl<strong>in</strong>eIntroduction to Weighted <strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong>s (WFSTs)WFSTs <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>The endThe “standard” recipeThe Kaldi recipeLattice generation◮ Questions?Daniel Povey<strong>F<strong>in</strong>ite</strong> <strong>State</strong> <strong>Transducer</strong> <strong>mechanisms</strong> <strong>in</strong> <strong>speech</strong> <strong>recognition</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!