Teza doctorat (pdf) - Universitatea Tehnică

Teza doctorat (pdf) - Universitatea Tehnică Teza doctorat (pdf) - Universitatea Tehnică

users.utcluj.ro
from users.utcluj.ro More from this publisher
15.06.2013 Views

WSEAS TRANSACTIONS ON COMMUNICATIONS Ovidiu Buza, Gavril Toderean Text Sentence Separator Sentence Separator Sentence Separator Fig.5. Hierarchical structure of high-level analysis 6 Wave Signal Segmentation For the construction of vocal database, an appropriate method for wave signal segmentation is a request. In our approach, segmentation was done through an automated procedure which can detect silence/speech and voiced/unvoiced signal. The automated procedure uses time domain analysis of signal. After a low-pass filtering of the signal, zero-cross (Z i) wave samples were detected. Minimum (m i) and maximum (M i) points between two zeros were also computed. Separation between silence and speech is done using an amplitude threshold Ts . In silence segments all MIN and MAX points have to be smaller than Ts: | M i | T s | mi | T s . ! Tab ? ; Word Separator Word Separator Word Separator Number Alphabetic , i = s… s+n (1) In (1) s is the segment sample index and n is the number of samples in that segment. For speech segments distance between two adjacent zero-cross points (D i = d(Zi,Zi+1)) is computed. Decision of voiced segment is assumed if distance is greater than a threshold distance V: Space Tab . , ? ! Integer Real Syllable Syllable Syllable 324 Di V , i = s,… , s+n (2) Z1 A B Fig.6. A voiced segment of speech For the zero points between A and B from figure 6 to be included in the voiced segment, a look-ahead technique has been applied. A number of maximum Nk zero points between Zi and Zi+k can be inserted in voiced region if Di-1>V and Di+k >V : D D i D i j 1 k V V V Zn , j = i..k; k

WSEAS TRANSACTIONS ON COMMUNICATIONS Ovidiu Buza, Gavril Toderean A segment is assumed unvoiced if distance Di between two adjacent zeros is smaller than a threshold U: Di U , i = s,… , s+n (4) Transient segments are also defined and they consist of regions for which conditions (2), (3) and (4) are not accomplished. After first appliance of above algorithm, a large set of regions will be created. Since voiced regions are well determined, the unvoiced are broken by intercalated silence regions. This situation appears because unvoiced consonants 325 have low amplitude so they can break in many silence/unvoiced subregions. Transient segments can also appear inside the unvoiced segment because of signal bouncing above zero line. Figure 7 shows such an example, in which numbered regions are unvoiced, simple-line and unnumbered are silence regions, and double-line are transient regions. All these regions will be packed together in the second pass of the algorithm, so the result will be a single unvoiced region – as one can see in figure no. 8. Fig.7. Determining regions for an unvoiced segment of speech After segmentation, voiced and unvoiced segments are coupled according to the syllable chain that is used in vocal database construction process. Acoustic units are labelled and stored in database. Each region boundary can be viewed with a special application and, if necessary, can be adjusted. 7 Vocal Database Construction Vocal database includes a subset of Romanian language syllables. Acoustic units were separated from male speech and normalized in pitch and amplitude. Vocal database with recorded syllables has a tree data structure. Each node in the tree corresponds with a syllable characteristic, and a leaf represents appropriate syllable. Fig.8. Compacting regions of above segment Units have been inserted in database following this classification: - after length of syllables : we have two, three or four character syllables (denoted S2, S3 and S4) and also singular phonemes; - after position inside the word: initial or median (Med) and final syllables (Fin); - after accentuation: stressed or accentuated (A) or normal (N) syllables. This classification offers the advantage of reducing time for matching process between phonetic and acoustic units. Organization of vocal database is shown in figure no. 9. Level one nodes indicate length of syllables, level two nodes indicate median or final syllables, and level three accentuated or normal syllables.

WSEAS TRANSACTIONS ON COMMUNICATIONS Ovidiu Buza, Gavril Toderean<br />

Text<br />

Sentence Separator Sentence Separator Sentence Separator<br />

Fig.5. Hierarchical structure of high-level analysis<br />

6 Wave Signal Segmentation<br />

For the construction of vocal database, an<br />

appropriate method for wave signal segmentation<br />

is a request. In our approach, segmentation was<br />

done through an automated procedure which can<br />

detect silence/speech and voiced/unvoiced signal.<br />

The automated procedure uses time domain<br />

analysis of signal. After a low-pass filtering of<br />

the signal, zero-cross (Z i) wave samples were<br />

detected. Minimum (m i) and maximum (M i)<br />

points between two zeros were also computed.<br />

Separation between silence and speech is done<br />

using an amplitude threshold Ts . In silence<br />

segments all MIN and MAX points have to be<br />

smaller than Ts:<br />

|<br />

M i | T<br />

s<br />

<br />

|<br />

mi<br />

| T<br />

s<br />

. ! Tab ? ;<br />

Word Separator Word Separator Word Separator<br />

Number<br />

Alphabetic<br />

, i = s… s+n (1)<br />

In (1) s is the segment sample index and n is<br />

the number of samples in that segment.<br />

For speech segments distance between two<br />

adjacent zero-cross points (D i = d(Zi,Zi+1)) is<br />

computed. Decision of voiced segment is assumed<br />

if distance is greater than a threshold distance V:<br />

Space Tab . , ? !<br />

Integer Real<br />

Syllable Syllable Syllable<br />

324<br />

Di V , i = s,… , s+n (2)<br />

Z1<br />

A<br />

B<br />

Fig.6. A voiced segment of speech<br />

For the zero points between A and B from<br />

figure 6 to be included in the voiced segment, a<br />

look-ahead technique has been applied. A number<br />

of maximum Nk zero points between Zi and Zi+k<br />

can be inserted in voiced region if Di-1>V and<br />

Di+k >V :<br />

D<br />

<br />

D<br />

i<br />

<br />

D<br />

i<br />

j<br />

1<br />

k<br />

V<br />

V<br />

V<br />

Zn<br />

, j = i..k; k

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!