Teza doctorat (pdf) - Universitatea Tehnică
Teza doctorat (pdf) - Universitatea Tehnică Teza doctorat (pdf) - Universitatea Tehnică
O. Buza, G. Toderean, J. Domokos, A. Zs. Bodo, Construction of a Syllable-Based Text-To-Speech System for Romanian Figure 1. Main stages in building LIGHTVOX text-to-speech system In the first flow, in voice signal analysis stage, we have extracted main parameters of pre- recorded speech, working in time domain of analysis. These parameters, such as: amplitude, energy, zero-crosses, were used in the second stage for speech segmentation. In this second stage, we have designed an algorithm for automated speech segmentation in ten different classes of regions, that we have put into correspondence with main phonetic categories of Romanian language. After phonetic segmentation of speech signal, we have used a semi-automated algorithm for detecting and storing waveform syllables from uttered words into vocal database. Processing has been applied onto a paralel text/voice corpus: voice corpus provides speech signal on which apply signal analysis and segmentation methods, and text corpus provides phonetical transcription of voice corpus. SIGNAL ANALYSIS TEXT PROCESSING In the second flow, in text processing stage, special phonetic rules have been developed for text pre-processing, syllables units detection and intrasegmental prosody (i.e. stress) prediction. Next, unit matching was done by selecting acoustic units from vocal database according to the linguistic units detected from the input text. And finally, acoustic units are concatenated to form the output speech signal that is synthesized by the mean of a digital audio card. 2. VOICE SIGNAL ANALYSIS SPEECH SEGMENTATION A. VOCAL DATABASE CONSTRUCTION FLOW UNIT DETECTION AND PROSODY PREDICTION B. TEXT TO VOICE CONVERTION FLOW Voice signal analysis is the first stage in vocal database construction flow. Voice signal analysis means the detection of signal parameters from speech samples recorded by the human speaker. These parameteres will be further used in signal segmentation stage. Analysis can be done in time or frequency domain. Time domain analysis, as our approach is, leads to the detection of signal characteristics directly from waveform samples. We have extracted following parameters: maximum and median amplitude, signal energy, number of zero-crosses and fundamental frequency. 302 VOCAL DATABASE CONSTRUCTION UNIT MATCHING, CONCATENATION AND SYNTHESIS
O. Buza, G. Toderean, J. Domokos, A. Zs. Bodo, Construction of a Syllable-Based Text-To-Speech System for Romanian Signal Amplitude gives information about presence or absence of speech, about voiced and unvoiced features of the signal on analyzed segment. In the case of a voiced segment of speech, as a vowel utterance, the amplitude is higher, beside the case of an unvoiced speech segment, where amplitude is lower. Mean amplitude for N samples has the following form ([7]): 1 M ( n) x( m) | w( n m ) (1) | N m where: x(m) represents the current sample of speech signal, and w(n-m) is the considered windowing function. Signal Energy is used for getting the characteristics of transported power of speech signal. For a null-mean signal, short term mean energy is defined as [8]: 1 2 E( n) x( n) w ( n m )] (2) [ N m Voiced segments (like vowels) have a higher mean energy, while the unvoiced segments (like fricative consonants) have a lower mean energy. For the majority speech segments, energy is concentrated in 300-3000 Hz band. Number of zero-crosses is used for determining frequency characteristics inside a segment. The number of zero-crossings is calculated as follows ([7]): 1 [ N n0 NTZ 1sgn( s( n 1 ) T ) sgn( s( nT ))] 2 (3) where sgn(n) represents the sign function: 1, n 0 sgn( n ) . (4) 1, n 0 303
- Page 270 and 271: Cap. 7. Proiectarea sistemului de s
- Page 272 and 273: Cap. 7. Proiectarea sistemului de s
- Page 274 and 275: Baza de date vocală Cap. 7. Proiec
- Page 276 and 277: 1 Procesare Separator Procesare Cuv
- Page 278 and 279: Cap. 7. Proiectarea sistemului de s
- Page 280 and 281: Cap. 7. Proiectarea sistemului de s
- Page 282 and 283: Cap. 7. Proiectarea sistemului de s
- Page 284 and 285: 8. Concluzii finale Cercetările ef
- Page 286 and 287: 268 Cap. 8. Concluzii finale 11. A
- Page 288 and 289: 270 Cap. 8. Concluzii finale percep
- Page 290 and 291: 272 Cap. 8. Concluzii finale b) pen
- Page 292 and 293: 274 Cap. 8. Concluzii finale - nive
- Page 294 and 295: Bibliografie [And88] André-Obrecht
- Page 296 and 297: 278 Bibliografie Quality and Testin
- Page 298 and 299: 280 Bibliografie [Giu06] Giurgiu M.
- Page 300 and 301: 282 Bibliografie [Nag05] Nageshwara
- Page 302 and 303: 284 Bibliografie Research Institute
- Page 304 and 305: Anexa 2. Silabele din setul S2 dup
- Page 306 and 307: Anexa 2. Silabele din setul S2 dup
- Page 308 and 309: Anexa 3. Silabe din setul S3 după
- Page 310 and 311: Anexa 4. Silabe din setul S4 după
- Page 312 and 313: Anexa 4. Silabe din setul S4 după
- Page 314 and 315: 296 Anexa 5. Activitatea ştiinţif
- Page 316 and 317: 298 Anexa 5. Activitatea ştiinţif
- Page 318 and 319: Anexa 6. Lucrări ştiinţifice ale
- Page 322 and 323: O. Buza, G. Toderean, J. Domokos, A
- Page 324 and 325: O. Buza, G. Toderean, J. Domokos, A
- Page 326 and 327: O. Buza, G. Toderean, J. Domokos, A
- Page 328 and 329: O. Buza, G. Toderean, J. Domokos, A
- Page 330 and 331: Level 3 O. Buza, G. Toderean, J. Do
- Page 332 and 333: O. Buza, G. Toderean, J. Domokos, A
- Page 334 and 335: O. Buza, G. Toderean, J. Domokos, A
- Page 336 and 337: O. Buza, G. Toderean, J. Domokos, A
- Page 338 and 339: About Construction of a Syllable-Ba
- Page 340 and 341: WSEAS TRANSACTIONS ON COMMUNICATION
- Page 342 and 343: WSEAS TRANSACTIONS ON COMMUNICATION
- Page 344 and 345: WSEAS TRANSACTIONS ON COMMUNICATION
O. Buza, G. Toderean, J. Domokos, A. Zs. Bodo, Construction of a Syllable-Based Text-To-Speech System for Romanian<br />
Figure 1. Main stages in building LIGHTVOX text-to-speech system<br />
In the first flow, in voice signal analysis stage, we have extracted main parameters of pre-<br />
recorded speech, working in time domain of analysis. These parameters, such as: amplitude,<br />
energy, zero-crosses, were used in the second stage for speech segmentation. In this second stage,<br />
we have designed an algorithm for automated speech segmentation in ten different classes of<br />
regions, that we have put into correspondence with main phonetic categories of Romanian<br />
language. After phonetic segmentation of speech signal, we have used a semi-automated algorithm<br />
for detecting and storing waveform syllables from uttered words into vocal database. Processing<br />
has been applied onto a paralel text/voice corpus: voice corpus provides speech signal on which<br />
apply signal analysis and segmentation methods, and text corpus provides phonetical transcription<br />
of voice corpus.<br />
SIGNAL<br />
ANALYSIS<br />
TEXT<br />
PROCESSING<br />
In the second flow, in text processing stage, special phonetic rules have been developed for<br />
text pre-processing, syllables units detection and intrasegmental prosody (i.e. stress) prediction.<br />
Next, unit matching was done by selecting acoustic units from vocal database according to the<br />
linguistic units detected from the input text. And finally, acoustic units are concatenated to form<br />
the output speech signal that is synthesized by the mean of a digital audio card.<br />
2. VOICE SIGNAL ANALYSIS<br />
SPEECH<br />
SEGMENTATION<br />
A. VOCAL DATABASE CONSTRUCTION FLOW<br />
UNIT DETECTION<br />
AND PROSODY<br />
PREDICTION<br />
B. TEXT TO VOICE CONVERTION FLOW<br />
Voice signal analysis is the first stage in vocal database construction flow. Voice signal<br />
analysis means the detection of signal parameters from speech samples recorded by the human<br />
speaker. These parameteres will be further used in signal segmentation stage. Analysis can be done<br />
in time or frequency domain. Time domain analysis, as our approach is, leads to the detection of<br />
signal characteristics directly from waveform samples.<br />
We have extracted following parameters: maximum and median amplitude, signal energy,<br />
number of zero-crosses and fundamental frequency.<br />
302<br />
VOCAL<br />
DATABASE<br />
CONSTRUCTION<br />
UNIT MATCHING,<br />
CONCATENATION<br />
AND SYNTHESIS