Journal of Software - Academy Publisher
Journal of Software - Academy Publisher Journal of Software - Academy Publisher
890 JOURNAL OF SOFTWARE, VOL. 6, NO. 5, MAY 2011 algorithm trained the Ki*(Ki-1)/2 FSVM used for subdividing rough category Ci by using the training sample Si, (Ki is the number of class of consonants in Ci), then organized them into fine classifier group: FDGSVMi (i = 1, ..., 7). Step 6 The algorithm extracted test samples feature parameters F and DWTMFC-CT, and input feature parameter F to the rough classifier FDGSVM0, the most appropriate fine classifier would be selected from the fine classifier group, and then determined test samples belong to different categories according the parameters DWTMFC-CT. B. feature parameters DWTMFC-CT extraction Some speech features had be extracted using wavelet transform by some researchers, such as MWBC [13], DWT-MFC [14]. The method of DWT-MFC was improved appropriately in this paper as it could adapt the features of consonant recognition. The extraction method of the new consonant feature vector DWTMFC-CT was as follows: Step 1 Enflame: According to the different length of consonant, the consonant signal was split up into several frames on average, as shown in TABLE I. Step 2 Wavelet transformation: Using db4 as mother wavelet, the consonant signals were decomposed to 3 TABLE I. THE NUMBER OF FRAMES rough category C1 C2 C3 C4 C5 C6 C7 number 1 2 2 3 3 4 4 layers, then 3 groups of detail coefficients: 1 D 2 , 3 D 2 and a group of similar coefficients: 3 2 2 D 2 , A were extracted. Step 3 Spectrum combined: The 4 groups of wavelet coefficients were done by Fast Fourier Transform (FFT), the wavelet coefficients spectrum were translated from the time domain into the frequency domain, then all of the wavelet coefficients will combined into a full spectrum. Step 4 Calculation cepstrum coefficient: The full spectrum above were done by M-order Mel filter, and got Mel spectrum, furthermore, translated by discrete cosine transformation to obtain cepstral vector coefficient dh= (dh,1, dh,2,…,dh,M ) of the hth frame, in which h=1,…, Nci, NCi is the number of the frames. Step 5 Cepstrum coefficient splice: Each frame was processed by the method above, and cepstrum coefficients of all frames were spliced, then could get NCi*M-dimension feature vectors DWTMFC-CT=(d1, d2,…,dNCi). V. EXPERIMENTS Speech samples were collected in a quiet environment, recorded by Cool Edit software. The number of recording channel was one, its frequency was 16000Hz and precision was 16bit. The distance from microphone to people’s mouth is form 10 cm to 20 cm. 10 females and 10 males (healthy and speech Mandarin) © 2011 ACADEMY PUBLISHER Figure 3. A graph describing the result of FSVM(C1-C2). Figure 4. A graph describing the result of FSVM(C3-C4). Figure 5. A graph describing the result of FSVM(C5-C6).
JOURNAL OF SOFTWARE, VOL. 6, NO. 5, MAY 2011 891 were called to do experiments. Each consonant was read 5 times, then all of the samples were front-end processed and segmented the vowel parts and consonant parts of consonants by usual. Finally each consonant had 100 samples. The aim of the first experiment was to determine whether the feature parameter vector F (including length (L), periodicity (P), relative energy (E) and zero-crossing rate(Z)) was useful in the first stage of consonant recognition. 100 samples in each rough category set were selected randomly. Two-class fuzzy support vector machine was used as classifier. For example, three pictures of results were shown in three rough category pairs (C1-C2 (Fig. 3), C3-C4 (Fig. 4), and C5-C6 (Fig. 5)). We can see that, there were significant differences among the rough categories with the feature parameter vector F. In the second stage of consonant recognition, in order to test the effectiveness of the 7 fine classifiers, 10 experiments were done. In each experiment, 60 samples in each consonant sample set were selected randomly as the training samples, the remaining 40 samples as the test samples. That is, the ith fine classification had 60*Ki category title Accuracy(%) TABLE III. THE RECOGNITION RESULTS OF 7 CLASSES training samples and 40*Ki test samples. The Mel cepstrum coefficients (MFCC) and DWTMFC-CT of consonant samples were extracted (the number of FFT points was min {512,the number of sample points}, Mel filter order was 24). SVM and FSVM were also used for comparative experiments. The recognition results of 7 classes were shown in TABLE II and TABLE III. The results of TABLE II and TABLE III were the average of 10 experiments. We can see that: compared with the speech feature classification by using the MFCC directly, the DWTMFC-CT based on wavelet transformation could reduce a large number of support vectors and improve the correct rate, particularly in rough categories with similar pronunciation: C3 (z similar to zh), C5 (c similar to ch), C7 (s similar to sh), and the effectiveness of C1 (with the strongest instability and the shortest length) was also significant. It showed that unstable consonants can be described more accurately by the multi-scale wavelet analysis. When chosen the same DWTMFC-CT feature vectors, compared with SVM, the recognition correct rates of FSVM were better, while the difference of support vectors number was small. It showed that FSVM SVM(MFCC) SVM(DWTMFCC-CT) FSVM(DWTMFCC-CT) number of support vectors Accuracy(%) number of support vectors Accuracy(%) number of support vectors C1 [b,d,g] 84.33 63.4 89.83 47.2 91.58 46.8 C2 [l,m,n,r] 93.75 88.8 97.13 47.5 98.38 50.6 C3 [z,zh,j] 85.75 61 90.92 49.5 92.17 54.4 C4 [p,t,k] 85.67 68.7 89.33 51.5 91.67 50.2 C5 [c,ch,q] 82.58 68.7 88.17 41.8 90.83 44.2 C6 [f,h] 95.50 10.1 98.25 7.3 99.25 8.4 C7 [s,sh,x] 86.42 75.9 95.50 23.5 96.58 24.8 Average 87.71 61.34 92.73 38.33 94.35 39.91 category title TABLE II. COMPARISON AMONG SVM(DWTMFCC), SVM(MFCC) AND FSVM(DWTMFCC-CT) SVM(MFCC) VS SVM(DWTMFCC-CT) Difference of Accuracy(%) Difference of the number of support vectors(%) SVM(DWTMFCC-CT) VS FSVM(DWTMFCC-CT) Difference of Accuracy(%) Difference of the number of support vectors(%) SVM(MFCC) VS FSVM(DWTMFCC-CT) Difference of Accuracy(%) Difference of the number of support vectors(%) C1 [b,d,g] 6.52 -25.55 1.95 0.85 8.60 -26.18 C2 [l,m,n,r] 3.60 -46.51 1.29 -6.53 4.93 -43.02 C3 [z,zh,j] 6.03 -18.85 1.37 -9.90 7.48 -10.82 C4 [p,t,k] 4.28 -25.04 2.61 2.52 7.01 -26.93 C5 [c,ch,q] 6.76 -32.03 3.02 -5.74 9.99 -28.13 C6 [f,h] 2.88 -27.72 1.02 -15.07 3.93 -16.83 C7 [s,sh,x] 10.51 -69.04 1.13 -5.53 11.76 -67.33 Average 5.80 -60.04 1.77 -3.97 7.67 -53.69 © 2011 ACADEMY PUBLISHER
- Page 97 and 98: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 99 and 100: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 101 and 102: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 103 and 104: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 105 and 106: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 107 and 108: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 109 and 110: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 111 and 112: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 113 and 114: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 115 and 116: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 117 and 118: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 119 and 120: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 121 and 122: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 123 and 124: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 125 and 126: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 127 and 128: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 129 and 130: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 131 and 132: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 133 and 134: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 135 and 136: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 137 and 138: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 139 and 140: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 141 and 142: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 143 and 144: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 145: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 149 and 150: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 151 and 152: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 153 and 154: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 155 and 156: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 157 and 158: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 159 and 160: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 161 and 162: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 163 and 164: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 165 and 166: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 167 and 168: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 169 and 170: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 171 and 172: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 173 and 174: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 175 and 176: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 177 and 178: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 179 and 180: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 181 and 182: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 183 and 184: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 185 and 186: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 187 and 188: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 189 and 190: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 191 and 192: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 193 and 194: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
- Page 195 and 196: JOURNAL OF SOFTWARE, VOL. 6, NO. 5,
JOURNAL OF SOFTWARE, VOL. 6, NO. 5, MAY 2011 891<br />
were called to do experiments. Each consonant was read<br />
5 times, then all <strong>of</strong> the samples were front-end processed<br />
and segmented the vowel parts and consonant parts <strong>of</strong><br />
consonants by usual. Finally each consonant had 100<br />
samples.<br />
The aim <strong>of</strong> the first experiment was to determine<br />
whether the feature parameter vector F (including length<br />
(L), periodicity (P), relative energy (E) and zero-crossing<br />
rate(Z)) was useful in the first stage <strong>of</strong> consonant<br />
recognition. 100 samples in each rough category set were<br />
selected randomly. Two-class fuzzy support vector<br />
machine was used as classifier. For example, three<br />
pictures <strong>of</strong> results were shown in three rough category<br />
pairs (C1-C2 (Fig. 3), C3-C4 (Fig. 4), and C5-C6 (Fig.<br />
5)). We can see that, there were significant differences<br />
among the rough categories with the feature parameter<br />
vector F.<br />
In the second stage <strong>of</strong> consonant recognition, in order<br />
to test the effectiveness <strong>of</strong> the 7 fine classifiers, 10<br />
experiments were done. In each experiment, 60 samples<br />
in each consonant sample set were selected randomly as<br />
the training samples, the remaining 40 samples as the test<br />
samples. That is, the ith fine classification had 60*Ki<br />
category title<br />
Accuracy(%)<br />
TABLE III.<br />
THE RECOGNITION RESULTS OF 7 CLASSES<br />
training samples and 40*Ki test samples. The Mel<br />
cepstrum coefficients (MFCC) and DWTMFC-CT <strong>of</strong><br />
consonant samples were extracted (the number <strong>of</strong> FFT<br />
points was min {512,the number <strong>of</strong> sample points}, Mel<br />
filter order was 24). SVM and FSVM were also used for<br />
comparative experiments. The recognition results <strong>of</strong> 7<br />
classes were shown in TABLE II and TABLE III. The<br />
results <strong>of</strong> TABLE II and TABLE III were the average <strong>of</strong><br />
10 experiments.<br />
We can see that: compared with the speech feature<br />
classification by using the MFCC directly, the<br />
DWTMFC-CT based on wavelet transformation could<br />
reduce a large number <strong>of</strong> support vectors and improve the<br />
correct rate, particularly in rough categories with similar<br />
pronunciation: C3 (z similar to zh), C5 (c similar to ch),<br />
C7 (s similar to sh), and the effectiveness <strong>of</strong> C1 (with the<br />
strongest instability and the shortest length) was also<br />
significant. It showed that unstable consonants can be<br />
described more accurately by the multi-scale wavelet<br />
analysis. When chosen the same DWTMFC-CT feature<br />
vectors, compared with SVM, the recognition correct<br />
rates <strong>of</strong> FSVM were better, while the difference <strong>of</strong><br />
support vectors number was small. It showed that FSVM<br />
SVM(MFCC) SVM(DWTMFCC-CT) FSVM(DWTMFCC-CT)<br />
number <strong>of</strong> support<br />
vectors<br />
Accuracy(%)<br />
number <strong>of</strong> support<br />
vectors<br />
Accuracy(%)<br />
number <strong>of</strong> support<br />
vectors<br />
C1 [b,d,g] 84.33 63.4 89.83 47.2 91.58 46.8<br />
C2 [l,m,n,r] 93.75 88.8 97.13 47.5 98.38 50.6<br />
C3 [z,zh,j] 85.75 61 90.92 49.5 92.17 54.4<br />
C4 [p,t,k] 85.67 68.7 89.33 51.5 91.67 50.2<br />
C5 [c,ch,q] 82.58 68.7 88.17 41.8 90.83 44.2<br />
C6 [f,h] 95.50 10.1 98.25 7.3 99.25 8.4<br />
C7 [s,sh,x] 86.42 75.9 95.50 23.5 96.58 24.8<br />
Average 87.71 61.34 92.73 38.33 94.35 39.91<br />
category title<br />
TABLE II.<br />
COMPARISON AMONG SVM(DWTMFCC), SVM(MFCC) AND FSVM(DWTMFCC-CT)<br />
SVM(MFCC) VS<br />
SVM(DWTMFCC-CT)<br />
Difference <strong>of</strong><br />
Accuracy(%)<br />
Difference <strong>of</strong> the<br />
number <strong>of</strong> support<br />
vectors(%)<br />
SVM(DWTMFCC-CT) VS<br />
FSVM(DWTMFCC-CT)<br />
Difference <strong>of</strong><br />
Accuracy(%)<br />
Difference <strong>of</strong> the<br />
number <strong>of</strong> support<br />
vectors(%)<br />
SVM(MFCC) VS<br />
FSVM(DWTMFCC-CT)<br />
Difference <strong>of</strong><br />
Accuracy(%)<br />
Difference <strong>of</strong> the<br />
number <strong>of</strong> support<br />
vectors(%)<br />
C1 [b,d,g] 6.52 -25.55 1.95 0.85 8.60 -26.18<br />
C2 [l,m,n,r] 3.60 -46.51 1.29 -6.53 4.93 -43.02<br />
C3 [z,zh,j] 6.03 -18.85 1.37 -9.90 7.48 -10.82<br />
C4 [p,t,k] 4.28 -25.04 2.61 2.52 7.01 -26.93<br />
C5 [c,ch,q] 6.76 -32.03 3.02 -5.74 9.99 -28.13<br />
C6 [f,h] 2.88 -27.72 1.02 -15.07 3.93 -16.83<br />
C7 [s,sh,x] 10.51 -69.04 1.13 -5.53 11.76 -67.33<br />
Average 5.80 -60.04 1.77 -3.97 7.67 -53.69<br />
© 2011 ACADEMY PUBLISHER