Journal of Computers - Academy Publisher

More documents

Recommendations

Info

1962 JOURNAL OF COMPUTERS, VOL. 6, NO. 9, SEPTEMBER 2011 A Double Margin Based Fuzzy Support Vector Machine Algorithm Kai Li School of Mathematics and Computer Science, Hebei University, Baoding, China Email: likai_njtu@163.com Xiaoxia Lu School of Mathematics and Computer Science, Hebei University, Baoding, China Email: yingli453@sina.com.cn Abstract—Although fuzzy support vector machine introduces the fuzzy membership degree in maximizing the margin and improves performance of classifier, it has not fully considered the position of training samples in the margin. In this paper, a double margin (rough margin) based fuzzy support vector machine (RFSVM) algorithm is presented by introducing rough set into fuzzy support vector machine. Firstly, we compute the degree of fuzzy membership of each training sample. Secondly, the data with fuzzy memberships are trained to obtain the decision hyperplane that maximizing rough margin method which contains the lower margin and the upper margin. In this algorithm, points in the lower margin have major penalty than those in the boundary in the rough margin. Finally, experiments on several benchmark datasets show that the RFSVM algorithm is very effective and feasible relative to the existing support vector machines. Index Terms—fuzzy support vector machine, double margin, classification, accuracy I. INTRODUCTION Support vector machine is firstly proposed by Vapnik et al for binary-class classification problem in 1995[1] [2] [3]. It has superior performance than traditional learning algorithms and has drawn the concern of many scholars in recent years. Support vector machine is based on Statistical Learning Theory (SLT) on (VC) dimension [4] deciding a confidence interval term and structural risk minimization (SRM) principle minimizing the upper bound of the generalization error. Support vector machine introduces a kernel tick to deal with non-separable problem. It maps points in the input space into a higherdimensional feature space such that the binary-class classification problem are indeed linearly separable or linearly approximately separable through a nonlinear map, and then finds an optimal separating hyperplane that maximizes the margin between two classes in the highdimensional feature space. However, there are still have Manuscript received October 1, 2010; revised December 1, 2010; accepted January 1, 2011. Corresponding author. Tel.:+86 0312 5079660 Email: likai_njtu@163.com, yingli453@sina.com.cn © 2011 ACADEMY PUBLISHER doi:10.4304/jcp.6.9.1962-1970 two questions needed to be further study which are how to effectively expand the binary-class classification problem to multiclass classification problem and how to overcome sensitivity or overfitting due to noises and outliers in optimal hyperplane. About the first problem, many scholars expand binaryclass classification to multiclass classification problem, wherein one-against-one (1-a-1) and one-against-all (1-ar) are common methods which transform multiclass classification problem into binary-class classification problem. Hsu and Lin studied a comparison of methods for multiclass support vector machines such as oneagainst-all, one-against-one, directed acyclic graph SVM (DAGSVM) [5]. To deal with unclassifiable region, Inoue and Abe proposed fuzzy support vector machine for multiclass problem [6]. This method uses fuzzy membership to resolve unclassifiable regions. In Ref. [7], the authors proposed a new fuzzy membership function in the nonlinear fuzzy support vector machine. Moreover, Yan and He propose a new method—multiclass fuzzy support vector machine of dismissing margin (DFSVM) based on class-center [8]. To the second problem, many scholars put forward a lot of variant SVM. In traditional support vector machine, each input point is fully assigned to one of two classes wherein some noises and outliers are ignored in training set. Therefore, it results in overfitting problem to some extent. In fact, only few input points can decide the hyperplane. In more and more real-world applications, the effects of the training points, especially noises and outliers, are different. Aimed at these problems, Lin and Wang introduced fuzzy set theory into support vector machine to overcome the sensitivity of noises and outliers to optimal hyperplane, called fuzzy support vector machine (FSVM) [9]. Fuzzy support vector machine associates a fuzzy membership with each input point such that different examples make different contributions to the learning of optimal surface. Other scholars combined FSVM with genetic algorithms (GA) [10] to improve the generalization performance of SVM. However, these need a prior knowledge of the distribution of training set. Wu and Law proposed fuzzy support vector regression
JOURNAL OF COMPUTERS, VOL. 6, NO. 9, SEPTEMBER 2011 1963 machine with Gaussian noises on triangular fuzzy number space to forecast fuzzy nonlinear system [11]. The rough set theory [12] is a powerful preprocessing tool to find out knowledge from an amount of uncertain and incomplete data and is applied to the support vector machines to reduce the features of data to process and eliminate redundancy. At the same time, it also improves performance of the classical support vector machines. To deal with the overfitting problem of the traditional support vector machine, Zhang and Wang proposed a rough margin based support vector machine [13]. In this paper, we propose a double margin based fuzzy support vector machine by combination of rough theory and fuzzy support vector machine, namely a double margin (rough margin) based on fuzzy support vector machine (RFSVM). The proposed method not only inherits the characteristic of the FSVM method, but also considers the effects of decision hyperplane depending on the position of training samples in the rough margin. So presented method further reduce overfitting due to noises or outliers. This paper is organized as follows. In section 2, a brief review of support vector machine is described. In Section 3, we describe the proposed RFSVM in detail which contain both binary classification and multiple classification RFSVM. In the following section, we evaluate our method on benchmark data sets and compare it with the existing support vector machine. Some conclusions are given in the final section. II. SUPPORT VECTOR MACHINES ALGORITHM In this section, we briefly describe the support vector machines in binary classification problems. Given a dataset of labeled training points (x1, y1), (x2, y2),…, (xl, yl), where N ( xi, yi) ⊆ R × { + 1, − 1} , i=1, 2…l. Supposed training data are linearly separable. That is to say, there is some hyperplane which correctly separates the positive examples and negative examples. The point x lying on the hyperplane satisfies +b = 0, where w is normal to the hyperplane. In this case, support vector machine algorithm finds the optimal separating hyperplane with the maximal margin. When the training data are linearly non-separable or approximately separable, it is needed to introduce the trade-off parameter. When the training data is not linearly separable, support vector machine learning algorithm introduces kernel strategy that maps the input data to a higher-dimension feature space z by using a nonlinearly mapping function ϕ () x and then the data in feature space z is indeed linearly or approximately separable. All training data satisfy the following decision function ⎧+ 1, if yi =+ 1 f( xi) = sign( < w, x >+ b) = ⎨ . (1) ⎩ − 1, if yi = -1 All training points satisfy the following inequalities: ⎧< wx , i > + b≥ + 1, ifyi = + 1 . (2) ⎨ ⎩ < wx , i > + b≤ − 1, ifyi = -1 In fact, it can be written as yi( < w, xi > + b) ≥ 1, i=1,2,…,l. above inequalities. It is seen that finding the hyperplane is equivalent to obtain the maximizing margin © 2011 ACADEMY PUBLISHER 2 by minimizing || w || subject to constraints (2). So the primal optimal problem is given as 1 2 min || w || wb , 2 st .. y( < w, x >+ b) ≥1. (3) i i i = 1, 2,..., l To solve optimal problem, we introduce Lagrange multiplier to transform the primal problem (3) into its dual problem that becomes the following quadratic programming (QP) problem: l l l 1 min ∑∑αiα jyy i j( xi⋅xj) −∑αi α 2 i= 1 j= 1 i= 1 . (4) l ∑ s. t. α y = 0, 0 ≤ α , i= 1,2,..., l. i= 1 i i i In classifier, the solution in feature space using a linearly mapping function ϕ ( x) only replaces the dot product x ⋅ x j by inner product vectors ϕ( x) ⋅ ϕ( x j ) . The mapping function ϕ( x) and ϕ ( xi ) satisfy < ϕ( x), ϕ( xj) >= K( x, xi) , where K( x, xi) is called kernel function. In real world application, we would never need to explicitly know what ϕ is. A decision function with SVM is obtained by computing dot products of a given test point x, or more specifically by computing following sign: Ns * f( x) = α y ( s ⋅ x) + b ∑ i= 1 i i i Ns * ∑α i i= 1 iϕ i ϕ Ns * = ∑α i i= 1 i ( i, ) + . (5) = y ( s ) ⋅ ( x) + b yK s x b Where the coefficient α is positive, i i s is support vector and Ns is the number of support vectors. In most cases, as the learning of a suitable hyperplane is too restrictive to be of practical use and causes a large overlap of classes, there is nonexistent some separable hyperplane. To deal with linearly non-separable data, it often allows that some points are misclassified, and introduces nonnegative slack variables ξ > 0 measuring the number of misclassifications and a punishment parameter C which is a cost trade-off between maximizing the margin and minimizing the classification error of training data. The sum of the slacks ∑ ξ is an i upper bound on the number of training errors. And, the original constraints (2) are relaxed to yi( < w, xi>+ b) ≥1 − ξi, i = 1,2,..., l. (6) Thus, constructing optimal hyperplane is equivalent to solve the following optimization problem: l 1 2 min || w|| + C∑ξi wb , , ξ 2 i= 1 st .. yi( < w, ϕ( xi) >+ b) ≥1−ξ (7) i ξ ≥ 0, i = 1, 2,..., l. i
Page 1 and 2:
Journal of Computers ISSN 1796-203X
Page 3:
Research on Self-built Digital Reso
Page 6 and 7:
1798 JOURNAL OF COMPUTERS, VOL. 6,
Page 8 and 9:
Page 10 and 11:
Page 12 and 13:
Page 14 and 15:
Page 16 and 17:
Page 18 and 19:
Page 20 and 21:
Page 22 and 23:
Page 24 and 25:
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Page 64 and 65:
Page 66 and 67:
Page 68 and 69:
Page 70 and 71:
Page 72 and 73:
Page 74 and 75:
Page 76 and 77:
Page 78 and 79:
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121: 1912 JOURNAL OF COMPUTERS, VOL. 6,
Page 211 and 212: A Modified Technique for Analysis o
show all

Journal of Computers - Academy Publisher

Create successful ePaper yourself

Delete template?

Save as template?