16.01.2013 Views

Journal of Computers - Academy Publisher

Journal of Computers - Academy Publisher

Journal of Computers - Academy Publisher

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1936 JOURNAL OF COMPUTERS, VOL. 6, NO. 9, SEPTEMBER 2011<br />

feasibility <strong>of</strong> this approach is examined on the testing<br />

function and nonlinear under-actuated systems.<br />

This paper is organized as follows. The LS-SVM<br />

regression algorithm is briefly reviewed in Section 2.<br />

Parameters optimization algorithm based on the CAS<br />

algorithm is addressed in Section 3. The results <strong>of</strong> testing<br />

and simulation are presented to demonstrate the<br />

effectiveness <strong>of</strong> the proposed method in Section 4. The<br />

application <strong>of</strong> LS-SVM based on the CAS Algorithm is<br />

given in Section 5. Finally, the paper is concluded in<br />

Section 6.<br />

II. LS_SVM REGRESSION<br />

The LS-SVM, evolved from the SVM, changes the<br />

inequality constraint <strong>of</strong> a SVM into an equality constraint<br />

and forces the sum <strong>of</strong> squared error (SSE) loss function to<br />

become an experience loss function <strong>of</strong> the training set.<br />

Then the problem has become one <strong>of</strong> solved linear<br />

programming problems. This can be specifically<br />

described as follows [4]:<br />

Given the following training sample set (D):<br />

{ ( , y ) k = 1,<br />

2,<br />

L,<br />

N}<br />

D = x k k<br />

where N is the total number <strong>of</strong> training data pairs,<br />

k<br />

n<br />

R ∈ x is the regression vector and ∈ R is the<br />

n<br />

output. According to SVM theory, the input space R is<br />

mapped into a feature space, and then the linear equation<br />

in the feature space can be defined as:<br />

T<br />

f ( x) = w ϕ ( x)<br />

+ b<br />

(1)<br />

h<br />

where the nonlinear mapping ϕ : R → R maps the<br />

input data into a so-called high dimensional feature space<br />

(which can be infinite dimension). The regularized cost<br />

function <strong>of</strong> the LS-SVM is given as:<br />

where,<br />

1 T 1<br />

min J ( w , e)<br />

= w w + γ<br />

2 2<br />

T<br />

n<br />

y k<br />

N<br />

2<br />

∑ ek<br />

k = 1<br />

s. t.<br />

yk<br />

= w ϕ ( xk<br />

) + b + ek<br />

, k = 1,<br />

2,<br />

L,<br />

N (2)<br />

h n<br />

w ∈ R is the weight vector, ek ∈ R is slack<br />

variable, b ∈ R is a bias term and γ ∈ R is regularization<br />

item. The Lagrangian corresponding to Eq. (2) can be<br />

defined as follows:<br />

L ( w,b,e; α)<br />

=<br />

N<br />

−∑<br />

k = 1<br />

k<br />

T { w ( x ) + b + e y }<br />

J ( w,e) α ϕ<br />

− (3)<br />

where α k ∈ R(<br />

k = 1,<br />

2,<br />

L,<br />

N ) are the Lagrange multipliers.<br />

The KKT conditions can be expressed by<br />

N<br />

∑<br />

k = 1<br />

© 2011 ACADEMY PUBLISHER<br />

k<br />

w = α ϕ(x<br />

)<br />

(4)<br />

k<br />

k<br />

k<br />

n<br />

k<br />

T<br />

α = γe<br />

(5)<br />

N<br />

∑<br />

k = 1<br />

k<br />

k<br />

α = 0<br />

(6)<br />

k<br />

w ϕ ( x ) b + e − y = 0<br />

(7)<br />

k<br />

+ k k<br />

After elimination <strong>of</strong> w and e k , the solution <strong>of</strong> the<br />

optimization problem can be obtained by solving the<br />

following set <strong>of</strong> linear equations<br />

⎡b⎤<br />

⎡0<br />

⎢ ⎥ = ⎢v<br />

⎣α⎦<br />

⎢⎣<br />

T<br />

1 N<br />

T N<br />

, N ] ∈ R<br />

with y = [ y , L,<br />

y ] ∈ R ,<br />

v −1<br />

T ⎤ ⎡0⎤<br />

−1 ⎥ ⎢ ⎥<br />

Ω + γ I ⎥⎦<br />

⎣ y⎦<br />

r¡<br />

N<br />

= [ 1,<br />

L,<br />

1<br />

T<br />

]<br />

N<br />

∈ R<br />

α = [ α1 , L α and Ω is an N × N kernel matrix.<br />

By using the kernel trick [2], one obtains<br />

T<br />

Ω = ϕ ( x ) ϕ(<br />

x ) = ( x , x ) , ∀k,<br />

l = 1,<br />

2,<br />

L,<br />

N.<br />

kl<br />

k<br />

l<br />

K k l<br />

And the resulting LS-SVM regression model becomes<br />

N<br />

∑<br />

k = 1<br />

(8)<br />

f ( x) = α K(<br />

x,<br />

x ) + b<br />

(9)<br />

where α k , b are the solution to Eq. (8).<br />

k<br />

Note that the dot product ϕ ⋅) ϕ(<br />

⋅)<br />

in the feature space<br />

( T<br />

is replaced by a prechosen kernel function K( ⋅,<br />

⋅)<br />

due to<br />

the employment <strong>of</strong> the kernel trick. Thus, there is no need<br />

to construct the feature vector w or to know the nonlinear<br />

mapping ϕ(⋅) explicitly. Given a training set, the training<br />

<strong>of</strong> an LS-SVM is equal to solving a set <strong>of</strong> linear equations<br />

as Eq. (8). This greatly simplifies the regression problem.<br />

The chosen kernel function must satisfy the Mercer’s<br />

condition [2]. Possible kernel functions are, e.g.:<br />

Linear kernel<br />

K( x , x ) = x ⋅ x .<br />

k<br />

Polynomial kernel<br />

k<br />

l<br />

K ( x , x ) = ( x ⋅ x + 1)<br />

.<br />

l<br />

Gaussian RBF kernel<br />

k<br />

k<br />

l<br />

K( x , x ) = exp( − x − x / 2 ) .<br />

k<br />

l<br />

l<br />

k<br />

m<br />

k<br />

2 2<br />

l σ<br />

where d denotes the polynomial degree, σ is the kernel<br />

(bandwidth) parameter.<br />

It is well known that LS-SVM generalization<br />

performance depends on a good setting <strong>of</strong> regularization<br />

parameter and the kernel parameter. In order to achieve<br />

the better generalization performance, it is necessary to<br />

select and optimize these parameters.<br />

III. PARAMETERS OPTIMIZATION OF LS_SVM BASED ON<br />

CAS ALGORITHM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!