24.04.2015 Views

Rule Extraction from Support Vector Machine - Department of ...

Rule Extraction from Support Vector Machine - Department of ...

Rule Extraction from Support Vector Machine - Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Synopsis<br />

<strong>of</strong> the Ph.D. thesis on<br />

<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong><br />

<strong>Vector</strong> <strong>Machine</strong><br />

Submitted by<br />

Mohammad Abdul Haque Farquad<br />

Reg. No: 04MCPC03<br />

for the degree <strong>of</strong><br />

Doctor <strong>of</strong> Philosophy<br />

Under the Guidance <strong>of</strong><br />

Pr<strong>of</strong>. S. Bapi Raju,<br />

University <strong>of</strong> Hyderabad, Hyderabad<br />

Dr. V. Ravi,<br />

IDRBT, Hyderabad<br />

Submitted to the<br />

<strong>Department</strong> <strong>of</strong> Computer and Information Sciences<br />

University <strong>of</strong> Hyderabad<br />

Hyderabad, Andhra Pradesh, India<br />

1


Abstract<br />

Although <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s have been used to develop highly accurate<br />

classification and regression models in various real-world problem domains, the most<br />

significant barrier is that, they generate models that are difficult to understand. The procedure<br />

to convert these opaque models into transparent models is called rule extraction. This thesis<br />

investigates the task <strong>of</strong> extracting comprehensible models <strong>from</strong> trained SVMs, thereby<br />

alleviating this limitation. The primary contribution <strong>of</strong> the thesis is the proposal <strong>of</strong> various<br />

hybrid algorithms to overcome the significant limitations <strong>of</strong> SVM by taking a novel approach<br />

to the task <strong>of</strong> extracting comprehensible models. This thesis investigates various ways to<br />

extract the knowledge learnt by SVM during training. The basic contribution <strong>of</strong> the thesis is<br />

to extract rules using SVM and <strong>from</strong> SVM. During rule extraction using SVM, SVM is used<br />

as a pre-processor only, where only support vectors are extracted resulting in Case-SA<br />

dataset. During rule extraction <strong>from</strong> SVM, the trained SVM is used to predict support vector<br />

instances and the training instances, where again two variants are proposed those are Case-SP<br />

and Case-P, respectively. Hence, the modified data is the replica <strong>of</strong> the knowledge learnt by<br />

SVM during training.<br />

This thesis also investigates the efficiency <strong>of</strong> our proposed rule extraction approach in<br />

solving Bankruptcy Prediction in Banks problem. Bankruptcy is a legally declared inability<br />

or impairment <strong>of</strong> ability to pay its creditors. Bankruptcy prediction in banks and corporate<br />

firms is the most researched area in the field <strong>of</strong> statistics and machine learning. Bank<br />

management would be interested in the comprehensibility <strong>of</strong> the algorithms used for<br />

predictions. We extracted fuzzy rules for bankruptcy prediction problems using fuzzy rule<br />

based systems and the efficiency <strong>of</strong> the fuzzy rules is then compared with the rules extracted<br />

using Decision Tree. Further, this thesis investigates the efficiency <strong>of</strong> rules extracted using<br />

our proposed approaches to solve real time data mining problems. In real time data mining<br />

applications, either almost all or more than 90% <strong>of</strong> the instances belong to one class, while a<br />

very few instances belong to the other class which is usually the more important class. In that<br />

sense, the datasets are termed as unbalanced. The class imbalance problem has been an<br />

evolving topic <strong>of</strong> research in data mining. It is observed <strong>from</strong> the literature that machine<br />

learning techniques tend to be biased towards majority class, thus producing poor prediction<br />

accuracy over the minority class. We proposed a rule extraction approach to extract rules for<br />

solving these problems. Furthermore, this thesis also presents the rule extraction approach for<br />

solving regression problems as well. For the first time, we proposed rule extraction approach<br />

for solving regression problems. Adaptive Network based Fuzzy Inference System, Dynamic<br />

Evolving Fuzzy Inference System and Classification and Regression Tree are employed for<br />

rule generation purpose. Later, modifications to Active Learning Based Approach (Martenes<br />

et al., 2009) are proposed by us, where extra instances are generated using various<br />

distributions such as Normal, Logistic and Gaussian. Data mining problems such as Churn<br />

prediction in bank credit card customers and fraud detection in Insurance are solved using<br />

mALBA.<br />

1. Introduction<br />

Artificial neural networks (ANNs) and SVMs are amongst the most successful machine<br />

learning techniques used in the area <strong>of</strong> data mining. But, they produce black box models that<br />

are difficult to understand for the end user. These models do not explicitly tell the end user<br />

the knowledge learnt by tem during the training phase. Predictive accuracy and the<br />

2


comprehensibility are two main driving factors to evaluate any learning system. It is observed<br />

that the learning method which constructs the model with the best predictive accuracy is not<br />

necessarily best method that produces the most comprehensible model. This thesis explores<br />

the following question: can we take the incomprehensible model produced by SVM, and<br />

closely approximate it in a language that better facilitates comprehensibility?<br />

1.1 Motivation<br />

The process <strong>of</strong> converting the opaque models (SVM in our research) into transparent<br />

models is <strong>of</strong>ten called <strong>Rule</strong> <strong>Extraction</strong>. Using the rules extracted one can certainly understand<br />

in a better way, how a prediction is made. <strong>Rule</strong> extraction <strong>from</strong> SVMs follows the footsteps<br />

<strong>of</strong> the earlier effort to obtain human-comprehensible rules <strong>from</strong> ANNs in order to explain the<br />

knowledge learnt by ANN during training. Much attention has been paid during last decades<br />

to find effective ways <strong>of</strong> extracting rules <strong>from</strong> ANNs and very less work has been reported<br />

towards representing the knowledge learnt by SVM during training.<br />

1.2 Significance <strong>of</strong> rule extraction<br />

Andrews et al. (1995) presented the motivation behind rule extraction <strong>from</strong> neural<br />

networks. A brief overview <strong>of</strong> their study will help us establish aim and significance <strong>of</strong> rule<br />

extraction <strong>from</strong> SVM techniques.<br />

<br />

<br />

<br />

<br />

Extracted rules provide the user explanation capability to the opaque model <strong>from</strong><br />

which they are extracted. Gallent (1988) reported that rule extraction enabled a novice<br />

user to gain more insight into the problem at hand. Davis et al., (1977) and Gilbert,<br />

(1989) argues that even limited explanation can positively influence the system<br />

acceptance by the user.<br />

<strong>Rule</strong> extraction procedures enable the transparency <strong>of</strong> the internal states <strong>of</strong> a system.<br />

Transparency means that internal states <strong>of</strong> the machine learning system are both<br />

accessible and can be interpreted unambiguously. Such capability is mandatory for<br />

safety critical applications such as, air traffic control, operation <strong>of</strong> power plants,<br />

medical applications etc.<br />

<strong>Rule</strong> extraction improves generalisation ability <strong>of</strong> the model. It is difficult to<br />

determine if and when generalisation fails for specific cases even with evaluation<br />

methods as cross validation. By expressing learned knowledge as set <strong>of</strong> rules, an<br />

experienced user can anticipate or predict a generalisation failure.<br />

A learning system (i.e. rule extraction) might discover salient features in the input<br />

data whose importance was not previously recognised and new scientific theories can<br />

be induced (Craven and Shavlik, 1994).<br />

1.3 <strong>Rule</strong> Quality<br />

The quality <strong>of</strong> the extracted rules is a key measure <strong>of</strong> the success <strong>of</strong> the rule extraction<br />

algorithm. Four rule quality criteria were suggested for rule extraction algorithm (Andrews et<br />

al. 1995; Tickle et al. 1998). They are rule accuracy, fidelity, consistency and<br />

comprehensibility. In this context, a rule set is considered to be accurate if it can correctly<br />

classify previously unseen examples.<br />

# <strong>of</strong><br />

Accuracy <br />

test patternscorrectly classified by rules<br />

Totalnumber <strong>of</strong> patternson test data<br />

3<br />

100


Similarly a rule set is considered to display a high level <strong>of</strong> fidelity if it can mimic the<br />

behaviour <strong>of</strong> the machine learning technique <strong>from</strong> which it was extracted.<br />

# <strong>of</strong><br />

Fidelity <br />

patternswhere classification <strong>of</strong> rules AGREE with theclassification <strong>of</strong> SVM<br />

Totalnumber <strong>of</strong> patternson data<br />

An extracted rule set is deemed to be consistent if, under different training sessions the<br />

machine learning technique generates same rule sets that produce the same classifications <strong>of</strong><br />

unseen examples. Finally the comprehensibility <strong>of</strong> a rule set is determined by measuring the<br />

size <strong>of</strong> the rule set (in terms <strong>of</strong> number <strong>of</strong> rules) and the number <strong>of</strong> antecedents per rule.<br />

1.4 Experimental Setup<br />

Empirical analysis in this thesis is carried out in a little different fashion. We first<br />

divided the dataset into 80:20 ratios. 20% data is then named validation set and stored aside<br />

for later use. Then 10 fold cross validation was performed on the 80% <strong>of</strong> the data for training<br />

and extracting rules. Later the efficiency <strong>of</strong> the rules is evaluated against validation set.<br />

Figure 1 presents the experimental setup followed throughout the research work presented in<br />

this thesis. In this class <strong>of</strong> research the experimental setup followed by me is unique and also<br />

my contribution.<br />

Data Set 100%<br />

80% Data for 10-fold Cross Validation Validation Set 20%<br />

1 2 3 10<br />

<strong>Rule</strong>s extracted during 10-fold cross validation<br />

are tested against validation set later<br />

Figure 1: Experimental Setup Followed in this Thesis<br />

2. Research Objective<br />

In this thesis, I present and evaluate novel algorithms for the task <strong>of</strong> extracting<br />

comprehensible descriptions <strong>from</strong> SVM. The hypothesis advanced by this research is that it is<br />

possible to develop algorithms for extracting symbolic descriptions <strong>from</strong> trained SVMs that:<br />

(i) Produce more comprehensible, high-fidelity descriptions <strong>of</strong> trained SVMs using<br />

fuzzy logic approaches.<br />

(ii) Application <strong>of</strong> various intelligent techniques with explanation capability viz.,<br />

FRBS, CART, ANFIS, DENFIS and NBTree for rule generation purpose.<br />

(iii) <strong>Rule</strong> extraction approach for solving regression problems as well.<br />

(iv) Scale to analyze medium scale and unbalanced datasets.<br />

Applications tested for solving classification problems include; benchmark datasets viz.,<br />

Iris, Wine and WBC; extended to Bankruptcy Prediction in Banks using Spanish, Turkish, US<br />

4


and UK banks data; and Analytical CRM applications viz., Churn Prediction in Bank Credit<br />

Card Customers and Insurance Fraud Detection.<br />

Regression datasets analysed during the research study in this thesis includes; Auto<br />

MPG, Body Fat, Boston Housing, Forest Fires and Pollution.<br />

3. <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM<br />

Translucency refers to the extent to which the details <strong>of</strong> the ANN internal model<br />

structure are utilized by rule extraction algorithm. Based on the Translucency criteria the rule<br />

extraction techniques are classified into two major categories; Decompositional and<br />

Pedagogical. Third category is Eclectic (i.e. Hybrid), which incorporates the elements <strong>of</strong> both<br />

the decompositional and pedagogical approaches.<br />

3.1 Decompositional Approach<br />

A decompositional approach is closely intertwined with internal workings <strong>of</strong> SVM and<br />

its constructed hyperplane. Nunez et al. (2002) proposed a decompositional rule extraction<br />

approach wherein prototypes extracted <strong>from</strong> k-means clustering algorithm are combined with<br />

support vectors <strong>from</strong> SVM and then rules are extracted. k-means clustering algorithm is used<br />

to determine prototype vectors for each input class. An ellipsoid is defined in the input space<br />

combining these prototypes with support vectors and mapped to if-then rules. The main<br />

drawback <strong>of</strong> this algorithm is that the extracted rules are neither exclusive nor exhaustive<br />

which results in conflicting or missing rules for the classification <strong>of</strong> new data instances.<br />

RulExtSVM (Fu et al. 2004) is proposed for extracting if-then rules using intervals defined<br />

by hyperrectangular forms, which are generated using the intersection <strong>of</strong> the support vectors<br />

with the decision boundary. The disadvantage <strong>of</strong> this algorithm is the construction <strong>of</strong><br />

hyperrectangles based on the number <strong>of</strong> support vectors.<br />

Hyper rectangle <strong>Rule</strong>s <strong>Extraction</strong> (HRE) (Zhang et al. 2005) first constructs hyper<br />

rectangles according to the prototypes and the support vectors, then these hyper rectangles are<br />

projected onto coordinate axes and if-then rules are formed. Fung et al. (2005) proposed a<br />

rule extraction technique similar to SVM+Prototype but did not include computationally<br />

expensive clustering. Instead, the algorithm transforms the problem to a simpler, equivalent<br />

variant and constructs hyper cubes by solving linear programming problems. Each hypercube<br />

is then transformed to a rule. Chaves et al. (2005) proposed a decompositional Fuzzy <strong>Rule</strong><br />

<strong>Extraction</strong> (FREx) approach, which applies triangular fuzzy membership function and<br />

determines the projection <strong>of</strong> the support vectors in the coordinate axes. Then each support<br />

vector is transformed into fuzzy if-then rule.<br />

Barakat and Bradely (2007) proposed Modified sequential covering algorithm termed<br />

SQRex-SVM to directly extract the rules <strong>from</strong> support vectors. <strong>Rule</strong> set performance is then<br />

evaluated using the true positives (TPs) and false positives (FPs), and AUC. A Multiple<br />

Kernel-<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> (MK-SVM) (Chen et al. 2007) scheme is proposed for<br />

feature selection, rule extraction and prediction modelling and the extracted rules are tested<br />

for predicting the cancer tissue in gene expression data. It is observed that rules extracted<br />

using MK-SVM improves the explanation capacity <strong>of</strong> SVM. Recently, Martens et al. (2009)<br />

proposed a new active learning-based approach (ALBA) to extract rules <strong>from</strong> SVM models.<br />

ALBA makes use <strong>of</strong> the support vectors which are typically close to decision boundary to<br />

generate additional samples and extracts rules <strong>from</strong> all labelled samples <strong>of</strong> trained SVM.<br />

5


3.1.1 Gaps Observed<br />

It is observed <strong>from</strong> the literature that researchers have focused on extracting rules for<br />

solving benchmark classification problems only. Also, only Decision Tree algorithm was<br />

employed for generating rules.<br />

The efficiency <strong>of</strong> the fuzzy logic was ignored in the earlier research for extracting<br />

fuzzy rules <strong>from</strong> SVM.<br />

Importance <strong>of</strong> support vectors extracted using SVM was ignored totally.<br />

Generation <strong>of</strong> extra instances for small scale problems may improve the accuracy <strong>of</strong><br />

the problem but because <strong>of</strong> the number <strong>of</strong> extra instances it may generate more number <strong>of</strong><br />

rules, which in turn affects the comprehensibility <strong>of</strong> the rules extracted. Efficiency <strong>of</strong> ALBA<br />

(Martenes et al., 2009) was analysed using benchmark and small datasets only.<br />

3.1.2 Proposed approaches and contributions<br />

First, we proposed a novel hybrid fuzzy rule extraction approach by using SVM and<br />

Fuzzy <strong>Rule</strong> Based System in tandem. The proposed hybrid rule extraction approach consists<br />

<strong>of</strong> two major steps. During first step SVM is trained and support vectors are extracted<br />

resulting in Case-SA dataset (i.e. SVs set with corresponding actual target values). Later,<br />

during second step, FRBS and DT are employed separately to generate rules. The proposed<br />

approach is first applied on benchmark datasets viz., Iris and Wine and later it is tested in<br />

solving bankruptcy prediction in banks. Spanish, Turkish and US banks datasets are analysed<br />

and it is observed that the proposed hybrid fuzzy rule extraction approach generates not only<br />

fuzzy rules but also improves generalisation without compromising the accuracy <strong>of</strong> the<br />

system. It is observed that proposed approach yielded best accuracy <strong>of</strong> 92.31% with Spanish<br />

banks data. Whereas our proposed approach stand second in the list <strong>of</strong> classifiers with 87.5%<br />

accuracy using Turkish banks data and 96.15% accuracy using US banks data. Figure 2<br />

presents the overall architecture <strong>of</strong> the approaches proposed during this thesis work, feature<br />

selection step is not invoked when it is not mentioned in the proposed approach. During this<br />

proposed approach full feature data only is analyzed.<br />

3.2 Pedagogical Approach<br />

A pedagogical algorithm considers the trained model as a black box. Instead <strong>of</strong> looking<br />

at the internal structure, these algorithms directly extract rules which relate the inputs and<br />

outputs <strong>of</strong> the SVM. These techniques typically use the trained SVM model as an oracle to<br />

label or classify artificially generated training examples that are later used by a symbolic<br />

learning algorithm. The idea behind these techniques is the assumption that the trained model<br />

can better represent the data than the original data set. Trepan (Craven and Shavlik, 1996)<br />

and REX (Markowska-Kaczmar and Trelak, 2003) are some <strong>of</strong> the pedagogical approaches<br />

used for rule extraction <strong>from</strong> ANNs.<br />

3.2.1 Gaps Observed<br />

Researchers argue that when the dataset is modified with the predictions <strong>of</strong> the SVM,<br />

the resulting modified data represents the knowledge <strong>of</strong> SVM. In this category SVM’s<br />

efficiency for predictions is mostly analysed by the researchers, whereas the efficiency <strong>of</strong><br />

SVM for feature selection was totally ignored.<br />

6


Further, rule extraction <strong>from</strong> SVM for solving regression problems was never<br />

reported in literature. SVM’s efficiency <strong>of</strong> feature selection for regression problem also was<br />

also studied in the earlier research.<br />

3.2.2 Proposed approaches and contributions<br />

Further, we proposed a hybrid rule extraction algorithm for solving classification and<br />

regression problems as well. Where feature selection using SVM-RFE is first employed and<br />

the actual target values <strong>of</strong> training instances are replaced by the predictions <strong>of</strong> SVM/SVR<br />

models and Case-P (i.e. training instances with corresponding predicted target values)<br />

datasets are generated. Later, using Case-P dataset with reduced features rule are extracted.<br />

For classification, benchmark problems viz., iris, wine and WBC and Bankruptcy prediction<br />

problems viz., Spanish, Turkish, US and UK banks are analysed. It is observed that reduced<br />

features reduce the complexity <strong>of</strong> the system and increases the comprehensibility <strong>of</strong> the rules.<br />

For regression analysis, efficiency <strong>of</strong> the rules is evaluated for solving benchmark regression<br />

problems. Empirical results show that the accuracy yielded using the proposed approach i.e.<br />

with less features is better than that <strong>of</strong> the accuracy yielded using full feature data. It is<br />

observed that the number <strong>of</strong> rules extracted using reduced features is very much less which<br />

results in better comprehensibility <strong>of</strong> the black box model <strong>from</strong> which they are extracted. The<br />

architecture <strong>of</strong> the approaches proposed is shown in Figure 2 below.<br />

Phase 1<br />

Data set<br />

Full Attributes<br />

Feature selection<br />

SVM-RFE<br />

Data set<br />

Reduced<br />

Attributes<br />

<strong>Support</strong> <strong>Vector</strong>s<br />

SVM/SVR<br />

Phase 2<br />

Modified Data<br />

Case-SA, Case-SP<br />

Case-P<br />

Case-A<br />

Phase 3<br />

DT/FRBS/CART/ANFIS/<br />

DENFIS/NBTree<br />

Test set<br />

<strong>Rule</strong>s<br />

Predictions<br />

Figure 2: Overall Architecture <strong>of</strong> the Proposed <strong>Rule</strong> <strong>Extraction</strong> Approaches<br />

Note: Case-A represents the Training set with corresponding Actual target values.<br />

Case-P represents the Training set with corresponding Predicted target values.<br />

Case-SA represents the <strong>Support</strong> vector set with corresponding Actual target values.<br />

Case-SP represents the <strong>Support</strong> vector set with corresponding Predicted target values.<br />

The blue coloured process in the figure 2 represents our contributions.<br />

7


We employed feature selection using SVM-RFE and empirical analysis is carried out<br />

using reduced feature data also. It is observed that rules extracted using reduced feature data<br />

produce less number <strong>of</strong> rules and the length <strong>of</strong> the rule also become less, resulting in<br />

improved comprehensibility <strong>of</strong> the rules extracted.<br />

3.3 Eclectic Approach<br />

Eclectic rule extraction techniques incorporate the elements <strong>of</strong> both the<br />

decompositional and pedagogical approaches (Andrews et al., 1995; Barakat and Diederich,<br />

2004 and 2005). A hybrid rule extraction technique is proposed by Barakat and Diederich<br />

(2004 and 2005). After developing the SVM model using training set, they used the<br />

developed model to predict the output class labels for training instances and support vectors.<br />

Later they used decision tree for generating rules. The quality <strong>of</strong> the extracted rules is then<br />

measured using AUC (Area under Receiving Operators Characteristics Curve) (Barakat, &<br />

Bradely, 2006). They extracted crisp rules <strong>from</strong> the data.<br />

3.3.1 Gaps Observed<br />

Efficiency <strong>of</strong> regression rules using SVM was not analyzed before and no rule<br />

extraction procedure was proposed to extract rules to solve regression problems.<br />

SVM’s efficiency for feature selection also was ignored in this category <strong>of</strong> the rule<br />

extraction approaches. The number <strong>of</strong> rules extracted using all the features <strong>of</strong> the dataset is<br />

huge, resulting in less comprehensible system. Only benchmark problems were solved and<br />

analyzed in the previous research.<br />

Efficiency <strong>of</strong> rules extracted <strong>from</strong> SVM for solving unbalanced, medium scale data<br />

mining problems was never studied or reported earlier.<br />

3.3.2 Proposed approaches and contributions<br />

In this category, we proposed a hybrid rule extraction algorithm for solving regression<br />

problems. For extracting rules we employed CART, ANFIS and DENFIS algorithms<br />

separately. The proposed regression rule extraction approach is consists <strong>of</strong> three major steps.<br />

(i) SVR model is trained and support vectors are extracted.<br />

(ii) Actual target values <strong>of</strong> the extracted support vectors are then replaced by the<br />

corresponding predictions <strong>of</strong> the developed SVR model resulting in Case-SP<br />

dataset (i.e. SVs set with corresponding predicted target values).<br />

(iii) This modified data is then used to generate rules using CART, ANFIS and<br />

DENFIS.<br />

Various benchmark regression problems were solved to evaluate the efficiency <strong>of</strong> the<br />

proposed approach. Empirical study shows that the efficiency <strong>of</strong> the rules increased in the<br />

form <strong>of</strong> least error (i.e. high accuracy) with proposed approach. It is also observed that the<br />

hybrid SVR+CART, SVR+ANFIS and SVR+DENFIS yielded better results compared to the<br />

stand alone CART, ANFIS and DENFIS.<br />

Further, we proposed a novel hybrid approach <strong>of</strong> rule extraction to solve unbalanced,<br />

medium scale problems in data mining. The proposed approach is carried out in three steps.<br />

Feature selection using SVM-RFE is carried out during first steps. During second step,<br />

support vectors are extracted. Further, predictions <strong>of</strong> these SVs are obtained using developed<br />

SVM model and the corresponding actual target values are replaced by the predictions,<br />

8


esulting in Case-SP datasets. Later during final step, Case-SP (i.e. SVs set with<br />

corresponding predicted target values) dataset is used to train NBTree and rules are<br />

generated. The proposed approach is then applied to solve Churn Prediction in Bank Credit<br />

Card customers. As the problem at hand is unbalanced, we employed various balancing<br />

techniques viz., Undersampling, Oversampling, SMOTE and combination <strong>of</strong> Undersampling<br />

and Oversampling. Later, using this modified data rules were extracted using NBTree. It is<br />

observed that the more generalised rules are obtained using NBTree. It is also observed that<br />

rules extracted using NBTree are efficient for solving unbalanced and medium scale<br />

problems. Feature selection was also performed using SVM-RFE (Guyon, 2002) during the<br />

first step <strong>of</strong> the proposed approach. Later, the dataset with reduced features has been used to<br />

generate rules. It is observed that once again feature selection using SVM outperformed the<br />

case where feature selection was not used. Figure 2 shows the overall architecture <strong>of</strong> the<br />

approaches proposed.<br />

Furthermore, we proposed an extension to ALBA (Martenes et al., 2009) and called it<br />

mALBA (modified ALBA). The proposed mALBA comprises three phases. Feature selection<br />

phase, Active learning phase and rule generation phase. During feature selection phase,<br />

SVM-RFE is employed for feature selection. Active learning phase <strong>of</strong> mALBA consists <strong>of</strong><br />

four steps. (i) SVM model is trained and SVs are obtained. (ii) Distance is calculated between<br />

SVs and training instances.<br />

Feature Selection Phase<br />

Training set<br />

Full features<br />

SVM-RFE<br />

Training set<br />

Reduced features<br />

Active Learning Phase<br />

SVM<br />

Step 1<br />

Step 2<br />

<strong>Support</strong> <strong>Vector</strong>s<br />

Step 3<br />

Step 4<br />

Step 4<br />

Data generated<br />

using mALBA<br />

SVs + Generated<br />

data<br />

Modified Training<br />

set<br />

NBTree<br />

Test / Validation<br />

Tree / <strong>Rule</strong>s<br />

<strong>Rule</strong> Generation Phase<br />

Predictions<br />

Figure 3: Architecture <strong>of</strong> the proposed rule extraction approach<br />

The blue coloured process in figure 3 represents our contributions.<br />

9


(iii) Extra instances are artificially generated using Uniform, Normal and Logistic distribution<br />

separately. (iv) The predictions <strong>of</strong> these generated instances are then obtained using the<br />

trained SVM model and Case-P and Case-SP datasets are obtained. Later, during rule<br />

generation phase, this modified data is used to train NBTree (Naive Bayes Tree) and rules are<br />

generated. The application <strong>of</strong> the proposed mALBA is also extended to data mining problem<br />

in finance viz., Churn prediction in bank credit card customers and Fraud detection in<br />

insurance. The datasets analysed during this research study are medium scale in size and<br />

highly unbalanced in nature. It is observed that our proposed mALBA extracted more<br />

generalised rules on unbalanced datasets. Feature selection preceding mALBA resulted in<br />

less number <strong>of</strong> rules thereby improving comprehensibility. Figure 3 presents the overall<br />

architecture <strong>of</strong> mALBA approach.<br />

4. Organisation <strong>of</strong> the Thesis<br />

In this thesis, I present and evaluate novel algorithms for the task <strong>of</strong> extracting<br />

comprehensible descriptions <strong>from</strong> hard-to-understand learning systems i.e. SVM. The<br />

hypothesis advanced by this research is that it is possible to develop algorithms for extracting<br />

symbolic descriptions <strong>from</strong> trained SVM<br />

Chapter 1: Introduction. This chapter provides the details about the issues involved in<br />

rule extraction. The rule extraction <strong>from</strong> SVM follows the footstep <strong>of</strong> the rule extraction <strong>from</strong><br />

ANNs. The taxonomy proposed by Andrews et al., 1995 for rule extraction techniques in<br />

general is presented which is also followed during the research presented in this thesis. This<br />

chapter also provides the details about the quality measure <strong>of</strong> the rules extracted <strong>from</strong> black<br />

box techniques in general.<br />

Chapter 2: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM: an Introduction. This chapter provides<br />

background material for the rest <strong>of</strong> the thesis. <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> and <strong>Support</strong> <strong>Vector</strong><br />

Regression are first presented in detail. Literature survey <strong>of</strong> the rule extraction <strong>from</strong> SVM<br />

and the gaps/shortcomings identified during the survey are presented. This chapter also<br />

provides overviews <strong>of</strong> various machine learning (intelligent techniques) used for rule<br />

generation purpose. They are, Fuzzy <strong>Rule</strong> Based Systems (FRBS), Decision Tree (DT),<br />

Classification and Regression Tree (CART), Adaptive Network based Fuzzy Inference<br />

Systems (ANFIS), Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS) and Naive<br />

Bayes Tree (NBTree).<br />

Chapter 3: Fuzzy <strong>Rule</strong> <strong>Extraction</strong> using SVM for Solving Classification Problems.<br />

This chapter presents the proposed decompositional rule extraction approach using SVM. In<br />

this chapter, the advantage <strong>of</strong> fuzzy rule based classification systems over crisp systems is<br />

analysed. In this chapter, the details <strong>of</strong> the proposed approach are first described with the<br />

empirical analysis as well. During the research study presented in this chapter, fuzzy rules are<br />

extracted using Case-SA dataset i.e. support vectors set with actual corresponding target<br />

values.<br />

The proposed rule extraction hybrid approach is first tested on benchmark datasets viz.,<br />

Iris and Wine and the efficiency <strong>of</strong> the proposed rule extraction approach is extended to solve<br />

bankruptcy prediction in banks. Spanish, Turkish and US banks datasets were used for this<br />

study. It is observed that fuzzy rules provide better understanding and also outperform other<br />

techniques tested.<br />

10


Chapter 4: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVR for Solving Regression Problems. This<br />

chapter presents first ever rule extraction approach <strong>from</strong> SVM for solving regression<br />

problems. The proposed rule extraction approach is a decompositional approach, which is one<br />

<strong>of</strong> the main contributions <strong>of</strong> this thesis. Intelligent techniques such as ANFIS, DENFIS and<br />

CART are employed to extract rules. During this research study, various benchmark<br />

regression datasets viz., Auto MPG, Body Fat, Boston Housing, Forest Fires and Pollution,<br />

are analysed and the efficiency <strong>of</strong> the rules is evaluated in the form <strong>of</strong> RMSE i.e. root mean<br />

squared error. It is observed that the proposed hybrid rule extraction approach yielded better<br />

results and outperformed the stand alone ANFIS, DENFIS and CART.<br />

Chapter 5: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM using Feature Section. This chapter presents<br />

a pedagogical rule extraction technique <strong>from</strong> SVM, which also SVM as feature selection<br />

algorithm and the actual target values <strong>of</strong> the training set are then replaced by the predictions<br />

<strong>of</strong> SVM resulting in Case-P dataset. By employing Case-P dataset with reduced feature data,<br />

rules are extracted using CART, DT, ANFIS and DENFIS. Researchers argued that the<br />

knowledge <strong>of</strong> the trained SVM can be represented in the form <strong>of</strong> support vectors or the<br />

predictions <strong>of</strong> the developed SVM. This chapter presents a hybrid rule extraction approach<br />

where we argue that feature selection using SVM also represents the knowledge learnt by<br />

SVM during training.<br />

Using the proposed hybrid rule extraction approach for rule extraction, both<br />

classification and regression problems are solved and the empirical study is presented in this<br />

chapter. Datasets analysed for classification analysis are, benchmark datasets viz., Iris, Wine,<br />

WBC; Bankruptcy prediction datasets viz., Spanish, Turkish, US and UK banks data. Datasets<br />

analysed for regression analysis are, Auto MPG, Body Fat, Boston Housing, Forest Fires and<br />

Pollution. It is observed that dataset with reduced features tend to extract smaller rules and<br />

the less number <strong>of</strong> rules are extracted resulting in improved comprehensibility.<br />

Chapter 6: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM for Data Mining on Unbalanced datasets.<br />

This chapter presents the proposed eclectic rule extraction technique, which is used to<br />

analyze medium scale unbalanced dataset. During the proposed hybrid rule extraction<br />

approach feature selection using SVM is first employed. Later, support vectors are obtained<br />

and the actual target values <strong>of</strong> the support vectors are then replaced by the corresponding<br />

predictions <strong>of</strong> the SVM resulting in Case-SP dataset. Case-SP dataset is then employed to<br />

train NBTree classifier and rules are extracted. The proposed rule extraction approach<br />

simplifies the problem with reduction in features (i.e. horizontal) and reduction in sample size<br />

in the form <strong>of</strong> support vectors (i.e. vertical). Dealing with such unbalanced datasets is an<br />

emerging area <strong>of</strong> research in computer science and statistics community. This chapter also<br />

presents the overview <strong>of</strong> the problem faced by unbalanced datasets and the approaches<br />

proposed to deal with unbalanced datasets in the literature.<br />

One <strong>of</strong> the most important financial problems analyzed during this research study is<br />

related to customer relationship management (CRM) and the dataset analysed is concerned to<br />

churn prediction in bank credit card customers. Empirical results show that using our<br />

proposed hybrid rule extraction approach the complexity <strong>of</strong> the system is reduced and during<br />

the process most comprehensible rules are extracted without compromising the accuracy <strong>of</strong><br />

the classifier.<br />

Chapter 7: Modified Active Learning Based Approach for <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong><br />

SVM. In this chapter a new modified active learning based approach for rule extraction <strong>from</strong><br />

11


SVM is proposed, which is a decompositional approach. During this proposed approach<br />

support vectors are extracted and the distance between support vectors set and training set is<br />

calculated and using various distributions viz., Normal, Gaussian and Logistic artificial data<br />

is generated, which is supposed to be near support vector instances. mALBA is also preceded<br />

by feature selection using SVM. This chapter presents the applications analysed during this<br />

research study.<br />

Two most important problems in finance were solved using the proposed approach,<br />

viz., Churn Prediction in Bank Credit Card Customers and Fraud Detection in Insurance. The<br />

datasets analysed are medium scale in size and unbalanced in nature. In this chapter we<br />

presented the benefits <strong>of</strong> the proposed approach towards dealing with unbalanced problems<br />

occurring in banking and finance.<br />

Chapter 8: Overall Conclusions. This chapter presents the overall conclusion made<br />

out <strong>of</strong> the various proposed hybrid rule extraction approaches. In this chapter, conclusions<br />

made for various proposed rule extraction approaches applied for solving classification<br />

problems, regression problems and data mining problems are presented separately.<br />

Research Publication out <strong>of</strong> the Thesis<br />

1. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Support</strong> vector regression based hybrid rule<br />

extraction methods for forecasting”. Expert Systems with Applications, 37(8), 5577-<br />

5589, 2010.<br />

2. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />

<strong>Machine</strong>s: A Hybrid Approach for classification and regression problems”,<br />

International Journal <strong>of</strong> Information and Decision Sciences (IJIDS), 2010. (In Press)<br />

3. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />

<strong>Machine</strong> using modified Active Learning Based Approach: An application to CRM”,<br />

Setchi et al. (Eds.): 14th International Conference on Knowledge-Based and<br />

Intelligent Information & Engineering Systems, KES 2010, Part I, LNAI 6276, pp.<br />

461–470, September 8-10, 2010, Cardiff, Wales, UK.<br />

4. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> based Hybrid<br />

Classifiers and <strong>Rule</strong> <strong>Extraction</strong> There<strong>of</strong>: Application to Bankruptcy Prediction in<br />

Banks”, In Soria, E., Martín, J.D., Magdalena, R., Martínez, M., Serrano, A.J.,<br />

editors, Handbook <strong>of</strong> Research on <strong>Machine</strong> Learning Applications and Trends:<br />

Algorithms, Methods and Techniques, Vol. II, pp. 404-426, 2010, IGI Global, USA.<br />

5. M.A.H. Farquad, V. Ravi and S.B. Raju, “Data Mining using <strong>Rule</strong>s Extracted <strong>from</strong><br />

SVM: an Application to Churn Prediction in Bank Credit Cards”, Presented in 12th<br />

International Conference on Rough Sets, Fuzzy Sets, Data Mining & Granular<br />

Computing (RSFDGrC’09), December 16-18, 2009, LNAI 5908, pp. 390-397, New<br />

Delhi, India.<br />

6. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> using <strong>Support</strong> <strong>Vector</strong><br />

<strong>Machine</strong> Based Hybrid Classifier”, Presented in TENCON-2008, IEEE Region 10<br />

Conference, 19-21 November, Hyderabad, India, 2008.<br />

12


7. M.A.H. Farquad, V. Ravi and S. Bapi Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM for Analytical<br />

CRM: an Application to Predict Churn in Bank Credit Cards”, Decision <strong>Support</strong><br />

Systems. (Under Review)<br />

8. M.A.H. Farquad, V. Ravi and S. Bapi Raju, “Analytical CRM using SVM: a Modified<br />

Active Learning Based <strong>Rule</strong> <strong>Extraction</strong> approach”, Information Sciences. (Under Review).<br />

References<br />

Andrews, R. Diederich, J. and Tickle, A., “Survey and Critique <strong>of</strong> Techniques for Extracting<br />

<strong>Rule</strong>s <strong>from</strong> Trained Artificial Neural Networks,” Knowledge Based Systems, vol. 8, no.<br />

6, pp. 373-389, 1995.<br />

Barakat, N.H. and Bradley, A.P.,“<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s: Measuring<br />

the Explanation Capability Using the Area under the ROC Curve”, Proceedings <strong>of</strong> the<br />

18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, 2006.<br />

Barakat, N.H. and Bradley, A.P., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s: A<br />

Sequential Covering Approach,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 6,<br />

pp. 729-741, June 2007.<br />

Barakat, N.H. and Diederich, J., “Learning-based <strong>Rule</strong>-<strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />

<strong>Machine</strong>s”, In proceedings <strong>of</strong> the 14th International Conference on Computer Theory<br />

and applications ICCTA'2004, Alexandria, Egypt, 2004.<br />

Barakat, N.H. and Diederich, J., “Eclectic <strong>Rule</strong>-<strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />

Int’l J. Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2005.<br />

Breiman, L., Friedman, J., Olsen, R. and Stone, C., “Classification and Regression Trees”,<br />

Wadsworth and Brooks, 1984.<br />

Chaves, Ad.C.F., Vellasco, M.M.B.R. and Tanscheit, R., “Fuzzy <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong><br />

<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s”, Fifth International Conference on Hybrid Intelligent<br />

Systems, Rio de Janeiro, Brazil, November 06-09, 2005.<br />

Craven, M.W., “Extracting Comprehensible Models <strong>from</strong> Trained Neural Networks”, PhD<br />

thesis, <strong>Department</strong> <strong>of</strong> Computer Science, University <strong>of</strong> Wisconsin-Madison, 1996.<br />

Clark, P. and Niblett, T., “The CN2 Induction Algorithm”, <strong>Machine</strong> Learning, vol. 3, no. 4,<br />

pp. 261-283, 1989.<br />

Craven, M. and Shavlik, J., “Extracting Tree-Structured Representations <strong>of</strong> Trained<br />

Networks”, Advances in Neural Information Processing Systems, vol. 8, D. Touretzky,<br />

M. Mozer, and M. Hasselmo, eds., pp. 24-30, The MIT Press, citeseer.ist.<br />

psu.edu/craven96extracting.html, 1996.<br />

Fu, X., Ong, C.J., Keerthi, S., Hung, G.G. and Goh, L., “Extracting the Knowledge<br />

Embedded in <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s”, In International Joint Conference on Neural<br />

Networks (IJCNN’04), Budapest, Hungary, 2004.<br />

Fung, G., Sandilya, S. and Rao, R., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> Linear <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />

Proc. 11th ACM SIGKDD International Conference on Knowledge Discovery in Data<br />

Mining (KDD ’05), pp. 32-40, 2005.<br />

13


Markowska-Kaczmar, U. and Trelak, W., “<strong>Extraction</strong> <strong>of</strong> Fuzzy <strong>Rule</strong>s <strong>from</strong> Trained Neural<br />

Network Using Evolutionary Algorithm,” Proc. European Symp. Artificial Neural<br />

Networks (ESANN ’03), pp. 149-154, 2003.<br />

Martens, D., Baesens, B. and Gestel, T.V., “Decompositional <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong><br />

<strong>Vector</strong> <strong>Machine</strong>s by Active Learning”, IEEE Transactions on Knowledge and Data<br />

Engineering, 21(2), 178-191, 2009.<br />

Martens, D., Baesens, B., Gestel, T.V. and Vanthienen, J., “Comprehensible credit scoring<br />

models using rule extraction <strong>from</strong> support vector machines”, European Journal <strong>of</strong><br />

Operational Research 183 (2007) 1466–1476<br />

Martens, D., De Backer, M., Haesen, R., Snoeck, M., Vanthienen, J. and Baesens, B.<br />

“Classification with Ant Colony Optimization,” IEEE Trans. Evolutionary Computation,<br />

vol. 11, no. 5, pp. 651-665, 2007.<br />

Nunez, H., Angulo, C. and Catala` , A., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />

Proc. European Symp. Artificial Neural Networks (ESANN ’02), pp. 107-112, 2002.<br />

Nunez-Castro, H., Angulo-Bahon, C., Catala-Mall<strong>of</strong>re, A., “<strong>Rule</strong> Based Learning Systems<br />

<strong>from</strong> SVM and RBFNN”, TENDENCIAS DE LA MINERIA DE DATOS EN ESPAÑA.<br />

Red Española de Minería de Datos. 1 ed. pp. 13-24, 2004. (available online in English at<br />

http://www.lsi.us.es/redmidas/Capitulos/LMD02.pdf.)<br />

Quinlan, J. , “C4.5 Programs for <strong>Machine</strong> Learning”, Morgan Kaufmann, 1993.<br />

Zhang, Y., Su, H., Jia, T. and Chu, J., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> Trained <strong>Support</strong> <strong>Vector</strong><br />

<strong>Machine</strong>s”, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, vol. 3518,<br />

pp. 61-70, 2005.<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!