Rule Extraction from Support Vector Machine - Department of ...
Rule Extraction from Support Vector Machine - Department of ...
Rule Extraction from Support Vector Machine - Department of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Synopsis<br />
<strong>of</strong> the Ph.D. thesis on<br />
<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong><br />
<strong>Vector</strong> <strong>Machine</strong><br />
Submitted by<br />
Mohammad Abdul Haque Farquad<br />
Reg. No: 04MCPC03<br />
for the degree <strong>of</strong><br />
Doctor <strong>of</strong> Philosophy<br />
Under the Guidance <strong>of</strong><br />
Pr<strong>of</strong>. S. Bapi Raju,<br />
University <strong>of</strong> Hyderabad, Hyderabad<br />
Dr. V. Ravi,<br />
IDRBT, Hyderabad<br />
Submitted to the<br />
<strong>Department</strong> <strong>of</strong> Computer and Information Sciences<br />
University <strong>of</strong> Hyderabad<br />
Hyderabad, Andhra Pradesh, India<br />
1
Abstract<br />
Although <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s have been used to develop highly accurate<br />
classification and regression models in various real-world problem domains, the most<br />
significant barrier is that, they generate models that are difficult to understand. The procedure<br />
to convert these opaque models into transparent models is called rule extraction. This thesis<br />
investigates the task <strong>of</strong> extracting comprehensible models <strong>from</strong> trained SVMs, thereby<br />
alleviating this limitation. The primary contribution <strong>of</strong> the thesis is the proposal <strong>of</strong> various<br />
hybrid algorithms to overcome the significant limitations <strong>of</strong> SVM by taking a novel approach<br />
to the task <strong>of</strong> extracting comprehensible models. This thesis investigates various ways to<br />
extract the knowledge learnt by SVM during training. The basic contribution <strong>of</strong> the thesis is<br />
to extract rules using SVM and <strong>from</strong> SVM. During rule extraction using SVM, SVM is used<br />
as a pre-processor only, where only support vectors are extracted resulting in Case-SA<br />
dataset. During rule extraction <strong>from</strong> SVM, the trained SVM is used to predict support vector<br />
instances and the training instances, where again two variants are proposed those are Case-SP<br />
and Case-P, respectively. Hence, the modified data is the replica <strong>of</strong> the knowledge learnt by<br />
SVM during training.<br />
This thesis also investigates the efficiency <strong>of</strong> our proposed rule extraction approach in<br />
solving Bankruptcy Prediction in Banks problem. Bankruptcy is a legally declared inability<br />
or impairment <strong>of</strong> ability to pay its creditors. Bankruptcy prediction in banks and corporate<br />
firms is the most researched area in the field <strong>of</strong> statistics and machine learning. Bank<br />
management would be interested in the comprehensibility <strong>of</strong> the algorithms used for<br />
predictions. We extracted fuzzy rules for bankruptcy prediction problems using fuzzy rule<br />
based systems and the efficiency <strong>of</strong> the fuzzy rules is then compared with the rules extracted<br />
using Decision Tree. Further, this thesis investigates the efficiency <strong>of</strong> rules extracted using<br />
our proposed approaches to solve real time data mining problems. In real time data mining<br />
applications, either almost all or more than 90% <strong>of</strong> the instances belong to one class, while a<br />
very few instances belong to the other class which is usually the more important class. In that<br />
sense, the datasets are termed as unbalanced. The class imbalance problem has been an<br />
evolving topic <strong>of</strong> research in data mining. It is observed <strong>from</strong> the literature that machine<br />
learning techniques tend to be biased towards majority class, thus producing poor prediction<br />
accuracy over the minority class. We proposed a rule extraction approach to extract rules for<br />
solving these problems. Furthermore, this thesis also presents the rule extraction approach for<br />
solving regression problems as well. For the first time, we proposed rule extraction approach<br />
for solving regression problems. Adaptive Network based Fuzzy Inference System, Dynamic<br />
Evolving Fuzzy Inference System and Classification and Regression Tree are employed for<br />
rule generation purpose. Later, modifications to Active Learning Based Approach (Martenes<br />
et al., 2009) are proposed by us, where extra instances are generated using various<br />
distributions such as Normal, Logistic and Gaussian. Data mining problems such as Churn<br />
prediction in bank credit card customers and fraud detection in Insurance are solved using<br />
mALBA.<br />
1. Introduction<br />
Artificial neural networks (ANNs) and SVMs are amongst the most successful machine<br />
learning techniques used in the area <strong>of</strong> data mining. But, they produce black box models that<br />
are difficult to understand for the end user. These models do not explicitly tell the end user<br />
the knowledge learnt by tem during the training phase. Predictive accuracy and the<br />
2
comprehensibility are two main driving factors to evaluate any learning system. It is observed<br />
that the learning method which constructs the model with the best predictive accuracy is not<br />
necessarily best method that produces the most comprehensible model. This thesis explores<br />
the following question: can we take the incomprehensible model produced by SVM, and<br />
closely approximate it in a language that better facilitates comprehensibility?<br />
1.1 Motivation<br />
The process <strong>of</strong> converting the opaque models (SVM in our research) into transparent<br />
models is <strong>of</strong>ten called <strong>Rule</strong> <strong>Extraction</strong>. Using the rules extracted one can certainly understand<br />
in a better way, how a prediction is made. <strong>Rule</strong> extraction <strong>from</strong> SVMs follows the footsteps<br />
<strong>of</strong> the earlier effort to obtain human-comprehensible rules <strong>from</strong> ANNs in order to explain the<br />
knowledge learnt by ANN during training. Much attention has been paid during last decades<br />
to find effective ways <strong>of</strong> extracting rules <strong>from</strong> ANNs and very less work has been reported<br />
towards representing the knowledge learnt by SVM during training.<br />
1.2 Significance <strong>of</strong> rule extraction<br />
Andrews et al. (1995) presented the motivation behind rule extraction <strong>from</strong> neural<br />
networks. A brief overview <strong>of</strong> their study will help us establish aim and significance <strong>of</strong> rule<br />
extraction <strong>from</strong> SVM techniques.<br />
<br />
<br />
<br />
<br />
Extracted rules provide the user explanation capability to the opaque model <strong>from</strong><br />
which they are extracted. Gallent (1988) reported that rule extraction enabled a novice<br />
user to gain more insight into the problem at hand. Davis et al., (1977) and Gilbert,<br />
(1989) argues that even limited explanation can positively influence the system<br />
acceptance by the user.<br />
<strong>Rule</strong> extraction procedures enable the transparency <strong>of</strong> the internal states <strong>of</strong> a system.<br />
Transparency means that internal states <strong>of</strong> the machine learning system are both<br />
accessible and can be interpreted unambiguously. Such capability is mandatory for<br />
safety critical applications such as, air traffic control, operation <strong>of</strong> power plants,<br />
medical applications etc.<br />
<strong>Rule</strong> extraction improves generalisation ability <strong>of</strong> the model. It is difficult to<br />
determine if and when generalisation fails for specific cases even with evaluation<br />
methods as cross validation. By expressing learned knowledge as set <strong>of</strong> rules, an<br />
experienced user can anticipate or predict a generalisation failure.<br />
A learning system (i.e. rule extraction) might discover salient features in the input<br />
data whose importance was not previously recognised and new scientific theories can<br />
be induced (Craven and Shavlik, 1994).<br />
1.3 <strong>Rule</strong> Quality<br />
The quality <strong>of</strong> the extracted rules is a key measure <strong>of</strong> the success <strong>of</strong> the rule extraction<br />
algorithm. Four rule quality criteria were suggested for rule extraction algorithm (Andrews et<br />
al. 1995; Tickle et al. 1998). They are rule accuracy, fidelity, consistency and<br />
comprehensibility. In this context, a rule set is considered to be accurate if it can correctly<br />
classify previously unseen examples.<br />
# <strong>of</strong><br />
Accuracy <br />
test patternscorrectly classified by rules<br />
Totalnumber <strong>of</strong> patternson test data<br />
3<br />
100
Similarly a rule set is considered to display a high level <strong>of</strong> fidelity if it can mimic the<br />
behaviour <strong>of</strong> the machine learning technique <strong>from</strong> which it was extracted.<br />
# <strong>of</strong><br />
Fidelity <br />
patternswhere classification <strong>of</strong> rules AGREE with theclassification <strong>of</strong> SVM<br />
Totalnumber <strong>of</strong> patternson data<br />
An extracted rule set is deemed to be consistent if, under different training sessions the<br />
machine learning technique generates same rule sets that produce the same classifications <strong>of</strong><br />
unseen examples. Finally the comprehensibility <strong>of</strong> a rule set is determined by measuring the<br />
size <strong>of</strong> the rule set (in terms <strong>of</strong> number <strong>of</strong> rules) and the number <strong>of</strong> antecedents per rule.<br />
1.4 Experimental Setup<br />
Empirical analysis in this thesis is carried out in a little different fashion. We first<br />
divided the dataset into 80:20 ratios. 20% data is then named validation set and stored aside<br />
for later use. Then 10 fold cross validation was performed on the 80% <strong>of</strong> the data for training<br />
and extracting rules. Later the efficiency <strong>of</strong> the rules is evaluated against validation set.<br />
Figure 1 presents the experimental setup followed throughout the research work presented in<br />
this thesis. In this class <strong>of</strong> research the experimental setup followed by me is unique and also<br />
my contribution.<br />
Data Set 100%<br />
80% Data for 10-fold Cross Validation Validation Set 20%<br />
1 2 3 10<br />
<strong>Rule</strong>s extracted during 10-fold cross validation<br />
are tested against validation set later<br />
Figure 1: Experimental Setup Followed in this Thesis<br />
2. Research Objective<br />
In this thesis, I present and evaluate novel algorithms for the task <strong>of</strong> extracting<br />
comprehensible descriptions <strong>from</strong> SVM. The hypothesis advanced by this research is that it is<br />
possible to develop algorithms for extracting symbolic descriptions <strong>from</strong> trained SVMs that:<br />
(i) Produce more comprehensible, high-fidelity descriptions <strong>of</strong> trained SVMs using<br />
fuzzy logic approaches.<br />
(ii) Application <strong>of</strong> various intelligent techniques with explanation capability viz.,<br />
FRBS, CART, ANFIS, DENFIS and NBTree for rule generation purpose.<br />
(iii) <strong>Rule</strong> extraction approach for solving regression problems as well.<br />
(iv) Scale to analyze medium scale and unbalanced datasets.<br />
Applications tested for solving classification problems include; benchmark datasets viz.,<br />
Iris, Wine and WBC; extended to Bankruptcy Prediction in Banks using Spanish, Turkish, US<br />
4
and UK banks data; and Analytical CRM applications viz., Churn Prediction in Bank Credit<br />
Card Customers and Insurance Fraud Detection.<br />
Regression datasets analysed during the research study in this thesis includes; Auto<br />
MPG, Body Fat, Boston Housing, Forest Fires and Pollution.<br />
3. <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM<br />
Translucency refers to the extent to which the details <strong>of</strong> the ANN internal model<br />
structure are utilized by rule extraction algorithm. Based on the Translucency criteria the rule<br />
extraction techniques are classified into two major categories; Decompositional and<br />
Pedagogical. Third category is Eclectic (i.e. Hybrid), which incorporates the elements <strong>of</strong> both<br />
the decompositional and pedagogical approaches.<br />
3.1 Decompositional Approach<br />
A decompositional approach is closely intertwined with internal workings <strong>of</strong> SVM and<br />
its constructed hyperplane. Nunez et al. (2002) proposed a decompositional rule extraction<br />
approach wherein prototypes extracted <strong>from</strong> k-means clustering algorithm are combined with<br />
support vectors <strong>from</strong> SVM and then rules are extracted. k-means clustering algorithm is used<br />
to determine prototype vectors for each input class. An ellipsoid is defined in the input space<br />
combining these prototypes with support vectors and mapped to if-then rules. The main<br />
drawback <strong>of</strong> this algorithm is that the extracted rules are neither exclusive nor exhaustive<br />
which results in conflicting or missing rules for the classification <strong>of</strong> new data instances.<br />
RulExtSVM (Fu et al. 2004) is proposed for extracting if-then rules using intervals defined<br />
by hyperrectangular forms, which are generated using the intersection <strong>of</strong> the support vectors<br />
with the decision boundary. The disadvantage <strong>of</strong> this algorithm is the construction <strong>of</strong><br />
hyperrectangles based on the number <strong>of</strong> support vectors.<br />
Hyper rectangle <strong>Rule</strong>s <strong>Extraction</strong> (HRE) (Zhang et al. 2005) first constructs hyper<br />
rectangles according to the prototypes and the support vectors, then these hyper rectangles are<br />
projected onto coordinate axes and if-then rules are formed. Fung et al. (2005) proposed a<br />
rule extraction technique similar to SVM+Prototype but did not include computationally<br />
expensive clustering. Instead, the algorithm transforms the problem to a simpler, equivalent<br />
variant and constructs hyper cubes by solving linear programming problems. Each hypercube<br />
is then transformed to a rule. Chaves et al. (2005) proposed a decompositional Fuzzy <strong>Rule</strong><br />
<strong>Extraction</strong> (FREx) approach, which applies triangular fuzzy membership function and<br />
determines the projection <strong>of</strong> the support vectors in the coordinate axes. Then each support<br />
vector is transformed into fuzzy if-then rule.<br />
Barakat and Bradely (2007) proposed Modified sequential covering algorithm termed<br />
SQRex-SVM to directly extract the rules <strong>from</strong> support vectors. <strong>Rule</strong> set performance is then<br />
evaluated using the true positives (TPs) and false positives (FPs), and AUC. A Multiple<br />
Kernel-<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> (MK-SVM) (Chen et al. 2007) scheme is proposed for<br />
feature selection, rule extraction and prediction modelling and the extracted rules are tested<br />
for predicting the cancer tissue in gene expression data. It is observed that rules extracted<br />
using MK-SVM improves the explanation capacity <strong>of</strong> SVM. Recently, Martens et al. (2009)<br />
proposed a new active learning-based approach (ALBA) to extract rules <strong>from</strong> SVM models.<br />
ALBA makes use <strong>of</strong> the support vectors which are typically close to decision boundary to<br />
generate additional samples and extracts rules <strong>from</strong> all labelled samples <strong>of</strong> trained SVM.<br />
5
3.1.1 Gaps Observed<br />
It is observed <strong>from</strong> the literature that researchers have focused on extracting rules for<br />
solving benchmark classification problems only. Also, only Decision Tree algorithm was<br />
employed for generating rules.<br />
The efficiency <strong>of</strong> the fuzzy logic was ignored in the earlier research for extracting<br />
fuzzy rules <strong>from</strong> SVM.<br />
Importance <strong>of</strong> support vectors extracted using SVM was ignored totally.<br />
Generation <strong>of</strong> extra instances for small scale problems may improve the accuracy <strong>of</strong><br />
the problem but because <strong>of</strong> the number <strong>of</strong> extra instances it may generate more number <strong>of</strong><br />
rules, which in turn affects the comprehensibility <strong>of</strong> the rules extracted. Efficiency <strong>of</strong> ALBA<br />
(Martenes et al., 2009) was analysed using benchmark and small datasets only.<br />
3.1.2 Proposed approaches and contributions<br />
First, we proposed a novel hybrid fuzzy rule extraction approach by using SVM and<br />
Fuzzy <strong>Rule</strong> Based System in tandem. The proposed hybrid rule extraction approach consists<br />
<strong>of</strong> two major steps. During first step SVM is trained and support vectors are extracted<br />
resulting in Case-SA dataset (i.e. SVs set with corresponding actual target values). Later,<br />
during second step, FRBS and DT are employed separately to generate rules. The proposed<br />
approach is first applied on benchmark datasets viz., Iris and Wine and later it is tested in<br />
solving bankruptcy prediction in banks. Spanish, Turkish and US banks datasets are analysed<br />
and it is observed that the proposed hybrid fuzzy rule extraction approach generates not only<br />
fuzzy rules but also improves generalisation without compromising the accuracy <strong>of</strong> the<br />
system. It is observed that proposed approach yielded best accuracy <strong>of</strong> 92.31% with Spanish<br />
banks data. Whereas our proposed approach stand second in the list <strong>of</strong> classifiers with 87.5%<br />
accuracy using Turkish banks data and 96.15% accuracy using US banks data. Figure 2<br />
presents the overall architecture <strong>of</strong> the approaches proposed during this thesis work, feature<br />
selection step is not invoked when it is not mentioned in the proposed approach. During this<br />
proposed approach full feature data only is analyzed.<br />
3.2 Pedagogical Approach<br />
A pedagogical algorithm considers the trained model as a black box. Instead <strong>of</strong> looking<br />
at the internal structure, these algorithms directly extract rules which relate the inputs and<br />
outputs <strong>of</strong> the SVM. These techniques typically use the trained SVM model as an oracle to<br />
label or classify artificially generated training examples that are later used by a symbolic<br />
learning algorithm. The idea behind these techniques is the assumption that the trained model<br />
can better represent the data than the original data set. Trepan (Craven and Shavlik, 1996)<br />
and REX (Markowska-Kaczmar and Trelak, 2003) are some <strong>of</strong> the pedagogical approaches<br />
used for rule extraction <strong>from</strong> ANNs.<br />
3.2.1 Gaps Observed<br />
Researchers argue that when the dataset is modified with the predictions <strong>of</strong> the SVM,<br />
the resulting modified data represents the knowledge <strong>of</strong> SVM. In this category SVM’s<br />
efficiency for predictions is mostly analysed by the researchers, whereas the efficiency <strong>of</strong><br />
SVM for feature selection was totally ignored.<br />
6
Further, rule extraction <strong>from</strong> SVM for solving regression problems was never<br />
reported in literature. SVM’s efficiency <strong>of</strong> feature selection for regression problem also was<br />
also studied in the earlier research.<br />
3.2.2 Proposed approaches and contributions<br />
Further, we proposed a hybrid rule extraction algorithm for solving classification and<br />
regression problems as well. Where feature selection using SVM-RFE is first employed and<br />
the actual target values <strong>of</strong> training instances are replaced by the predictions <strong>of</strong> SVM/SVR<br />
models and Case-P (i.e. training instances with corresponding predicted target values)<br />
datasets are generated. Later, using Case-P dataset with reduced features rule are extracted.<br />
For classification, benchmark problems viz., iris, wine and WBC and Bankruptcy prediction<br />
problems viz., Spanish, Turkish, US and UK banks are analysed. It is observed that reduced<br />
features reduce the complexity <strong>of</strong> the system and increases the comprehensibility <strong>of</strong> the rules.<br />
For regression analysis, efficiency <strong>of</strong> the rules is evaluated for solving benchmark regression<br />
problems. Empirical results show that the accuracy yielded using the proposed approach i.e.<br />
with less features is better than that <strong>of</strong> the accuracy yielded using full feature data. It is<br />
observed that the number <strong>of</strong> rules extracted using reduced features is very much less which<br />
results in better comprehensibility <strong>of</strong> the black box model <strong>from</strong> which they are extracted. The<br />
architecture <strong>of</strong> the approaches proposed is shown in Figure 2 below.<br />
Phase 1<br />
Data set<br />
Full Attributes<br />
Feature selection<br />
SVM-RFE<br />
Data set<br />
Reduced<br />
Attributes<br />
<strong>Support</strong> <strong>Vector</strong>s<br />
SVM/SVR<br />
Phase 2<br />
Modified Data<br />
Case-SA, Case-SP<br />
Case-P<br />
Case-A<br />
Phase 3<br />
DT/FRBS/CART/ANFIS/<br />
DENFIS/NBTree<br />
Test set<br />
<strong>Rule</strong>s<br />
Predictions<br />
Figure 2: Overall Architecture <strong>of</strong> the Proposed <strong>Rule</strong> <strong>Extraction</strong> Approaches<br />
Note: Case-A represents the Training set with corresponding Actual target values.<br />
Case-P represents the Training set with corresponding Predicted target values.<br />
Case-SA represents the <strong>Support</strong> vector set with corresponding Actual target values.<br />
Case-SP represents the <strong>Support</strong> vector set with corresponding Predicted target values.<br />
The blue coloured process in the figure 2 represents our contributions.<br />
7
We employed feature selection using SVM-RFE and empirical analysis is carried out<br />
using reduced feature data also. It is observed that rules extracted using reduced feature data<br />
produce less number <strong>of</strong> rules and the length <strong>of</strong> the rule also become less, resulting in<br />
improved comprehensibility <strong>of</strong> the rules extracted.<br />
3.3 Eclectic Approach<br />
Eclectic rule extraction techniques incorporate the elements <strong>of</strong> both the<br />
decompositional and pedagogical approaches (Andrews et al., 1995; Barakat and Diederich,<br />
2004 and 2005). A hybrid rule extraction technique is proposed by Barakat and Diederich<br />
(2004 and 2005). After developing the SVM model using training set, they used the<br />
developed model to predict the output class labels for training instances and support vectors.<br />
Later they used decision tree for generating rules. The quality <strong>of</strong> the extracted rules is then<br />
measured using AUC (Area under Receiving Operators Characteristics Curve) (Barakat, &<br />
Bradely, 2006). They extracted crisp rules <strong>from</strong> the data.<br />
3.3.1 Gaps Observed<br />
Efficiency <strong>of</strong> regression rules using SVM was not analyzed before and no rule<br />
extraction procedure was proposed to extract rules to solve regression problems.<br />
SVM’s efficiency for feature selection also was ignored in this category <strong>of</strong> the rule<br />
extraction approaches. The number <strong>of</strong> rules extracted using all the features <strong>of</strong> the dataset is<br />
huge, resulting in less comprehensible system. Only benchmark problems were solved and<br />
analyzed in the previous research.<br />
Efficiency <strong>of</strong> rules extracted <strong>from</strong> SVM for solving unbalanced, medium scale data<br />
mining problems was never studied or reported earlier.<br />
3.3.2 Proposed approaches and contributions<br />
In this category, we proposed a hybrid rule extraction algorithm for solving regression<br />
problems. For extracting rules we employed CART, ANFIS and DENFIS algorithms<br />
separately. The proposed regression rule extraction approach is consists <strong>of</strong> three major steps.<br />
(i) SVR model is trained and support vectors are extracted.<br />
(ii) Actual target values <strong>of</strong> the extracted support vectors are then replaced by the<br />
corresponding predictions <strong>of</strong> the developed SVR model resulting in Case-SP<br />
dataset (i.e. SVs set with corresponding predicted target values).<br />
(iii) This modified data is then used to generate rules using CART, ANFIS and<br />
DENFIS.<br />
Various benchmark regression problems were solved to evaluate the efficiency <strong>of</strong> the<br />
proposed approach. Empirical study shows that the efficiency <strong>of</strong> the rules increased in the<br />
form <strong>of</strong> least error (i.e. high accuracy) with proposed approach. It is also observed that the<br />
hybrid SVR+CART, SVR+ANFIS and SVR+DENFIS yielded better results compared to the<br />
stand alone CART, ANFIS and DENFIS.<br />
Further, we proposed a novel hybrid approach <strong>of</strong> rule extraction to solve unbalanced,<br />
medium scale problems in data mining. The proposed approach is carried out in three steps.<br />
Feature selection using SVM-RFE is carried out during first steps. During second step,<br />
support vectors are extracted. Further, predictions <strong>of</strong> these SVs are obtained using developed<br />
SVM model and the corresponding actual target values are replaced by the predictions,<br />
8
esulting in Case-SP datasets. Later during final step, Case-SP (i.e. SVs set with<br />
corresponding predicted target values) dataset is used to train NBTree and rules are<br />
generated. The proposed approach is then applied to solve Churn Prediction in Bank Credit<br />
Card customers. As the problem at hand is unbalanced, we employed various balancing<br />
techniques viz., Undersampling, Oversampling, SMOTE and combination <strong>of</strong> Undersampling<br />
and Oversampling. Later, using this modified data rules were extracted using NBTree. It is<br />
observed that the more generalised rules are obtained using NBTree. It is also observed that<br />
rules extracted using NBTree are efficient for solving unbalanced and medium scale<br />
problems. Feature selection was also performed using SVM-RFE (Guyon, 2002) during the<br />
first step <strong>of</strong> the proposed approach. Later, the dataset with reduced features has been used to<br />
generate rules. It is observed that once again feature selection using SVM outperformed the<br />
case where feature selection was not used. Figure 2 shows the overall architecture <strong>of</strong> the<br />
approaches proposed.<br />
Furthermore, we proposed an extension to ALBA (Martenes et al., 2009) and called it<br />
mALBA (modified ALBA). The proposed mALBA comprises three phases. Feature selection<br />
phase, Active learning phase and rule generation phase. During feature selection phase,<br />
SVM-RFE is employed for feature selection. Active learning phase <strong>of</strong> mALBA consists <strong>of</strong><br />
four steps. (i) SVM model is trained and SVs are obtained. (ii) Distance is calculated between<br />
SVs and training instances.<br />
Feature Selection Phase<br />
Training set<br />
Full features<br />
SVM-RFE<br />
Training set<br />
Reduced features<br />
Active Learning Phase<br />
SVM<br />
Step 1<br />
Step 2<br />
<strong>Support</strong> <strong>Vector</strong>s<br />
Step 3<br />
Step 4<br />
Step 4<br />
Data generated<br />
using mALBA<br />
SVs + Generated<br />
data<br />
Modified Training<br />
set<br />
NBTree<br />
Test / Validation<br />
Tree / <strong>Rule</strong>s<br />
<strong>Rule</strong> Generation Phase<br />
Predictions<br />
Figure 3: Architecture <strong>of</strong> the proposed rule extraction approach<br />
The blue coloured process in figure 3 represents our contributions.<br />
9
(iii) Extra instances are artificially generated using Uniform, Normal and Logistic distribution<br />
separately. (iv) The predictions <strong>of</strong> these generated instances are then obtained using the<br />
trained SVM model and Case-P and Case-SP datasets are obtained. Later, during rule<br />
generation phase, this modified data is used to train NBTree (Naive Bayes Tree) and rules are<br />
generated. The application <strong>of</strong> the proposed mALBA is also extended to data mining problem<br />
in finance viz., Churn prediction in bank credit card customers and Fraud detection in<br />
insurance. The datasets analysed during this research study are medium scale in size and<br />
highly unbalanced in nature. It is observed that our proposed mALBA extracted more<br />
generalised rules on unbalanced datasets. Feature selection preceding mALBA resulted in<br />
less number <strong>of</strong> rules thereby improving comprehensibility. Figure 3 presents the overall<br />
architecture <strong>of</strong> mALBA approach.<br />
4. Organisation <strong>of</strong> the Thesis<br />
In this thesis, I present and evaluate novel algorithms for the task <strong>of</strong> extracting<br />
comprehensible descriptions <strong>from</strong> hard-to-understand learning systems i.e. SVM. The<br />
hypothesis advanced by this research is that it is possible to develop algorithms for extracting<br />
symbolic descriptions <strong>from</strong> trained SVM<br />
Chapter 1: Introduction. This chapter provides the details about the issues involved in<br />
rule extraction. The rule extraction <strong>from</strong> SVM follows the footstep <strong>of</strong> the rule extraction <strong>from</strong><br />
ANNs. The taxonomy proposed by Andrews et al., 1995 for rule extraction techniques in<br />
general is presented which is also followed during the research presented in this thesis. This<br />
chapter also provides the details about the quality measure <strong>of</strong> the rules extracted <strong>from</strong> black<br />
box techniques in general.<br />
Chapter 2: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM: an Introduction. This chapter provides<br />
background material for the rest <strong>of</strong> the thesis. <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> and <strong>Support</strong> <strong>Vector</strong><br />
Regression are first presented in detail. Literature survey <strong>of</strong> the rule extraction <strong>from</strong> SVM<br />
and the gaps/shortcomings identified during the survey are presented. This chapter also<br />
provides overviews <strong>of</strong> various machine learning (intelligent techniques) used for rule<br />
generation purpose. They are, Fuzzy <strong>Rule</strong> Based Systems (FRBS), Decision Tree (DT),<br />
Classification and Regression Tree (CART), Adaptive Network based Fuzzy Inference<br />
Systems (ANFIS), Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS) and Naive<br />
Bayes Tree (NBTree).<br />
Chapter 3: Fuzzy <strong>Rule</strong> <strong>Extraction</strong> using SVM for Solving Classification Problems.<br />
This chapter presents the proposed decompositional rule extraction approach using SVM. In<br />
this chapter, the advantage <strong>of</strong> fuzzy rule based classification systems over crisp systems is<br />
analysed. In this chapter, the details <strong>of</strong> the proposed approach are first described with the<br />
empirical analysis as well. During the research study presented in this chapter, fuzzy rules are<br />
extracted using Case-SA dataset i.e. support vectors set with actual corresponding target<br />
values.<br />
The proposed rule extraction hybrid approach is first tested on benchmark datasets viz.,<br />
Iris and Wine and the efficiency <strong>of</strong> the proposed rule extraction approach is extended to solve<br />
bankruptcy prediction in banks. Spanish, Turkish and US banks datasets were used for this<br />
study. It is observed that fuzzy rules provide better understanding and also outperform other<br />
techniques tested.<br />
10
Chapter 4: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVR for Solving Regression Problems. This<br />
chapter presents first ever rule extraction approach <strong>from</strong> SVM for solving regression<br />
problems. The proposed rule extraction approach is a decompositional approach, which is one<br />
<strong>of</strong> the main contributions <strong>of</strong> this thesis. Intelligent techniques such as ANFIS, DENFIS and<br />
CART are employed to extract rules. During this research study, various benchmark<br />
regression datasets viz., Auto MPG, Body Fat, Boston Housing, Forest Fires and Pollution,<br />
are analysed and the efficiency <strong>of</strong> the rules is evaluated in the form <strong>of</strong> RMSE i.e. root mean<br />
squared error. It is observed that the proposed hybrid rule extraction approach yielded better<br />
results and outperformed the stand alone ANFIS, DENFIS and CART.<br />
Chapter 5: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM using Feature Section. This chapter presents<br />
a pedagogical rule extraction technique <strong>from</strong> SVM, which also SVM as feature selection<br />
algorithm and the actual target values <strong>of</strong> the training set are then replaced by the predictions<br />
<strong>of</strong> SVM resulting in Case-P dataset. By employing Case-P dataset with reduced feature data,<br />
rules are extracted using CART, DT, ANFIS and DENFIS. Researchers argued that the<br />
knowledge <strong>of</strong> the trained SVM can be represented in the form <strong>of</strong> support vectors or the<br />
predictions <strong>of</strong> the developed SVM. This chapter presents a hybrid rule extraction approach<br />
where we argue that feature selection using SVM also represents the knowledge learnt by<br />
SVM during training.<br />
Using the proposed hybrid rule extraction approach for rule extraction, both<br />
classification and regression problems are solved and the empirical study is presented in this<br />
chapter. Datasets analysed for classification analysis are, benchmark datasets viz., Iris, Wine,<br />
WBC; Bankruptcy prediction datasets viz., Spanish, Turkish, US and UK banks data. Datasets<br />
analysed for regression analysis are, Auto MPG, Body Fat, Boston Housing, Forest Fires and<br />
Pollution. It is observed that dataset with reduced features tend to extract smaller rules and<br />
the less number <strong>of</strong> rules are extracted resulting in improved comprehensibility.<br />
Chapter 6: <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM for Data Mining on Unbalanced datasets.<br />
This chapter presents the proposed eclectic rule extraction technique, which is used to<br />
analyze medium scale unbalanced dataset. During the proposed hybrid rule extraction<br />
approach feature selection using SVM is first employed. Later, support vectors are obtained<br />
and the actual target values <strong>of</strong> the support vectors are then replaced by the corresponding<br />
predictions <strong>of</strong> the SVM resulting in Case-SP dataset. Case-SP dataset is then employed to<br />
train NBTree classifier and rules are extracted. The proposed rule extraction approach<br />
simplifies the problem with reduction in features (i.e. horizontal) and reduction in sample size<br />
in the form <strong>of</strong> support vectors (i.e. vertical). Dealing with such unbalanced datasets is an<br />
emerging area <strong>of</strong> research in computer science and statistics community. This chapter also<br />
presents the overview <strong>of</strong> the problem faced by unbalanced datasets and the approaches<br />
proposed to deal with unbalanced datasets in the literature.<br />
One <strong>of</strong> the most important financial problems analyzed during this research study is<br />
related to customer relationship management (CRM) and the dataset analysed is concerned to<br />
churn prediction in bank credit card customers. Empirical results show that using our<br />
proposed hybrid rule extraction approach the complexity <strong>of</strong> the system is reduced and during<br />
the process most comprehensible rules are extracted without compromising the accuracy <strong>of</strong><br />
the classifier.<br />
Chapter 7: Modified Active Learning Based Approach for <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong><br />
SVM. In this chapter a new modified active learning based approach for rule extraction <strong>from</strong><br />
11
SVM is proposed, which is a decompositional approach. During this proposed approach<br />
support vectors are extracted and the distance between support vectors set and training set is<br />
calculated and using various distributions viz., Normal, Gaussian and Logistic artificial data<br />
is generated, which is supposed to be near support vector instances. mALBA is also preceded<br />
by feature selection using SVM. This chapter presents the applications analysed during this<br />
research study.<br />
Two most important problems in finance were solved using the proposed approach,<br />
viz., Churn Prediction in Bank Credit Card Customers and Fraud Detection in Insurance. The<br />
datasets analysed are medium scale in size and unbalanced in nature. In this chapter we<br />
presented the benefits <strong>of</strong> the proposed approach towards dealing with unbalanced problems<br />
occurring in banking and finance.<br />
Chapter 8: Overall Conclusions. This chapter presents the overall conclusion made<br />
out <strong>of</strong> the various proposed hybrid rule extraction approaches. In this chapter, conclusions<br />
made for various proposed rule extraction approaches applied for solving classification<br />
problems, regression problems and data mining problems are presented separately.<br />
Research Publication out <strong>of</strong> the Thesis<br />
1. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Support</strong> vector regression based hybrid rule<br />
extraction methods for forecasting”. Expert Systems with Applications, 37(8), 5577-<br />
5589, 2010.<br />
2. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />
<strong>Machine</strong>s: A Hybrid Approach for classification and regression problems”,<br />
International Journal <strong>of</strong> Information and Decision Sciences (IJIDS), 2010. (In Press)<br />
3. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />
<strong>Machine</strong> using modified Active Learning Based Approach: An application to CRM”,<br />
Setchi et al. (Eds.): 14th International Conference on Knowledge-Based and<br />
Intelligent Information & Engineering Systems, KES 2010, Part I, LNAI 6276, pp.<br />
461–470, September 8-10, 2010, Cardiff, Wales, UK.<br />
4. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong> based Hybrid<br />
Classifiers and <strong>Rule</strong> <strong>Extraction</strong> There<strong>of</strong>: Application to Bankruptcy Prediction in<br />
Banks”, In Soria, E., Martín, J.D., Magdalena, R., Martínez, M., Serrano, A.J.,<br />
editors, Handbook <strong>of</strong> Research on <strong>Machine</strong> Learning Applications and Trends:<br />
Algorithms, Methods and Techniques, Vol. II, pp. 404-426, 2010, IGI Global, USA.<br />
5. M.A.H. Farquad, V. Ravi and S.B. Raju, “Data Mining using <strong>Rule</strong>s Extracted <strong>from</strong><br />
SVM: an Application to Churn Prediction in Bank Credit Cards”, Presented in 12th<br />
International Conference on Rough Sets, Fuzzy Sets, Data Mining & Granular<br />
Computing (RSFDGrC’09), December 16-18, 2009, LNAI 5908, pp. 390-397, New<br />
Delhi, India.<br />
6. M.A.H. Farquad, V. Ravi and S.B. Raju, “<strong>Rule</strong> <strong>Extraction</strong> using <strong>Support</strong> <strong>Vector</strong><br />
<strong>Machine</strong> Based Hybrid Classifier”, Presented in TENCON-2008, IEEE Region 10<br />
Conference, 19-21 November, Hyderabad, India, 2008.<br />
12
7. M.A.H. Farquad, V. Ravi and S. Bapi Raju, “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> SVM for Analytical<br />
CRM: an Application to Predict Churn in Bank Credit Cards”, Decision <strong>Support</strong><br />
Systems. (Under Review)<br />
8. M.A.H. Farquad, V. Ravi and S. Bapi Raju, “Analytical CRM using SVM: a Modified<br />
Active Learning Based <strong>Rule</strong> <strong>Extraction</strong> approach”, Information Sciences. (Under Review).<br />
References<br />
Andrews, R. Diederich, J. and Tickle, A., “Survey and Critique <strong>of</strong> Techniques for Extracting<br />
<strong>Rule</strong>s <strong>from</strong> Trained Artificial Neural Networks,” Knowledge Based Systems, vol. 8, no.<br />
6, pp. 373-389, 1995.<br />
Barakat, N.H. and Bradley, A.P.,“<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s: Measuring<br />
the Explanation Capability Using the Area under the ROC Curve”, Proceedings <strong>of</strong> the<br />
18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, 2006.<br />
Barakat, N.H. and Bradley, A.P., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s: A<br />
Sequential Covering Approach,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 6,<br />
pp. 729-741, June 2007.<br />
Barakat, N.H. and Diederich, J., “Learning-based <strong>Rule</strong>-<strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong><br />
<strong>Machine</strong>s”, In proceedings <strong>of</strong> the 14th International Conference on Computer Theory<br />
and applications ICCTA'2004, Alexandria, Egypt, 2004.<br />
Barakat, N.H. and Diederich, J., “Eclectic <strong>Rule</strong>-<strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />
Int’l J. Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2005.<br />
Breiman, L., Friedman, J., Olsen, R. and Stone, C., “Classification and Regression Trees”,<br />
Wadsworth and Brooks, 1984.<br />
Chaves, Ad.C.F., Vellasco, M.M.B.R. and Tanscheit, R., “Fuzzy <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong><br />
<strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s”, Fifth International Conference on Hybrid Intelligent<br />
Systems, Rio de Janeiro, Brazil, November 06-09, 2005.<br />
Craven, M.W., “Extracting Comprehensible Models <strong>from</strong> Trained Neural Networks”, PhD<br />
thesis, <strong>Department</strong> <strong>of</strong> Computer Science, University <strong>of</strong> Wisconsin-Madison, 1996.<br />
Clark, P. and Niblett, T., “The CN2 Induction Algorithm”, <strong>Machine</strong> Learning, vol. 3, no. 4,<br />
pp. 261-283, 1989.<br />
Craven, M. and Shavlik, J., “Extracting Tree-Structured Representations <strong>of</strong> Trained<br />
Networks”, Advances in Neural Information Processing Systems, vol. 8, D. Touretzky,<br />
M. Mozer, and M. Hasselmo, eds., pp. 24-30, The MIT Press, citeseer.ist.<br />
psu.edu/craven96extracting.html, 1996.<br />
Fu, X., Ong, C.J., Keerthi, S., Hung, G.G. and Goh, L., “Extracting the Knowledge<br />
Embedded in <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s”, In International Joint Conference on Neural<br />
Networks (IJCNN’04), Budapest, Hungary, 2004.<br />
Fung, G., Sandilya, S. and Rao, R., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> Linear <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />
Proc. 11th ACM SIGKDD International Conference on Knowledge Discovery in Data<br />
Mining (KDD ’05), pp. 32-40, 2005.<br />
13
Markowska-Kaczmar, U. and Trelak, W., “<strong>Extraction</strong> <strong>of</strong> Fuzzy <strong>Rule</strong>s <strong>from</strong> Trained Neural<br />
Network Using Evolutionary Algorithm,” Proc. European Symp. Artificial Neural<br />
Networks (ESANN ’03), pp. 149-154, 2003.<br />
Martens, D., Baesens, B. and Gestel, T.V., “Decompositional <strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong><br />
<strong>Vector</strong> <strong>Machine</strong>s by Active Learning”, IEEE Transactions on Knowledge and Data<br />
Engineering, 21(2), 178-191, 2009.<br />
Martens, D., Baesens, B., Gestel, T.V. and Vanthienen, J., “Comprehensible credit scoring<br />
models using rule extraction <strong>from</strong> support vector machines”, European Journal <strong>of</strong><br />
Operational Research 183 (2007) 1466–1476<br />
Martens, D., De Backer, M., Haesen, R., Snoeck, M., Vanthienen, J. and Baesens, B.<br />
“Classification with Ant Colony Optimization,” IEEE Trans. Evolutionary Computation,<br />
vol. 11, no. 5, pp. 651-665, 2007.<br />
Nunez, H., Angulo, C. and Catala` , A., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> <strong>Support</strong> <strong>Vector</strong> <strong>Machine</strong>s,”<br />
Proc. European Symp. Artificial Neural Networks (ESANN ’02), pp. 107-112, 2002.<br />
Nunez-Castro, H., Angulo-Bahon, C., Catala-Mall<strong>of</strong>re, A., “<strong>Rule</strong> Based Learning Systems<br />
<strong>from</strong> SVM and RBFNN”, TENDENCIAS DE LA MINERIA DE DATOS EN ESPAÑA.<br />
Red Española de Minería de Datos. 1 ed. pp. 13-24, 2004. (available online in English at<br />
http://www.lsi.us.es/redmidas/Capitulos/LMD02.pdf.)<br />
Quinlan, J. , “C4.5 Programs for <strong>Machine</strong> Learning”, Morgan Kaufmann, 1993.<br />
Zhang, Y., Su, H., Jia, T. and Chu, J., “<strong>Rule</strong> <strong>Extraction</strong> <strong>from</strong> Trained <strong>Support</strong> <strong>Vector</strong><br />
<strong>Machine</strong>s”, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, vol. 3518,<br />
pp. 61-70, 2005.<br />
14