01.03.2013 Views

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Problem<br />

Association Rule Mining<br />

In the definition of the CuA problem, we employ by Thabtah (2005). Let T be the input training<br />

dataset with k different attributes A 1, A 2, … , A k and L is a set of class labels. A specific attribute<br />

value for A i is represented by a i, and the class labels of L are represented l j.<br />

Definition 1: An AttributeValue (A i, a i) is combination of between 1 and k different attributes<br />

values, e.g. < (A 1, a 1)>, < (A 1, a 1), (A 2, a 2)>, (A 1, a 1), (A 2, a 2), (A 3, a 3)>, …, etc.<br />

Definition 2: A class association rule (CAR) is given in the following format:<br />

( Ai1, ai1)<br />

∧ ( Ai<br />

2 , ai<br />

2 ) ∧ ... ∧ ( A1k<br />

, aik<br />

) → li<br />

, where the antecedent is a conjunction of AttributeValues<br />

and the consequent is a class.<br />

Definition 3: The frequency (freq) of a CAR in T is the number of cases in T that matches r’s<br />

antecedent.<br />

Definition 4: The support count (suppcount) of a CAR is the number of cases in T that matches<br />

r’s antecedent and belongs to a class l i for r.<br />

Definition 5: A CAR (r) passes the minsupp if for r, suppcount(r)/ |T| ≥ minsupp, where |T| is<br />

the number of cases in T.<br />

Definition 6: A CAR (r) passes minconf if suppcount(r) /freq(r) ≥ minconf.<br />

CuA Main Steps<br />

Fig.1 depicts the main steps used in CuA. The first step involves the discovery of frequent<br />

item set. This requires methods that find complete set of the frequent items by separating those<br />

that are potentially frequent and determine their frequencies in the training dataset (step 1). A<br />

rule will be produced if an item set exceeds the Minconf threshold value. The rule will be in<br />

the form of X → l , where l is the largest frequency class associated with X in the training<br />

dataset (Step 2). In step 3, a selection of an effective subset of rules ordering is performed using<br />

various procedures, while the quality of the selected subset is measured on an independent<br />

(test) data set in step 4.<br />

Fig.1: Main Steps in CuA (adopted from Thabtah, 2005)<br />

<strong>Pertanika</strong> J. Sci. & Technol. <strong>21</strong> (1): 283 - 298 (<strong>2013</strong>)<br />

207

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!