27.03.2014 Views

An Empirical Study on Classification of Non-Functional Requirements

An Empirical Study on Classification of Non-Functional Requirements

An Empirical Study on Classification of Non-Functional Requirements

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>An</str<strong>on</strong>g> <str<strong>on</strong>g>Empirical</str<strong>on</strong>g> <str<strong>on</strong>g>Study</str<strong>on</strong>g> <strong>on</strong> Classificati<strong>on</strong> <strong>of</strong><br />

N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong><br />

Wen Zhang, Ye Yang, Qing Wang, Fengdi Shu<br />

Laboratory for Internet S<strong>of</strong>tware Technologies<br />

Institute <strong>of</strong> S<strong>of</strong>tware, Chinese Academy <strong>of</strong> Sciences<br />

Beijing 100190, P.R.China<br />

{zhangwen,ye,wq, fdshu}@itechs.iscas.ac.cn<br />

Abstract—The classificati<strong>on</strong> <strong>of</strong> NFRs brings about the benefits<br />

that NFRs with respect to the same type in the system can be<br />

c<strong>on</strong>sidered and implemented aggregately by developers, and as<br />

a result be verified by quality assurers assigned for the type.<br />

This paper c<strong>on</strong>ducts an empirical study <strong>on</strong> using text mining<br />

techniques to classify NFRs automatically. Three kinds <strong>of</strong> index<br />

terms, which are at different levels <strong>of</strong> linguistical semantics, as<br />

N-grams, individual words, and multi-word expressi<strong>on</strong>s (MWE),<br />

are used in representati<strong>on</strong> <strong>of</strong> NFRs. Then, SVM (Support Vector<br />

Machine) with linear kernel is used as the classifier. We collected<br />

a data set from PROMISE web site for experimentati<strong>on</strong> in<br />

this empirical study. The experiments show that index term as<br />

individual words with Boolean weighting outperforms the other<br />

two index terms. When MWEs are used to enhance representati<strong>on</strong><br />

<strong>of</strong> individual words, there is no significant improvement<br />

<strong>on</strong> classificati<strong>on</strong> performance. Automatic classificati<strong>on</strong> produces<br />

better performance <strong>on</strong> categories <strong>of</strong> large sizes than that <strong>on</strong><br />

categories <strong>of</strong> small sizes. It can be drawn from the experimental<br />

results that for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, individual words<br />

are the best index terms in text representati<strong>on</strong> <strong>of</strong> short NFRs’<br />

descripti<strong>on</strong> and we should collect as many as possible NFRs <strong>of</strong><br />

s<strong>of</strong>tware system.<br />

Keywords—N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong>, Automatic Classificati<strong>on</strong>,<br />

Support Vector Machine.<br />

I. INTRODUCTION<br />

N<strong>on</strong>-functi<strong>on</strong>al requirements (NFRs) describe the expected<br />

qualities <strong>of</strong> a s<strong>of</strong>tware system such as usability, security and<br />

look and feel <strong>of</strong> user interface. These qualities do impact <strong>on</strong><br />

the architectural design <strong>of</strong> the s<strong>of</strong>tware system [1] and satisfacti<strong>on</strong><br />

<strong>of</strong> stakeholders relevant with the s<strong>of</strong>tware system [2] [3]<br />

[4]. NFRs are put forward by stakeholders with no expertise<br />

in s<strong>of</strong>tware engineering and the number <strong>of</strong> NFRs is <strong>of</strong>ten<br />

very large. Moreover, NFRs are scattered over the functi<strong>on</strong>al<br />

requirements and the types <strong>of</strong> the NFRs are unknown because<br />

they are written in natural language. These charateriestics<br />

<strong>of</strong> NFRs make its classificati<strong>on</strong> a labor-intensive and timec<strong>on</strong>suming<br />

job without the adopti<strong>on</strong> <strong>of</strong> automatic techniques.<br />

Nevertheless, s<strong>of</strong>tware developers cannot ignore NFRs because<br />

they are decisive <strong>on</strong> the success <strong>of</strong> the s<strong>of</strong>tware system.<br />

Compared with functi<strong>on</strong>al requirement, NFRs tend to be<br />

properties <strong>of</strong> a system as whole, and hence cannot be verified<br />

for individual comp<strong>on</strong>ents or modules [19] [23]. This makes<br />

the classificati<strong>on</strong> <strong>of</strong> NFRs necessary in development because<br />

, <strong>on</strong> the <strong>on</strong>e hand,we need to handle NFRs in a different way<br />

from functi<strong>on</strong>al requirements and, different types <strong>of</strong> NFRs<br />

should be handled by developers with different expertise.<br />

For instance, in a security critical, missi<strong>on</strong> critical, or<br />

ec<strong>on</strong>omy vital s<strong>of</strong>tware systems, formal method with model<br />

checking <strong>of</strong> the correctness <strong>of</strong> specificati<strong>on</strong> <strong>on</strong> system security<br />

is used to determine whether or not the <strong>on</strong>going behavior<br />

holds <strong>of</strong> the system’s structure. However, a serious (and<br />

obvious) drawback <strong>of</strong> model checking is the state explosi<strong>on</strong><br />

problem because the size <strong>of</strong> the global state graph is (at least)<br />

exp<strong>on</strong>ential in the size <strong>of</strong> the program text [21]. Thus, an<br />

exigent demand from system developers is to identify the<br />

security-related NFRs from the requirement document and<br />

express them using formal specificati<strong>on</strong>s, and then to c<strong>on</strong>duct<br />

verificati<strong>on</strong> <strong>of</strong> the satisfacti<strong>on</strong> <strong>of</strong> system’s properties <strong>on</strong> those<br />

security specificati<strong>on</strong>s.<br />

On the other hand, the classificati<strong>on</strong> <strong>of</strong> NFRs benefits an<br />

overall decisi<strong>on</strong> <strong>on</strong> the systems’ satisfacti<strong>on</strong> <strong>on</strong> each category<br />

<strong>of</strong> NFRs and the progress <strong>of</strong> system development developers<br />

have made. Functi<strong>on</strong>al requirements can be measured as either<br />

satisfied or not satisfied, but NFRs can not merely measured by<br />

a linear scale as degree <strong>of</strong> satisfacti<strong>on</strong> [25]. In system testing,<br />

for example, we can customize our test strategy based <strong>on</strong> the<br />

classificati<strong>on</strong> <strong>of</strong> NFRs and thus system behaviors <strong>of</strong> NFRs<br />

were reported directly to the project manager.<br />

Most, if not all, projects have resource limitati<strong>on</strong>s and<br />

time c<strong>on</strong>straints, with different requirements having different<br />

c<strong>on</strong>cerns <strong>of</strong> s<strong>of</strong>tware systems. Even after the requirements are<br />

elicited and collected, they are still just an unorganized set <strong>of</strong><br />

data. As such, it is necessary to categorize the requirements<br />

into different types so as to match problems and soluti<strong>on</strong>s<br />

in separate domain <strong>of</strong> discourse, especially for large-scale<br />

s<strong>of</strong>tware projects with hundreds or even thousands <strong>of</strong> requirements.<br />

In human resource allocati<strong>on</strong> and optimizati<strong>on</strong> [22], different<br />

developers possess different expertise in handling various<br />

aspects <strong>of</strong> s<strong>of</strong>tware development. Different tasks in development<br />

may need different expertise and capability from the<br />

developers. Thus, a match <strong>of</strong> developers and tasks is at the<br />

core <strong>of</strong> the success <strong>of</strong> s<strong>of</strong>tware development. The identificati<strong>on</strong><br />

<strong>of</strong> different types <strong>of</strong> NFRs results in the formati<strong>on</strong><br />

<strong>of</strong> different types <strong>of</strong> development tasks. C<strong>on</strong>sequently, those<br />

tasks are assigned to developers aggregately according to their<br />

expertise and capability level. For instance, the NFRs with<br />

”performance” type and the NFRs with ”maintainability” type<br />

should be dealt with by developers with different expertise.


We forward NFRs <strong>of</strong> the type ”look and feel” to UI (User<br />

Interface) design experts <strong>of</strong> the system so that the satisfacti<strong>on</strong><br />

<strong>of</strong> these NFRs can be ensured throughout the whole system.<br />

Methods for the elicitati<strong>on</strong> <strong>of</strong> NFRs include questi<strong>on</strong>naires,<br />

checklists and templates for inquiring stakeholders c<strong>on</strong>cerning<br />

quality issues [24]. Basically, NFRs are made <strong>of</strong> textual<br />

sentences whose c<strong>on</strong>tents c<strong>on</strong>cern the expected qualities <strong>of</strong><br />

a s<strong>of</strong>tware system. For instance, ”The applicati<strong>on</strong> shall match<br />

the color <strong>of</strong> the schema set forth by Department <strong>of</strong> Homeland<br />

Security. LF” is a typical NFR record fetched out from the<br />

data set and it comprises two parts: textual descripti<strong>on</strong> and<br />

NFR type as ”LF”. The descripti<strong>on</strong> <strong>of</strong> NFR is very simple and<br />

easy to understand. It does not need pr<strong>of</strong>essi<strong>on</strong>al knowledge <strong>of</strong><br />

engineering aspects. Thus, text mining, which combines both<br />

natural language processing (NLP) and statistical machine<br />

learning, can be used for the task <strong>of</strong> automatic classificati<strong>on</strong><br />

<strong>of</strong> NFRs.<br />

Although there is substantial literature <strong>on</strong> NFR classificati<strong>on</strong><br />

[2] [4] [5] [17] , we find two problems. First, most methods<br />

are purely intuitive and derived without theoretical support<br />

or mathematical model. Also, some methods are extremely<br />

labor-intensive and time-c<strong>on</strong>suming, and others are qualitative,<br />

without quantitative analysis. Sec<strong>on</strong>d, the index terms used<br />

in existing automatic classificati<strong>on</strong> <strong>of</strong> NFRs are keywords<br />

extracted directly from requirements without feature selecti<strong>on</strong>.<br />

This would cause huge dimensi<strong>on</strong>ality <strong>of</strong> NFR vectors and<br />

c<strong>on</strong>sequently bring about huge computati<strong>on</strong> when quantitative<br />

methods are used. Most importance, we are uncertain that<br />

other index terms than keywords are more appropriate for<br />

automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />

The questi<strong>on</strong>s devised for this empirical study are: 1) am<strong>on</strong>g<br />

existing indexing methods, which <strong>on</strong>e is the best performance<br />

for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, and 2) is it possible to<br />

produce better performance with SVM than those derived in<br />

previous work?<br />

The remainder <strong>of</strong> this paper is organized as follows. Secti<strong>on</strong><br />

2 describes the research approach employed in this paper.<br />

Secti<strong>on</strong> 3 c<strong>on</strong>ducts experiments <strong>of</strong> using SVM and different<br />

index terms to classify NFRs. Secti<strong>on</strong> 4 present the related<br />

work and Secti<strong>on</strong> 5 c<strong>on</strong>cludes this paper.<br />

II. RESEARCH APPROACH<br />

The research approach adopted in the empirical study is<br />

shown in Figure 1 to automate the classificati<strong>on</strong> <strong>of</strong> NFRs. First,<br />

NFRs’ textual descripti<strong>on</strong> are represented using different types<br />

<strong>of</strong> index terms as N-grams, individual words, and MWEs,<br />

respectively. Then, we transfer NFR textual descripti<strong>on</strong> into<br />

numerical vectors in different feature space determined by<br />

different types <strong>of</strong> index terms. Sec<strong>on</strong>d, support vector machine<br />

(SVM), which is a popular classifier in machine learning [6]<br />

[9], is used to classify NFR vectors. Finally, performance <strong>of</strong><br />

each classificati<strong>on</strong> is evaluated.<br />

A. Index Terms<br />

The requirement data set we collected from the PROMISE<br />

web site (http://promisedata.org/repository) and it c<strong>on</strong>tains 625<br />

records <strong>of</strong> both functi<strong>on</strong>al and n<strong>on</strong>-functi<strong>on</strong>al requirements.<br />

Fig. 1.<br />

Procedures <strong>of</strong> automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />

N-gram Representati<strong>on</strong>. N-gram is proposed in text mining<br />

to categorize documents with textual errors such as<br />

spelling and grammatical errors, and it has been proved as<br />

an effective technique to handle these kinds <strong>of</strong> errors [7].<br />

<str<strong>on</strong>g>An</str<strong>on</strong>g> N-gram is an N-character c<strong>on</strong>tiguous fragment <strong>of</strong> a l<strong>on</strong>g<br />

string. Since every string is decomposed into small fragments,<br />

any errors that are present in words <strong>on</strong>ly affect a limited<br />

number <strong>of</strong> those fragments. If we measure the similarity <strong>of</strong><br />

two strings based <strong>on</strong> their N-grams, we will find that their<br />

similarity is immune to most textual errors. We used bi-gram<br />

(344 2-grams) and tri-gram (1,295 3-grams) for representati<strong>on</strong><br />

<strong>of</strong> NFRs, respectively.<br />

Word Representati<strong>on</strong>. The method <strong>of</strong> using individual<br />

words <strong>of</strong> a text c<strong>on</strong>tent to represent the text can be traced<br />

to Salt<strong>on</strong> et al[10]. We follow this idea to use words in NFRs<br />

for representati<strong>on</strong>. First, we eliminated stop words from NFRs’<br />

descripti<strong>on</strong> 1 . Sec<strong>on</strong>d, word stemming 2 was c<strong>on</strong>ducted to map<br />

word variants to the same stem. We set the minimum length<br />

<strong>of</strong> a stem as 2. That is, <strong>on</strong>ly the stems which have more than<br />

2 characters will be accepted as stems <strong>of</strong> words. Thus, 1,127<br />

individual word stems are produced from the data set.<br />

MWE Representati<strong>on</strong>. We used the method proposed by<br />

Justes<strong>on</strong> and Katz [12] for MWE extracti<strong>on</strong> from NFRs.<br />

The basic idea behind this method is that a MWE should<br />

include 2 to 6 individual words and it should occur in a<br />

collecti<strong>on</strong> <strong>of</strong> documents more than twice. Nevertheless, the<br />

part <strong>of</strong> speeches <strong>of</strong> individual words <strong>of</strong> a MWE should meet<br />

the regular expressi<strong>on</strong> described in formula 1.<br />

((A ∣ N) + ∣ (A ∣ N) ∗ (NP) ? (A ∣ N) ∗ )N (1)<br />

Here, A denotes an adjective, N denotes a noun and P<br />

denotes a prepositi<strong>on</strong>. Using the method adopted from [11],<br />

we extracted 93 MWEs from the data set.<br />

1 We obtain the stop words from http://ftp.uspto.gov/patft/help/stopword.htm<br />

2 we use Porter stemming algorithm that is available at http://tartarus.org/<br />

martin/PorterStemmer/


B. Feature Selecti<strong>on</strong><br />

In this paper, we adopt informati<strong>on</strong> gain (IG) [8], which<br />

is a classic method for feature selecti<strong>on</strong> in machine learning,<br />

to select informative index terms for automatic classificati<strong>on</strong><br />

<strong>of</strong> NFRs. IG is defined as the expected reducti<strong>on</strong> in entropy<br />

caused by partiti<strong>on</strong>ing NFRs according to a given term. The<br />

formula <strong>of</strong> IG is presented in Equati<strong>on</strong> 2 and the formula <strong>of</strong><br />

entropy is depicted in Equati<strong>on</strong> 3.<br />

Gain(S, A) =Entropy(S) −<br />

Entropy(S) =<br />

∑<br />

v∈Value(A)<br />

∣ S v ∣<br />

∣ S ∣ Entropy(S v)<br />

(2)<br />

c∑<br />

−p i log p i (3)<br />

i=1<br />

Here, S is the collecti<strong>on</strong> <strong>of</strong> the types <strong>of</strong> all NFRs such as<br />

PE (performance) and US (usability). Value(A) is a set <strong>of</strong> all<br />

possible values <strong>of</strong> index term A. S v is a subset for which A<br />

value v, c is the number <strong>of</strong> categories <strong>of</strong> all NFRs, and p i is the<br />

proporti<strong>on</strong> <strong>of</strong> the NFRs that bel<strong>on</strong>g to category i. To observe<br />

the performances <strong>of</strong> textual features <strong>on</strong> automatic classificati<strong>on</strong><br />

dynamically, the different feature sets <strong>of</strong> textual features are<br />

c<strong>on</strong>structed at different removal percentages <strong>of</strong> low IG value<br />

terms (see also removal ratio in Secti<strong>on</strong> 3.2).<br />

C. SVM Classifier<br />

The classifier used for automatic classificati<strong>on</strong> is support<br />

vector machine (SVM) [6] [9], that is introduced in statistical<br />

machine learning. We selected this method in this paper because<br />

Gokyer et al [9] used it to transfer NFRs to architectural<br />

c<strong>on</strong>cerns and preferred effectiveness was achieved. In this<br />

paper, the linear kernel (u*v) is used for SVM training because<br />

it is superior to n<strong>on</strong>-linear kernels for classifying textual<br />

c<strong>on</strong>tents validated by prior research [8] [13].<br />

III. EXPERIMENT<br />

A. Categories <strong>of</strong> the data set<br />

Table 1 lists the categories <strong>of</strong> both functi<strong>on</strong>al and n<strong>on</strong>functi<strong>on</strong>al<br />

requirements in the data set and their numbers.<br />

Functi<strong>on</strong>al requirements occupy the largest proporti<strong>on</strong> in the<br />

data set, amounting to 40.8% (255/625). Moreover, there is<br />

a skew distributi<strong>on</strong> am<strong>on</strong>g the NFR categories in the data<br />

set. For instance, the largest category ”usability” has 67 cases<br />

while the smallest category ”Palm Operati<strong>on</strong>al” has <strong>on</strong>ly <strong>on</strong>e<br />

case. We adopt binary classificati<strong>on</strong> in this paper because skew<br />

distributi<strong>on</strong> <strong>of</strong>ten deteriorates the performance <strong>of</strong> multi-class<br />

classificati<strong>on</strong> [28] .<br />

In binary classificati<strong>on</strong>, if the number <strong>of</strong> positive (negative)<br />

cases is much larger than the number <strong>of</strong> negative (positive)<br />

cases, the performance <strong>of</strong> automatic classificati<strong>on</strong> may be not<br />

c<strong>on</strong>vincing because <strong>of</strong> the biased distributi<strong>on</strong> <strong>of</strong> data points.<br />

For example, if 90% cases in a data set are positive, then<br />

the accuracy <strong>of</strong> automatic classificati<strong>on</strong> should be greater than<br />

90% if the classifier predicts all the data points with positive<br />

TABLE I<br />

CATEGORIES OF REQUIREMENTS AND THEIR NUMBERS IN THE DATA SET<br />

Abbr. Full Name Num<br />

F Functi<strong>on</strong>al 255<br />

US Usability 67<br />

SE Security 66<br />

O Operati<strong>on</strong>al 62<br />

PE Performance 54<br />

LF Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel 38<br />

A Availability 21<br />

SC Scalability 21<br />

MN Maintainability 17<br />

L Legal 13<br />

FT Fitness 10<br />

PO Palm Operati<strong>on</strong>al 1<br />

labels. For this reas<strong>on</strong>, the categories ”usability”, ”security”,<br />

and ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” with relatively medium sizes in the data<br />

set are used as the positive classes for binary classificati<strong>on</strong> <strong>on</strong>e<br />

by <strong>on</strong>e and meanwhile, the NFRs in remaining categories are<br />

used as negative classes.<br />

B. Classificati<strong>on</strong> Process<br />

There are actually two kinds <strong>of</strong> work involved in representing<br />

textual c<strong>on</strong>tents: indexing and term weighting. Indexing is<br />

the job <strong>of</strong> assigning index terms for textual c<strong>on</strong>tents and term<br />

weighting is job <strong>of</strong> assigning weights to terms, to measure the<br />

importance <strong>of</strong> index terms in textual documents. We employ<br />

Boolean values as term weights for the index terms for the<br />

reas<strong>on</strong> that most NFRs in the data set are very short and<br />

c<strong>on</strong>tain less than 20 index terms so they do not need complex<br />

weighting schemes such as those menti<strong>on</strong>ed for document<br />

representati<strong>on</strong> [15].<br />

IG is employed to change the percentages <strong>of</strong> index terms<br />

used for representati<strong>on</strong>. The removal ratio is predefined to<br />

remove the index terms with small entropy. For instance, if we<br />

set the removal ratio to 0.1 for representati<strong>on</strong> with individual<br />

words, then a percentage <strong>of</strong> 90% <strong>of</strong> individual words with<br />

smaller entropy will be eliminated from the feature set and<br />

we <strong>on</strong>ly use the remaining 10% <strong>of</strong> individual words for the<br />

representati<strong>on</strong> <strong>of</strong> NFRs. The purpose <strong>of</strong> varying different<br />

percentages <strong>of</strong> index terms for representati<strong>on</strong> is that we want<br />

to observe the robustness <strong>of</strong> classificati<strong>on</strong> performances when<br />

the set <strong>of</strong> index terms become smaller and smaller. This is<br />

especially important for deciding which type <strong>of</strong> index terms<br />

should be used for representing NFRs when computati<strong>on</strong><br />

capacity is not enough to support large dimensi<strong>on</strong> <strong>of</strong> vectors<br />

in training classifier. In representati<strong>on</strong> with MWEs, we use<br />

all the 1,127 individual words and a proporti<strong>on</strong>, which is is<br />

defined by the removal ratio, <strong>of</strong> MWEs as the index terms.<br />

The experiments in this paper are carried out with 10-fold<br />

cross-validati<strong>on</strong> technique. In each experiment, we divide the<br />

whole data set (for both positive and negative classes) into<br />

10 subsets. The 9 <strong>of</strong> 10 subsets are used for training and<br />

the remaining <strong>on</strong>e subset is used for testing. We repeat the<br />

experiment 10 times and the performance <strong>of</strong> the classificati<strong>on</strong><br />

is measured as average precisi<strong>on</strong> and recall [2] <strong>of</strong> the 10<br />

repetiti<strong>on</strong>s.


C. Experimental Results<br />

The outcome <strong>of</strong> our experiments shows that the precisi<strong>on</strong>s<br />

<strong>of</strong> all the automatic classificati<strong>on</strong> tasks are significantly higher<br />

than those <strong>of</strong> the classificati<strong>on</strong> approach proposed by Cleland-<br />

Huang et al [2]. Yet, the recalls <strong>of</strong> the automatic classificati<strong>on</strong>s<br />

in this paper are not comparable to the classificati<strong>on</strong> proposed<br />

by Cleland-Huang et al. We do not list the precisi<strong>on</strong>s and<br />

recalls produced in our experiments due to space limitati<strong>on</strong><br />

(Readers who have an interest in the precisi<strong>on</strong>s and recalls<br />

are welcome to ask the authors for more details).<br />

We c<strong>on</strong>jecture that the high recalls <strong>of</strong> the results <strong>of</strong> Cleland-<br />

Huang et al [2] can be attributed to the small size (within the<br />

range between 10 and 20) <strong>of</strong> index terms they employed in<br />

their experiments, which is much smaller than the sizes <strong>of</strong><br />

index terms (within the range between 100 and 1,000) used in<br />

our experiments. When small size <strong>of</strong> index terms is used, larger<br />

number <strong>of</strong> NFRs is classified as relevant NFRs <strong>of</strong> the category.<br />

C<strong>on</strong>sequently, more irrelevant NFRs will be misclassified as<br />

relevant. C<strong>on</strong>sidering an extreme case <strong>of</strong> classifying relevant<br />

NFRs with ”Usability”, if all the NFRs are regarded as relevant<br />

with ”Usability”, then the recall <strong>of</strong> automatic classificati<strong>on</strong> will<br />

be 100%. However, this classificati<strong>on</strong> result may be <strong>of</strong> less<br />

help for automating the task <strong>of</strong> classificati<strong>on</strong>. Thus, we argue<br />

that for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, precisi<strong>on</strong> should be<br />

given more importance if recall is acceptable.<br />

The F-measure [16] described in Equati<strong>on</strong> 4 combines both<br />

precisi<strong>on</strong> and recall for performance evaluati<strong>on</strong>. We used<br />

F-measure as the indicator for performance evaluati<strong>on</strong>. In<br />

general, the larger the F-measure is, the better the classificati<strong>on</strong><br />

result is. Here, for the purpose <strong>of</strong> comparis<strong>on</strong>, we make out<br />

the F-measures <strong>of</strong> Cleland-Huang et al [2] as the baseline<br />

performance in the experiments.<br />

2 × precisi<strong>on</strong> × recall<br />

F − measure = (4)<br />

precisi<strong>on</strong> + recall<br />

Figures 2-4 show the experimental results <strong>of</strong> classifying the<br />

assigned three categories ”Usability”, ”Security”, and ”Look<br />

<str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” at different removal ratios using four types <strong>of</strong> index<br />

terms (2-gram, 3-gram, word and MWE) to represent NFRs.<br />

First, it can be seen that the representati<strong>on</strong> with individual<br />

words has the best performance measured by F-measure nearly<br />

<strong>on</strong> all three categories. In some cases, representati<strong>on</strong> with terminologies<br />

improves the precisi<strong>on</strong> <strong>of</strong> automatic classificati<strong>on</strong><br />

<strong>of</strong> representati<strong>on</strong> with individual words (when the removal<br />

ratio arrives at 0.9). That is, even if most index terms are<br />

removed from the feature set, their performances do not<br />

decline drastically. However, in most cases, representati<strong>on</strong><br />

with terminologies is not able to improve the performances<br />

<strong>of</strong> automatic NFR classificati<strong>on</strong> compared with representati<strong>on</strong><br />

with individual words, even worse than that <strong>of</strong> 3-gram<br />

representati<strong>on</strong>. Both representati<strong>on</strong> with individual words and<br />

terminologies produce very robust classificati<strong>on</strong>.<br />

This outcome implies that for automatic classificati<strong>on</strong> <strong>of</strong><br />

NFRs, representati<strong>on</strong> with individual words is sufficient to<br />

produce a desirable performance. This outcome is very different<br />

from the experimental results produced by Zhang et al<br />

F−measure<br />

F−measure<br />

F−measure<br />

0.65<br />

0.6<br />

0.55<br />

0.5<br />

0.45<br />

0.4<br />

0.9<br />

Fig. 2.<br />

0.65<br />

0.6<br />

0.55<br />

0.5<br />

0.45<br />

0.4<br />

0.9<br />

Fig. 3.<br />

0.6<br />

0.55<br />

0.5<br />

0.45<br />

0.4<br />

0.35<br />

0.3<br />

Fig. 4.<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5 0.4<br />

Removal Ratio<br />

0.3<br />

2−gram<br />

3−gram<br />

individual words<br />

terminology<br />

baseline<br />

Classificati<strong>on</strong> performances <strong>on</strong> category ”Usability”.<br />

0.8<br />

0.7<br />

0.6<br />

0.5 0.4<br />

Removal Ratio<br />

0.3<br />

0.2<br />

2−gram<br />

3−gram<br />

individual words<br />

terminology<br />

baseline<br />

Classificati<strong>on</strong> performances <strong>on</strong> category ”Security”.<br />

0.8<br />

0.7<br />

0.6<br />

2−gram<br />

3−gram<br />

individual words<br />

terminology<br />

baseline<br />

0.5 0.4<br />

Removal Ratio<br />

Classificati<strong>on</strong> performances <strong>on</strong> category ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel”.<br />

0.3<br />

0.2<br />

0.2<br />

0.1<br />

0.1<br />

0.1<br />

0<br />

0<br />

0


[8]. They argued that MWEs are superior to individual words<br />

for classifying news documents automatically. We explain that<br />

the lengths <strong>of</strong> NFRs in the data set are much shorter (20<br />

individual words <strong>on</strong> average) than those <strong>of</strong> Reuters-21578<br />

news documents (80 individual words <strong>on</strong> average) so the<br />

semantics inherent in textual c<strong>on</strong>tents <strong>of</strong> NFRs is not so much<br />

important as those inherent in news documents.<br />

Sec<strong>on</strong>d, for representati<strong>on</strong> with N-grams, representati<strong>on</strong><br />

with 3-grams outperforms that with 2-grams. This outcome<br />

can be attributed to that the number <strong>of</strong> 3-grams is much<br />

larger than that <strong>of</strong> 2-grams (see Secti<strong>on</strong> 2.1). Moreover, the<br />

robustness <strong>of</strong> the 3-gram representati<strong>on</strong> is better than the 2-<br />

gram representati<strong>on</strong> because the magnitudes <strong>of</strong> variati<strong>on</strong>s <strong>of</strong><br />

F-measures <strong>of</strong> the 3-gram representati<strong>on</strong> are smaller than those<br />

produced by the 2-gram representati<strong>on</strong>.<br />

Thirdly, for classificati<strong>on</strong> <strong>on</strong> different categories, the performance<br />

<strong>on</strong> ”Security” is as almost the same as that <strong>on</strong><br />

”Usability” but outperforms that <strong>on</strong> ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel”. The<br />

number <strong>of</strong> NFRs in category ”Security” (66) is approximately<br />

equal to that in category ”Usability” (67), which is much<br />

larger than the number <strong>of</strong> ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” (38). We explain<br />

that automatic classificati<strong>on</strong> would produce more favorable<br />

performance <strong>on</strong> categories those having large sizes than those<br />

having small sizes.<br />

Based <strong>on</strong> the experimental results, the answer for questi<strong>on</strong><br />

1 devised for this case study in Secti<strong>on</strong> 1 should be that to<br />

date, keywords are the most effective index terms for automatic<br />

classificati<strong>on</strong> <strong>of</strong> NFRs and for questi<strong>on</strong> 2, our answer is that<br />

the machine learning technique, at least SVM, can improve<br />

the classificati<strong>on</strong> significantly.<br />

IV. RELATED WORK<br />

NFR has attracted much interest <strong>of</strong> researchers from s<strong>of</strong>tware<br />

engineering field. Much work has been invested in<br />

managing NFRs. Tran and Chung [20] proposed a prototype<br />

tool for explicit representati<strong>on</strong> <strong>of</strong> NFRs. They aim to c<strong>on</strong>sider<br />

NFRs in a more goal-oriented perspective and argue that NFRs<br />

have much influence <strong>on</strong> the design <strong>of</strong> the soluti<strong>on</strong> as well as<br />

reinforce engineering process. In order to formulate implicit<br />

relati<strong>on</strong>ships <strong>of</strong> NFRs , <strong>on</strong>tology and graphic visualizati<strong>on</strong> are<br />

used in their tool to express s<strong>of</strong>tgoals and their interdependencies<br />

explicitly.<br />

Cleland-Huang et al [4] introduce a goal-centric approach to<br />

managing the impact <strong>of</strong> change up<strong>on</strong> the NFRs <strong>of</strong> a s<strong>of</strong>tware<br />

system. They partiti<strong>on</strong> NFRs into different s<strong>of</strong>tgoals <strong>of</strong> a<br />

system and c<strong>on</strong>struct a s<strong>of</strong>tgoal interdependency graph (SIG)<br />

to trace both direct and indirect impacts <strong>of</strong> s<strong>of</strong>tware changes <strong>on</strong><br />

NFRs. Probabilistic network model is employed to enable the<br />

traceability <strong>of</strong> impacts. However, the job <strong>of</strong> c<strong>on</strong>structing SIG<br />

is labor-intensive and time-c<strong>on</strong>suming because <strong>of</strong> the lack <strong>of</strong><br />

automatic approaches. If we can classify NFRs into different<br />

s<strong>of</strong>tgoals automatically, it would be much easier to identify<br />

the relati<strong>on</strong>s between subgoals and s<strong>of</strong>tgoals. That is, human<br />

workload involved in c<strong>on</strong>structing SIG will be reduced to a<br />

great extent.<br />

In another work, Cleland-Huang, et al.[2] proposed an<br />

informati<strong>on</strong> retrieval method to discover and identify NFRs<br />

from system specificati<strong>on</strong>. Their basic assumpti<strong>on</strong> is that<br />

different types <strong>of</strong> NFRs are characterized by distinct keywords<br />

(index terms) that can be learned from documents <strong>of</strong> that<br />

type. However, their method <strong>of</strong> selecting index terms is ad-hoc<br />

and does not c<strong>on</strong>sider any linguistic properties <strong>of</strong> those index<br />

terms. Moreover, the classifier used in their method, which is<br />

based <strong>on</strong> the additive weights <strong>of</strong> index terms <strong>on</strong> a given NFR<br />

type, is quite simple without any theoretical ground.<br />

Rosenhainer [14] proposed aspect mining to identify crosscutting<br />

c<strong>on</strong>cerns in requirements specificati<strong>on</strong>s. Two techniques<br />

are suggested to be used for aspect mining: identificati<strong>on</strong><br />

through inspecti<strong>on</strong> and identificati<strong>on</strong> supported by<br />

informati<strong>on</strong> retrieval (IR). The former <strong>on</strong>e is manual and<br />

the later <strong>on</strong>e is semi-automatic. Rosenhainer argued that IRbased<br />

technique is more promising than manual method and<br />

their experimental results <strong>on</strong> interacti<strong>on</strong>s between functi<strong>on</strong>al<br />

and n<strong>on</strong>-functi<strong>on</strong>al requirements have validated this argument.<br />

This work encourages our study in this paper to use IR<br />

techniques for automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />

Casamayor et al [24] employed naïve Bayes and EM<br />

(Expectati<strong>on</strong> Maximizati<strong>on</strong>) algorithm as a semi-supervised<br />

learning approach to classify n<strong>on</strong>-functi<strong>on</strong>al requirements in<br />

textual specificati<strong>on</strong>s. They used the same data set as used<br />

in this paper and reported that their algorithm produced an<br />

average accuracy above 70%. In fact, the performance <strong>of</strong> SVM<br />

in our experiments is better than theirs because we exclude<br />

those categories <strong>of</strong> small number <strong>of</strong> data points. That is,<br />

unbalanced distributi<strong>on</strong> <strong>of</strong> data points is purposely alleviated in<br />

our experiments. Moreover, our c<strong>on</strong>jecture is also validated by<br />

their experimental results that those category <strong>of</strong> large number<br />

<strong>of</strong> data points such as ”usability”, ”security” and ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d<br />

Feel”.<br />

V. CONCLUDING REMARKS<br />

NFR is crucial to the success <strong>of</strong> a s<strong>of</strong>tware system as it<br />

describes necessary qualities <strong>of</strong> system to avoid devastating<br />

effects and system failure [17]. In this empirical study, vector<br />

space model and machine learning technique are employed to<br />

classify NFRs automatically. We used different index terms to<br />

transfer NFRs into numeric vectors and examined their performances<br />

<strong>on</strong> automatic classificati<strong>on</strong> <strong>of</strong> NFRs. The machine<br />

learning classifier we adopted in this paper is SVM with linear<br />

kernel, which is widely recommended as a promising classifier<br />

for text mining. Informati<strong>on</strong> gain for feature selecti<strong>on</strong> and<br />

SVM for automatic classificati<strong>on</strong> is introduced. We c<strong>on</strong>ducted<br />

experiments using the data set collected from PROMISE data<br />

set.<br />

The experimental results show that individual words, when<br />

used as the index terms, have the best performance in classifying<br />

NFRs automatically. We noticed that, in most cases, the<br />

more samples in a category in data set, the better performance<br />

the automatic classifier will produce <strong>on</strong> the category. This<br />

outcome illustrated that the number <strong>of</strong> NFRs in the data set is<br />

an important factor for automating the classificati<strong>on</strong> <strong>of</strong> NFRs.


This inference suggests that we need to collect NFRs as many<br />

as possible if automatic classificati<strong>on</strong> is desired.<br />

This work can be applied in at least two aspects in s<strong>of</strong>tware<br />

engineering currently. First, it can be used to identify NFRs<br />

in requirement specificati<strong>on</strong>. Usually, customers, testers, and<br />

stakeholders <strong>of</strong> a s<strong>of</strong>tware system will mix their desirable<br />

qualities in a specificati<strong>on</strong>. Whereas functi<strong>on</strong>al requirements<br />

describe what the system needs to do, NFRs describe c<strong>on</strong>straints<br />

<strong>on</strong> the soluti<strong>on</strong> space and capture a broad spectrum<br />

<strong>of</strong> properties, such as usability and security [18]. Because the<br />

soluti<strong>on</strong>s <strong>of</strong> functi<strong>on</strong>al requirements and NFRs are different, or<br />

the time to c<strong>on</strong>sider these two kinds <strong>of</strong> requirement in system<br />

design is different, we must differentiate these two kinds <strong>of</strong><br />

requirements. Sec<strong>on</strong>d, different NFRs are <strong>of</strong>ten handled by<br />

different designers and developers with different background<br />

knowledge in architecture, it is crucial to classify the NFRs<br />

into different categories so that the NFRs in the same category<br />

can be processed as a comprehensive requirement in system<br />

design.<br />

ACKNOWLEDGMENTS<br />

This work is supported by the Nati<strong>on</strong>al Natural Science<br />

Foundati<strong>on</strong> <strong>of</strong> China under Grant Nos. 90718042, 60873072,<br />

61073044, and 60903050; the Nati<strong>on</strong>al Science and Technology<br />

Major Project; the Nati<strong>on</strong>al Basic Research Program under<br />

Grant No. 2007CB310802; the Scientific Research Foundati<strong>on</strong><br />

for the Returned Overseas Chinese Scholars, State Educati<strong>on</strong><br />

Ministry.<br />

REFERENCES<br />

[1] Y. Yang, J. Bhuta and D. Port and B. Boehm, Value-Based Processes for<br />

COTS-Based Applicati<strong>on</strong>s, IEEE S<strong>of</strong>tware, 22(4), 54-62, July/August,<br />

2005.<br />

[2] J. Cleland-Huang, R. Settimi and X. Zou and P. Solc, The Detecti<strong>on</strong> and<br />

Classificati<strong>on</strong> <strong>of</strong> N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong> with Applicati<strong>on</strong> to Early<br />

Aspects, In Proceedings <strong>of</strong> the 14th IEEE Internati<strong>on</strong>al <strong>Requirements</strong><br />

Engineering C<strong>on</strong>ference, 36-45, 2006.<br />

[3] B. Nuseibeh, Weaving Together <strong>Requirements</strong> and Architecture, IEEE<br />

Computer, 34(3), 115-117, 2001.<br />

[4] J. Cleland-Huang, et al, Goal-Centric Traceability for Managing N<strong>on</strong>-<br />

Functi<strong>on</strong>al <strong>Requirements</strong>, In Proceedings <strong>of</strong> the 27th Internati<strong>on</strong>al C<strong>on</strong>ference<br />

<strong>on</strong> S<strong>of</strong>tware Engineering, 362-371, 2005.<br />

[5] D. Zhang and J. J. P. Tsai, Machine Learning and S<strong>of</strong>tware Engineering,<br />

S<strong>of</strong>tware Quality Journal, 11, 87-119, 2003.<br />

[6] V. Vapnic, The Nature <strong>of</strong> Statistical Learning Theory, Springer, New York,<br />

1995.<br />

[7] W. B. Cavnar and J. M. Trenkle, N-Gram-Based Text Categorizati<strong>on</strong>, Proceedings<br />

<strong>of</strong> SDAIR-94, 3rd <str<strong>on</strong>g>An</str<strong>on</strong>g>nual Symposium <strong>on</strong> Document <str<strong>on</strong>g>An</str<strong>on</strong>g>alysis<br />

and Informati<strong>on</strong> Retrieval, 161-175, 1994.<br />

[8] W. Zhang, T. Yoshida and X. J. Tang, Text classificati<strong>on</strong> based <strong>on</strong> multiword<br />

with support vector machine, Knowledge-based Systems, 21(8),<br />

879-886, 2008.<br />

[9] G. Gokyer, et al, N<strong>on</strong>-functi<strong>on</strong>al <strong>Requirements</strong> to Architectural C<strong>on</strong>cerns:<br />

ML and NLP at Crossroads, Proceedings <strong>of</strong> Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong><br />

S<strong>of</strong>tware Engineering Advances, 2008.<br />

[10] G. Salt<strong>on</strong> and A. W<strong>on</strong>g and C. S. Yang, A Vector Space Model for<br />

Automatic Indexing, Communicati<strong>on</strong>s <strong>of</strong> the ACM, 18(11), 613-620,<br />

1975.<br />

[11] W. Zhang, T. Yoshida and X. J. Tang, Improving effectiveness <strong>of</strong><br />

mutual informati<strong>on</strong> for substantival multiword expressi<strong>on</strong> extracti<strong>on</strong>,<br />

Expert Systems with Applicati<strong>on</strong>s, 36(8), 10919-10930, 2009.<br />

[12] J. S. Justes<strong>on</strong> and S. M. Katz, Technical terminology: some linguistic<br />

properties and an algorithm for identificati<strong>on</strong> in text, Natural Language<br />

Engineering, 1(1), 9-27, 1995.<br />

[13] Y. M. Yang and X. Lin, A re-examinati<strong>on</strong> <strong>of</strong> text categorizati<strong>on</strong> methods,<br />

In Proceedings <strong>on</strong> the 22nd <str<strong>on</strong>g>An</str<strong>on</strong>g>nual Internati<strong>on</strong>al ACM C<strong>on</strong>ference <strong>on</strong><br />

Research and Development in Informati<strong>on</strong> Retrieval, 42-49, 1999.<br />

[14] L. Rosenhainer, Identifying Crosscutting C<strong>on</strong>cerns in <strong>Requirements</strong><br />

Specificati<strong>on</strong>s, In Proceedings <strong>of</strong> OOPSLA Early Aspects 2004: Aspect-<br />

Oriented <strong>Requirements</strong> Engineering and Architecture Design Workshop,<br />

2004.<br />

[15] G. Salt<strong>on</strong> and C. Buckley, Term weighting approaches in automatic text<br />

retrieval, Informati<strong>on</strong> Processing Management, 24, 513-523, 1998.<br />

[16] J. Han and M. Kamber, Data Mining C<strong>on</strong>cepts and Techniques, Morgan<br />

Kaufmann Publishers, New York, 2006.<br />

[17] F. Brooks, No sliver bullet-essence and accidents <strong>of</strong> s<strong>of</strong>tware engineering,<br />

IEEE Computer, 20(4), 10-19, 1987.<br />

[18] I. Sommerville and P. Sawyer, Viewpoints: Principles, Problems, and<br />

a Practical Approach to <strong>Requirements</strong> Engineering, <str<strong>on</strong>g>An</str<strong>on</strong>g>nals <strong>of</strong> S<strong>of</strong>tware<br />

Engineering, 3, 101-130, 1997.<br />

[19] B. Nuseibeh and S. Easterbrook, Requirement Engineering: A Roadmap,<br />

Proceedings <strong>of</strong> the C<strong>on</strong>ference <strong>on</strong> The Future <strong>of</strong> S<strong>of</strong>tware Engineering(ICSE’00),<br />

35-46, 2000.<br />

[20] Q. Tran and L. ChungL, NFR-Assitant: Tool Support for Achieving<br />

Quality, Proceedings <strong>of</strong> IEEE Symposium <strong>on</strong> Applicati<strong>on</strong>-Specific Systems<br />

and S<strong>of</strong>tware Engineering and Technology (ASSET’99),1999.<br />

[21] E. A. Emers<strong>on</strong>, The beginning <strong>of</strong> model checking: A pers<strong>on</strong>al perspective,<br />

25 Years <strong>of</strong> Model Checking, 27-45, 2008.<br />

[22] J. Xiao, L. J. Osterweil, Q. Wang and M. Li, L, Disrupti<strong>on</strong>-Driven Resource<br />

Rescheduling in S<strong>of</strong>tware Development Processes, In Proceedings<br />

<strong>of</strong> 4th Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> S<strong>of</strong>tware Process, 234-247, 2010.<br />

[23] L. Chung, J. Prado Leite, On N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong> in S<strong>of</strong>tware<br />

Engineering, A.T. Borgida et al. (Eds.): Mylopoulos Festschrift, LNCS<br />

5600, 363-379, 2009.<br />

[24] A, Casamayor, D. Godoy and M. Campo, Identificati<strong>on</strong> <strong>of</strong> n<strong>on</strong>functi<strong>on</strong>al<br />

requirements in textual specificati<strong>on</strong>s: A semi-supervised learning<br />

approach, Informati<strong>on</strong> and S<strong>of</strong>tware Technology, 52, 436-445, 2010.<br />

[25] F. Tsui and O. Karam, Essentials <strong>of</strong> S<strong>of</strong>tware Engineering (Sec<strong>on</strong>d<br />

Editi<strong>on</strong>), J<strong>on</strong>es and Bartlett Publishers, Sudbury, Masshachusetts, 2011.<br />

[26] G. Salt<strong>on</strong> and M.J. McGill, Introducti<strong>on</strong> to Modern Informati<strong>on</strong> Retrieval,<br />

McGraw-Hill Book Company, New York, NY, USA, 1986.<br />

[27] J. Davis and M. Goadrich, The relati<strong>on</strong>ship between precisi<strong>on</strong>-recall<br />

and ROC curves, In Proceedings <strong>of</strong> the 23rd internati<strong>on</strong>al c<strong>on</strong>ference <strong>on</strong><br />

Machine learning, 233-240, 2006.<br />

[28] J. West<strong>on</strong> and C. Watkins, Multi-class support vector machines, In<br />

Proceedings <strong>of</strong> 7th European Symposium <strong>on</strong> Artificial Neural Networks,<br />

219-224, 1999.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!