An Empirical Study on Classification of Non-Functional Requirements
An Empirical Study on Classification of Non-Functional Requirements
An Empirical Study on Classification of Non-Functional Requirements
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>An</str<strong>on</strong>g> <str<strong>on</strong>g>Empirical</str<strong>on</strong>g> <str<strong>on</strong>g>Study</str<strong>on</strong>g> <strong>on</strong> Classificati<strong>on</strong> <strong>of</strong><br />
N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong><br />
Wen Zhang, Ye Yang, Qing Wang, Fengdi Shu<br />
Laboratory for Internet S<strong>of</strong>tware Technologies<br />
Institute <strong>of</strong> S<strong>of</strong>tware, Chinese Academy <strong>of</strong> Sciences<br />
Beijing 100190, P.R.China<br />
{zhangwen,ye,wq, fdshu}@itechs.iscas.ac.cn<br />
Abstract—The classificati<strong>on</strong> <strong>of</strong> NFRs brings about the benefits<br />
that NFRs with respect to the same type in the system can be<br />
c<strong>on</strong>sidered and implemented aggregately by developers, and as<br />
a result be verified by quality assurers assigned for the type.<br />
This paper c<strong>on</strong>ducts an empirical study <strong>on</strong> using text mining<br />
techniques to classify NFRs automatically. Three kinds <strong>of</strong> index<br />
terms, which are at different levels <strong>of</strong> linguistical semantics, as<br />
N-grams, individual words, and multi-word expressi<strong>on</strong>s (MWE),<br />
are used in representati<strong>on</strong> <strong>of</strong> NFRs. Then, SVM (Support Vector<br />
Machine) with linear kernel is used as the classifier. We collected<br />
a data set from PROMISE web site for experimentati<strong>on</strong> in<br />
this empirical study. The experiments show that index term as<br />
individual words with Boolean weighting outperforms the other<br />
two index terms. When MWEs are used to enhance representati<strong>on</strong><br />
<strong>of</strong> individual words, there is no significant improvement<br />
<strong>on</strong> classificati<strong>on</strong> performance. Automatic classificati<strong>on</strong> produces<br />
better performance <strong>on</strong> categories <strong>of</strong> large sizes than that <strong>on</strong><br />
categories <strong>of</strong> small sizes. It can be drawn from the experimental<br />
results that for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, individual words<br />
are the best index terms in text representati<strong>on</strong> <strong>of</strong> short NFRs’<br />
descripti<strong>on</strong> and we should collect as many as possible NFRs <strong>of</strong><br />
s<strong>of</strong>tware system.<br />
Keywords—N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong>, Automatic Classificati<strong>on</strong>,<br />
Support Vector Machine.<br />
I. INTRODUCTION<br />
N<strong>on</strong>-functi<strong>on</strong>al requirements (NFRs) describe the expected<br />
qualities <strong>of</strong> a s<strong>of</strong>tware system such as usability, security and<br />
look and feel <strong>of</strong> user interface. These qualities do impact <strong>on</strong><br />
the architectural design <strong>of</strong> the s<strong>of</strong>tware system [1] and satisfacti<strong>on</strong><br />
<strong>of</strong> stakeholders relevant with the s<strong>of</strong>tware system [2] [3]<br />
[4]. NFRs are put forward by stakeholders with no expertise<br />
in s<strong>of</strong>tware engineering and the number <strong>of</strong> NFRs is <strong>of</strong>ten<br />
very large. Moreover, NFRs are scattered over the functi<strong>on</strong>al<br />
requirements and the types <strong>of</strong> the NFRs are unknown because<br />
they are written in natural language. These charateriestics<br />
<strong>of</strong> NFRs make its classificati<strong>on</strong> a labor-intensive and timec<strong>on</strong>suming<br />
job without the adopti<strong>on</strong> <strong>of</strong> automatic techniques.<br />
Nevertheless, s<strong>of</strong>tware developers cannot ignore NFRs because<br />
they are decisive <strong>on</strong> the success <strong>of</strong> the s<strong>of</strong>tware system.<br />
Compared with functi<strong>on</strong>al requirement, NFRs tend to be<br />
properties <strong>of</strong> a system as whole, and hence cannot be verified<br />
for individual comp<strong>on</strong>ents or modules [19] [23]. This makes<br />
the classificati<strong>on</strong> <strong>of</strong> NFRs necessary in development because<br />
, <strong>on</strong> the <strong>on</strong>e hand,we need to handle NFRs in a different way<br />
from functi<strong>on</strong>al requirements and, different types <strong>of</strong> NFRs<br />
should be handled by developers with different expertise.<br />
For instance, in a security critical, missi<strong>on</strong> critical, or<br />
ec<strong>on</strong>omy vital s<strong>of</strong>tware systems, formal method with model<br />
checking <strong>of</strong> the correctness <strong>of</strong> specificati<strong>on</strong> <strong>on</strong> system security<br />
is used to determine whether or not the <strong>on</strong>going behavior<br />
holds <strong>of</strong> the system’s structure. However, a serious (and<br />
obvious) drawback <strong>of</strong> model checking is the state explosi<strong>on</strong><br />
problem because the size <strong>of</strong> the global state graph is (at least)<br />
exp<strong>on</strong>ential in the size <strong>of</strong> the program text [21]. Thus, an<br />
exigent demand from system developers is to identify the<br />
security-related NFRs from the requirement document and<br />
express them using formal specificati<strong>on</strong>s, and then to c<strong>on</strong>duct<br />
verificati<strong>on</strong> <strong>of</strong> the satisfacti<strong>on</strong> <strong>of</strong> system’s properties <strong>on</strong> those<br />
security specificati<strong>on</strong>s.<br />
On the other hand, the classificati<strong>on</strong> <strong>of</strong> NFRs benefits an<br />
overall decisi<strong>on</strong> <strong>on</strong> the systems’ satisfacti<strong>on</strong> <strong>on</strong> each category<br />
<strong>of</strong> NFRs and the progress <strong>of</strong> system development developers<br />
have made. Functi<strong>on</strong>al requirements can be measured as either<br />
satisfied or not satisfied, but NFRs can not merely measured by<br />
a linear scale as degree <strong>of</strong> satisfacti<strong>on</strong> [25]. In system testing,<br />
for example, we can customize our test strategy based <strong>on</strong> the<br />
classificati<strong>on</strong> <strong>of</strong> NFRs and thus system behaviors <strong>of</strong> NFRs<br />
were reported directly to the project manager.<br />
Most, if not all, projects have resource limitati<strong>on</strong>s and<br />
time c<strong>on</strong>straints, with different requirements having different<br />
c<strong>on</strong>cerns <strong>of</strong> s<strong>of</strong>tware systems. Even after the requirements are<br />
elicited and collected, they are still just an unorganized set <strong>of</strong><br />
data. As such, it is necessary to categorize the requirements<br />
into different types so as to match problems and soluti<strong>on</strong>s<br />
in separate domain <strong>of</strong> discourse, especially for large-scale<br />
s<strong>of</strong>tware projects with hundreds or even thousands <strong>of</strong> requirements.<br />
In human resource allocati<strong>on</strong> and optimizati<strong>on</strong> [22], different<br />
developers possess different expertise in handling various<br />
aspects <strong>of</strong> s<strong>of</strong>tware development. Different tasks in development<br />
may need different expertise and capability from the<br />
developers. Thus, a match <strong>of</strong> developers and tasks is at the<br />
core <strong>of</strong> the success <strong>of</strong> s<strong>of</strong>tware development. The identificati<strong>on</strong><br />
<strong>of</strong> different types <strong>of</strong> NFRs results in the formati<strong>on</strong><br />
<strong>of</strong> different types <strong>of</strong> development tasks. C<strong>on</strong>sequently, those<br />
tasks are assigned to developers aggregately according to their<br />
expertise and capability level. For instance, the NFRs with<br />
”performance” type and the NFRs with ”maintainability” type<br />
should be dealt with by developers with different expertise.
We forward NFRs <strong>of</strong> the type ”look and feel” to UI (User<br />
Interface) design experts <strong>of</strong> the system so that the satisfacti<strong>on</strong><br />
<strong>of</strong> these NFRs can be ensured throughout the whole system.<br />
Methods for the elicitati<strong>on</strong> <strong>of</strong> NFRs include questi<strong>on</strong>naires,<br />
checklists and templates for inquiring stakeholders c<strong>on</strong>cerning<br />
quality issues [24]. Basically, NFRs are made <strong>of</strong> textual<br />
sentences whose c<strong>on</strong>tents c<strong>on</strong>cern the expected qualities <strong>of</strong><br />
a s<strong>of</strong>tware system. For instance, ”The applicati<strong>on</strong> shall match<br />
the color <strong>of</strong> the schema set forth by Department <strong>of</strong> Homeland<br />
Security. LF” is a typical NFR record fetched out from the<br />
data set and it comprises two parts: textual descripti<strong>on</strong> and<br />
NFR type as ”LF”. The descripti<strong>on</strong> <strong>of</strong> NFR is very simple and<br />
easy to understand. It does not need pr<strong>of</strong>essi<strong>on</strong>al knowledge <strong>of</strong><br />
engineering aspects. Thus, text mining, which combines both<br />
natural language processing (NLP) and statistical machine<br />
learning, can be used for the task <strong>of</strong> automatic classificati<strong>on</strong><br />
<strong>of</strong> NFRs.<br />
Although there is substantial literature <strong>on</strong> NFR classificati<strong>on</strong><br />
[2] [4] [5] [17] , we find two problems. First, most methods<br />
are purely intuitive and derived without theoretical support<br />
or mathematical model. Also, some methods are extremely<br />
labor-intensive and time-c<strong>on</strong>suming, and others are qualitative,<br />
without quantitative analysis. Sec<strong>on</strong>d, the index terms used<br />
in existing automatic classificati<strong>on</strong> <strong>of</strong> NFRs are keywords<br />
extracted directly from requirements without feature selecti<strong>on</strong>.<br />
This would cause huge dimensi<strong>on</strong>ality <strong>of</strong> NFR vectors and<br />
c<strong>on</strong>sequently bring about huge computati<strong>on</strong> when quantitative<br />
methods are used. Most importance, we are uncertain that<br />
other index terms than keywords are more appropriate for<br />
automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />
The questi<strong>on</strong>s devised for this empirical study are: 1) am<strong>on</strong>g<br />
existing indexing methods, which <strong>on</strong>e is the best performance<br />
for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, and 2) is it possible to<br />
produce better performance with SVM than those derived in<br />
previous work?<br />
The remainder <strong>of</strong> this paper is organized as follows. Secti<strong>on</strong><br />
2 describes the research approach employed in this paper.<br />
Secti<strong>on</strong> 3 c<strong>on</strong>ducts experiments <strong>of</strong> using SVM and different<br />
index terms to classify NFRs. Secti<strong>on</strong> 4 present the related<br />
work and Secti<strong>on</strong> 5 c<strong>on</strong>cludes this paper.<br />
II. RESEARCH APPROACH<br />
The research approach adopted in the empirical study is<br />
shown in Figure 1 to automate the classificati<strong>on</strong> <strong>of</strong> NFRs. First,<br />
NFRs’ textual descripti<strong>on</strong> are represented using different types<br />
<strong>of</strong> index terms as N-grams, individual words, and MWEs,<br />
respectively. Then, we transfer NFR textual descripti<strong>on</strong> into<br />
numerical vectors in different feature space determined by<br />
different types <strong>of</strong> index terms. Sec<strong>on</strong>d, support vector machine<br />
(SVM), which is a popular classifier in machine learning [6]<br />
[9], is used to classify NFR vectors. Finally, performance <strong>of</strong><br />
each classificati<strong>on</strong> is evaluated.<br />
A. Index Terms<br />
The requirement data set we collected from the PROMISE<br />
web site (http://promisedata.org/repository) and it c<strong>on</strong>tains 625<br />
records <strong>of</strong> both functi<strong>on</strong>al and n<strong>on</strong>-functi<strong>on</strong>al requirements.<br />
Fig. 1.<br />
Procedures <strong>of</strong> automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />
N-gram Representati<strong>on</strong>. N-gram is proposed in text mining<br />
to categorize documents with textual errors such as<br />
spelling and grammatical errors, and it has been proved as<br />
an effective technique to handle these kinds <strong>of</strong> errors [7].<br />
<str<strong>on</strong>g>An</str<strong>on</strong>g> N-gram is an N-character c<strong>on</strong>tiguous fragment <strong>of</strong> a l<strong>on</strong>g<br />
string. Since every string is decomposed into small fragments,<br />
any errors that are present in words <strong>on</strong>ly affect a limited<br />
number <strong>of</strong> those fragments. If we measure the similarity <strong>of</strong><br />
two strings based <strong>on</strong> their N-grams, we will find that their<br />
similarity is immune to most textual errors. We used bi-gram<br />
(344 2-grams) and tri-gram (1,295 3-grams) for representati<strong>on</strong><br />
<strong>of</strong> NFRs, respectively.<br />
Word Representati<strong>on</strong>. The method <strong>of</strong> using individual<br />
words <strong>of</strong> a text c<strong>on</strong>tent to represent the text can be traced<br />
to Salt<strong>on</strong> et al[10]. We follow this idea to use words in NFRs<br />
for representati<strong>on</strong>. First, we eliminated stop words from NFRs’<br />
descripti<strong>on</strong> 1 . Sec<strong>on</strong>d, word stemming 2 was c<strong>on</strong>ducted to map<br />
word variants to the same stem. We set the minimum length<br />
<strong>of</strong> a stem as 2. That is, <strong>on</strong>ly the stems which have more than<br />
2 characters will be accepted as stems <strong>of</strong> words. Thus, 1,127<br />
individual word stems are produced from the data set.<br />
MWE Representati<strong>on</strong>. We used the method proposed by<br />
Justes<strong>on</strong> and Katz [12] for MWE extracti<strong>on</strong> from NFRs.<br />
The basic idea behind this method is that a MWE should<br />
include 2 to 6 individual words and it should occur in a<br />
collecti<strong>on</strong> <strong>of</strong> documents more than twice. Nevertheless, the<br />
part <strong>of</strong> speeches <strong>of</strong> individual words <strong>of</strong> a MWE should meet<br />
the regular expressi<strong>on</strong> described in formula 1.<br />
((A ∣ N) + ∣ (A ∣ N) ∗ (NP) ? (A ∣ N) ∗ )N (1)<br />
Here, A denotes an adjective, N denotes a noun and P<br />
denotes a prepositi<strong>on</strong>. Using the method adopted from [11],<br />
we extracted 93 MWEs from the data set.<br />
1 We obtain the stop words from http://ftp.uspto.gov/patft/help/stopword.htm<br />
2 we use Porter stemming algorithm that is available at http://tartarus.org/<br />
martin/PorterStemmer/
B. Feature Selecti<strong>on</strong><br />
In this paper, we adopt informati<strong>on</strong> gain (IG) [8], which<br />
is a classic method for feature selecti<strong>on</strong> in machine learning,<br />
to select informative index terms for automatic classificati<strong>on</strong><br />
<strong>of</strong> NFRs. IG is defined as the expected reducti<strong>on</strong> in entropy<br />
caused by partiti<strong>on</strong>ing NFRs according to a given term. The<br />
formula <strong>of</strong> IG is presented in Equati<strong>on</strong> 2 and the formula <strong>of</strong><br />
entropy is depicted in Equati<strong>on</strong> 3.<br />
Gain(S, A) =Entropy(S) −<br />
Entropy(S) =<br />
∑<br />
v∈Value(A)<br />
∣ S v ∣<br />
∣ S ∣ Entropy(S v)<br />
(2)<br />
c∑<br />
−p i log p i (3)<br />
i=1<br />
Here, S is the collecti<strong>on</strong> <strong>of</strong> the types <strong>of</strong> all NFRs such as<br />
PE (performance) and US (usability). Value(A) is a set <strong>of</strong> all<br />
possible values <strong>of</strong> index term A. S v is a subset for which A<br />
value v, c is the number <strong>of</strong> categories <strong>of</strong> all NFRs, and p i is the<br />
proporti<strong>on</strong> <strong>of</strong> the NFRs that bel<strong>on</strong>g to category i. To observe<br />
the performances <strong>of</strong> textual features <strong>on</strong> automatic classificati<strong>on</strong><br />
dynamically, the different feature sets <strong>of</strong> textual features are<br />
c<strong>on</strong>structed at different removal percentages <strong>of</strong> low IG value<br />
terms (see also removal ratio in Secti<strong>on</strong> 3.2).<br />
C. SVM Classifier<br />
The classifier used for automatic classificati<strong>on</strong> is support<br />
vector machine (SVM) [6] [9], that is introduced in statistical<br />
machine learning. We selected this method in this paper because<br />
Gokyer et al [9] used it to transfer NFRs to architectural<br />
c<strong>on</strong>cerns and preferred effectiveness was achieved. In this<br />
paper, the linear kernel (u*v) is used for SVM training because<br />
it is superior to n<strong>on</strong>-linear kernels for classifying textual<br />
c<strong>on</strong>tents validated by prior research [8] [13].<br />
III. EXPERIMENT<br />
A. Categories <strong>of</strong> the data set<br />
Table 1 lists the categories <strong>of</strong> both functi<strong>on</strong>al and n<strong>on</strong>functi<strong>on</strong>al<br />
requirements in the data set and their numbers.<br />
Functi<strong>on</strong>al requirements occupy the largest proporti<strong>on</strong> in the<br />
data set, amounting to 40.8% (255/625). Moreover, there is<br />
a skew distributi<strong>on</strong> am<strong>on</strong>g the NFR categories in the data<br />
set. For instance, the largest category ”usability” has 67 cases<br />
while the smallest category ”Palm Operati<strong>on</strong>al” has <strong>on</strong>ly <strong>on</strong>e<br />
case. We adopt binary classificati<strong>on</strong> in this paper because skew<br />
distributi<strong>on</strong> <strong>of</strong>ten deteriorates the performance <strong>of</strong> multi-class<br />
classificati<strong>on</strong> [28] .<br />
In binary classificati<strong>on</strong>, if the number <strong>of</strong> positive (negative)<br />
cases is much larger than the number <strong>of</strong> negative (positive)<br />
cases, the performance <strong>of</strong> automatic classificati<strong>on</strong> may be not<br />
c<strong>on</strong>vincing because <strong>of</strong> the biased distributi<strong>on</strong> <strong>of</strong> data points.<br />
For example, if 90% cases in a data set are positive, then<br />
the accuracy <strong>of</strong> automatic classificati<strong>on</strong> should be greater than<br />
90% if the classifier predicts all the data points with positive<br />
TABLE I<br />
CATEGORIES OF REQUIREMENTS AND THEIR NUMBERS IN THE DATA SET<br />
Abbr. Full Name Num<br />
F Functi<strong>on</strong>al 255<br />
US Usability 67<br />
SE Security 66<br />
O Operati<strong>on</strong>al 62<br />
PE Performance 54<br />
LF Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel 38<br />
A Availability 21<br />
SC Scalability 21<br />
MN Maintainability 17<br />
L Legal 13<br />
FT Fitness 10<br />
PO Palm Operati<strong>on</strong>al 1<br />
labels. For this reas<strong>on</strong>, the categories ”usability”, ”security”,<br />
and ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” with relatively medium sizes in the data<br />
set are used as the positive classes for binary classificati<strong>on</strong> <strong>on</strong>e<br />
by <strong>on</strong>e and meanwhile, the NFRs in remaining categories are<br />
used as negative classes.<br />
B. Classificati<strong>on</strong> Process<br />
There are actually two kinds <strong>of</strong> work involved in representing<br />
textual c<strong>on</strong>tents: indexing and term weighting. Indexing is<br />
the job <strong>of</strong> assigning index terms for textual c<strong>on</strong>tents and term<br />
weighting is job <strong>of</strong> assigning weights to terms, to measure the<br />
importance <strong>of</strong> index terms in textual documents. We employ<br />
Boolean values as term weights for the index terms for the<br />
reas<strong>on</strong> that most NFRs in the data set are very short and<br />
c<strong>on</strong>tain less than 20 index terms so they do not need complex<br />
weighting schemes such as those menti<strong>on</strong>ed for document<br />
representati<strong>on</strong> [15].<br />
IG is employed to change the percentages <strong>of</strong> index terms<br />
used for representati<strong>on</strong>. The removal ratio is predefined to<br />
remove the index terms with small entropy. For instance, if we<br />
set the removal ratio to 0.1 for representati<strong>on</strong> with individual<br />
words, then a percentage <strong>of</strong> 90% <strong>of</strong> individual words with<br />
smaller entropy will be eliminated from the feature set and<br />
we <strong>on</strong>ly use the remaining 10% <strong>of</strong> individual words for the<br />
representati<strong>on</strong> <strong>of</strong> NFRs. The purpose <strong>of</strong> varying different<br />
percentages <strong>of</strong> index terms for representati<strong>on</strong> is that we want<br />
to observe the robustness <strong>of</strong> classificati<strong>on</strong> performances when<br />
the set <strong>of</strong> index terms become smaller and smaller. This is<br />
especially important for deciding which type <strong>of</strong> index terms<br />
should be used for representing NFRs when computati<strong>on</strong><br />
capacity is not enough to support large dimensi<strong>on</strong> <strong>of</strong> vectors<br />
in training classifier. In representati<strong>on</strong> with MWEs, we use<br />
all the 1,127 individual words and a proporti<strong>on</strong>, which is is<br />
defined by the removal ratio, <strong>of</strong> MWEs as the index terms.<br />
The experiments in this paper are carried out with 10-fold<br />
cross-validati<strong>on</strong> technique. In each experiment, we divide the<br />
whole data set (for both positive and negative classes) into<br />
10 subsets. The 9 <strong>of</strong> 10 subsets are used for training and<br />
the remaining <strong>on</strong>e subset is used for testing. We repeat the<br />
experiment 10 times and the performance <strong>of</strong> the classificati<strong>on</strong><br />
is measured as average precisi<strong>on</strong> and recall [2] <strong>of</strong> the 10<br />
repetiti<strong>on</strong>s.
C. Experimental Results<br />
The outcome <strong>of</strong> our experiments shows that the precisi<strong>on</strong>s<br />
<strong>of</strong> all the automatic classificati<strong>on</strong> tasks are significantly higher<br />
than those <strong>of</strong> the classificati<strong>on</strong> approach proposed by Cleland-<br />
Huang et al [2]. Yet, the recalls <strong>of</strong> the automatic classificati<strong>on</strong>s<br />
in this paper are not comparable to the classificati<strong>on</strong> proposed<br />
by Cleland-Huang et al. We do not list the precisi<strong>on</strong>s and<br />
recalls produced in our experiments due to space limitati<strong>on</strong><br />
(Readers who have an interest in the precisi<strong>on</strong>s and recalls<br />
are welcome to ask the authors for more details).<br />
We c<strong>on</strong>jecture that the high recalls <strong>of</strong> the results <strong>of</strong> Cleland-<br />
Huang et al [2] can be attributed to the small size (within the<br />
range between 10 and 20) <strong>of</strong> index terms they employed in<br />
their experiments, which is much smaller than the sizes <strong>of</strong><br />
index terms (within the range between 100 and 1,000) used in<br />
our experiments. When small size <strong>of</strong> index terms is used, larger<br />
number <strong>of</strong> NFRs is classified as relevant NFRs <strong>of</strong> the category.<br />
C<strong>on</strong>sequently, more irrelevant NFRs will be misclassified as<br />
relevant. C<strong>on</strong>sidering an extreme case <strong>of</strong> classifying relevant<br />
NFRs with ”Usability”, if all the NFRs are regarded as relevant<br />
with ”Usability”, then the recall <strong>of</strong> automatic classificati<strong>on</strong> will<br />
be 100%. However, this classificati<strong>on</strong> result may be <strong>of</strong> less<br />
help for automating the task <strong>of</strong> classificati<strong>on</strong>. Thus, we argue<br />
that for automatic classificati<strong>on</strong> <strong>of</strong> NFRs, precisi<strong>on</strong> should be<br />
given more importance if recall is acceptable.<br />
The F-measure [16] described in Equati<strong>on</strong> 4 combines both<br />
precisi<strong>on</strong> and recall for performance evaluati<strong>on</strong>. We used<br />
F-measure as the indicator for performance evaluati<strong>on</strong>. In<br />
general, the larger the F-measure is, the better the classificati<strong>on</strong><br />
result is. Here, for the purpose <strong>of</strong> comparis<strong>on</strong>, we make out<br />
the F-measures <strong>of</strong> Cleland-Huang et al [2] as the baseline<br />
performance in the experiments.<br />
2 × precisi<strong>on</strong> × recall<br />
F − measure = (4)<br />
precisi<strong>on</strong> + recall<br />
Figures 2-4 show the experimental results <strong>of</strong> classifying the<br />
assigned three categories ”Usability”, ”Security”, and ”Look<br />
<str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” at different removal ratios using four types <strong>of</strong> index<br />
terms (2-gram, 3-gram, word and MWE) to represent NFRs.<br />
First, it can be seen that the representati<strong>on</strong> with individual<br />
words has the best performance measured by F-measure nearly<br />
<strong>on</strong> all three categories. In some cases, representati<strong>on</strong> with terminologies<br />
improves the precisi<strong>on</strong> <strong>of</strong> automatic classificati<strong>on</strong><br />
<strong>of</strong> representati<strong>on</strong> with individual words (when the removal<br />
ratio arrives at 0.9). That is, even if most index terms are<br />
removed from the feature set, their performances do not<br />
decline drastically. However, in most cases, representati<strong>on</strong><br />
with terminologies is not able to improve the performances<br />
<strong>of</strong> automatic NFR classificati<strong>on</strong> compared with representati<strong>on</strong><br />
with individual words, even worse than that <strong>of</strong> 3-gram<br />
representati<strong>on</strong>. Both representati<strong>on</strong> with individual words and<br />
terminologies produce very robust classificati<strong>on</strong>.<br />
This outcome implies that for automatic classificati<strong>on</strong> <strong>of</strong><br />
NFRs, representati<strong>on</strong> with individual words is sufficient to<br />
produce a desirable performance. This outcome is very different<br />
from the experimental results produced by Zhang et al<br />
F−measure<br />
F−measure<br />
F−measure<br />
0.65<br />
0.6<br />
0.55<br />
0.5<br />
0.45<br />
0.4<br />
0.9<br />
Fig. 2.<br />
0.65<br />
0.6<br />
0.55<br />
0.5<br />
0.45<br />
0.4<br />
0.9<br />
Fig. 3.<br />
0.6<br />
0.55<br />
0.5<br />
0.45<br />
0.4<br />
0.35<br />
0.3<br />
Fig. 4.<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5 0.4<br />
Removal Ratio<br />
0.3<br />
2−gram<br />
3−gram<br />
individual words<br />
terminology<br />
baseline<br />
Classificati<strong>on</strong> performances <strong>on</strong> category ”Usability”.<br />
0.8<br />
0.7<br />
0.6<br />
0.5 0.4<br />
Removal Ratio<br />
0.3<br />
0.2<br />
2−gram<br />
3−gram<br />
individual words<br />
terminology<br />
baseline<br />
Classificati<strong>on</strong> performances <strong>on</strong> category ”Security”.<br />
0.8<br />
0.7<br />
0.6<br />
2−gram<br />
3−gram<br />
individual words<br />
terminology<br />
baseline<br />
0.5 0.4<br />
Removal Ratio<br />
Classificati<strong>on</strong> performances <strong>on</strong> category ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel”.<br />
0.3<br />
0.2<br />
0.2<br />
0.1<br />
0.1<br />
0.1<br />
0<br />
0<br />
0
[8]. They argued that MWEs are superior to individual words<br />
for classifying news documents automatically. We explain that<br />
the lengths <strong>of</strong> NFRs in the data set are much shorter (20<br />
individual words <strong>on</strong> average) than those <strong>of</strong> Reuters-21578<br />
news documents (80 individual words <strong>on</strong> average) so the<br />
semantics inherent in textual c<strong>on</strong>tents <strong>of</strong> NFRs is not so much<br />
important as those inherent in news documents.<br />
Sec<strong>on</strong>d, for representati<strong>on</strong> with N-grams, representati<strong>on</strong><br />
with 3-grams outperforms that with 2-grams. This outcome<br />
can be attributed to that the number <strong>of</strong> 3-grams is much<br />
larger than that <strong>of</strong> 2-grams (see Secti<strong>on</strong> 2.1). Moreover, the<br />
robustness <strong>of</strong> the 3-gram representati<strong>on</strong> is better than the 2-<br />
gram representati<strong>on</strong> because the magnitudes <strong>of</strong> variati<strong>on</strong>s <strong>of</strong><br />
F-measures <strong>of</strong> the 3-gram representati<strong>on</strong> are smaller than those<br />
produced by the 2-gram representati<strong>on</strong>.<br />
Thirdly, for classificati<strong>on</strong> <strong>on</strong> different categories, the performance<br />
<strong>on</strong> ”Security” is as almost the same as that <strong>on</strong><br />
”Usability” but outperforms that <strong>on</strong> ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel”. The<br />
number <strong>of</strong> NFRs in category ”Security” (66) is approximately<br />
equal to that in category ”Usability” (67), which is much<br />
larger than the number <strong>of</strong> ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d Feel” (38). We explain<br />
that automatic classificati<strong>on</strong> would produce more favorable<br />
performance <strong>on</strong> categories those having large sizes than those<br />
having small sizes.<br />
Based <strong>on</strong> the experimental results, the answer for questi<strong>on</strong><br />
1 devised for this case study in Secti<strong>on</strong> 1 should be that to<br />
date, keywords are the most effective index terms for automatic<br />
classificati<strong>on</strong> <strong>of</strong> NFRs and for questi<strong>on</strong> 2, our answer is that<br />
the machine learning technique, at least SVM, can improve<br />
the classificati<strong>on</strong> significantly.<br />
IV. RELATED WORK<br />
NFR has attracted much interest <strong>of</strong> researchers from s<strong>of</strong>tware<br />
engineering field. Much work has been invested in<br />
managing NFRs. Tran and Chung [20] proposed a prototype<br />
tool for explicit representati<strong>on</strong> <strong>of</strong> NFRs. They aim to c<strong>on</strong>sider<br />
NFRs in a more goal-oriented perspective and argue that NFRs<br />
have much influence <strong>on</strong> the design <strong>of</strong> the soluti<strong>on</strong> as well as<br />
reinforce engineering process. In order to formulate implicit<br />
relati<strong>on</strong>ships <strong>of</strong> NFRs , <strong>on</strong>tology and graphic visualizati<strong>on</strong> are<br />
used in their tool to express s<strong>of</strong>tgoals and their interdependencies<br />
explicitly.<br />
Cleland-Huang et al [4] introduce a goal-centric approach to<br />
managing the impact <strong>of</strong> change up<strong>on</strong> the NFRs <strong>of</strong> a s<strong>of</strong>tware<br />
system. They partiti<strong>on</strong> NFRs into different s<strong>of</strong>tgoals <strong>of</strong> a<br />
system and c<strong>on</strong>struct a s<strong>of</strong>tgoal interdependency graph (SIG)<br />
to trace both direct and indirect impacts <strong>of</strong> s<strong>of</strong>tware changes <strong>on</strong><br />
NFRs. Probabilistic network model is employed to enable the<br />
traceability <strong>of</strong> impacts. However, the job <strong>of</strong> c<strong>on</strong>structing SIG<br />
is labor-intensive and time-c<strong>on</strong>suming because <strong>of</strong> the lack <strong>of</strong><br />
automatic approaches. If we can classify NFRs into different<br />
s<strong>of</strong>tgoals automatically, it would be much easier to identify<br />
the relati<strong>on</strong>s between subgoals and s<strong>of</strong>tgoals. That is, human<br />
workload involved in c<strong>on</strong>structing SIG will be reduced to a<br />
great extent.<br />
In another work, Cleland-Huang, et al.[2] proposed an<br />
informati<strong>on</strong> retrieval method to discover and identify NFRs<br />
from system specificati<strong>on</strong>. Their basic assumpti<strong>on</strong> is that<br />
different types <strong>of</strong> NFRs are characterized by distinct keywords<br />
(index terms) that can be learned from documents <strong>of</strong> that<br />
type. However, their method <strong>of</strong> selecting index terms is ad-hoc<br />
and does not c<strong>on</strong>sider any linguistic properties <strong>of</strong> those index<br />
terms. Moreover, the classifier used in their method, which is<br />
based <strong>on</strong> the additive weights <strong>of</strong> index terms <strong>on</strong> a given NFR<br />
type, is quite simple without any theoretical ground.<br />
Rosenhainer [14] proposed aspect mining to identify crosscutting<br />
c<strong>on</strong>cerns in requirements specificati<strong>on</strong>s. Two techniques<br />
are suggested to be used for aspect mining: identificati<strong>on</strong><br />
through inspecti<strong>on</strong> and identificati<strong>on</strong> supported by<br />
informati<strong>on</strong> retrieval (IR). The former <strong>on</strong>e is manual and<br />
the later <strong>on</strong>e is semi-automatic. Rosenhainer argued that IRbased<br />
technique is more promising than manual method and<br />
their experimental results <strong>on</strong> interacti<strong>on</strong>s between functi<strong>on</strong>al<br />
and n<strong>on</strong>-functi<strong>on</strong>al requirements have validated this argument.<br />
This work encourages our study in this paper to use IR<br />
techniques for automatic classificati<strong>on</strong> <strong>of</strong> NFRs.<br />
Casamayor et al [24] employed naïve Bayes and EM<br />
(Expectati<strong>on</strong> Maximizati<strong>on</strong>) algorithm as a semi-supervised<br />
learning approach to classify n<strong>on</strong>-functi<strong>on</strong>al requirements in<br />
textual specificati<strong>on</strong>s. They used the same data set as used<br />
in this paper and reported that their algorithm produced an<br />
average accuracy above 70%. In fact, the performance <strong>of</strong> SVM<br />
in our experiments is better than theirs because we exclude<br />
those categories <strong>of</strong> small number <strong>of</strong> data points. That is,<br />
unbalanced distributi<strong>on</strong> <strong>of</strong> data points is purposely alleviated in<br />
our experiments. Moreover, our c<strong>on</strong>jecture is also validated by<br />
their experimental results that those category <strong>of</strong> large number<br />
<strong>of</strong> data points such as ”usability”, ”security” and ”Look <str<strong>on</strong>g>An</str<strong>on</strong>g>d<br />
Feel”.<br />
V. CONCLUDING REMARKS<br />
NFR is crucial to the success <strong>of</strong> a s<strong>of</strong>tware system as it<br />
describes necessary qualities <strong>of</strong> system to avoid devastating<br />
effects and system failure [17]. In this empirical study, vector<br />
space model and machine learning technique are employed to<br />
classify NFRs automatically. We used different index terms to<br />
transfer NFRs into numeric vectors and examined their performances<br />
<strong>on</strong> automatic classificati<strong>on</strong> <strong>of</strong> NFRs. The machine<br />
learning classifier we adopted in this paper is SVM with linear<br />
kernel, which is widely recommended as a promising classifier<br />
for text mining. Informati<strong>on</strong> gain for feature selecti<strong>on</strong> and<br />
SVM for automatic classificati<strong>on</strong> is introduced. We c<strong>on</strong>ducted<br />
experiments using the data set collected from PROMISE data<br />
set.<br />
The experimental results show that individual words, when<br />
used as the index terms, have the best performance in classifying<br />
NFRs automatically. We noticed that, in most cases, the<br />
more samples in a category in data set, the better performance<br />
the automatic classifier will produce <strong>on</strong> the category. This<br />
outcome illustrated that the number <strong>of</strong> NFRs in the data set is<br />
an important factor for automating the classificati<strong>on</strong> <strong>of</strong> NFRs.
This inference suggests that we need to collect NFRs as many<br />
as possible if automatic classificati<strong>on</strong> is desired.<br />
This work can be applied in at least two aspects in s<strong>of</strong>tware<br />
engineering currently. First, it can be used to identify NFRs<br />
in requirement specificati<strong>on</strong>. Usually, customers, testers, and<br />
stakeholders <strong>of</strong> a s<strong>of</strong>tware system will mix their desirable<br />
qualities in a specificati<strong>on</strong>. Whereas functi<strong>on</strong>al requirements<br />
describe what the system needs to do, NFRs describe c<strong>on</strong>straints<br />
<strong>on</strong> the soluti<strong>on</strong> space and capture a broad spectrum<br />
<strong>of</strong> properties, such as usability and security [18]. Because the<br />
soluti<strong>on</strong>s <strong>of</strong> functi<strong>on</strong>al requirements and NFRs are different, or<br />
the time to c<strong>on</strong>sider these two kinds <strong>of</strong> requirement in system<br />
design is different, we must differentiate these two kinds <strong>of</strong><br />
requirements. Sec<strong>on</strong>d, different NFRs are <strong>of</strong>ten handled by<br />
different designers and developers with different background<br />
knowledge in architecture, it is crucial to classify the NFRs<br />
into different categories so that the NFRs in the same category<br />
can be processed as a comprehensive requirement in system<br />
design.<br />
ACKNOWLEDGMENTS<br />
This work is supported by the Nati<strong>on</strong>al Natural Science<br />
Foundati<strong>on</strong> <strong>of</strong> China under Grant Nos. 90718042, 60873072,<br />
61073044, and 60903050; the Nati<strong>on</strong>al Science and Technology<br />
Major Project; the Nati<strong>on</strong>al Basic Research Program under<br />
Grant No. 2007CB310802; the Scientific Research Foundati<strong>on</strong><br />
for the Returned Overseas Chinese Scholars, State Educati<strong>on</strong><br />
Ministry.<br />
REFERENCES<br />
[1] Y. Yang, J. Bhuta and D. Port and B. Boehm, Value-Based Processes for<br />
COTS-Based Applicati<strong>on</strong>s, IEEE S<strong>of</strong>tware, 22(4), 54-62, July/August,<br />
2005.<br />
[2] J. Cleland-Huang, R. Settimi and X. Zou and P. Solc, The Detecti<strong>on</strong> and<br />
Classificati<strong>on</strong> <strong>of</strong> N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong> with Applicati<strong>on</strong> to Early<br />
Aspects, In Proceedings <strong>of</strong> the 14th IEEE Internati<strong>on</strong>al <strong>Requirements</strong><br />
Engineering C<strong>on</strong>ference, 36-45, 2006.<br />
[3] B. Nuseibeh, Weaving Together <strong>Requirements</strong> and Architecture, IEEE<br />
Computer, 34(3), 115-117, 2001.<br />
[4] J. Cleland-Huang, et al, Goal-Centric Traceability for Managing N<strong>on</strong>-<br />
Functi<strong>on</strong>al <strong>Requirements</strong>, In Proceedings <strong>of</strong> the 27th Internati<strong>on</strong>al C<strong>on</strong>ference<br />
<strong>on</strong> S<strong>of</strong>tware Engineering, 362-371, 2005.<br />
[5] D. Zhang and J. J. P. Tsai, Machine Learning and S<strong>of</strong>tware Engineering,<br />
S<strong>of</strong>tware Quality Journal, 11, 87-119, 2003.<br />
[6] V. Vapnic, The Nature <strong>of</strong> Statistical Learning Theory, Springer, New York,<br />
1995.<br />
[7] W. B. Cavnar and J. M. Trenkle, N-Gram-Based Text Categorizati<strong>on</strong>, Proceedings<br />
<strong>of</strong> SDAIR-94, 3rd <str<strong>on</strong>g>An</str<strong>on</strong>g>nual Symposium <strong>on</strong> Document <str<strong>on</strong>g>An</str<strong>on</strong>g>alysis<br />
and Informati<strong>on</strong> Retrieval, 161-175, 1994.<br />
[8] W. Zhang, T. Yoshida and X. J. Tang, Text classificati<strong>on</strong> based <strong>on</strong> multiword<br />
with support vector machine, Knowledge-based Systems, 21(8),<br />
879-886, 2008.<br />
[9] G. Gokyer, et al, N<strong>on</strong>-functi<strong>on</strong>al <strong>Requirements</strong> to Architectural C<strong>on</strong>cerns:<br />
ML and NLP at Crossroads, Proceedings <strong>of</strong> Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong><br />
S<strong>of</strong>tware Engineering Advances, 2008.<br />
[10] G. Salt<strong>on</strong> and A. W<strong>on</strong>g and C. S. Yang, A Vector Space Model for<br />
Automatic Indexing, Communicati<strong>on</strong>s <strong>of</strong> the ACM, 18(11), 613-620,<br />
1975.<br />
[11] W. Zhang, T. Yoshida and X. J. Tang, Improving effectiveness <strong>of</strong><br />
mutual informati<strong>on</strong> for substantival multiword expressi<strong>on</strong> extracti<strong>on</strong>,<br />
Expert Systems with Applicati<strong>on</strong>s, 36(8), 10919-10930, 2009.<br />
[12] J. S. Justes<strong>on</strong> and S. M. Katz, Technical terminology: some linguistic<br />
properties and an algorithm for identificati<strong>on</strong> in text, Natural Language<br />
Engineering, 1(1), 9-27, 1995.<br />
[13] Y. M. Yang and X. Lin, A re-examinati<strong>on</strong> <strong>of</strong> text categorizati<strong>on</strong> methods,<br />
In Proceedings <strong>on</strong> the 22nd <str<strong>on</strong>g>An</str<strong>on</strong>g>nual Internati<strong>on</strong>al ACM C<strong>on</strong>ference <strong>on</strong><br />
Research and Development in Informati<strong>on</strong> Retrieval, 42-49, 1999.<br />
[14] L. Rosenhainer, Identifying Crosscutting C<strong>on</strong>cerns in <strong>Requirements</strong><br />
Specificati<strong>on</strong>s, In Proceedings <strong>of</strong> OOPSLA Early Aspects 2004: Aspect-<br />
Oriented <strong>Requirements</strong> Engineering and Architecture Design Workshop,<br />
2004.<br />
[15] G. Salt<strong>on</strong> and C. Buckley, Term weighting approaches in automatic text<br />
retrieval, Informati<strong>on</strong> Processing Management, 24, 513-523, 1998.<br />
[16] J. Han and M. Kamber, Data Mining C<strong>on</strong>cepts and Techniques, Morgan<br />
Kaufmann Publishers, New York, 2006.<br />
[17] F. Brooks, No sliver bullet-essence and accidents <strong>of</strong> s<strong>of</strong>tware engineering,<br />
IEEE Computer, 20(4), 10-19, 1987.<br />
[18] I. Sommerville and P. Sawyer, Viewpoints: Principles, Problems, and<br />
a Practical Approach to <strong>Requirements</strong> Engineering, <str<strong>on</strong>g>An</str<strong>on</strong>g>nals <strong>of</strong> S<strong>of</strong>tware<br />
Engineering, 3, 101-130, 1997.<br />
[19] B. Nuseibeh and S. Easterbrook, Requirement Engineering: A Roadmap,<br />
Proceedings <strong>of</strong> the C<strong>on</strong>ference <strong>on</strong> The Future <strong>of</strong> S<strong>of</strong>tware Engineering(ICSE’00),<br />
35-46, 2000.<br />
[20] Q. Tran and L. ChungL, NFR-Assitant: Tool Support for Achieving<br />
Quality, Proceedings <strong>of</strong> IEEE Symposium <strong>on</strong> Applicati<strong>on</strong>-Specific Systems<br />
and S<strong>of</strong>tware Engineering and Technology (ASSET’99),1999.<br />
[21] E. A. Emers<strong>on</strong>, The beginning <strong>of</strong> model checking: A pers<strong>on</strong>al perspective,<br />
25 Years <strong>of</strong> Model Checking, 27-45, 2008.<br />
[22] J. Xiao, L. J. Osterweil, Q. Wang and M. Li, L, Disrupti<strong>on</strong>-Driven Resource<br />
Rescheduling in S<strong>of</strong>tware Development Processes, In Proceedings<br />
<strong>of</strong> 4th Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> S<strong>of</strong>tware Process, 234-247, 2010.<br />
[23] L. Chung, J. Prado Leite, On N<strong>on</strong>-Functi<strong>on</strong>al <strong>Requirements</strong> in S<strong>of</strong>tware<br />
Engineering, A.T. Borgida et al. (Eds.): Mylopoulos Festschrift, LNCS<br />
5600, 363-379, 2009.<br />
[24] A, Casamayor, D. Godoy and M. Campo, Identificati<strong>on</strong> <strong>of</strong> n<strong>on</strong>functi<strong>on</strong>al<br />
requirements in textual specificati<strong>on</strong>s: A semi-supervised learning<br />
approach, Informati<strong>on</strong> and S<strong>of</strong>tware Technology, 52, 436-445, 2010.<br />
[25] F. Tsui and O. Karam, Essentials <strong>of</strong> S<strong>of</strong>tware Engineering (Sec<strong>on</strong>d<br />
Editi<strong>on</strong>), J<strong>on</strong>es and Bartlett Publishers, Sudbury, Masshachusetts, 2011.<br />
[26] G. Salt<strong>on</strong> and M.J. McGill, Introducti<strong>on</strong> to Modern Informati<strong>on</strong> Retrieval,<br />
McGraw-Hill Book Company, New York, NY, USA, 1986.<br />
[27] J. Davis and M. Goadrich, The relati<strong>on</strong>ship between precisi<strong>on</strong>-recall<br />
and ROC curves, In Proceedings <strong>of</strong> the 23rd internati<strong>on</strong>al c<strong>on</strong>ference <strong>on</strong><br />
Machine learning, 233-240, 2006.<br />
[28] J. West<strong>on</strong> and C. Watkins, Multi-class support vector machines, In<br />
Proceedings <strong>of</strong> 7th European Symposium <strong>on</strong> Artificial Neural Networks,<br />
219-224, 1999.