12.07.2015 Views

Test Case Generation for Collaborative Real-time Editing Tools

Test Case Generation for Collaborative Real-time Editing Tools

Test Case Generation for Collaborative Real-time Editing Tools

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A Hybrid Approach to Detecting Security Defects in ProgramsLian Yu 1 , Jun Zhou, Yue Yi, Jianchu Fan, † Qianxiang WangSchool of Software and Microelectronics, Peking University, Beijing, 102600, P.R. China† Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, P.R. ChinaAbstract 1Static analysis works well at checking defects that clearlymap to source code constructs. Model checking can finddefects of deadlocks and routing loops that are not easilydetected by static analysis, but faces the problem of stateexplosion. This paper proposes a hybrid approach todetecting security defects in programs. Fuzzy InferenceSystem is used to infer selection among the two detectionapproaches. A cluster algorithm is developed to divide alarge system into several clusters in order to apply modelchecking. Ontology based static analysis employs logicreasoning to intelligently detect the defects. We also put<strong>for</strong>wards strategies to improve per<strong>for</strong>mance of the staticanalysis. At last, we per<strong>for</strong>m experiments to evaluate theaccuracy and per<strong>for</strong>mance of the hybrid approach.Keywords: security defects, static analysis, modelchecking, ontology model, fuzzy inference, featureextraction.1. IntroductionSoftware security vulnerability is a common problem inexisting applications, which tends to bring seriousconsequences. Roughly half of all security defects areintroduced at the source code level [1], and coding errorsare a critical problem. It is not reasonable to expectdevelopers to be security experts, and we must arm themwith the tools to make their jobs easier. Among manymethods, static analysis and model checking have beenused to automatically detect potential security problems.Static techniques explore abstractions of all possibleprogram behaviors, and thus are not limited by the qualityof test cases to achieve high quality of programs. Manystatic analysis tools can be used to uncover securitydefects. ITS4 [29], FlawFinder [26], and RATS [38] allpreprocess and tokenize source files and then match theresulting token stream against a library of vulnerableconstructs. FindBugs [25] is a lightweight checker <strong>for</strong>unearthing common errors in Java programs.Recent research shows that static analysis works well atchecking defects that clearly map to source codeconstructs [2]. Static analysis checker can process a lot of1 The research is supported by the National High-Tech Research andDevelopment Plan of China (No. 2006AA01Z175) and by IBM JointResearch Program (No. JS200705002).source code with a fast speed compared to modelchecking. It can also analyze code which is not completedor with compiling errors. The tools look <strong>for</strong> a fixed set ofdefect patterns, or rules, in the code. However, if a rulehas not been written yet to find a particular problem, thetools will never find that problem. As static analysisproblems are undecidable, it is <strong>for</strong>ced to makeapproximations and that these approximations lead toless-than-perfect output [3][4].Model checking is capable of handling complexsystems, widely used <strong>for</strong> the verification of all kinds ofsystems, especially critical systems. The well-used modelchecking tools include SPIN [41], SMV [42], JPF [33],Bandera [18], and Microsoft SLAM [40].Model checking can find defects that are not easilydetected by static analysis by executing code, such asdeadlocks and routing loops that need to analyze thewhole execution of the source code. Model checkingchecks all executing paths while static analysis checks allpossible paths, so program segments containing defectsfound by static analysis may be unreachable. The defectsfound by model checking are the true ones, while thestatic analysis may complain about some false positives.Model checking has the problem of state explosion. Thesize of a finite state model increases exponentially as thenumber of system classes grows. When checking largemodels, the traces (especially thread-related andprotocol-related) produced by the checker can behundreds, or even thousands of steps long. Using the tracelog to locate the defects in the source code is extremely<strong>time</strong>-consuming and tedious, thus it is difficult to interpretthe outputs. Table 1 summarizes the comparison of thetwo approaches from different aspects.Table 1: Comparison of the Two ApproachesAspects Static Analysis Model CheckingTime to detect hours weeksPath coverage all possible paths all executing pathsTime per bug minute dayBugs found 100~1000 0~10Code size 10Megabyte 10KilobyteAfter classifying the categories of security defects, wefind some security defects (D 1 ) are suitable to be detectedby static analysis; some of them (D 2 ) are suitable to bedetected by model checking, while some of them (D 3 ) canbe detected by both approaches. For types of defects from


D 3 , if we apply static analysis, we might miss somedefects due to the limitation of the approach; if we applymodel checking, it will take a long, even endless, <strong>time</strong> toget the results.This paper proposes to introduce fuzzy inferencesystem (FIS) to help automatically select a detectionapproach. Based on the characteristics of source code, FISdecides the detection approach. Chi-square method is usedto extract security defect features from defect descriptionsand we map them into reserved words or class names inthe standard libraries of a program language. To improvethe per<strong>for</strong>mance of ontology-based static analysis, weintroduce parallel computing and ontology modelpartitioning. Figure 1 shows the whole process of theproposed hybrid approach to detecting security defects inprograms; each step will be detailed in following sections.1Collection/Classification3Language keywords or classesSecurityFeaturesFeature - LanguageMappingFeatures ExtractionSource code ParserTree ParserDefect Reasoning5Language OntologyModel(OWL)Security Lib(SWRL)Security DefectTypeBuilding FuzzyRulesFuzzyRuleFuzzy InferenceIsSuccessfulYes67Source Code2ClusteringClustersPicking aCluster ModelNo4Model checkingAbstract ModelNoIs LastClusterYesGenerating ReportReportFigure 1. Process of the proposed hybrid approach to detecting security defects1) Collect manually security defects from securityresearch organizations and related books. Extract thefeatures from the descriptions of security defects andmap the features to the reserved words or class namesof C++/Java programming languages.2) Cluster the whole system under test according to thesystem dependence graph extracted from the sourcecode using clustering algorithm; and divide the wholesystem into several clusters.3) Build a Mamdani fuzzy inference system [16] <strong>for</strong>each detection approach, with the reserved words andclasses of a programming language as the inputvariables, the possibility of a detection approach asthe output variables. Pick a cluster from step 2, andmake a fuzzy inference which attributes the inputs toa proper approach to detecting security defects.4) Extract the finite state model from target source codeand per<strong>for</strong>m model checking on each cluster.5) Per<strong>for</strong>m the static analysis based on the logicreasoning. Source code is parsed into an AST, andC++/Java AST Parser maps it into individuals inOWL (Web Ontology Language) [36] model;Reasoning Engine (Jess [31] Engine) takes securityrules written in SWRL (Semantic Web RuleLanguage) [43], programming language ontologymodel and the individuals created by the tree parseras inputs to per<strong>for</strong>m defect reasoning.6) Iterate these steps. If a cluster is large and takes along <strong>time</strong> to run, go back to step 2 to divide thecluster into smaller clusters. If the current cluster isnot the last cluster, pick another cluster and go backto the step 3 to make a fuzzy inference.7) Generate the final report on security defect detection.The rest of the paper is organized as follows. Section 2describes classification, feature extraction <strong>for</strong> securitydefects, and algorithms to cluster programs underdetection. Section 3 explains the mechanism of fuzzyinference based selection of detection approach. Section 4demonstrates security defection using static analysis.Section 5 applies model checking to detect securitydefects. Section 6 puts <strong>for</strong>ward strategies to improve theper<strong>for</strong>mance of the static analysis. Section 7 carries outexperiments to evaluate the proposed approach. Section 8gives a survey of related work. Section 9 concludes thepaper and scratches our future work.2. Classification, Extraction, and Clustering2.1. Classifying Security DefectsAt first, we collect the descriptions of security defectsfrom popular websites, related books [4][5] and papers


[3][6]. CWE (Common Weakness Enumeration) [22] is avery famous and authoritative Website related to softwareweakness. It contains various types of defect domains.Because we focus on the code level defects, we onlyabstract source code related security defects. CVE(Common Vulnerability Enumeration) [21] is anotherWebsite which provides a dictionary of publicly knownin<strong>for</strong>mation security defects. Totally we obtain about 110security defects, and Table 2 shows two examples ofdefects associated with security defect descriptions.Security DefectsUnchecked ErrorConditionUse ofInsufficientlyRandom ValuesTable 2: Security Defect ExamplesDescriptionIgnoring exceptions and other errorconditions may allow an attacker to induceunexpected behavior unnoticed.The software may use insufficiently randomnumbers or values in a security context thatdepends on unpredictable numbers.There exist many different classifications <strong>for</strong> securitydefects, such as 19 sins [14], OWASP Top 10 [35], and7PK [15]. We employ the 7PK (Seven PerniciousKingdoms) which divides all the software security defectsinto eight categories (Note: seven plus one [7]): inputvalidation and representation, API abuse, security features,<strong>time</strong> and state, errors, code quality, encapsulation, andenvironment. According to this classification and thedescription of the defects we manually tag each defectwith a category label and associate each defect with aproper detection approach, model checking or staticanalysis as shown in Figure 2. When this step is done, theclassifying result is used to extract features from thesesecurity defects.Model Checking Detection ApproachValidation and RepresentationAPI AbuseSecurity FeaturesStatic Analysis Detection ApproachFigure 2. Classification Result2.2. Security Defect Feature ExtractionDefectThe feature extraction of security defects helps us torecognize whether a kind of features is suitable to bedetected by static analysis or by model checking. Thereare several algorithms to extract common features incurrent text classification domain: Document Frequency(DF), In<strong>for</strong>mation Gain (IG), Mutual In<strong>for</strong>mation (MI),and Chi-square Statistic [8]. These algorithms aim atselecting terms/features which are most comprehensive oftext contents. Experiments show that Chi-square is moreeffective at removing aggressive terms (whose chi-score islower than predetermined threshold) without losingcategorization accuracy [8]. The terms in this paper arewords in the description of the security defects, and thecategory is the detection approaches that contain the eightcategories described in the previous subsection. TheChi-square statistic measures the relationship betweenterm t and category c, as shown in the following <strong>for</strong>mula:Where A is the number of <strong>time</strong>s that t and c co-occur; Bis the number of <strong>time</strong>s that t occurs without c; C is thenumber of <strong>time</strong>s that c occurs without t; D is the numberof <strong>time</strong>s that neither c nor t occurs, and N is the totalnumber of documents. We can calculate the average scoreand maximum score by combining the category scores ofeach term. The maximum score is used to determinewhich category a feature belongs to, and the average scorecan distinguish whether the word is an invalid feature.As shown in Figure 2, each detection approach canhandle some defects from the 7PK categories and each7PK category contains defects, some of which can bedetected by static analysis or by model checking or byboth. For each detection approach, we apply chi-squaretesting to extract security defect features from defectdescriptions associated with the 7PK categories as shownin Table 3. These features will be used in fuzzy inferencein the next section.Detection ApproachModel CheckingStatic AnalysisTable 3: Security Defect FeaturesFeatures/Termsbuffer overflow username inputcharacter char string pointer streamexception network locked integerasynchronous threads read processpath random <strong>for</strong>m Struts private publicfilenames printf Action exceptionsinject returns J2EE array dirnamesession EJB connection cloneable2.3. Clustering Programs under DetectionClustering serves two purposes. By building a systemdependence graph (SDG) from source code and cuttingthe SDG into smaller parts, the size of the system modelcan be reduced to a suitable one in order to run modelchecking. Static analysis can also take advantage of SDGto increase its accuracy. The clustering algorithm containsthe following four steps.1) Build a dependence matrix. Its smallest unit is a class.Set the number of classes as the size of the matrix. Theentries in the matrix are the count of the dependencerelationship among the classes. The count is calculatedas follows, where a1, a2, and a3 are constants, inv isthe invocation count, impl is the number of interfaceimplementations, and ext is the number of extendingclasses.


count = a 1 inv + a 2 impl + a 3 ext2) Randomly select a class i , and compute the costd_value(cluster k , class i ) associated with each clustercluster k as shown in the following <strong>for</strong>mula (E-1) wherepow_dep and pow_cc are constants. Sort d_value andput the unit into a cluster with largest d-value.pow_dep( Matrix ( t,j) Matrix ( j,t))pow_ccjClustercluster _ size(k)k(E-1)3) Calculate the matrix cost, Cost(class i ), which is used todescribe current clustering result. The smaller the valueis, the better the clustering result is. When the matrix ischanged, its cost should be recalculated as shown in thefollowing <strong>for</strong>mula (E-2). Compare the cost with theprevious one. If the new cost is smaller than the oldone, accept the result and update the matrix.sizepow_deppow_cc(Matrix(i,j) Matrix(j,i))* cluster _ sizej1sizej1ifi, j are in the same cluster( Matrix(i,j) Matrix(j,i))pow_dep* sizepow_ccif i, j are not in the same cluster(E-2)4) Iterate the steps 2 and 3, and finally get a stable matrix,which consists of several clusters and will be used bymodel checking to detect security defects.3. Fuzzy Inference-based SelectionThis section employs fuzzy inference techniques todetermine dynamically which detection approach shouldbe used to identify security defects <strong>for</strong> each cluster whichis produced in the previous section.3.1. Mapping FeaturesBecause those extracted features cannot be useddirectly by the detection approaches, the step of mappingthe extracted features to the reserved words and classes ofprogramming language is needed. Table 4 shows thesemapping results in Java language. MC indicates modelchecking and SA static analysis. Currently, the mapping ismanually per<strong>for</strong>med, and our future work will addressdeveloping automatic mapping algorithms and proving thecompleteness and correctness of the algorithms.Table 4: Mapping Features to Java CharacteristicsMC-related Featurescharacter char stringpointerbuffer overflowstream readexceptionnetworklocked asynchronousintegerthreads processSA-related FeaturesReserved Words or ClassesString/Char/StringBuffer/StringBuilder(All function Invoking)N/AAll classes in the package java.ioAll ExceptionAll Classes in the package java.netsynchronized /All Classes in thepackage java.util.concurrentInteger, intThreadReserved words or ClassesRandom<strong>for</strong>m Struts Actionprivate publicpath filenamesdirnameexceptionsreturnsJ2EE EJBconnectioncloneableinjectRandomActionForm, and the classes relatedwith Struts frameworkprivate public protectedjava.io.FileAll ExceptionreturnAll classes related with EJBframeworkAll Classes in the package java.netjava.lang.ObjectJDBC and XML operation classes3.2. Building a Fuzzy inference SystemWe create two FISs <strong>for</strong> model checking and staticanalysis, respectively. Building an FIS can be divided into5 steps: 1) Fuzzify inputs 2) Apply fuzzy operator 3)Apply implication method 4) Aggregate all outputs 5)Defuzzify. Mamdani fuzzy inference method is the mostcommonly and useful fuzzy methodology [17]; and weuse the Mamdani method <strong>for</strong> each FIS.Step 1 Fuzzify inputs: Use Gauss member function (E-3)to fuzzify each input parameter based on Table 4. Eachparameter contains three member functions named low,median and high, which show the possibility (from 0 to 1)of reserved words or classes contained in a program. EachFIS contains nine input parameters shown in the“Reserved Words or Classes” column in Table 4 and oneoutput parameter. Figure 3 shows the model checking FISas an example. The input variables are “string”, “pointer”,“buffer”, “stream”, “exception”, “network”, “lock”,“integer”, and “thread”.(E-3)Figure 3.Model checking Fuzzy Inference SystemStep 2 Apply fuzzy operators: To apply fuzzy operator,we use the min function <strong>for</strong> AND method and the maxfunction <strong>for</strong> OR method.Step 3 Apply implication method: The min function isused as an implication method. We choose the min valuefrom all the input parameters.Step 4 Aggregate all outputs: Because decisions arebased on the testing of all rules in an FIS, we have tocombine the rules in a manner in order to make a decision.This paper selects the max function as the aggregationfunction.Step 5 Defuzzify: The defuzzification applies the most


popular used centroid approach, which returns the centerof area under the output curve.The fuzzy rules are fabricated manually from thesecurity defects descriptions as shown in Table 2. Thereare eight rules <strong>for</strong> model checking FIS and eleven rules<strong>for</strong> static analysis FIS. For example, the following rulesays that it is more suitable to select model checkingapproach if the thread class has a high appearancepossibility, the synchronized keyword has a lowappearance possibility and the stream class has a highappearance possibility.If (thread is high) and (synchronized is low) and(Stream is high) then (model_checking is high)For each cluster, we obtain an output from the modelchecking FIS and/or static analysis FIS. It ranges from 0to 1. If the output value from model checking FIS is high,we conclude it should be analyzed by model checkingmethod, and if the value <strong>for</strong> static analysis FIS is high, itshould be analyzed by static analysis. If both are high, wecheck the class with both methods.4. Detecting Security Defects with StaticAnalysisIf static analysis is selected by FISs to find the securitydefects in source code and byte code, we use ontologybased approach to accomplish static analysis. The processis similar to our previous paper [9], but we add moreSWRL rules into the rule base to detect vulnerabilityrelated bugs.4.1. Ontology-based Detection ProcessWe develop a parser to translate source code into anAbstract Syntax Tree (AST), and a module to map theAST instance to the individuals of Java ontology model,which is constructed according to the programminglanguage specifications. SWRL Bridge takes theindividuals, defect rules and the programming languageontology model as inputs into the reasoning engine toinfer defects. Finally a report is generated.4.2. Building Security Defect RulesIn Section 2.1, we have explored the security databasesof CVE and the CWE, and abstracted the security defects.This section trans<strong>for</strong>ms these defects into the rules usingSWRL and put these security defect rules into a database<strong>for</strong> dynamically loading and reasoning. We collect about60 security defects that are suitable to be detected by staticanalysis. The following shows two trans<strong>for</strong>mationexamples of security defects into SWRL rules.Rule <strong>for</strong> Use of Dynamic Class Loading is:InvokeClause(?c)∧invokeClause_invokedMethod(?c, ?m)∧invokeClause_invokedMethodParameter(?c, ?ob)∧name(?m, "<strong>for</strong>Name")∧name(?t,"java.lang.String") ∧Object_hasType(?ob, ?t)∧UseDynamicClassLoadingWarning(?warning)→ errorReport(?c, ?warning)Rule <strong>for</strong> “Use of Insufficiently Security Random Values”is:Object(?c) ∧ Name(?m,”java.lang.Random”)∧UseOfInsufficentlyRandomWarning(?warning)→ errorReport(?c, ?warning)The proposed rules are of the <strong>for</strong>m of an implicationbetween an antecedent (body) and consequent (head). Themeaning is read as: whenever the conditions specified inthe antecedent hold, then the conditions specified in theconsequent must also hold. The detailed description onhow to trans<strong>for</strong>m can be found in the paper [9].5. Detecting Security Defects using ModelCheckingOnce model checking is selected by FISs to find thesecurity defects in programs, we utilize the integratedmodel checking approach to fulfill the goals.5.1. Detection Process using Model CheckingThe integrated model checking follows the four steps:1) Cluster the classes: The program model can be cut intosmaller pieces using the clustering algorithm describedin Section 2.3.2) Generate drive and stub classes: Automatically createan enter class and stub classes <strong>for</strong> each cluster asdescribed in section 5.2.3) Abstract the model: Extract the finite state model fromthe programs under detection using proper ways.4) Per<strong>for</strong>m the model checking: Select a model checkingtool to find the defects in the program model and reportthe results.5.2. Automatically Generating Drivers and StubsAlthough we have already clustered the classes and gotsmaller pieces, we cannot per<strong>for</strong>m the model checkingwithout an entry point. Each cluster needs main functionsto start with the source code, and also needs stubs tosimulate the action of the classes that are not contained inthe current cluster.In order to emulate the outer actions to a cluster, wegenerate driver classes <strong>for</strong> each public method of a classin this cluster, which is invoked by the classes in otherclusters, The responsibility of each enter class is to createa proper instance, and invoke the target public function.Because current model checking tools do not have theability to create the stubs, we also generate default actionsas stubs <strong>for</strong> each class, which are contained in otherclusters and are invoked by the classes in current cluster.5.3. Applying Model CheckingCompared to Bandera, Java Pathfinder (JPF) is of highper<strong>for</strong>mance according to our experiments. We employ


JPF to per<strong>for</strong>m model checking in this step. JFP is opensource software, an explicit state model checking tool <strong>for</strong>Java code developed by NASA Ames Research Center. Itsimulates the JVM, automatically abstracts the finite statemodel from the code, and then checks whether there issome vulnerability in the model.6. Strategies to Improve Per<strong>for</strong>manceWe per<strong>for</strong>med several experiments on ourontology-based static analysis tool and found its lowper<strong>for</strong>mance issue [9]. The proportion of <strong>time</strong> consumedis shown in Figure 4, where up to 70% of <strong>time</strong> is used <strong>for</strong>reasoning, 20% is <strong>for</strong> loading, and 10% is used to parseJava source code and other processing.Figure 4.Time Allocations among three partsLet T be the total <strong>time</strong>, R the reasoning <strong>time</strong>, L theloading <strong>time</strong>, P the parsing <strong>time</strong>, and O the other <strong>time</strong>. Theproportion of <strong>time</strong> consumed can be calculated in the<strong>for</strong>mula: T=R+L+P+O. To improve its per<strong>for</strong>mance, wedesign strategies to cut down reasoning <strong>time</strong>, loading <strong>time</strong>and parsing <strong>time</strong>, respectively.system resources.JavaSourceCodeJavaBytecodeJavaSourceCodeJavaBytecodeCodeParserCodeParser……ASTASTSorted ASTsASTAST……ASTASTThread 1TreeParserThread nTreeParser……OWLModelOWLModelJessRuleEngineJessRuleEngineFigure 6.Multi-Threaded ArchitectureBugReport6.2. Sorting ASTs according to their SimilarityBugReportIf a reasoning engine processes ASTs in an order suchthat rules the ASTs need to check against are almost same,we will reduce the <strong>time</strong> <strong>for</strong> a reasoning engine to swap inand out those rules, there<strong>for</strong>e improving the reasoningper<strong>for</strong>mance. To reach this goal, we sort the ASTs be<strong>for</strong>eloading them into reasoning engine as follows:1) Get detection rule list <strong>for</strong> every source code file, andcreate an AST – rule relationship matrix as shown inFigure 7, where, X axis represents AST IDs, Y axisrepresents detection rule IDs. The entry with 1 meansthat the corresponding AST should be reasoned againstthe corresponding detection rule.6.1. Parallel ComputingThe previous version of the ontology-base staticanalysis tool is single-threaded; processing source code orbytecode files one by one as follows in Figure 5:JavaSourceCodeJavaBytecodeCodeParserASTTreeParserOWLModelJessRuleEngineFigure 5. Single-Thread ArchitectureBugReportWorkstations or PCs nowadays generally have morethan one CPU. We propose to change the single-threadedarchitecture to a multi-threading architecture to take theadvantage of multi-CPU. We use a thread pool to limit themaximum amount of the running threads. The thread poolmaintains a queue, which contains the IDs of runningthreads that parse concurrently multiple source code orbyte-code and produce a set of ASTs as shown in Figure 6.The ASTs are sorted using a sorting algorithm that will beintroduced in Section 6.2, and then the parallel computingcontinues.The size of the queue is the amount of the maximumrunning threads. The queue is checked several <strong>time</strong>s perseconds, and if there is a finished thread, it starts a newthread and puts it into the queue. In this way, we maintaina fixed amount of running threads, so that not only can weget the good per<strong>for</strong>mance, but also do not take up manyFigure 7. AST- Detection Rule Relationship Matrix2) Create an AST association matrix as shown in Figure 8,where X and Y axes both represent AST, and its entry isthe degree of association of the corresponding twoASTs in terms of detection rules that share. Let B(i, j)be the intersection of Rule i and Rule j, and C(i, j) theunion of Rule i and Rule j, the entry value is calculatedby <strong>for</strong>mula: AA[i][j] = B(i,j)÷C(i,j).Figure 8. AST matrix of association3) Sort ASTs using the following procedure:a) Find two ASTs with the maximum degree ofassociation, and add them to an array;b) Let H1 be the first element in the array, H2 be the


second element. For an AST that has not been addedinto the array, let D1 be the degree of association <strong>for</strong>H1 and the AST, and D2 be the degree of association<strong>for</strong> H2 and the AST, calculate a value using the <strong>for</strong>mula:D1 + D2β, (+β=1). By experiments, we find thatwhen equals to 0.6 and equals to 0.4, theper<strong>for</strong>mance gets the best. Find the maximum valueand the corresponding AST and insert the AST to thehead of the array.c) Let T1 be the last element in the array, T2 be the lastbut one element in the array. For an AST that has notbeen added into array, let D1 be value of degree ofassociation <strong>for</strong> T1 and the AST, and D2 be value ofdegree of association <strong>for</strong> T2 and the AST, calculate avalue using the above <strong>for</strong>mula, find the maximumvalue and the corresponding AST and insert this ASTto the end of array;d) Repeat step b and c until all ASTs have been added intothe array, which is the sorted AST list. The final orderof sorted ASTs is shown in Figure 9.Figure 9.Sorted AST ResultsIt is observed in Figure 9 that after the sorting, twoadjacent ASTs have the maximum number of shared rules.This implies that when the reasoning engine swifts fromprocessing one AST to another, it reduces to the extremethe number of rule swapping. There<strong>for</strong>e, we reach the goalto improve reasoning per<strong>for</strong>mance.6.3. Selective AST ExtractionIn the ontology-based static analysis, the loadingprocess consists of the activities: 1) instantiate an OWLmodel with the Java specification OWL model and theSWRL rules of defect detection; 2) parse Java source codeto construct an AST; and 3) extract OWL individuals andOWL properties from the AST to exemplify the OWLmodel. In the reasoning process, SWRL Bridge takes thisOWL model as an input, trans<strong>for</strong>ms it from OWL syntaxto Jess syntax, and passes the results on to Jess reasoningengine to do logic reasoning, and at last the engine returnsasserted results back to SWRL Bridge.Be<strong>for</strong>e reasoning we examine the security detectionrules and only extract the relevant or necessary OWLindividuals and OWL properties to instantiate the OWLmodel. Instead of loading all security detection rules intothe reasoning engine at a <strong>time</strong> as we did be<strong>for</strong>e in ourprevious paper [9], we change the design to load one ruleat a <strong>time</strong>, and the overall process is described as follows:1) Instantiate an OWL model only with the Javaspecification.2) Extract all detection rules from another OWL file withSWRL rules, which are created by Protégéin the phaseof Bug Pattern modeling [9].3) Load one rule and create the obligatory OWLindividuals and properties from an AST <strong>for</strong> the rule.4) Per<strong>for</strong>m SWRL-to-Jess syntax trans<strong>for</strong>mation usingSWRL Bridge, and run Jess reasoning engine.5) Clear up the loaded rule, OWL individuals andproperties, and prepare <strong>for</strong> the next rule.With this design, we per<strong>for</strong>med a simple evaluation ona Java source code with 1,800 lines of code, reasoning<strong>time</strong> is reduced from 63,141ms to 16,485ms, but loading<strong>time</strong> increases from 20,406ms to 176,144ms. It isobserved that the new reasoning <strong>time</strong> is only about onefourth of the old <strong>time</strong>. We achieve the goal of shorteningthe reasoning <strong>time</strong>. However, the loading <strong>time</strong> increases.After analyzing, we conclude that the increased loading<strong>time</strong> is mostly spent on repeatedly creating the same OWLindividuals and OWL properties, since every <strong>time</strong> a newrule is loaded, the necessary OWL individuals and OWLproperties associated with the rule are extracted andcreated. With the loop, when different rules are loaded,some same OWL individuals and OWL properties arecreated over and over again; a lot of <strong>time</strong> is spent on theoperation. To solve this problem, we propose a solution:after the previous rule is reasoned and be<strong>for</strong>e the next ruleis loaded, analyze the similarity between these two rulesand find out which OWL individuals and OWL propertiescan be reused to avoid the repeated creation.With this design, we modify the implementation, andcarry out an experiment on the same Java source code,reasoning <strong>time</strong> is reduced from 63,141ms to 16,485ms,and loading <strong>time</strong> is reduced from 20,406ms to 10,144ms.6.4. Selectively Rules LoadingWe can analyze the characteristics of the target sourcecode, and decide the necessary rules to reason against thetarget, in this way we can avoid loading and reasoning ofunnecessary rules. For example, consider the equalsmethod, when we check a Java source code that does notinvoke equals, it is unnecessary to load and run rulesrelated to equals method. We carried out an experiment toverify this assumption. The experiment target is a packageorg.apache.struts2.components of Struts project,containing 54 Java source files. The total defined rules are62; there<strong>for</strong>e, the <strong>time</strong>s to execute the rules are equal to54*62=3,348. However, we manually analyze these Javasource files, and find that the actual required <strong>time</strong>s areonly 328. It is obvious that selective loading of rules isvery necessary. Based on this consideration, we redesignthe overall process as follows:1) Categorize all defined rules to several groups. Rules in


one group may be related to the same methodinvocation, e.g., equals, hashCode, or related to thesame statement, e.g. if, switch, or describinginheritance errors and etc. Examples of rulescategorization are shown in Table 5. Record thecategorization results in an XML file with one keywordreferring to several rules.2) Analyze each Java source code and extractcharacteristics from the corresponding AST of thesource code. AST is a structured object and easier tomanipulate than source code does. As a result, weobtain a list of keywords describing characteristics ofthe target.3) Match the keywords of source code with the rulecategories.4) Load and run the matched rules to check on the sourcecode.Table 5: Categorization of RulesKeyword Rules MeaninghashCode DEF_HASHCODE_USE_OBJECT_EQUALShashCodeinvocationOverrideEqualsOrHashcodeMethodErrorChecker2toString ArrayToStringCheckerToStringCheckertoString invocation7. Evaluation of the Proposed ApproachBased on the research work described above, wedevelop a tool to find the security defects in programs.The tool is an eclipse plug-in, and runs on a Dell PC witha P8600 dual CPU, 3G memory and 160G hard disk. Wealso install onto the PC JDK1.4, Eclipse 3.2, and MySQL5.1. In the experiments, we demonstrate programexamples under detection only in Java language.7.1. Detection AccuracyThe first experiment is to evaluate the detectionaccuracy of the proposed hybrid approach. We develop aP2P client based on Bit Torrent protocol [19] as detectiontarget that contains 43 classes and 3,129 lines of code. Weinject about 74 security vulnerabilities into the target,which fall into seven types of security defects as shown inTable 6: 1) Singleton Pattern in a non-thread safe; 2) Innerclass containing sensitive data; 3) Passing Mutable objectsto un-trusted method; 4) Public static field not markedfinal; 5) Null pointer related; 6) Array index overflow; and7) Dead lock. The target is checked using threeapproaches: model checking only, static analysis only, andthe proposed hybrid approach.It is observed in the Table 6 that the accuracy of thehybrid approach is 98.6%, higher than the pure modelchecking with 36.5% and pure static analysis with 71.6%.Note that because there are some overlaps in null pointrelated defects, the total percentage of model checkingand static analysis is higher than 100%. The <strong>time</strong> cost issetting between, higher than pure static analysis, andlower than pure model checking. It is expected. We alsonotice that our tool cannot detect all the null point relatedsecurity defects. The reason is the driver classes cannot begenerated because of some parameters to invoke thecluster is constructed by Factory Design Pattern that hidesthe in<strong>for</strong>mation about the class created, and our currenttool can not properly generate the driver class. We willcontinue the research in the future work.Table 6: Comparison of Detection AccuracyVulnerabilityTypeInjectedDefectsModelcheckingStaticAnalysisHybridApproachType 1 7 N/A 7 7Type 2 3 N/A 3 3Type 3 34 N/A 34 34Type 4 3 N/A 3 3Type 5 18 18 6 17Type 6 5 5 N/A 5Type 7 4 4 N/A 4Total Detected 74 27 53 73Time(s) 2,137 224 1,792Accuracy 36.5% 71.6% 98.6%7.2. Time CostThe second experiment is to observe the trend of <strong>time</strong>consumes along with increasing the size of source code.The numbers of classes in the projects under detection are8, 14, 30, 62, 84, 195 and 274, respectively. And theirlines of code increase from 1,402 to 41,656. We use threeapproaches to per<strong>for</strong>m the experiments as shown in Figure10: 1) static analysis only; 2) model checking only; and 3)fuzzy inference based hybrid approach.Figure 10.Time consumed when increasing the sizeIt is observed from Figure 10 that the static analysisapproach consumes the lowest <strong>time</strong> compared with theother two, while the model checking approach withoutclustering runs out of memory when processing theprogram with 84 classes and cannot produce results anymore. On the other hand, the proposed hybrid approachcan gracefully handle the situations due to the introductionof clustering and fuzzy inference based selection amongstatic analysis and model checking. We apply the hybridapproach to other large sizes of programs and find out itcan handle up to 400 classes of programs.


7.3. Per<strong>for</strong>mance of Static AnalysisThis experiment examines the per<strong>for</strong>manceimprovement of the ontology based static analysis. Theprojects under detection are dom4j-1.6.1, Struts-2.1.6 andPMD-4.2.4. We compare the detection <strong>time</strong>s be<strong>for</strong>e andafter applying the strategies described in Section 6. Onaverage, about half the <strong>time</strong> is reduced as shown in Table7, in which code size refers to the number of source files.Table 7: Per<strong>for</strong>mance ImprovementProjectCode Be<strong>for</strong>e AfterSize (min) (min)Improvementdom4j-1.6.1 163 8 3.5 56.3%Struts-2.1.6 641 28 15 46.4%PMD-4.2.4 556 17 9 47.1%Compared with the results shown in Figure 4, the <strong>time</strong>allocation of the three parts is more balanced afterapplying the strategies as shown in Figure 11: reasoning<strong>time</strong> is 45%, loading <strong>time</strong> 28%, and parsing <strong>time</strong> andother 27%.Figure 11. Time Proportion of Static Analysis7.4. Time Distribution and ComparisonThis experiment reports the <strong>time</strong> consumes of fourcomputations in the hybrid approach: clustering, fuzzyinference, static analysis and model checking. We choosethree projects to examine: dom4j-1.6.1(163 classes), JavaPath Finder r1580 (329 classes), and Struts 1.2.7(onlycheck “src\share” folder that contains 272 classes). Table8 shows the <strong>time</strong> distributions among the fourcomputations.Table 8: Time Distribution of the Hybrid ApproachProjects ClusteringFuzzy Static Modelinference analysis checkingJPF 26% 1% 13% 60%Dom4j 15% 1% 16% 67%Struts 28% 1% 13% 59%Average 23% 1% 14% 62%It is observed that more than half the <strong>time</strong> is spent onmodel checking, which takes on average 62% of total <strong>time</strong>.Clustering aims at reducing the complexity of modelchecking, and burns up 23% of total <strong>time</strong>. Table 9 showsthe defects spotted by the two detection approachesdynamically selected by FISs. The experiment explainsthat static analysis discovers more defects using less <strong>time</strong>,and model checking identifies relatively less defects.Furthermore they defect different types of defects andcannot substitute with each other.Table 9: Detection Comparison7PKJPF Dom4j StrutsSA MC SA MC SA MCvalidation and 12 0 3 0 23 0representationAPI abuse 7 2 0 2 3 0security features 3 0 0 0 0 0<strong>time</strong> and state 9 19 0 0 2 6errors 2 0 3 0 0 0Code quality 0 6 11 1 3 1encapsulation 8 0 19 0 42 0environment 0 0 0 0 0 0Total 41 27 36 3 73 768 39 808. Related WorkThere are many organizations that provide securitydefect lists, including CVE, CWE, CAPEC [20], andSCAP [39]. The CAPEC (Common Attack PatternEnumeration and Classification) is to provide a publiclyavailable catalog of attack patterns along with acomprehensive schema and classification taxonomy. TheSCAP (Security Content Automation Protocol) is amethod using specific standards to enable automatedvulnerability management, measurement, and policycompliance evaluation.<strong>Tools</strong> <strong>for</strong> hunting security vulnerabilities include ITS4[29], Lint [34] and IBM Rational AppScan [27]. Cigitaldeveloped ITS4 to help automate source code review <strong>for</strong>security. ITS4 statically scans C and C++ source codelooking <strong>for</strong> function calls that are potentially dangerous.For some calls, ITS4 tries to per<strong>for</strong>m some code analysisto determine how risky the call is. In each case, ITS4provides a problem report, including a short description ofthe potential problem and suggestions on how to fix thecode. It is a command-line tool that works across Unixand Windows plat<strong>for</strong>ms.Lint flags suspicious and non-portable constructs (likelyto be defects) in C language source code. It is used tocheck C code <strong>for</strong> errors that may cause a compilationfailure or unexpected results at run<strong>time</strong>. The Lint programissues every error and warning message produced by theC compiler. Many messages issued by Lint can assist inimproving program's effectiveness, including reducing itssize and required memory.IBM Rational AppScan is based on model checkingtechnology. It knows security in-depth and can exploitsecurity defects to prove the impact. It is an eclipseplug-in <strong>for</strong> Rational Quality Manager and allows QAteams to manage Security <strong>Test</strong>ing just like they manageQuality and Per<strong>for</strong>mance testing.Model checking method emerges as a practical tool <strong>for</strong>automated debugging of embedded software. There aresome survey [10] and some sample model checkers [11],and also some applications to analyze software Java


PathFinder [30], Bandera [18], SLAM [40].Our method combines static analysis and modelchecking technologies together to take the advantages ofthe two approaches. By introducing strategies to improvethe per<strong>for</strong>mance of ontology based static analysis,developing clustering algorithm to reduce the size oftarget programs, fuzzy inference guided hybrid approachdetect more defects within limited <strong>time</strong> frame.9. Conclusion and Future WorkThis paper proposes a hybrid approach to automaticallydetect security defects in programs. We employ thechi-square method to extract the features from the securitydefect descriptions and map the features into theprogramming language characteristics. Fuzzy inference isused to intelligently select model checking or staticanalysis according to the features of programs underdetection. We develop a cluster algorithm to divide awhole system into small pieces in order to apply modelchecking approach when it runs out of computingresources. We use an ontology rule language, SWRL, tobuild security defect-detection rule base. Severalstrategies are proposed to improve the per<strong>for</strong>mance of theontology based static analysis. Finally we develop aprototype tool and carry out the experiments to evaluatethe proposed hybrid approach. The result shows that ourtool can achieve the highest detection accuracy in terms ofinjected defects compared with the two individualapproaches, and significantly cut down the reasoning andloading <strong>time</strong> when per<strong>for</strong>ming static analysis. In addition,the hybrid approach can discover more types of securitydefects than any of two individual approaches. Althoughclustering helps reduce the size the detection targets,model checking related computing is still taking up to80% of the total <strong>time</strong>. Improving the per<strong>for</strong>mance ofmodel checking is our future work.References[1] G. McGraw, Software security, Addison-Wesley, 2006.[2] D. Engler and M. Musuvathi. Static analysis versussoftware model checking <strong>for</strong> bug finding. In Invited paper:Fifth International Conference on Verification, ModelChecking, and Abstract Interpretation (VMCAI04), pp.191-210, Jan. 2004.[3] B. Chess and G. McGraw. Static analysis <strong>for</strong> security. IEEESecurity and Privacy, 2(6), pp. 76-79, 2004.[4] Brian Chess, Jacob West. (1997) Secure Programming withstatic analysis, Addision Wesley Professional.[5] Gray McGraw. (2006)Software security: Building securityin, Addision Wesley Professional[6] O. H. Alhazmi, Y. K. Malaiya and I. Ray, SecurityVulnerabilities in Software Systems: A QuantitativePerspective, IFIP WG 11.3 Working Conference on Dataand Applications Security, Aug. 2005.[7] G. Miller, “The Magic Number Seven, Plus or MinusTwo,” The Psychological Rev., vol. 63, 1956, pp. 81-97.[8] Yiming Yang and Jan O. Pedersen. A Comparative Studyon Feature Selection in Text Categorization. In Invitedpaper: Proceedings of the ninth international workshop onMachine learning, pp. 249-256,1992.[9] Lian Yu, Jun Zhou, Yue Yi, Ping Li, Qianxiang Wang,"Ontology Model-Based Static Analysis on JavaPrograms," The 32nd Annual IEEE International ComputerSoftware and Applications Conference, 2008. pp.92-99.[10] E.M. Clarke and R.P. Kurshan. Computer-aidedverification. IEEE Spectrum, 33(6), pp. 61-67, 1996.[11] G.J. Holzmann. The model checker SPIN. IEEE Trans.Software Engineering, 23(5), pp. 279-295, 1997.[12] N. Rutar, C. B. Almazan and J. S. Foster, A Comparison ofBug Finding <strong>Tools</strong> <strong>for</strong> Java, 15th IEEE Int. Symp. onSoftware Reliability Eng. (ISSRE'04), France, Nov 2004.[13] A. Ozment and S. E. Schechter, Milk or Wine: DoesSoftware Security Improve with Age? 2006 UsenixSecurity Symposium, Vancouver, B.C., Canada, 31 July-4Aug. 2006.[14] Michael Howard, David LeBlanc, John Viega. (2005) 19Deadly Sins of Software Security, McGraw Hill.[15] McGraw, G., Chess, B., Tsipenyuk, K., “Seven PerniciousKingdoms: A Taxonomy of Software Security Errors”.“NIST Workshop on Software Security Assurance <strong>Tools</strong>,Techniques, and Metrics,” November, 2005 Long Beach,CA.[16] G Castellano, AM Fanelli, C Mencar,Design of transparentMamdani fuzzy inference systems, Design and Applicationof Hybrid Intelligent Systems, 2003[17] Okyay Kaynak, Computational Intelligence:SoftComputing and Fuzzy-Neuro Integration with applications ,North Atlantic Treaty Organization, 1998.[18] Bandera. http://bandera.projects.cis.ksu.edu/[19] BitTorrent. http://www.bittorrent.com/[20] CAPEC. http://capec.mitre.org/[21] CVE. http://cve.mitre.org/[22] CWE. http://cwe.mitre.org/[23] ESCJava http://kind.ucd.ie/products/opensource/ESCJava2/.[24] Feaver. http://cm.bell-labs.com/cm/cs/what/feaver/[25] FindBugs. http://findbugs.source<strong>for</strong>ge.net/.[26] FlawFinder. http://www.dwheeler.com/flawfinder/[27] IBM Rational AppScan[28] http://www-01.ibm.com/software/awdtools/appscan/[29] ITS4. http://www.cigital.com/its4/[30] Java Path Finder http://javapathfinder.source<strong>for</strong>ge.net/[31] Jess. http://www.jessrules.com/[32] JLint. http://artho.com/jlint.[33] JPF. http://jpf.source<strong>for</strong>ge.net/[34] Lint. http://en.wikipedia.org/wiki/Lint_programming_tool[35] OWASP Top 10.http://www.owasp.org/index.php/OWASP_Top_Ten_Project[36] OWL. http://www.w3.org/TR/owl-features/[37] PMD/Java. http://pmd.source<strong>for</strong>ge.net.[38] RATS. http://www.<strong>for</strong>tify.com/security-resources/rats.jsp[39] SCAP. http://nvd.nist.gov/scap.cfm[40] SLAM. http://research.microsoft.com/en-us/projects/slam/[41] SPIN. http://spinroot.com/[42] SMV. http://www.cs.cmu.edu/~modelcheck/smv.html[43] SWRL: http://www.w3.org/TR/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!