12.07.2015 Views

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2. Flow AttributesFlow <strong>at</strong>tributes are used to describe a flow. In the relevantresearch liter<strong>at</strong>ure, flow <strong>at</strong>tributes are often called fe<strong>at</strong>ures, orcharacteristics. They can be values from the fields in headersof packets. They can be counters (total bytes, total packets,etc.) or summary <strong>at</strong>tributes such as average, median, andvariance. They can also be discrete distribution <strong>at</strong>tributes,which estim<strong>at</strong>e the probability repartition of certain variables.Discrete distribution <strong>at</strong>tributes are often useful to observep<strong>at</strong>terns otherwise missed by simple st<strong>at</strong>istics.5. Rel<strong>at</strong>ed WorkThe pressing need for altern<strong>at</strong>ives to correctly identifynetwork activities has <strong>at</strong>tracted <strong>at</strong>tention in the researchcommunity. Some novel approaches are being proposed torecognize the traffic based on its behaviour [1][2][3][4][5].However the classific<strong>at</strong>ion methods proposed take as inputbasic flow fe<strong>at</strong>ures (e.g. average packet size, flow dur<strong>at</strong>ion,recurring use of addresses/ports). While such approaches havethe advantage of using inform<strong>at</strong>ion th<strong>at</strong> current flow collectorsprovide, we argue th<strong>at</strong> it is necessary to continue to search forother flow fe<strong>at</strong>ures th<strong>at</strong> better characterize traffic.Found<strong>at</strong>ion research work in identifying discrimin<strong>at</strong>ive flowfe<strong>at</strong>ures has been documented in [6] [7] [8] [9], and morerecently [10]. Lee and Stolfo [6] analyzed the DARPA d<strong>at</strong>a[11], and identified 41 <strong>at</strong>tributes of interest to NetworkIntrusion Detection System (NIDS) technologies. Dunigan etal. [7] [8] proposed a multidimensional “binning” process tosort the packets, and applied multivari<strong>at</strong>e analysis to reducethe flow <strong>at</strong>tributes to the three “bins” th<strong>at</strong> showed the gre<strong>at</strong>estvari<strong>at</strong>ion among all known flow types. Paxson and Zhang [9]developed a set of heuristics to identify keystroke-interactiveconnections by testing packet size, timing, and directionalityagainst preset criteria.Lastly, in parallel to our work, Hernández-Campos et al.[10] proposed an approach to cluster traffic flows based on aset of st<strong>at</strong>istical <strong>at</strong>tributes. The novelty in their approach is notin the flow <strong>at</strong>tributes themselves but r<strong>at</strong>her on the use of a unitof d<strong>at</strong>a which is different from a “packet”. The unit, called theApplic<strong>at</strong>ion-D<strong>at</strong>a Unit (ADU), may contain several packets.Instead of modeling the p<strong>at</strong>terns of packet exchanges, theymodel the p<strong>at</strong>terns of ADU exchanges.6. Our Technical ApproachIn this work we explored discrimin<strong>at</strong>ive flow fe<strong>at</strong>ures th<strong>at</strong>portray essential communic<strong>at</strong>ion dynamics, based solely oninform<strong>at</strong>ion th<strong>at</strong> can be g<strong>at</strong>hered from monitoring packetheaders. We developed a proof of concept tool based on theseindic<strong>at</strong>ors to help identify the network activities, and if trafficis not recognized, then to provide useful insight into the trafficbehaviour. This emphasis on insight into traffic behaviourdistinguishes this work from rel<strong>at</strong>ed work in detecting known<strong>at</strong>tacks and vari<strong>at</strong>ions using intrusion detection techniques.The approach focuses on lightweight characteriz<strong>at</strong>ionmetrics. The analysis is confined to headers <strong>at</strong> the network andtransport layers (IP and TCP/UDP). The methodology can beviewed as a three step process as outlined:1) Packets are grouped into flows. Each flow isidentified by a 5-tuple defined by the IP protocol,IP addresses of the Origin<strong>at</strong>or and the Responder,and the two TCP/UDP ports numbers involved.2) Characteristics (fe<strong>at</strong>ures) are measured on eachflow. The output is a set of flow records in whichall flow records are summarized with the same setof flow <strong>at</strong>tributes.3) Flows are Recognized and Described. Based onthe characteristics obtained during step 2, we tageach flow with two properties: the applic<strong>at</strong>ionrecognized (if any) and a Flow Description basedon the traffic behaviour.Our main contribution to the research is in step 2. In totalwe use about 40 flow <strong>at</strong>tributes by which different types ofapplic<strong>at</strong>ions can be distinguished. A technical report [12] wasprepared in which we describe all flow fe<strong>at</strong>ures in detailsalong with the metrics to quantify the values.The process of developing our flow fe<strong>at</strong>ures was gre<strong>at</strong>lyinspired by the work of Paxson and Zhang [9] for detectinginteractivity (human control). With the exception of a fewminor differences, the interactive indic<strong>at</strong>ors we use areessentially those of Paxson and Zhang [9]. We however derivetwo distinct classes of human-driven packet transmission:keystroke transmission and command-line transmission.Command-line transmissions are larger in size and aresepar<strong>at</strong>ed by longer delays than keystrokes. The distinctionbetween command-line and keystroke interactivity helps refinethe classific<strong>at</strong>ion process a step further. FTP command forinstance, can be distinguished from interactive SSH andTELNET sessions; and it is foreseen th<strong>at</strong> ch<strong>at</strong> sessions will beclassed differently depending on the “flavour”. We contributefurther by developing heuristics th<strong>at</strong> capture other distinctivecharacteristics such as convers<strong>at</strong>ion, transaction, d<strong>at</strong>a transfer,as well as by deriving sign<strong>at</strong>ures from observable p<strong>at</strong>terns.The goal in defining flow <strong>at</strong>tributes is to identify not onlythe relevant characteristics but also the proper way to measurethem. In particular our findings indic<strong>at</strong>e th<strong>at</strong> while averagepacket size offers little discrimin<strong>at</strong>ive power whendistinguishing among network applic<strong>at</strong>ions, characterizingpacket size using a discrete distribution allows us to observedistinctive p<strong>at</strong>terns such as the existence of a minimumpayload size per packet due to applic<strong>at</strong>ion header length; highfrequency of packets of special sizes due to applic<strong>at</strong>ionnegoti<strong>at</strong>ion mechanisms; and gaps in the distribution rangedue to applic<strong>at</strong>ion preferential packet sizes.Another observ<strong>at</strong>ion we made is th<strong>at</strong> p<strong>at</strong>terns in the packetdirection dynamics stand out more clearly when we remove,from the sequence of packets, those th<strong>at</strong> contain no payload.Over an established TCP connection, these TCP packets aretransmitted to simply acknowledge having received d<strong>at</strong>a. This640

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!