NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

More documents

Recommendations

Info

4.2. Flow AttributesFlow attributes are used to describe a flow. In the relevantresearch literature, flow attributes are often called features, orcharacteristics. They can be values from the fields in headersof packets. They can be counters (total bytes, total packets,etc.) or summary attributes such as average, median, andvariance. They can also be discrete distribution attributes,which estimate the probability repartition of certain variables.Discrete distribution attributes are often useful to observepatterns otherwise missed by simple statistics.5. Related WorkThe pressing need for alternatives to correctly identifynetwork activities has attracted attention in the researchcommunity. Some novel approaches are being proposed torecognize the traffic based on its behaviour [1][2][3][4][5].However the classification methods proposed take as inputbasic flow features (e.g. average packet size, flow duration,recurring use of addresses/ports). While such approaches havethe advantage of using information that current flow collectorsprovide, we argue that it is necessary to continue to search forother flow features that better characterize traffic.Foundation research work in identifying discriminative flowfeatures has been documented in [6] [7] [8] [9], and morerecently [10]. Lee and Stolfo [6] analyzed the DARPA data[11], and identified 41 attributes of interest to NetworkIntrusion Detection System (NIDS) technologies. Dunigan etal. [7] [8] proposed a multidimensional “binning” process tosort the packets, and applied multivariate analysis to reducethe flow attributes to the three “bins” that showed the greatestvariation among all known flow types. Paxson and Zhang [9]developed a set of heuristics to identify keystroke-interactiveconnections by testing packet size, timing, and directionalityagainst preset criteria.Lastly, in parallel to our work, Hernández-Campos et al.[10] proposed an approach to cluster traffic flows based on aset of statistical attributes. The novelty in their approach is notin the flow attributes themselves but rather on the use of a unitof data which is different from a “packet”. The unit, called theApplication-Data Unit (ADU), may contain several packets.Instead of modeling the patterns of packet exchanges, theymodel the patterns of ADU exchanges.6. Our Technical ApproachIn this work we explored discriminative flow features thatportray essential communication dynamics, based solely oninformation that can be gathered from monitoring packetheaders. We developed a proof of concept tool based on theseindicators to help identify the network activities, and if trafficis not recognized, then to provide useful insight into the trafficbehaviour. This emphasis on insight into traffic behaviourdistinguishes this work from related work in detecting knownattacks and variations using intrusion detection techniques.The approach focuses on lightweight characterizationmetrics. The analysis is confined to headers at the network andtransport layers (IP and TCP/UDP). The methodology can beviewed as a three step process as outlined:1) Packets are grouped into flows. Each flow isidentified by a 5-tuple defined by the IP protocol,IP addresses of the Originator and the Responder,and the two TCP/UDP ports numbers involved.2) Characteristics (features) are measured on eachflow. The output is a set of flow records in whichall flow records are summarized with the same setof flow attributes.3) Flows are Recognized and Described. Based onthe characteristics obtained during step 2, we tageach flow with two properties: the applicationrecognized (if any) and a Flow Description basedon the traffic behaviour.Our main contribution to the research is in step 2. In totalwe use about 40 flow attributes by which different types ofapplications can be distinguished. A technical report [12] wasprepared in which we describe all flow features in detailsalong with the metrics to quantify the values.The process of developing our flow features was greatlyinspired by the work of Paxson and Zhang [9] for detectinginteractivity (human control). With the exception of a fewminor differences, the interactive indicators we use areessentially those of Paxson and Zhang [9]. We however derivetwo distinct classes of human-driven packet transmission:keystroke transmission and command-line transmission.Command-line transmissions are larger in size and areseparated by longer delays than keystrokes. The distinctionbetween command-line and keystroke interactivity helps refinethe classification process a step further. FTP command forinstance, can be distinguished from interactive SSH andTELNET sessions; and it is foreseen that chat sessions will beclassed differently depending on the “flavour”. We contributefurther by developing heuristics that capture other distinctivecharacteristics such as conversation, transaction, data transfer,as well as by deriving signatures from observable patterns.The goal in defining flow attributes is to identify not onlythe relevant characteristics but also the proper way to measurethem. In particular our findings indicate that while averagepacket size offers little discriminative power whendistinguishing among network applications, characterizingpacket size using a discrete distribution allows us to observedistinctive patterns such as the existence of a minimumpayload size per packet due to application header length; highfrequency of packets of special sizes due to applicationnegotiation mechanisms; and gaps in the distribution rangedue to application preferential packet sizes.Another observation we made is that patterns in the packetdirection dynamics stand out more clearly when we remove,from the sequence of packets, those that contain no payload.Over an established TCP connection, these TCP packets aretransmitted to simply acknowledge having received data. This640
observation was used when deriving heuristics to quantifytransaction and conversation episodes and when derivingsignatures of directionality in the beginning ofcommunications.We showed with step 3 of our approach that when flowattributes are meaningful and discriminative, lightweight rulesets can be defined to classify flows. While broad classes oftraffic can be defined based on our flow attributes, we findthat the discriminative power of those features is strongenough to distinguish among similar traffic flows (e.g.distinguish among e-mail protocols). To demonstrate this wehave developed simple rule sets to distinguish amongcommonly used protocols such as FTPdata, HTTP, HTTPS,IMAP, POP, SMTP, FTPcontrol, RLOGIN, SSH, andTELNET. In this initial study, the flow features selected toderive the profiles were chosen “manually”. That is theselection was done based on knowledge of the protocols; andthe thresholds have been chosen from analyzing samples offlows collected from our test environment.Because we started with discriminative features, onlyminimal effort was required to derive the profiles. Therecognition in step 3 is typically based on “give-away”features (e.g. which of the Originator and the Responder sendsthe first non-empty packet) while the description, whichprovides insight about the nature of the communication, isbased on “behavioural” features (e.g. indicators ofinteractivity, conversation, transaction). A given applicationmay receive different descriptions depending on its use. Inparticular, ssh may in some cases be used as an interactivecontrol application, and in other cases may be called from ascript to perform a file transfer. The outcome of step 3 in thessh example would be to mark the flow as the SSH protocoland provide a description that would help an analyst determinehow ssh was used. The flow descriptor is particularly useful inproviding insights about unrecognized networked applications.For instance, a flow described as “persistent, bidirectional,command-line interactive, conversational” could indicate achat session.7. Current StateA prototype system is in an early stage of development,where the focus is on developing the metrics to achievereliable and meaningful results from examination of networkflow parameters.To evaluate early in the research process the reliability ofour approach, we have tested the flow recognition capabilityof the tool on traffic traces collected from a campus researchnetwork and found the results quite encouraging. Theevaluation experiment provided some guidelines fordeveloping profiles of other types of network applications andrefining those previously developed.We have also tested the prototype against a number ofsubverting tools that deliberately masquerade the traffic toappear as HTTP [13][14][15]. Experiments with these tools indifferent scenarios such as chat sessions, remote controlsessions, file transfers, and e-mail, showed that the HTTPdisguise failed in almost all cases.8. Challenges to Research ProgressChallenges for this research work are not insignificant.Recognizing malicious traffic in high-speed, high-volumenetworks and within protocols that obscure the details of theinformation carried are the primary challenges. While offlinetools are appropriate for examining captured traffic in forensicor similar contexts, management of operational networksrequires tools and techniques that cannot be overcome by thevolume of traffic seen by the tool in real-time.It is often difficult to obtain research and operational datathat can be used for analysis and testing algorithms. Good“clean” data, as well as data containing malicious traffic isessential to further this and related work.9. Future WorkOur focus so far has been placed on identifying flowattributes that are useful in characterizing network traffic.Much remains to be done. In particular we plan to examine anumber of applications not yet considered (e.g. VoIP, peer topeer, gaming traffic, networked applications using webservices); refine the profiles for the protocols studied so far;identify broader classes of traffic in order to characterize awide range of network services currently used on the Internet;identify a small subset of the most important flow features inclassifying traffic without jeopardizing the accuracy; andexamine the possibility of estimating the flow attributes innear real-time as opposed to measuring them over aconnection’s total lifetime. We will undertake these tasks infuture work and have recently initiated a related project withVirtual Private Networks (VPN) for which one of the goals isto determine how the encryption layer alters the characteristicsmeasured by the flow features. The study will focus on trafficproduced by commonly available business grade VPNequipment.10. ConclusionsA majority of the currently available tools that provideinformation on network usage rely on well-known portnumbers to identify the network services. Some of thesemonitoring tools also use protocol-aware mechanisms basedon payload decoding. The authors believe that the practice ofcloaking attacks to appear as innocuous types of applicationswill greatly accelerate. Mechanisms that infer the true natureof information flows based on traffic behaviour can be use toincrease the level of confidence in the monitoring tools forsecurity practitioners and researchers.The flow attributes developed by our research work havediscriminative power and provide insight into the networkactivity. When flow attributes are well defined, lightweightrule sets based on these attributes can be defined to classify641
Page 1: NETWORK TRAFFIC FLOW ANALYSISAnnie

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

Create successful ePaper yourself

Delete template?

Save as template?