NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

NETWORK TRAFFIC FLOW ANALYSISAnnie De Montigny-LeboeufCommunications Research Centre (CRC)An Agency of Industry Canadaemail: annie.demontigny@crc.caTim SymchychCommunications Research Centre (CRC)An Agency of Industry Canadaemail: tim.symchych@crc.caAbstractThousands of diverse applications and services flow dailyover networks used by governments, industry, and privateusers. Attacks can be hidden within these information flows bydisguising malicious network traffic to appear to belegitimate. Generally, TCP or UDP based protocols can bemapped to specific network services. However, intruders dohide unauthorized activity by using non-standard protocols orstandard protocols in non-standard ways to avoid detection.This paper describes current work and future directionsthat the Network Security Research Group at theCommunication Research Centre (CRC) will take to identifyflows of information that disguise attacks. Researchchallenges include uncovering unauthorized activities in highspeed,high-volume network links and within protocols thatare intended to obscure the details of the information carried.Keywords: traffic flow analysis; network security, trafficclassification.1. IntroductionThe Network Security Research Group at theCommunications Research Centre (CRC) Canada conductsresearch in technologies and techniques to advance the currentstate of network security. We are proposing an approach fortraffic characterization through the use of meaningful flowattributes that do not rely on access to payload nor depend onIP port numbers.The approach taken does not require a priori knowledge ofthe protocols in use on a network, nor require that the payloadof packets be visible to the monitor. To identify signature-likefeatures of common applications and services, algorithmswere developed based on lightweight characterization metrics.These metrics have discriminative power and also provideinsight into the traffic behaviour to help an analyst investigatesuspicious flows. The analysis is confined to headers at thenetwork, and transport layers, thus the analysis does notdepend on access to application data.2. Intruder and InsiderIntrusion detection systems (IDS) and the more recentintrusion prevention systems (IPS) are still considered the firstline of defence used to identify attacks on networkedapplications, network services and infrastructure. However, ifan attack were successful, it is imperative that the aftermath ofthe attack, including ongoing malicious activity such asdisguised “tunnels”, be detected.A second important type of malicious activity that must bedetected within the protective layers, even if there is nosuccessful attack, is the activity of the inside user who may beauthorized to use the network, but chooses to use unauthorizedapplications and services, thus contravening organizationalsecurity policy.Although the motivation and methods of the maliciousintruder and authorized insider may differ, the result of theseactivities is that the organization is at risk. The research workdescribed herein is an attempt to identify and mitigate thatrisk.3. Classification of Network TrafficClassifying network traffic according to the applicationsproducing the traffic is an important aspect of networkmonitoring. However developing a dependable method forclassifying flows is a difficult problem that requires extensiveresearch. We no longer can reliably identify applications basedon the port numbers now that a growing number ofapplications have the ability to disguise their activity throughthe use of arbitrary ports. While payload analysis is a possibleapproach, it can be resource intensive if exhaustive payloadexamination is performed; or easily defeated if only minimaldecoding is done. Given the shortcomings of using knownports or payload analysis to identify flows, it is clear thatanother approach is needed.4.1. Flow Specification4. The Notion of FlowAs part of this research work, we examined bidirectionalTCP and UDP flows. A number of the attributes we derivedare measured in each direction separately. Flows wereidentified using a 5-tuple key, defined by the IP protocol, IPaddresses of the Originator and the Responder, and the twoTCP/UDP port numbers involved. The Originator is the senderof the first packet captured from a flow. For simplicity, wechose a fixed inactivity timeout of 60 seconds to terminateflows. Thus, a flow terminates if 60 seconds has elapsed sincethe last packet belonging to that flow was captured.1-4244-0038-4 2006IEEE CCECE/CCGEI, Ottawa, May 2006639

4.2. Flow AttributesFlow attributes are used to describe a flow. In the relevantresearch literature, flow attributes are often called features, orcharacteristics. They can be values from the fields in headersof packets. They can be counters (total bytes, total packets,etc.) or summary attributes such as average, median, andvariance. They can also be discrete distribution attributes,which estimate the probability repartition of certain variables.Discrete distribution attributes are often useful to observepatterns otherwise missed by simple statistics.5. Related WorkThe pressing need for alternatives to correctly identifynetwork activities has attracted attention in the researchcommunity. Some novel approaches are being proposed torecognize the traffic based on its behaviour [1][2][3][4][5].However the classification methods proposed take as inputbasic flow features (e.g. average packet size, flow duration,recurring use of addresses/ports). While such approaches havethe advantage of using information that current flow collectorsprovide, we argue that it is necessary to continue to search forother flow features that better characterize traffic.Foundation research work in identifying discriminative flowfeatures has been documented in [6] [7] [8] [9], and morerecently [10]. Lee and Stolfo [6] analyzed the DARPA data[11], and identified 41 attributes of interest to NetworkIntrusion Detection System (NIDS) technologies. Dunigan etal. [7] [8] proposed a multidimensional “binning” process tosort the packets, and applied multivariate analysis to reducethe flow attributes to the three “bins” that showed the greatestvariation among all known flow types. Paxson and Zhang [9]developed a set of heuristics to identify keystroke-interactiveconnections by testing packet size, timing, and directionalityagainst preset criteria.Lastly, in parallel to our work, Hernández-Campos et al.[10] proposed an approach to cluster traffic flows based on aset of statistical attributes. The novelty in their approach is notin the flow attributes themselves but rather on the use of a unitof data which is different from a “packet”. The unit, called theApplication-Data Unit (ADU), may contain several packets.Instead of modeling the patterns of packet exchanges, theymodel the patterns of ADU exchanges.6. Our Technical ApproachIn this work we explored discriminative flow features thatportray essential communication dynamics, based solely oninformation that can be gathered from monitoring packetheaders. We developed a proof of concept tool based on theseindicators to help identify the network activities, and if trafficis not recognized, then to provide useful insight into the trafficbehaviour. This emphasis on insight into traffic behaviourdistinguishes this work from related work in detecting knownattacks and variations using intrusion detection techniques.The approach focuses on lightweight characterizationmetrics. The analysis is confined to headers at the network andtransport layers (IP and TCP/UDP). The methodology can beviewed as a three step process as outlined:1) Packets are grouped into flows. Each flow isidentified by a 5-tuple defined by the IP protocol,IP addresses of the Originator and the Responder,and the two TCP/UDP ports numbers involved.2) Characteristics (features) are measured on eachflow. The output is a set of flow records in whichall flow records are summarized with the same setof flow attributes.3) Flows are Recognized and Described. Based onthe characteristics obtained during step 2, we tageach flow with two properties: the applicationrecognized (if any) and a Flow Description basedon the traffic behaviour.Our main contribution to the research is in step 2. In totalwe use about 40 flow attributes by which different types ofapplications can be distinguished. A technical report [12] wasprepared in which we describe all flow features in detailsalong with the metrics to quantify the values.The process of developing our flow features was greatlyinspired by the work of Paxson and Zhang [9] for detectinginteractivity (human control). With the exception of a fewminor differences, the interactive indicators we use areessentially those of Paxson and Zhang [9]. We however derivetwo distinct classes of human-driven packet transmission:keystroke transmission and command-line transmission.Command-line transmissions are larger in size and areseparated by longer delays than keystrokes. The distinctionbetween command-line and keystroke interactivity helps refinethe classification process a step further. FTP command forinstance, can be distinguished from interactive SSH andTELNET sessions; and it is foreseen that chat sessions will beclassed differently depending on the “flavour”. We contributefurther by developing heuristics that capture other distinctivecharacteristics such as conversation, transaction, data transfer,as well as by deriving signatures from observable patterns.The goal in defining flow attributes is to identify not onlythe relevant characteristics but also the proper way to measurethem. In particular our findings indicate that while averagepacket size offers little discriminative power whendistinguishing among network applications, characterizingpacket size using a discrete distribution allows us to observedistinctive patterns such as the existence of a minimumpayload size per packet due to application header length; highfrequency of packets of special sizes due to applicationnegotiation mechanisms; and gaps in the distribution rangedue to application preferential packet sizes.Another observation we made is that patterns in the packetdirection dynamics stand out more clearly when we remove,from the sequence of packets, those that contain no payload.Over an established TCP connection, these TCP packets aretransmitted to simply acknowledge having received data. This640

observation was used when deriving heuristics to quantifytransaction and conversation episodes and when derivingsignatures of directionality in the beginning ofcommunications.We showed with step 3 of our approach that when flowattributes are meaningful and discriminative, lightweight rulesets can be defined to classify flows. While broad classes oftraffic can be defined based on our flow attributes, we findthat the discriminative power of those features is strongenough to distinguish among similar traffic flows (e.g.distinguish among e-mail protocols). To demonstrate this wehave developed simple rule sets to distinguish amongcommonly used protocols such as FTPdata, HTTP, HTTPS,IMAP, POP, SMTP, FTPcontrol, RLOGIN, SSH, andTELNET. In this initial study, the flow features selected toderive the profiles were chosen “manually”. That is theselection was done based on knowledge of the protocols; andthe thresholds have been chosen from analyzing samples offlows collected from our test environment.Because we started with discriminative features, onlyminimal effort was required to derive the profiles. Therecognition in step 3 is typically based on “give-away”features (e.g. which of the Originator and the Responder sendsthe first non-empty packet) while the description, whichprovides insight about the nature of the communication, isbased on “behavioural” features (e.g. indicators ofinteractivity, conversation, transaction). A given applicationmay receive different descriptions depending on its use. Inparticular, ssh may in some cases be used as an interactivecontrol application, and in other cases may be called from ascript to perform a file transfer. The outcome of step 3 in thessh example would be to mark the flow as the SSH protocoland provide a description that would help an analyst determinehow ssh was used. The flow descriptor is particularly useful inproviding insights about unrecognized networked applications.For instance, a flow described as “persistent, bidirectional,command-line interactive, conversational” could indicate achat session.7. Current StateA prototype system is in an early stage of development,where the focus is on developing the metrics to achievereliable and meaningful results from examination of networkflow parameters.To evaluate early in the research process the reliability ofour approach, we have tested the flow recognition capabilityof the tool on traffic traces collected from a campus researchnetwork and found the results quite encouraging. Theevaluation experiment provided some guidelines fordeveloping profiles of other types of network applications andrefining those previously developed.We have also tested the prototype against a number ofsubverting tools that deliberately masquerade the traffic toappear as HTTP [13][14][15]. Experiments with these tools indifferent scenarios such as chat sessions, remote controlsessions, file transfers, and e-mail, showed that the HTTPdisguise failed in almost all cases.8. Challenges to Research ProgressChallenges for this research work are not insignificant.Recognizing malicious traffic in high-speed, high-volumenetworks and within protocols that obscure the details of theinformation carried are the primary challenges. While offlinetools are appropriate for examining captured traffic in forensicor similar contexts, management of operational networksrequires tools and techniques that cannot be overcome by thevolume of traffic seen by the tool in real-time.It is often difficult to obtain research and operational datathat can be used for analysis and testing algorithms. Good“clean” data, as well as data containing malicious traffic isessential to further this and related work.9. Future WorkOur focus so far has been placed on identifying flowattributes that are useful in characterizing network traffic.Much remains to be done. In particular we plan to examine anumber of applications not yet considered (e.g. VoIP, peer topeer, gaming traffic, networked applications using webservices); refine the profiles for the protocols studied so far;identify broader classes of traffic in order to characterize awide range of network services currently used on the Internet;identify a small subset of the most important flow features inclassifying traffic without jeopardizing the accuracy; andexamine the possibility of estimating the flow attributes innear real-time as opposed to measuring them over aconnection’s total lifetime. We will undertake these tasks infuture work and have recently initiated a related project withVirtual Private Networks (VPN) for which one of the goals isto determine how the encryption layer alters the characteristicsmeasured by the flow features. The study will focus on trafficproduced by commonly available business grade VPNequipment.10. ConclusionsA majority of the currently available tools that provideinformation on network usage rely on well-known portnumbers to identify the network services. Some of thesemonitoring tools also use protocol-aware mechanisms basedon payload decoding. The authors believe that the practice ofcloaking attacks to appear as innocuous types of applicationswill greatly accelerate. Mechanisms that infer the true natureof information flows based on traffic behaviour can be use toincrease the level of confidence in the monitoring tools forsecurity practitioners and researchers.The flow attributes developed by our research work havediscriminative power and provide insight into the networkactivity. When flow attributes are well defined, lightweightrule sets based on these attributes can be defined to classify641

flows. In this work, much of the effort so far has beenconcentrated on identifying meaningful flow attributes.Preliminary assessment indicates the proof-of-concept toolis useful as is, and may lead, with further research, to anumber of applications.Mechanisms that allow us to infer the true nature ofinformation flows based on traffic behaviour can be used toincrease the level of confidence in the monitoring tools and inour networks. We believe that the flow features derived duringthis study will prove to be useful to other researchers in thefield.References[1] J. P. Early, C. E. Brodley, C. Rosenberg, “BehavioralAuthentication of Server Flows”, Proc. of the AnnualComputer Security Applications Conference (ACSAC2003), Las Vagas, NV, USA, December 2003.[2] M. Roughan, S. Sen, O. Spatscheck, N. G. Duffield,“Class-of-service mapping for QoS: a statistical signaturebasedapproach to IP traffic classification”, Proc of theConference on internet Measurement (IMC 04), pp. 135-148, Taormina, Sicily, Italy, October 2004.[3] T. Karagiannis, K. Papagiannaki and M. Faloutsos,“BLINC: Multilevel Traffic Classification in the Dark”,Proc. of ACM SIGCOMM, pp. 229-240, Philadelphia, PA,USA, August 2005.[4] K. Xu, Z. Zhang, and S. Bhattacharya. “Profiling InternetBackbone Traffic: Behavior Models and Applications”,Proc. of ACM SIGCOMM, pp. 169-180, Philadelphia, PA,USA, August 2005.[5] A. W. Moore and D. Zuev. “Internet Traffic ClassificationUsing Bayesian Analysis Techniques”, Proc. of ACMSIGMETRICS, pp. 50-60, Banff, Alberta, Canada, June,2005.[6] W. Lee, S.J. Stolfo, “A Framework for ConstructingFeatures and Models for Intrusion Detection Systems”,ACM Transactions on Information and System Security,Vol. 3 No. 4, November, 2000.[7] S. Abdulrahman “Network Intrusion Detection UsingFlow Characterization,” project description,http://www.cs.utk.edu/~abdulrah/project/paper.html[8] T. Dunigan, G. Ostrouchov, “Flow Characterization forIntrusion Detection”, Oak Ridge National Laboratoryreport, ORNL/TM-2001/115, November 2000, available athttp://www.csm.ornl.gov/~dunigan/pubs.html[9] Y. Zhang and V. Paxson, “Detecting Backdoors”, Proc. ofUSENIX Security Symposium, Denver, CO, USA, August2000.[10] F. Hernández-Campos, A. B. Nobel, F. Donelson Smith,K. Jeffay, “Understanding Patterns of TCP ConnectionUsage with Statistical Clustering”, Proc. of theSymposium on Modeling, Analysis, and Simulation ofComputer and Telecommunication Systems (MASCOTS),pp. 35-44, Atlanta, GA, USA, September 2005.[11] DARPA Intrusion Detection Evaluation, LincolnLaboratory, http://www.ll.mit.edu/IST/ideval/[12] Annie De Montigny-Leboeuf, “Flow Attributes For UseIn Traffic Characterization,” CRC Technical Note CRC-TN-2005-003, December 2005[13] C. Daicos, G.S. Knight, “Concerning Enterprise NetworkVulnerability To Http Tunnelling”, Proc. of IFIP TC1118th International Conference on Information Security(IFIP SEC 2003), Athens, Greece, May 2003.[14]Httptunnel, a HTTP tunnel tool, available athttp://www.nocrew.org/software/httptunnel.html[15] Httport, a HTTP tunnel tool, available athttp://www.htthost.com/642

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

Create successful ePaper yourself

Delete template?

Save as template?