12.07.2015 Views

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

NETWORK TRAFFIC FLOW ANALYSIS - NM Lab at Korea Univ.

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>NETWORK</strong> <strong>TRAFFIC</strong> <strong>FLOW</strong> <strong>ANALYSIS</strong>Annie De Montigny-LeboeufCommunic<strong>at</strong>ions Research Centre (CRC)An Agency of Industry Canadaemail: annie.demontigny@crc.caTim SymchychCommunic<strong>at</strong>ions Research Centre (CRC)An Agency of Industry Canadaemail: tim.symchych@crc.caAbstractThousands of diverse applic<strong>at</strong>ions and services flow dailyover networks used by governments, industry, and priv<strong>at</strong>eusers. Attacks can be hidden within these inform<strong>at</strong>ion flows bydisguising malicious network traffic to appear to belegitim<strong>at</strong>e. Generally, TCP or UDP based protocols can bemapped to specific network services. However, intruders dohide unauthorized activity by using non-standard protocols orstandard protocols in non-standard ways to avoid detection.This paper describes current work and future directionsth<strong>at</strong> the Network Security Research Group <strong>at</strong> theCommunic<strong>at</strong>ion Research Centre (CRC) will take to identifyflows of inform<strong>at</strong>ion th<strong>at</strong> disguise <strong>at</strong>tacks. Researchchallenges include uncovering unauthorized activities in highspeed,high-volume network links and within protocols th<strong>at</strong>are intended to obscure the details of the inform<strong>at</strong>ion carried.Keywords: traffic flow analysis; network security, trafficclassific<strong>at</strong>ion.1. IntroductionThe Network Security Research Group <strong>at</strong> theCommunic<strong>at</strong>ions Research Centre (CRC) Canada conductsresearch in technologies and techniques to advance the currentst<strong>at</strong>e of network security. We are proposing an approach fortraffic characteriz<strong>at</strong>ion through the use of meaningful flow<strong>at</strong>tributes th<strong>at</strong> do not rely on access to payload nor depend onIP port numbers.The approach taken does not require a priori knowledge ofthe protocols in use on a network, nor require th<strong>at</strong> the payloadof packets be visible to the monitor. To identify sign<strong>at</strong>ure-likefe<strong>at</strong>ures of common applic<strong>at</strong>ions and services, algorithmswere developed based on lightweight characteriz<strong>at</strong>ion metrics.These metrics have discrimin<strong>at</strong>ive power and also provideinsight into the traffic behaviour to help an analyst investig<strong>at</strong>esuspicious flows. The analysis is confined to headers <strong>at</strong> thenetwork, and transport layers, thus the analysis does notdepend on access to applic<strong>at</strong>ion d<strong>at</strong>a.2. Intruder and InsiderIntrusion detection systems (IDS) and the more recentintrusion prevention systems (IPS) are still considered the firstline of defence used to identify <strong>at</strong>tacks on networkedapplic<strong>at</strong>ions, network services and infrastructure. However, ifan <strong>at</strong>tack were successful, it is imper<strong>at</strong>ive th<strong>at</strong> the afterm<strong>at</strong>h ofthe <strong>at</strong>tack, including ongoing malicious activity such asdisguised “tunnels”, be detected.A second important type of malicious activity th<strong>at</strong> must bedetected within the protective layers, even if there is nosuccessful <strong>at</strong>tack, is the activity of the inside user who may beauthorized to use the network, but chooses to use unauthorizedapplic<strong>at</strong>ions and services, thus contravening organiz<strong>at</strong>ionalsecurity policy.Although the motiv<strong>at</strong>ion and methods of the maliciousintruder and authorized insider may differ, the result of theseactivities is th<strong>at</strong> the organiz<strong>at</strong>ion is <strong>at</strong> risk. The research workdescribed herein is an <strong>at</strong>tempt to identify and mitig<strong>at</strong>e th<strong>at</strong>risk.3. Classific<strong>at</strong>ion of Network TrafficClassifying network traffic according to the applic<strong>at</strong>ionsproducing the traffic is an important aspect of networkmonitoring. However developing a dependable method forclassifying flows is a difficult problem th<strong>at</strong> requires extensiveresearch. We no longer can reliably identify applic<strong>at</strong>ions basedon the port numbers now th<strong>at</strong> a growing number ofapplic<strong>at</strong>ions have the ability to disguise their activity throughthe use of arbitrary ports. While payload analysis is a possibleapproach, it can be resource intensive if exhaustive payloadexamin<strong>at</strong>ion is performed; or easily defe<strong>at</strong>ed if only minimaldecoding is done. Given the shortcomings of using knownports or payload analysis to identify flows, it is clear th<strong>at</strong>another approach is needed.4.1. Flow Specific<strong>at</strong>ion4. The Notion of FlowAs part of this research work, we examined bidirectionalTCP and UDP flows. A number of the <strong>at</strong>tributes we derivedare measured in each direction separ<strong>at</strong>ely. Flows wereidentified using a 5-tuple key, defined by the IP protocol, IPaddresses of the Origin<strong>at</strong>or and the Responder, and the twoTCP/UDP port numbers involved. The Origin<strong>at</strong>or is the senderof the first packet captured from a flow. For simplicity, wechose a fixed inactivity timeout of 60 seconds to termin<strong>at</strong>eflows. Thus, a flow termin<strong>at</strong>es if 60 seconds has elapsed sincethe last packet belonging to th<strong>at</strong> flow was captured.1-4244-0038-4 2006IEEE CCECE/CCGEI, Ottawa, May 2006639


4.2. Flow AttributesFlow <strong>at</strong>tributes are used to describe a flow. In the relevantresearch liter<strong>at</strong>ure, flow <strong>at</strong>tributes are often called fe<strong>at</strong>ures, orcharacteristics. They can be values from the fields in headersof packets. They can be counters (total bytes, total packets,etc.) or summary <strong>at</strong>tributes such as average, median, andvariance. They can also be discrete distribution <strong>at</strong>tributes,which estim<strong>at</strong>e the probability repartition of certain variables.Discrete distribution <strong>at</strong>tributes are often useful to observep<strong>at</strong>terns otherwise missed by simple st<strong>at</strong>istics.5. Rel<strong>at</strong>ed WorkThe pressing need for altern<strong>at</strong>ives to correctly identifynetwork activities has <strong>at</strong>tracted <strong>at</strong>tention in the researchcommunity. Some novel approaches are being proposed torecognize the traffic based on its behaviour [1][2][3][4][5].However the classific<strong>at</strong>ion methods proposed take as inputbasic flow fe<strong>at</strong>ures (e.g. average packet size, flow dur<strong>at</strong>ion,recurring use of addresses/ports). While such approaches havethe advantage of using inform<strong>at</strong>ion th<strong>at</strong> current flow collectorsprovide, we argue th<strong>at</strong> it is necessary to continue to search forother flow fe<strong>at</strong>ures th<strong>at</strong> better characterize traffic.Found<strong>at</strong>ion research work in identifying discrimin<strong>at</strong>ive flowfe<strong>at</strong>ures has been documented in [6] [7] [8] [9], and morerecently [10]. Lee and Stolfo [6] analyzed the DARPA d<strong>at</strong>a[11], and identified 41 <strong>at</strong>tributes of interest to NetworkIntrusion Detection System (NIDS) technologies. Dunigan etal. [7] [8] proposed a multidimensional “binning” process tosort the packets, and applied multivari<strong>at</strong>e analysis to reducethe flow <strong>at</strong>tributes to the three “bins” th<strong>at</strong> showed the gre<strong>at</strong>estvari<strong>at</strong>ion among all known flow types. Paxson and Zhang [9]developed a set of heuristics to identify keystroke-interactiveconnections by testing packet size, timing, and directionalityagainst preset criteria.Lastly, in parallel to our work, Hernández-Campos et al.[10] proposed an approach to cluster traffic flows based on aset of st<strong>at</strong>istical <strong>at</strong>tributes. The novelty in their approach is notin the flow <strong>at</strong>tributes themselves but r<strong>at</strong>her on the use of a unitof d<strong>at</strong>a which is different from a “packet”. The unit, called theApplic<strong>at</strong>ion-D<strong>at</strong>a Unit (ADU), may contain several packets.Instead of modeling the p<strong>at</strong>terns of packet exchanges, theymodel the p<strong>at</strong>terns of ADU exchanges.6. Our Technical ApproachIn this work we explored discrimin<strong>at</strong>ive flow fe<strong>at</strong>ures th<strong>at</strong>portray essential communic<strong>at</strong>ion dynamics, based solely oninform<strong>at</strong>ion th<strong>at</strong> can be g<strong>at</strong>hered from monitoring packetheaders. We developed a proof of concept tool based on theseindic<strong>at</strong>ors to help identify the network activities, and if trafficis not recognized, then to provide useful insight into the trafficbehaviour. This emphasis on insight into traffic behaviourdistinguishes this work from rel<strong>at</strong>ed work in detecting known<strong>at</strong>tacks and vari<strong>at</strong>ions using intrusion detection techniques.The approach focuses on lightweight characteriz<strong>at</strong>ionmetrics. The analysis is confined to headers <strong>at</strong> the network andtransport layers (IP and TCP/UDP). The methodology can beviewed as a three step process as outlined:1) Packets are grouped into flows. Each flow isidentified by a 5-tuple defined by the IP protocol,IP addresses of the Origin<strong>at</strong>or and the Responder,and the two TCP/UDP ports numbers involved.2) Characteristics (fe<strong>at</strong>ures) are measured on eachflow. The output is a set of flow records in whichall flow records are summarized with the same setof flow <strong>at</strong>tributes.3) Flows are Recognized and Described. Based onthe characteristics obtained during step 2, we tageach flow with two properties: the applic<strong>at</strong>ionrecognized (if any) and a Flow Description basedon the traffic behaviour.Our main contribution to the research is in step 2. In totalwe use about 40 flow <strong>at</strong>tributes by which different types ofapplic<strong>at</strong>ions can be distinguished. A technical report [12] wasprepared in which we describe all flow fe<strong>at</strong>ures in detailsalong with the metrics to quantify the values.The process of developing our flow fe<strong>at</strong>ures was gre<strong>at</strong>lyinspired by the work of Paxson and Zhang [9] for detectinginteractivity (human control). With the exception of a fewminor differences, the interactive indic<strong>at</strong>ors we use areessentially those of Paxson and Zhang [9]. We however derivetwo distinct classes of human-driven packet transmission:keystroke transmission and command-line transmission.Command-line transmissions are larger in size and aresepar<strong>at</strong>ed by longer delays than keystrokes. The distinctionbetween command-line and keystroke interactivity helps refinethe classific<strong>at</strong>ion process a step further. FTP command forinstance, can be distinguished from interactive SSH andTELNET sessions; and it is foreseen th<strong>at</strong> ch<strong>at</strong> sessions will beclassed differently depending on the “flavour”. We contributefurther by developing heuristics th<strong>at</strong> capture other distinctivecharacteristics such as convers<strong>at</strong>ion, transaction, d<strong>at</strong>a transfer,as well as by deriving sign<strong>at</strong>ures from observable p<strong>at</strong>terns.The goal in defining flow <strong>at</strong>tributes is to identify not onlythe relevant characteristics but also the proper way to measurethem. In particular our findings indic<strong>at</strong>e th<strong>at</strong> while averagepacket size offers little discrimin<strong>at</strong>ive power whendistinguishing among network applic<strong>at</strong>ions, characterizingpacket size using a discrete distribution allows us to observedistinctive p<strong>at</strong>terns such as the existence of a minimumpayload size per packet due to applic<strong>at</strong>ion header length; highfrequency of packets of special sizes due to applic<strong>at</strong>ionnegoti<strong>at</strong>ion mechanisms; and gaps in the distribution rangedue to applic<strong>at</strong>ion preferential packet sizes.Another observ<strong>at</strong>ion we made is th<strong>at</strong> p<strong>at</strong>terns in the packetdirection dynamics stand out more clearly when we remove,from the sequence of packets, those th<strong>at</strong> contain no payload.Over an established TCP connection, these TCP packets aretransmitted to simply acknowledge having received d<strong>at</strong>a. This640


observ<strong>at</strong>ion was used when deriving heuristics to quantifytransaction and convers<strong>at</strong>ion episodes and when derivingsign<strong>at</strong>ures of directionality in the beginning ofcommunic<strong>at</strong>ions.We showed with step 3 of our approach th<strong>at</strong> when flow<strong>at</strong>tributes are meaningful and discrimin<strong>at</strong>ive, lightweight rulesets can be defined to classify flows. While broad classes oftraffic can be defined based on our flow <strong>at</strong>tributes, we findth<strong>at</strong> the discrimin<strong>at</strong>ive power of those fe<strong>at</strong>ures is strongenough to distinguish among similar traffic flows (e.g.distinguish among e-mail protocols). To demonstr<strong>at</strong>e this wehave developed simple rule sets to distinguish amongcommonly used protocols such as FTPd<strong>at</strong>a, HTTP, HTTPS,IMAP, POP, SMTP, FTPcontrol, RLOGIN, SSH, andTELNET. In this initial study, the flow fe<strong>at</strong>ures selected toderive the profiles were chosen “manually”. Th<strong>at</strong> is theselection was done based on knowledge of the protocols; andthe thresholds have been chosen from analyzing samples offlows collected from our test environment.Because we started with discrimin<strong>at</strong>ive fe<strong>at</strong>ures, onlyminimal effort was required to derive the profiles. Therecognition in step 3 is typically based on “give-away”fe<strong>at</strong>ures (e.g. which of the Origin<strong>at</strong>or and the Responder sendsthe first non-empty packet) while the description, whichprovides insight about the n<strong>at</strong>ure of the communic<strong>at</strong>ion, isbased on “behavioural” fe<strong>at</strong>ures (e.g. indic<strong>at</strong>ors ofinteractivity, convers<strong>at</strong>ion, transaction). A given applic<strong>at</strong>ionmay receive different descriptions depending on its use. Inparticular, ssh may in some cases be used as an interactivecontrol applic<strong>at</strong>ion, and in other cases may be called from ascript to perform a file transfer. The outcome of step 3 in thessh example would be to mark the flow as the SSH protocoland provide a description th<strong>at</strong> would help an analyst determinehow ssh was used. The flow descriptor is particularly useful inproviding insights about unrecognized networked applic<strong>at</strong>ions.For instance, a flow described as “persistent, bidirectional,command-line interactive, convers<strong>at</strong>ional” could indic<strong>at</strong>e ach<strong>at</strong> session.7. Current St<strong>at</strong>eA prototype system is in an early stage of development,where the focus is on developing the metrics to achievereliable and meaningful results from examin<strong>at</strong>ion of networkflow parameters.To evalu<strong>at</strong>e early in the research process the reliability ofour approach, we have tested the flow recognition capabilityof the tool on traffic traces collected from a campus researchnetwork and found the results quite encouraging. Theevalu<strong>at</strong>ion experiment provided some guidelines fordeveloping profiles of other types of network applic<strong>at</strong>ions andrefining those previously developed.We have also tested the prototype against a number ofsubverting tools th<strong>at</strong> deliber<strong>at</strong>ely masquerade the traffic toappear as HTTP [13][14][15]. Experiments with these tools indifferent scenarios such as ch<strong>at</strong> sessions, remote controlsessions, file transfers, and e-mail, showed th<strong>at</strong> the HTTPdisguise failed in almost all cases.8. Challenges to Research ProgressChallenges for this research work are not insignificant.Recognizing malicious traffic in high-speed, high-volumenetworks and within protocols th<strong>at</strong> obscure the details of theinform<strong>at</strong>ion carried are the primary challenges. While offlinetools are appropri<strong>at</strong>e for examining captured traffic in forensicor similar contexts, management of oper<strong>at</strong>ional networksrequires tools and techniques th<strong>at</strong> cannot be overcome by thevolume of traffic seen by the tool in real-time.It is often difficult to obtain research and oper<strong>at</strong>ional d<strong>at</strong><strong>at</strong>h<strong>at</strong> can be used for analysis and testing algorithms. Good“clean” d<strong>at</strong>a, as well as d<strong>at</strong>a containing malicious traffic isessential to further this and rel<strong>at</strong>ed work.9. Future WorkOur focus so far has been placed on identifying flow<strong>at</strong>tributes th<strong>at</strong> are useful in characterizing network traffic.Much remains to be done. In particular we plan to examine anumber of applic<strong>at</strong>ions not yet considered (e.g. VoIP, peer topeer, gaming traffic, networked applic<strong>at</strong>ions using webservices); refine the profiles for the protocols studied so far;identify broader classes of traffic in order to characterize awide range of network services currently used on the Internet;identify a small subset of the most important flow fe<strong>at</strong>ures inclassifying traffic without jeopardizing the accuracy; andexamine the possibility of estim<strong>at</strong>ing the flow <strong>at</strong>tributes innear real-time as opposed to measuring them over aconnection’s total lifetime. We will undertake these tasks infuture work and have recently initi<strong>at</strong>ed a rel<strong>at</strong>ed project withVirtual Priv<strong>at</strong>e Networks (VPN) for which one of the goals isto determine how the encryption layer alters the characteristicsmeasured by the flow fe<strong>at</strong>ures. The study will focus on trafficproduced by commonly available business grade VPNequipment.10. ConclusionsA majority of the currently available tools th<strong>at</strong> provideinform<strong>at</strong>ion on network usage rely on well-known portnumbers to identify the network services. Some of thesemonitoring tools also use protocol-aware mechanisms basedon payload decoding. The authors believe th<strong>at</strong> the practice ofcloaking <strong>at</strong>tacks to appear as innocuous types of applic<strong>at</strong>ionswill gre<strong>at</strong>ly acceler<strong>at</strong>e. Mechanisms th<strong>at</strong> infer the true n<strong>at</strong>ureof inform<strong>at</strong>ion flows based on traffic behaviour can be use toincrease the level of confidence in the monitoring tools forsecurity practitioners and researchers.The flow <strong>at</strong>tributes developed by our research work havediscrimin<strong>at</strong>ive power and provide insight into the networkactivity. When flow <strong>at</strong>tributes are well defined, lightweightrule sets based on these <strong>at</strong>tributes can be defined to classify641


flows. In this work, much of the effort so far has beenconcentr<strong>at</strong>ed on identifying meaningful flow <strong>at</strong>tributes.Preliminary assessment indic<strong>at</strong>es the proof-of-concept toolis useful as is, and may lead, with further research, to anumber of applic<strong>at</strong>ions.Mechanisms th<strong>at</strong> allow us to infer the true n<strong>at</strong>ure ofinform<strong>at</strong>ion flows based on traffic behaviour can be used toincrease the level of confidence in the monitoring tools and inour networks. We believe th<strong>at</strong> the flow fe<strong>at</strong>ures derived duringthis study will prove to be useful to other researchers in thefield.References[1] J. P. Early, C. E. Brodley, C. Rosenberg, “BehavioralAuthentic<strong>at</strong>ion of Server Flows”, Proc. of the AnnualComputer Security Applic<strong>at</strong>ions Conference (ACSAC2003), Las Vagas, NV, USA, December 2003.[2] M. Roughan, S. Sen, O. Sp<strong>at</strong>scheck, N. G. Duffield,“Class-of-service mapping for QoS: a st<strong>at</strong>istical sign<strong>at</strong>urebasedapproach to IP traffic classific<strong>at</strong>ion”, Proc of theConference on internet Measurement (IMC 04), pp. 135-148, Taormina, Sicily, Italy, October 2004.[3] T. Karagiannis, K. Papagiannaki and M. Faloutsos,“BLINC: Multilevel Traffic Classific<strong>at</strong>ion in the Dark”,Proc. of ACM SIGCOMM, pp. 229-240, Philadelphia, PA,USA, August 2005.[4] K. Xu, Z. Zhang, and S. Bh<strong>at</strong>tacharya. “Profiling InternetBackbone Traffic: Behavior Models and Applic<strong>at</strong>ions”,Proc. of ACM SIGCOMM, pp. 169-180, Philadelphia, PA,USA, August 2005.[5] A. W. Moore and D. Zuev. “Internet Traffic Classific<strong>at</strong>ionUsing Bayesian Analysis Techniques”, Proc. of ACMSIGMETRICS, pp. 50-60, Banff, Alberta, Canada, June,2005.[6] W. Lee, S.J. Stolfo, “A Framework for ConstructingFe<strong>at</strong>ures and Models for Intrusion Detection Systems”,ACM Transactions on Inform<strong>at</strong>ion and System Security,Vol. 3 No. 4, November, 2000.[7] S. Abdulrahman “Network Intrusion Detection UsingFlow Characteriz<strong>at</strong>ion,” project description,http://www.cs.utk.edu/~abdulrah/project/paper.html[8] T. Dunigan, G. Ostrouchov, “Flow Characteriz<strong>at</strong>ion forIntrusion Detection”, Oak Ridge N<strong>at</strong>ional <strong>Lab</strong>or<strong>at</strong>oryreport, ORNL/TM-2001/115, November 2000, available <strong>at</strong>http://www.csm.ornl.gov/~dunigan/pubs.html[9] Y. Zhang and V. Paxson, “Detecting Backdoors”, Proc. ofUSENIX Security Symposium, Denver, CO, USA, August2000.[10] F. Hernández-Campos, A. B. Nobel, F. Donelson Smith,K. Jeffay, “Understanding P<strong>at</strong>terns of TCP ConnectionUsage with St<strong>at</strong>istical Clustering”, Proc. of theSymposium on Modeling, Analysis, and Simul<strong>at</strong>ion ofComputer and Telecommunic<strong>at</strong>ion Systems (MASCOTS),pp. 35-44, Atlanta, GA, USA, September 2005.[11] DARPA Intrusion Detection Evalu<strong>at</strong>ion, Lincoln<strong>Lab</strong>or<strong>at</strong>ory, http://www.ll.mit.edu/IST/ideval/[12] Annie De Montigny-Leboeuf, “Flow Attributes For UseIn Traffic Characteriz<strong>at</strong>ion,” CRC Technical Note CRC-TN-2005-003, December 2005[13] C. Daicos, G.S. Knight, “Concerning Enterprise NetworkVulnerability To Http Tunnelling”, Proc. of IFIP TC1118th Intern<strong>at</strong>ional Conference on Inform<strong>at</strong>ion Security(IFIP SEC 2003), Athens, Greece, May 2003.[14]Httptunnel, a HTTP tunnel tool, available <strong>at</strong>http://www.nocrew.org/software/httptunnel.html[15] Httport, a HTTP tunnel tool, available <strong>at</strong>http://www.htthost.com/642

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!