Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ... Military Communications and Information Technology: A Trusted ...

22.01.2015 Views

404 Military Communications and Information Technology... it appears to be saturated, indicating a data transfer, or not, implying that data is being generated at a constant rate. We suggest estimating the maximum size of a packet for the respective link, e.g. by observing maximum packet sizes for flows between two systems, and check whether the mean packet size of a given flow converges towards it. This would indicate saturated packets as we expect to observe for a data transfer. While estimating the maximum capacity available to the flow for each attached system would be another valid metric, mismeasurements may occur due to several reasons. First, a system may communicate outside our field of view, i.e. its network connection may be saturated but we may not be able to observe that. Secondly, even though the effect may be questioned, many applications introduce rate limiting to improve quality of service, thus the observed flow could fail to saturate a correct estimate for the capacity of each system's network connection even when we are able to observe all flows for both systems. C. Multiple types of behaviour in a single flow The behaviour described in the previous section will provide a classification into data transfer, fixed rate data and varying rate data for a single direction of a network flow. Obviously, a flow, including a single direction of a flow, may carry data generated through different kinds of mechanisms. Consider e.g. an HTTPS connection where the first packets are used to establish a shared secret key, i.e. by a computationally expensive mechanism generating data at varying rates, followed by a or even several data transfers when HTTP pipelining is used. Thus, using a single label for the whole flow may misrepresent the nature of the flow. To correctly represent the nature of such flows, we introduce the notion of a subflow, i.e. a portion of a single direction of a netflow that fits into one of the above categories. We explicitly allow two succeeding subflows to belong to the same category, given that their nature changes in a way suggesting they were generated by a different mechanism or another instantiation of the same mechanism. This would apply for instance to the HTTP pipelining case mentioned above, where a time gap between two file transfers occurs, caused by the client-server interaction and the need for additional processing by the server for providing the second file. VI. Observations for botnet detection We want to exploit features that can be observed or derived from observations described in section V. Since the described feature space is very limited and reveals only basic properties of the communicating applications, we cannot expect a single observable feature to reveal the presence of a botnet client, nor do we expect that a single observation for a given feature will provide sufficient evidence for such a conclusion.

Chapter 4: Information Assurance & Cyber Defence 405 To sidestep this issue, we want to employ statistical methods developed in the field of sensor data fusion for analysing measurements from physical sensors. These methods require that we can not only define features that we expect to distinguish botnet traffic from other traffic, but also that we are able to describe and formalise the difference between the two. In this section, we describe three features that both provide evidence of botnet activity and can be described in regard to how the presence of a botnet client will affect the measurement for a given system. We expect that this list is not exhaustive, i.e. additional features exist which have the desired properties and can be used to discriminate between benign and infected systems. Identifying these will be part of our future work. First, we introduce a feature based on measuring the delay between two consecutive flows supposedly initiated by the same application. Following that, we discuss the failed flow count as an indicator for peer-to-peer based botnets in section VI.B and finally we describe how to interpret the volume of bytes transferred in a flow for our purposes in section VI.C. A. Inter-flow-initiation delay A given type of client application usually interacts with another class of applications, using an appropriate transport layer protocol. Typically, the latter application would be called the server counterpart for the application but may be identical in the case of peer-to-peer applications. Internet traffic is dominated by standardised protocols many of which use a registered TCP or UDP port for their server application. Thus, a given application usually interacts with servers listening on the same port. When adding the assumption that a system usually runs only one application for a given protocol, this becomes “two flow initiations with a given destination port and originating from a given IP address are usually generated by the same application running on the system identified by the address.” While this conclusion may not hold in some cases, particularly when several hosts that run a popular application share an IP address through NAT, the loose association provided may already be enough for our purpose. Based on the assumption described in the previous paragraph, we can measure the delay between two successive attempts of an application to contact another application. Again, we analyse the distribution function for the measured delays, if an application tries to initiate a flow in regular intervals, the distribution function will exhibit a local maximum at the configured interval value, i.e. the existence of a local maximum can be treated as an indicator for an automated process initiating flows. Note that we may not be able to measure the conspicuous intervals, if the malware’s flow initiations are mixed with a legitimate user’s. However, once the user ceases interaction with the application disguising the malware traffic for a sufficiently long time span, the interval can be observed. We would also like to point out that when only considering traditional netflows, this mechanism would

Chapter 4: <strong>Information</strong> Assurance & Cyber Defence<br />

405<br />

To sidestep this issue, we want to employ statistical methods developed<br />

in the field of sensor data fusion for analysing measurements from physical sensors.<br />

These methods require that we can not only define features that we expect to<br />

distinguish botnet traffic from other traffic, but also that we are able to describe<br />

<strong>and</strong> formalise the difference between the two. In this section, we describe three<br />

features that both provide evidence of botnet activity <strong>and</strong> can be described in regard<br />

to how the presence of a botnet client will affect the measurement for a given<br />

system. We expect that this list is not exhaustive, i.e. additional features exist which<br />

have the desired properties <strong>and</strong> can be used to discriminate between benign <strong>and</strong><br />

infected systems. Identifying these will be part of our future work.<br />

First, we introduce a feature based on measuring the delay between two<br />

consecutive flows supposedly initiated by the same application. Following that, we<br />

discuss the failed flow count as an indicator for peer-to-peer based botnets in section<br />

VI.B <strong>and</strong> finally we describe how to interpret the volume of bytes transferred<br />

in a flow for our purposes in section VI.C.<br />

A. Inter-flow-initiation delay<br />

A given type of client application usually interacts with another class of applications,<br />

using an appropriate transport layer protocol. Typically, the latter application<br />

would be called the server counterpart for the application but may be<br />

identical in the case of peer-to-peer applications. Internet traffic is dominated by<br />

st<strong>and</strong>ardised protocols many of which use a registered TCP or UDP port for their<br />

server application. Thus, a given application usually interacts with servers listening<br />

on the same port. When adding the assumption that a system usually runs only one<br />

application for a given protocol, this becomes “two flow initiations with a given<br />

destination port <strong>and</strong> originating from a given IP address are usually generated<br />

by the same application running on the system identified by the address.” While<br />

this conclusion may not hold in some cases, particularly when several hosts that<br />

run a popular application share an IP address through NAT, the loose association<br />

provided may already be enough for our purpose.<br />

Based on the assumption described in the previous paragraph, we can measure<br />

the delay between two successive attempts of an application to contact another<br />

application. Again, we analyse the distribution function for the measured delays,<br />

if an application tries to initiate a flow in regular intervals, the distribution function<br />

will exhibit a local maximum at the configured interval value, i.e. the existence<br />

of a local maximum can be treated as an indicator for an automated process<br />

initiating flows. Note that we may not be able to measure the conspicuous intervals,<br />

if the malware’s flow initiations are mixed with a legitimate user’s. However, once<br />

the user ceases interaction with the application disguising the malware traffic for<br />

a sufficiently long time span, the interval can be observed. We would also like to<br />

point out that when only considering traditional netflows, this mechanism would

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!