Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ... Military Communications and Information Technology: A Trusted ...

22.01.2015 Views

396 Military Communications and Information Technology... In this paper, we identify cornerstones of the protocol design for future botnets. Besides using peer-to-peer-based mechanisms to avoid a single point of failure, they will employ cryptographic methods that are also used in many legitimate applications. Particularly, their command and control channel will use strong encryption and integrity checks to prevent reading or altering messages in transit and authentication for commands and updates. As a side effect, messages will no longer be available to network intrusion detection systems that rely on deep packet inspection, i.e. analyse packet payloads to detect the presence of malicious applications. Since this is the main mode of operation for most deployed network intrusion detection systems, we also analyse which properties cannot be obscured by these methods and explore how they can be used to achieve botnet detection in the future. The rest of this paper is organised as follows. In section II, we provide the definitions for netflows and botnets as a base for the following elaboration. The next section briefly discusses related approaches, followed by our analysis of future botnet designs in section IV. Section V provides the background for the detection of botnets by measuring features described in section VI. We then briefly summarise the host models required for our projected approach and provide measurement results for the named features. In sections IX and X, we provide an outlook on future work and summarise the conclusions derived in this paper. II. Background A. Netflows Typically, network protocols are developed following the OSI layer model, encapsulating higher level protocols in the payload section of the next lower layer’s protocol. In inter-networking, OSI layers 3 and 4 are of particular concern, where the former is responsible for transferring data between hosts in different networks and the latter provides services such as error correction or packet reordering to applications on those hosts. Nowadays, the only wide-spread implementations for layer 3 are the IP protocol versions 4 and and 6 (IPv4 and IPv6, respectively) and layer 4 is dominated by the TCP and UDP protocols. Applications using the latter protocols are identified by a 16 bit integer (or port), i.e. a tuple (IP address, type, port) identifies an endpoint that a particular application instance on a particular host may send or receive data at. Given two applications A and B communicating through a network, the conversation can be identified by a combined tuple (IP A , port A , type, port B , IP B ). Such a conversation is called a “netflow” or a flow for short. B. Botnets For the purpose of this paper, we define a botnet as a malware with access to a command and control (C 2 ) channel allowing a group or an individual to is-

Chapter 4: Information Assurance & Cyber Defence 397 sue commands to an infected system. While such a channel could use a different medium in theory, we further narrow this definition down to such botnets where the C 2 channel is implemented using the Internet or a similar wide area network. This is the case for all botnets deployed for commercial purposes and, while apparently designed to bridge an air-gapped system, even the Stuxnet malware provided an Internet-based C 2 channel [2]. We will use the term bot herder when referring to the group or individual controlling a botnet, without any further implications on how or why the herder acquired control over the botnet. III. Related literature Detecting botnets can be considered a special case of network-based intrusion detection. The most prominent examples in that field are Bro, first presented in [3], and Snort [4]. While allowing different levels of complexity for defining signatures, both are focused on discovering known malicious packet payloads described by the user. To some extent, an administrator with deep insight into the environment and applications she or he supervises may define signatures that describe abusive behaviour but generally this technique can only be used when the payload generated by a particular piece of malicious software is known to the user. [5] alleviates this requirement by introducing a system that is able to generate signatures from malware communication patterns learned from repeatedly executing a sample in a secure environment. However, in order to be able to generate a signature, an infection has to be detected and a sample of the malware be obtained first. Gu et al. follow a different approach [6], collecting data for each system in two domains, one for netflow data and another one for malicious activities. They then cluster data in each of these domains individually and treat co-occurrences of hosts in activity and netflow clusters as an indicator for those hosts being part of a botnet. While this eliminates the requirement for obtaining a sample for a malware, obtaining data for malicious activity requires the ability to detect such activity. I.e. while their approach shifts the focus, it will still work only when the attacks a botnet will carry out have been analysed and described appropriately before. The authors of [7] introduce an approach which measures several features for each observed flow. Based on their assumption that these features are normally distributed, they are able to assign an anomaly score to a measurement and visualise the expectation and actual measurement for a system. In contrast to the approaches described above, this does not require any knowledge of a malware that should be detected, but requires that both the distribution for an observed feature is Gaussian and that it will be affected by the malware’s traffic. Thus, feature selection is a critical element, as underlined by the author’s statement that for features with a distribution not fit well by a Gaussian curve, the accuracy of their approach was not satisfying.

Chapter 4: <strong>Information</strong> Assurance & Cyber Defence<br />

397<br />

sue comm<strong>and</strong>s to an infected system. While such a channel could use a different<br />

medium in theory, we further narrow this definition down to such botnets where<br />

the C 2 channel is implemented using the Internet or a similar wide area network.<br />

This is the case for all botnets deployed for commercial purposes <strong>and</strong>, while apparently<br />

designed to bridge an air-gapped system, even the Stuxnet malware provided<br />

an Internet-based C 2 channel [2]. We will use the term bot herder when referring<br />

to the group or individual controlling a botnet, without any further implications<br />

on how or why the herder acquired control over the botnet.<br />

III. Related literature<br />

Detecting botnets can be considered a special case of network-based intrusion<br />

detection. The most prominent examples in that field are Bro, first presented<br />

in [3], <strong>and</strong> Snort [4]. While allowing different levels of complexity for defining<br />

signatures, both are focused on discovering known malicious packet payloads<br />

described by the user. To some extent, an administrator with deep insight into<br />

the environment <strong>and</strong> applications she or he supervises may define signatures that<br />

describe abusive behaviour but generally this technique can only be used when<br />

the payload generated by a particular piece of malicious software is known to<br />

the user. [5] alleviates this requirement by introducing a system that is able to<br />

generate signatures from malware communication patterns learned from repeatedly<br />

executing a sample in a secure environment. However, in order to be able to<br />

generate a signature, an infection has to be detected <strong>and</strong> a sample of the malware<br />

be obtained first.<br />

Gu et al. follow a different approach [6], collecting data for each system in two<br />

domains, one for netflow data <strong>and</strong> another one for malicious activities. They then<br />

cluster data in each of these domains individually <strong>and</strong> treat co-occurrences of hosts<br />

in activity <strong>and</strong> netflow clusters as an indicator for those hosts being part of a botnet.<br />

While this eliminates the requirement for obtaining a sample for a malware, obtaining<br />

data for malicious activity requires the ability to detect such activity. I.e. while<br />

their approach shifts the focus, it will still work only when the attacks a botnet will<br />

carry out have been analysed <strong>and</strong> described appropriately before.<br />

The authors of [7] introduce an approach which measures several features<br />

for each observed flow. Based on their assumption that these features are normally<br />

distributed, they are able to assign an anomaly score to a measurement <strong>and</strong> visualise<br />

the expectation <strong>and</strong> actual measurement for a system. In contrast to the approaches<br />

described above, this does not require any knowledge of a malware that<br />

should be detected, but requires that both the distribution for an observed feature<br />

is Gaussian <strong>and</strong> that it will be affected by the malware’s traffic. Thus, feature selection<br />

is a critical element, as underlined by the author’s statement that for features<br />

with a distribution not fit well by a Gaussian curve, the accuracy of their approach<br />

was not satisfying.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!