Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ... Military Communications and Information Technology: A Trusted ...

22.01.2015 Views

408 Military Communications and Information Technology... the measurements for our features concerning legitimate servers. To distinguish between servers and clients, we can also exploit their sinkhole structure in terms of the network flow direction. C. Infected The malicious behaviour of an infected host can possibly only be observed in interference with benign user actions. However, we assume that the malware activities have a significant impact on the features described above. As the malicious activities are machine-controlled, they might be assumed to occur in timely regular patterns [20]. This implies needle-like bandwiths in the frequency domain. Again, the same holds for the flow volume. This is due to the fact that the number of possible actions of a malware is very limited [19]. VIII. First results In the preceding sections, we described the foundations for a system that distinguishes between regular client or server systems and systems that have been infected with a botnet client. Our prototype implementation does not cover all of these concepts yet but provides a base for estimating whether our assumptions were correct. The Waikato traces briefly described in section VIII.A provide our baseline for network traffic predominantly generated by benign systems. In the section thereafter we describe our method for obtaining a network trace from a live botnet client by executing it in a secure environment. Finally, we discuss the feature distributions observed for these traces in section VIII.C. A. Waikato We use network traces provided by the Waikato Internet Traffic Storage (WITS) of the University of Waikato, New Zealand as a baseline for “normal” traffic. The traces were captured at the university’s Internet exchange, between June and September 2007. They do not include layer 4 payloads and IP addresses have been anonymised by XOR’ing them with a key which would be changed once every week. Please refer to the WITS website [21] for additional details on the trace files. Since considering the whole dataset would be unlikely to provide any insight beyond that provided by a well-chosen sample, we selected two trace files from consecutive weekdays, the 4th and 5th of September 2007 for this evaluation. We could not use traces from different weeks since the anonymisation may result in systems with different roles being mapped to the same address, possibly distorting measurements. Also, anonymisation prevents the use of legacy methods for detecting malicious traffic. Thus, we cannot ensure that the traffic observed in these traces does

Chapter 4: Information Assurance & Cyber Defence 409 not include any traffic generated by malware. To select a subset which should at least be clearly dominated by user interaction, we only consider flows initiated towards a server listening on TCP port 80 (HTTP). While some of the respective flows may have been initiated by a malware, we suspect that a very dominant majority corresponds to legitimate use of the HTTP protocol. When analysing this subset, we noticed that a single IP address (anonymised to 249.5.77.77) would contribute almost 57% of the respective measurements. Our best guess is that this address refers to a proxy server, i.e. a system relaying HTTP requests for an unknown number of end users. While the respective measurements neatly fit our expectations for the features described earlier, we decided to exclude them from the results presented in section VIII.C to avoid introducing the skew of the distribution caused by such a system, particularly with respect to the interflow-initiation feature. B. Miner Evaluating our feature set for actual malware turned out to be a difficult task. To achieve an acceptable level of significance, we would have to run a malware for a prolonged time span. Doing so while giving the malware full access to the Internet would be unethical and could result in liability for damages. We thus use a setup where a malware runs in a virtual machine without access to the Internet. To be able to observe C 2 traffic, we had to provide the malware with a peer or peers that it can interact with. This implied reverse-engineering the malware’s C 2 protocol, a very labour- and thus time-consuming task on its own. Therefore, we rely on our colleagues’ implementation of the reverse engineered Miner botnet C 2 protocol. In our setup, this implementation would run in one virtual machine, providing the interfaces described in [19] to a second virtual machine infected with the Miner botnet client. The Miner uses a list of bootstrapping IPs and an additional list of peers for a peer-to-peer component. For our setup, we ensured that each address in the first list would be available, providing data to the botnet client, including a modified version of the second list. We generated the latter list such that each entry would be selected from a pool of reachable addresses with probability 1 / 3 and from another pool of unreachable addresses otherwise. This is a significant improvement over the 23% of responding hosts in Miner peer lists determined in the wild. Since the Miner client scans the peer list linearly, we randomised the order of addresses in the peer list to avoid any bias. The results we present below were obtained by sniffing on the virtual network link between the two virtual machines for 24 hours. We started with an uninfected system but initiated an infection right after starting to listen on the virtual link. Other than for the infection, no user interaction occurred. With just a single malware to verify our observations against, we cannot derive conclusions regarding the generality of our approach, yet. However, it allows us to

408 <strong>Military</strong> <strong>Communications</strong> <strong>and</strong> <strong>Information</strong> <strong>Technology</strong>...<br />

the measurements for our features concerning legitimate servers. To distinguish<br />

between servers <strong>and</strong> clients, we can also exploit their sinkhole structure in terms<br />

of the network flow direction.<br />

C. Infected<br />

The malicious behaviour of an infected host can possibly only be observed<br />

in interference with benign user actions. However, we assume that the malware<br />

activities have a significant impact on the features described above. As the malicious<br />

activities are machine-controlled, they might be assumed to occur in timely<br />

regular patterns [20]. This implies needle-like b<strong>and</strong>withs in the frequency domain.<br />

Again, the same holds for the flow volume. This is due to the fact that the number<br />

of possible actions of a malware is very limited [19].<br />

VIII. First results<br />

In the preceding sections, we described the foundations for a system that distinguishes<br />

between regular client or server systems <strong>and</strong> systems that have been infected<br />

with a botnet client. Our prototype implementation does not cover all of these concepts<br />

yet but provides a base for estimating whether our assumptions were correct.<br />

The Waikato traces briefly described in section VIII.A provide our baseline for<br />

network traffic predominantly generated by benign systems. In the section thereafter<br />

we describe our method for obtaining a network trace from a live botnet client by<br />

executing it in a secure environment. Finally, we discuss the feature distributions<br />

observed for these traces in section VIII.C.<br />

A. Waikato<br />

We use network traces provided by the Waikato Internet Traffic Storage<br />

(WITS) of the University of Waikato, New Zeal<strong>and</strong> as a baseline for “normal” traffic.<br />

The traces were captured at the university’s Internet exchange, between June<br />

<strong>and</strong> September 2007. They do not include layer 4 payloads <strong>and</strong> IP addresses have<br />

been anonymised by XOR’ing them with a key which would be changed once every<br />

week. Please refer to the WITS website [21] for additional details on the trace files.<br />

Since considering the whole dataset would be unlikely to provide any insight<br />

beyond that provided by a well-chosen sample, we selected two trace files from<br />

consecutive weekdays, the 4th <strong>and</strong> 5th of September 2007 for this evaluation. We<br />

could not use traces from different weeks since the anonymisation may result<br />

in systems with different roles being mapped to the same address, possibly distorting<br />

measurements.<br />

Also, anonymisation prevents the use of legacy methods for detecting malicious<br />

traffic. Thus, we cannot ensure that the traffic observed in these traces does

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!