01.07.2014 Views

name-collision-02aug13-en

name-collision-02aug13-en

name-collision-02aug13-en

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

However this argum<strong>en</strong>t overlooks what is practical. Firstly, it is unlikely that all of the<br />

participating root server operators could contribute one week or one month of pcap data to<br />

OARC. Some would not. So the data for a longer data gathering interval would probably come<br />

from a smaller number of RSOs, raising further concerns about the “complet<strong>en</strong>ess” of the data<br />

set. Ev<strong>en</strong> if all of the RSOs were able to supply data over a longer interval, it could take weeks or<br />

ev<strong>en</strong> months to get those pcaps to DNS-OARC. There would also be the obvious capacity<br />

problems for DNS-OARC in storing hundreds of terabytes, perhaps petabytes, of what would<br />

th<strong>en</strong> become a Month in the Life of the Internet exercise. Finally, it would take many CPUmonths<br />

to process so much data and this would require a very substantial investm<strong>en</strong>t in tooling<br />

and hardware.<br />

There was also a short discussion about the data-gathering interval betwe<strong>en</strong> members of the<br />

study team and the participating root server operators. There was a cons<strong>en</strong>sus that the 2–3 day<br />

DITL interval was a reasonable compromise giv<strong>en</strong> there was a broadly repres<strong>en</strong>tative set of<br />

participating RSOs and this gave every edge device or resolving server a fair opportunity of<br />

appearing in the root server traffic. It was also noted that a longer data gathering interval could<br />

skew the results because traffic patterns could get counted twice (or more) as devices moved<br />

around the Internet: for example, a smartphone that issues the same set of queries each time it<br />

connects and disconnects or changes networks.<br />

4.3.6 Geographical limitation<br />

Another reasonable criticism would be that there has be<strong>en</strong> no geolocation analysis of the<br />

observed traffic. The objective of gathering source address prefix information for each new<br />

gTLD was to assess how widely spread the sources were, not their actual geographic location.<br />

The goal was to find out if traffic for .whatever was localized or spread across the Internet.<br />

To that ext<strong>en</strong>t, the specific physical locations from which the traffic originated did not matter<br />

much. If traffic for a new gTLD was found to be coming mostly from a small number of<br />

prefixes, that would have be<strong>en</strong> worthy of deeper analysis. However very few of the proposed<br />

TLD strings found in the root server traffic match that criterion. None of the most heavily used<br />

strings does.<br />

In addition, the impact of resolving servers and their caches makes it very difficult to draw<br />

meaningful conclusions about where traffic actually originates. For instance, a device on a<br />

corporate network at a site in Asia might be making DNS lookups via the company firewall in<br />

Australia. Similarly, a device in Africa might be using a global resolver provider and it is a node<br />

in that provider’s network in Europe, which queries a root server instance in North America.<br />

This might be a topic for further study.<br />

Name Collision Study Report Page 44<br />

Version 1.5 2013.08.02

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!