28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The major disadvantage of these schemes is that they need to be explicitly configured by the users. The<br />

Netscape plug-<strong>in</strong> needs to be downloaded and <strong>in</strong>stalled; the Mosaic CCI application needs to be downloaded,<br />

<strong>in</strong>stalled, and then Mosaic needs to be configured and the helper started at the start of each session. It is likely<br />

that due to this <strong>in</strong>convenience few users will download and <strong>in</strong>stall the track<strong>in</strong>g software, unless some form of<br />

<strong>in</strong>centive is given (free access to a subscription service, etc.). Furthermore, there are then the problems shared<br />

by any public software developer: support<strong>in</strong>g multiple architectures, user support, fix<strong>in</strong>g bugs and notify<strong>in</strong>g<br />

users of updates.<br />

Commentary<br />

The loss of real-time server-side track<strong>in</strong>g accuracy due to history mechanisms described <strong>in</strong> Section 2.1 might<br />

suggest that a browser-oriented track<strong>in</strong>g mechanism would be more desirable. However, whilst browser-side<br />

track<strong>in</strong>g is certa<strong>in</strong>ly more accurate, it suffers from scalability problems, and potentially low uptake due to the<br />

explicit configuration required. It may be possible to <strong>in</strong>crease the accuracy of server-side track<strong>in</strong>g with certa<strong>in</strong><br />

browsers through several techniques. One such technique is the use of an anchor to an unavailable background<br />

image, embedded <strong>in</strong> the page. Netscape and Mosaic's history mechanism causes all unloaded images to be rerequested<br />

whenever the page is revisited, so careful analysis of requests for this unavailable image, <strong>in</strong>clud<strong>in</strong>g<br />

the Referer header, will <strong>in</strong>dicate when a particular page has been revisited through the history mechanism.<br />

A hybrid approach which comb<strong>in</strong>es server-side and browser-side track<strong>in</strong>g would be for the server to <strong>in</strong>clude a<br />

reference to a small Java track<strong>in</strong>g applet with each page. The applet would have the sole responsibility of<br />

contact<strong>in</strong>g the server each time the user departs a page or navigated back to a page through the history<br />

mechanism. The applet would be stored <strong>in</strong> the browsers cache, and so wouldn't be loaded across the network<br />

for each page. The Java applets may even be able to track the brows<strong>in</strong>g activity levels on the workstation to<br />

determ<strong>in</strong>e if a user is actively view<strong>in</strong>g a page. Do<strong>in</strong>g so, however, might be regarded as a breach of privacy.<br />

Both of these approaches could be handled with<strong>in</strong> the WWW server which issues the resources, or they could<br />

be served by `track<strong>in</strong>g servers', <strong>in</strong> a similar way to the current proliferation of `page count<strong>in</strong>g servers'.<br />

Track<strong>in</strong>g servers could handle page track<strong>in</strong>g for a number of different sites. They could act standalone for their<br />

benefit only, or make the track<strong>in</strong>g <strong>in</strong><strong>format</strong>ion available to applications such as HyperVisVR through TCP,<br />

UDP or multicast communication.<br />

Cache/Firewall problems<br />

The need to accurately track the movement of browsers clashes horribly with the application of proxy cache<br />

servers and firewalls to drastically reduce the amount of network bandwidth consumed with redundant<br />

requests. The crux of the problem is based on the load-based algorithm proxy caches use to determ<strong>in</strong>e how<br />

long a particular object should be cached for. This results <strong>in</strong> the orig<strong>in</strong>al WWW server `see<strong>in</strong>g' extremely few<br />

requests from proxy servers for its most popular resources (as the caches store these popular objects), and an<br />

disproportionately high volume of requests for its least popular resources. As the popularity of proxy caches<br />

<strong>in</strong>creases, this could completely <strong>in</strong>validate the use of visualization such as HyperVisVR, which relies heavily<br />

on usage and popularity <strong>in</strong><strong>format</strong>ion. Possible solutions to the proxy cache problem can be broadly categorised<br />

<strong>in</strong>to `ignor<strong>in</strong>g the cache', `beat<strong>in</strong>g the cache', and `work<strong>in</strong>g with the cache'. We now discuss each of these <strong>in</strong><br />

turn.<br />

It is possible to completely ignore the WWW population which accesses the server from beh<strong>in</strong>d a proxy cache<br />

by disregard<strong>in</strong>g all requests which conta<strong>in</strong> a `Via' header or the word `via' <strong>in</strong> the User-Agent header from the<br />

cache. Ignor<strong>in</strong>g these mislead<strong>in</strong>g requests would seem like a good, straightforward approach to the problem.<br />

However, it is not possible to assume that the `direct access population' will be a representative sample of the<br />

entire requests. Many WWW users access through a cache because of organisation rules or country-based<br />

bandwidth problems, so by elim<strong>in</strong>at<strong>in</strong>g these users from the statistics you could unwitt<strong>in</strong>gly be exclud<strong>in</strong>g whole<br />

classes of users from the track<strong>in</strong>g statistics.<br />

Many content providers have resorted to `beat<strong>in</strong>g the cache' when attempt<strong>in</strong>g to obta<strong>in</strong> full access statistics and<br />

track<strong>in</strong>g <strong>in</strong><strong>format</strong>ion. HTTP/1.0 specifies a `Pragma: no-cache' header, which is an <strong>in</strong>struction to the cache

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!