Big Data for Security and Resilience
201410_Big_Data_STFC_WEB_FINAL
201410_Big_Data_STFC_WEB_FINAL
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong><br />
<strong>and</strong> <strong>Resilience</strong><br />
Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next<br />
Generation of Policy-Makers<br />
Proceedings of the Conference ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong>: Challenges<br />
<strong>and</strong> Opportunities <strong>for</strong> the Next Generation of Policy-Makers’<br />
Edited by Jennifer Cole<br />
STFC/RUSI Conference Series No. 4
Conference Report, October 2014<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next Generation<br />
of Policy-Makers<br />
Proceedings of the Conference ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong>: Challenges <strong>and</strong> Opportunities <strong>for</strong> the<br />
Next Generation of Policy-Makers’, March 2014<br />
Edited by Jennifer Cole<br />
www.stfc.ac.uk<br />
www.rusi.org
A joint publication of RUSI <strong>and</strong> the STFC, 2014.<br />
Royal United Services Institute <strong>for</strong> Defence <strong>and</strong> <strong>Security</strong> Studies<br />
Whitehall<br />
London<br />
SW1A 2ET<br />
UK<br />
Science <strong>and</strong> Technology Facilities Council<br />
Polaris House<br />
North Star Avenue<br />
Swindon<br />
SN2 1SZ<br />
Editor: Jennifer Cole<br />
Sub-editor: Susannah Wright<br />
Individual authors retain copyright of their contributions to this publication.<br />
This report may be copied <strong>and</strong> electronically transmitted freely. It may not<br />
be reproduced in a different <strong>for</strong>m without prior permission of RUSI <strong>and</strong> the<br />
SFTC.
Contents<br />
Foreword<br />
Bryan Edwards<br />
v<br />
Introduction: Machine Learning <strong>for</strong> <strong>Big</strong> <strong>Data</strong> 1<br />
Alex Gammerman <strong>and</strong> Jennifer Cole<br />
I. The National Archives, <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Security</strong>: Why Dusty<br />
Documents Really Matter 5<br />
Tim Gollins<br />
II. Trends in <strong>Big</strong> <strong>Data</strong>: Key Challenges <strong>for</strong> Skills 14<br />
Harvey Lewis<br />
III: <strong>Big</strong> <strong>Data</strong> <strong>and</strong> Financial Transactions: Providing New Means<br />
of Analysis 18<br />
Gregory M<strong>and</strong>oli<br />
IV. Characteristics of Terrorist Finance Networks: The Human Element 28<br />
Neil Bennett<br />
V: Terrorism <strong>and</strong> Political Risk Modelling 32<br />
Mark Lynch<br />
VI: Intelligent Use of Electronic <strong>Data</strong> to Enhance Public Health<br />
Surveillance 38<br />
Edward Velasco<br />
VII: The Raxibacumab Experience: The First Novel Product Approved<br />
Under the US Food <strong>and</strong> Drug Administration ‘Animal Rule’ 47<br />
Chia-Wei Tsai<br />
Discussion Groups<br />
Rapporteurs: Philippa Morrell, Chris Sheehan, Ed Hawker<br />
Discussion Group 1: The Ethics <strong>and</strong> Legality of <strong>Big</strong> <strong>Data</strong> Sharing 57<br />
Chair <strong>and</strong> Rapporteur: Edward Hawker<br />
Discussion Group 2: Policing, Terrorism, Crime <strong>and</strong> Fraud 62<br />
Chair: David Smart; Rapporteur: Philippa Morrell
Discussion Group 3: Health <strong>Data</strong>, Public Health <strong>and</strong> Public Health<br />
Emergencies 68<br />
Chair: Chris Watkins<br />
Discussion Group 4: Individual Privacy Versus Community Safety 76<br />
Chair <strong>and</strong> Rapporteur: Jennifer Cole<br />
Research Themes Identified in the Presentations <strong>and</strong> Discussion Groups 83<br />
An additional three presentations were given at the conference by<br />
Professor John Parkinson of the Medicines <strong>and</strong> Health Products<br />
Regulatory Agency (MHRA), Michael Connaughton of Oracle, <strong>and</strong> Dr<br />
Catriona McLeish of the University of Sussex. For a variety of reasons,<br />
no written papers were produced <strong>for</strong> these presentations, but we would<br />
still like to acknowledge their contribution to the event. The Powerpoint<br />
presentations given by Michael Connaugton <strong>and</strong> Professor Parkinson, as<br />
well as those delivered by the speakers who have contributed a written<br />
paper, can be accessed on the RUSI website events page here: http://goo.<br />
gl/9cXC3g.
Foreword<br />
Bryan Edwards<br />
Of all the challenges facing the UK today, few are as dem<strong>and</strong>ing as those<br />
affecting its national security. Some threats to the UK <strong>and</strong> its citizens are<br />
modern variants of those that the country has faced <strong>for</strong> many years. Others<br />
are entirely new <strong>and</strong> different to anything that has preceded them; while<br />
some, no doubt, have yet to be recognised or understood.<br />
One feature of this large, complex <strong>and</strong> constantly evolving array of challenges<br />
is that few, if any, lend themselves to single-discipline solutions.<br />
With this in mind, the Science <strong>and</strong> Technology Facilities Council (STFC)<br />
operates a Defence, <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong> Futures Programme. Challengeled<br />
<strong>and</strong> agnostic with respect to academic discipline, the STFC’s aim is to<br />
identify <strong>and</strong> facilitate opportunities to engage relevant capabilities within<br />
the UK National Laboratories <strong>and</strong> university research groups in relation to<br />
some of the highest-priority <strong>and</strong> most dem<strong>and</strong>ing challenges in national<br />
security.<br />
As part of this programme, the STFC is delighted to fund <strong>and</strong> proud to<br />
collaborate closely with RUSI in delivering a series of conferences on topical<br />
issues within this domain.<br />
Each meeting is designed to explore the interface between academic<br />
research <strong>and</strong> government policy <strong>and</strong> operations, in order to stimulate debate<br />
on how a step change, rather than incremental change, in the protection<br />
of the UK could be achieved. The meetings are strategic in character, with<br />
contributions from an atypically broad community drawn from universities,<br />
industry, government <strong>and</strong> its agencies <strong>and</strong> partners.<br />
At the <strong>for</strong>efront of the organisers’ minds is a deceptively simple question:<br />
what academic research can offer now, <strong>and</strong> in the future, to allow<br />
government to further enhance its capabilities in key areas, enabling it either<br />
to do significantly different things or to do what it does now in significantly<br />
different <strong>and</strong> better ways.<br />
In this context, <strong>Big</strong> <strong>Data</strong> is often identified as being of particular importance.<br />
Certainly, there is little doubt that raw data are being generated at what<br />
appears to be an accelerating rate. This is a trend that seems set to continue<br />
<strong>for</strong> the <strong>for</strong>eseeable future. Not only that, but complementary improvements<br />
in data storage technologies <strong>and</strong> telecommunications infrastructure mean<br />
that more of these data can be archived (potentially indefinitely) <strong>and</strong><br />
accessed on a global basis. And yet volume alone is insufficient to fully
vi<br />
Foreword<br />
appreciate either the nature of the challenge or the opportunities that<br />
exist. Indeed, if <strong>Big</strong> <strong>Data</strong> was defined simply according to volume alone,<br />
there would be few grounds <strong>for</strong> claiming a revolution. For example, during<br />
the 1990s, the strategy of the UK’s Department <strong>for</strong> Social <strong>Security</strong> sought<br />
to migrate benefits, such as unemployment benefits <strong>and</strong> pensions, from<br />
traditional paper-based systems to IT systems. The data volumes associated<br />
with this enterprise were large, even by today’s st<strong>and</strong>ards. It is there<strong>for</strong>e<br />
necessary to look instead at other characteristics of the data to identify<br />
what is qualitatively different, <strong>and</strong> to establish the source of the challenges<br />
<strong>and</strong> opportunities we are now presented with. These include features such<br />
as the diversity of the data, in terms of type <strong>and</strong> reliability. These in turn<br />
create new challenges <strong>for</strong> the development of the automated data analysis<br />
<strong>and</strong> interpretation systems required. This raises questions not only over how<br />
one could, in principle, approach the analysis of such data, but equally how<br />
systems based on these new principles could themselves be tested, verified<br />
<strong>and</strong> validated.<br />
While these technical challenges are significant, there are additional<br />
complexities associated with data residing in different organisations, <strong>and</strong><br />
a population that is becoming increasingly aware of <strong>and</strong> sensitive to the<br />
possibility of exploitation of data whose ownership they question in ways<br />
they consider inappropriate.<br />
In this meeting we look at some of the technical challenges that <strong>Big</strong> <strong>Data</strong><br />
presents, <strong>and</strong> consider a range of possible uses of <strong>and</strong> perspectives on<br />
data to tease out new issues. In the course of a one-day event, the scope<br />
<strong>for</strong> exploring them in detail is extremely limited. However, it is hoped that<br />
identifying relevant questions to be explored elsewhere is, in itself, a useful<br />
contribution to the debate.<br />
I would very much like to acknowledge the generous assistance <strong>and</strong> support<br />
offered by the US Department of Homel<strong>and</strong> <strong>Security</strong>, which contributed to<br />
making the day a success. Similarly, thanks must go to the staff at the STFC<br />
<strong>and</strong> RUSI, whose extremely hard work made this event possible. However,<br />
the final word of appreciation <strong>and</strong> gratitude is reserved <strong>for</strong> all those who<br />
participated so enthusiastically on the day itself, whether as speakers or as<br />
delegates.<br />
Anyone wishing to know more about the STFC’s Defence, <strong>Security</strong> <strong>and</strong><br />
<strong>Resilience</strong> Futures Programme in general, or about these conferences in<br />
particular, is invited to contact me using the e-mail address below.<br />
Professor Bryan Edwards<br />
Science <strong>and</strong> Technology Facilities Council<br />
bryan.edwards@stfc.ac.uk
Introduction: Machine Learning <strong>for</strong> <strong>Big</strong> <strong>Data</strong><br />
Jennifer Cole <strong>and</strong> Alex Gammerman<br />
This paper discusses the impact of the current high level of interest in <strong>Big</strong><br />
<strong>Data</strong> from academia <strong>and</strong> industry, <strong>and</strong> comments on how this is influencing<br />
the approach taken to funding research <strong>and</strong> developing skills in particular<br />
areas of computer science. It also discusses the relationship between <strong>Big</strong><br />
<strong>Data</strong> <strong>and</strong> machine learning – systems that have the ability to learn from<br />
data, rather than only following explicitly programmed instructions – <strong>and</strong><br />
the influence <strong>Big</strong> <strong>Data</strong> has on machine learning.<br />
For <strong>Big</strong> <strong>Data</strong> (or, <strong>for</strong> that matter, Small <strong>Data</strong>) to have any value, machine learning<br />
needs to be applied in order to extract useful in<strong>for</strong>mation from the data. The<br />
current approach to <strong>Big</strong> <strong>Data</strong> arguably places too much focus on the data as an<br />
end in themselves at the expense of properly considering the techniques <strong>and</strong><br />
approaches that will enable the best use to be made of them. For example, in<br />
2012 the International <strong>Data</strong> Corporation estimated that while the global data<br />
supply had reached about 2.8 zettabytes (1 zettabyte equalling 10 21 bytes),<br />
only an estimated 0.5 per cent of all data collected is used <strong>for</strong> analysis. 1 There<br />
is little point in <strong>Big</strong> <strong>Data</strong> per se; a problem needs to be defined <strong>and</strong> then the<br />
amount of data needed to solve this problem can be decided.<br />
As a way of extracting useful in<strong>for</strong>mation from data (irrespective of whether<br />
they are <strong>Big</strong> or Small <strong>Data</strong>) along with the academic disciplines <strong>and</strong> research<br />
that have contributed (<strong>and</strong> continue to contribute) to it, machine learning has<br />
much to offer in determining how the data are collected, analysed <strong>and</strong> used.<br />
Buzzwords in Computer History<br />
<strong>Big</strong> <strong>Data</strong> is a buzzword (or two), <strong>and</strong> it is not the first time in computer science<br />
that a new concept has been hailed as the answer to everything. In 1982, the<br />
Japanese Ministry of International Trade <strong>and</strong> Industry (MITI) began the Fifth<br />
Generation Computer Systems (FGCS) 2 project to develop a supercomputer<br />
that would further develop artificial intelligence. The British response to the<br />
Japanese challenge was the Alvey Programme 3 in in<strong>for</strong>mation technology. At<br />
that time, the way <strong>for</strong>ward <strong>for</strong> artificial intelligence was largely considered to<br />
be expert systems: computer systems that could help a human in the decision-<br />
1. John Gants <strong>and</strong> David Reinsel, ‘The Digital Universe in 2020: <strong>Big</strong> <strong>Data</strong>, <strong>Big</strong>ger Digital<br />
Shadows <strong>and</strong> <strong>Big</strong>gest Growth in the Far East’, International <strong>Data</strong> Corporation <strong>and</strong><br />
EMC, 2012, ,<br />
accessed 2 July 2014.<br />
2. Ehud Shapiro, ‘The Fifth Generation Project – a Trip Report’, Communications of the<br />
ACM (Vol. 26, No. 9, 1983), pp. 637–41, ,<br />
accessed 2 July 2014.<br />
3. The Alvey Programme, , accessed 30 July 2014.
2<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
making process by emulating the reasoning abilities of an expert. Such systems<br />
were supposed to solve everything. Gradually, however, as it become clear that<br />
expert systems have narrow <strong>and</strong> limited areas of application, unsubstantiated<br />
claims died down <strong>and</strong> the boom was over.<br />
The expert systems boom has much in common with the <strong>Big</strong> <strong>Data</strong> hullabaloo<br />
being experienced today. There seems to be an assumption that everything<br />
can be resolved by <strong>Big</strong> <strong>Data</strong>. It is somewhat naive to assume that theory is no<br />
longer needed to solve problems, just a lot of data <strong>and</strong> an ability to calculate<br />
a correlation between various items of data. This is nonetheless what some<br />
of proponents of <strong>Big</strong> <strong>Data</strong> say. 4 The myth persists that <strong>Big</strong> <strong>Data</strong> will provide<br />
the answers to all our questions. <strong>Big</strong> <strong>Data</strong> will not do this, but combined with<br />
machine learning it may help to provide some of them.<br />
<strong>Big</strong> <strong>Data</strong> <strong>and</strong> Machine Learning<br />
Modern machine learning exists at the intersection between statistics <strong>and</strong><br />
computer science. 5 Two main topics – inference (the process of reaching<br />
a conclusion from known facts) <strong>and</strong> data analysis – have been taken<br />
from statistics. In particular, non-parametric statistics (which makes no<br />
assumptions about probability distributions) has developed many methods<br />
<strong>and</strong> algorithms that are in use in machine learning. On the other h<strong>and</strong>, how<br />
to develop efficient algorithms <strong>and</strong> knowledge representation – the tractable,<br />
intractable, non-computable functions – are coming from computer science.<br />
Basically, machine learning tries to find regularities within past (or training)<br />
data (or examples) that allow the user to make predictions in future examples.<br />
This is done irrespective of the amount of data – big or small.<br />
Researchers at Royal Holloway, University of London, have been doing this<br />
<strong>for</strong> years: in 1998, the Computer Learning Research Centre 6 was established<br />
there <strong>and</strong> today two prominent Royal Holloway researchers are working in<br />
the field of statistical learning theory (SLT) with Vladimir Vapnik <strong>and</strong> Alexey<br />
Chervonenkis, the theory’s founders.<br />
Classical statistics usually deals with small scales <strong>and</strong> low dimensions of data;<br />
conceptual <strong>and</strong> computational difficulties may begin to arise when there are<br />
complex, sizable <strong>and</strong> high-dimensional data (roughly speaking, where the<br />
number of attributes or features are greater than a number of examples).<br />
Several machine learning methods are being developed to deal with these<br />
4. Chris Anderson, ‘The End of Theory: The <strong>Data</strong> Deluge Makes the Scientific Method<br />
Obsolete’, Wired, 16 July 2008, , accessed 30 July 2014.<br />
5. Many disciplines like psychology, mathematics, philosophy, linguistics, biology contribute<br />
to machine learning, but the main ones at present are statistics <strong>and</strong> computing.<br />
6. Computer Research Learning Centre, Royal Holloway University of London, , accessed 30 July 2014.
3<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
problems, including online predictions, parallel algorithms <strong>and</strong> efficient<br />
methods. Some of the new techniques being developed at Royal Holloway<br />
include string kernel techniques, prediction with expert advice <strong>and</strong> online<br />
con<strong>for</strong>mal predictors (or transductive confidence machines) – new learning<br />
techniques that make valid predictions. These techniques have been applied<br />
in a number of areas, <strong>for</strong> example <strong>for</strong> automatic target recognition, statistical<br />
profiling of offenders <strong>for</strong> the Home Office, material identification <strong>and</strong><br />
atmospheric correction <strong>for</strong> military applications, <strong>and</strong> anomaly detection to<br />
identify suspicious behaviour of ships <strong>and</strong> other vehicles. These techniques<br />
have also been applied to several medical fields, <strong>for</strong> example <strong>for</strong> detecting<br />
various abdominal diseases <strong>and</strong> ovarian cancer, <strong>and</strong> finding the best<br />
treatment <strong>for</strong> depression.<br />
One of the central questions in the theory of learning concerns the quantity of<br />
data needed in order to achieve a solution with a desirable degree of accuracy. A<br />
simple pattern recognition system to classify digits (0–9) can learn to recognise<br />
<strong>and</strong> correctly predict a shown digit after being trained on only a few hundred<br />
digits out of the hundreds of thous<strong>and</strong>s of digits available <strong>for</strong> training. 7 That<br />
is only a fraction of data, but enough to solve a problem. Pattern recognition<br />
systems often need surprisingly small amounts of data to obtain an answer.<br />
While intuitively it seems that the more data are used, the more accurate<br />
the prediction will be, the founders of SLT, 8 Vapnik <strong>and</strong> Chervonenkis, have<br />
shown that it is not just the length of training data that is important, but a<br />
concept called ‘capacity’ or ‘VC-dimension’ (after Vapnik <strong>and</strong> Chervonenkis).<br />
Roughly speaking, VC-dimension is a number of parameters of a decision<br />
rule. The important factor <strong>for</strong> quality of learning is a ratio of a length of<br />
the training set to the VC-dimension. A large ratio is ‘good’ from a learning<br />
perspective, as the results obtained on the test set are close to those on the<br />
training set to avoid ‘overfitting’ – the test set should show about the same<br />
accuracy (number of errors) as in the training set.<br />
If, however, there is a request to apply machine learning algorithms when <strong>Big</strong><br />
<strong>Data</strong> is provided but the analysis cannot be h<strong>and</strong>led on one machine, parallel<br />
algorithms can be developed <strong>and</strong> run on parallel machines. This requires more<br />
efficient methods to be developed, which is currently a challenge, though<br />
some progress is being made to resolve this. For example, in addition to wellknown<br />
methods such as induction, there are some advances in developing<br />
7. Alex Gammerman <strong>and</strong> Volodya Volk (2007), ‘Hedging Prediction in Machine Learning’,<br />
The Computer Journal (Vol. 50, No. 2, 2007), pp. 151–63.<br />
8. Oliver Bousquet at al., ‘Introduction to Statistical Learning Theory’, Max Plank Institute<br />
<strong>for</strong> Biological Cybernetics, 2004, , accessed 2 July 2014.
4<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
transductive methods. 9 In induction, particular examples are used to <strong>for</strong>mulate<br />
a general rule <strong>and</strong> then make predictions using this rule. The transductive<br />
instead goes from one example to another, which should be more efficient as<br />
the model does not have to solve an infinite number of examples, just find one<br />
particular example, which will in turn predict the next one. This could be a way<br />
<strong>for</strong>ward <strong>for</strong> developing new, efficient algorithms <strong>for</strong> prediction.<br />
Conclusions<br />
There is currently a lot of research into machine learning taking place <strong>and</strong><br />
new algorithms are being developed. They are both simple <strong>and</strong> rigorous, <strong>and</strong><br />
give a wide range of statistical learning methods. John Poppelaars 10 compared<br />
the current belief in <strong>Big</strong> <strong>Data</strong> with a fictional computer, Deep Thought, in The<br />
Hitchhiker’s Guide to the Galaxy, which took 10 million years to compute<br />
the ultimate question of life, the universe <strong>and</strong> everything, but because the<br />
beings who had programmed it never really knew what the question was,<br />
nobody knew what to make of the answer. Nowadays, people hope that <strong>Big</strong><br />
<strong>Data</strong> will help to find the ultimate question, but if we slightly paraphrased<br />
The Hitchhiker’s Guide to the Galaxy, we would argue that it is not <strong>Big</strong> <strong>Data</strong><br />
that will define the question: it is machine learning.<br />
Jennifer Cole is a Senior Research Fellow in <strong>Resilience</strong> <strong>and</strong> Emergency<br />
Management at the Royal United Services Institute, where her research<br />
programme has included a number of reports <strong>and</strong> projects on the use of <strong>Big</strong><br />
<strong>Data</strong> <strong>and</strong> cyber-security <strong>for</strong> the UK government, including the Foreign Office<br />
<strong>and</strong> Ministry of Defence. She is also a PhD c<strong>and</strong>idate in the Computer Science<br />
Department at Royal Holloway, University of London.<br />
Professor Alex Gammerman studied in Leningrad (now St Petersburg) <strong>and</strong> then<br />
worked in several research institutes of the Academy of Science of the USSR. In<br />
1983 he moved to the UK. He was appointed to the established Chair in Computer<br />
Science at the University of London (Royal Holloway <strong>and</strong> Bed<strong>for</strong>d New College)<br />
in 1993. Currently, he is Founding Director of the Computer Learning Research<br />
Centre at Royal Holloway, University of London, <strong>and</strong> a Fellow of the Royal<br />
Statistical Society. Professor Gammerman’s research interest lies in the field<br />
of machine learning, particularly the development of inductive–transductive<br />
confidence machines. Areas in which these techniques have been applied include<br />
medical diagnosis, <strong>for</strong>ensic science, genomics, environment <strong>and</strong> finance.<br />
This is a version of the paper written by the authors <strong>and</strong> can be found at http://<br />
clrc.rhul.ac.uk/publications/techrep.htm<br />
9. Vladimir Vapnik, The Nature of Statistical Learning Theory (New York, NY: Springet,<br />
1995).<br />
10. John Poppelaars, ‘Will <strong>Big</strong> <strong>Data</strong> End Operations Research?’, 2013, , accessed 30 July 2014.
I. The National Archives, <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Security</strong>:<br />
Why Dusty Documents Really Matter<br />
Tim Gollins<br />
This paper discusses three linked propositions. First, the way in which the National<br />
Archives, as a national institution of the United Kingdom, can be regarded as a<br />
repository of <strong>Big</strong> <strong>Data</strong>. The paper will discuss the concept of big data <strong>and</strong> place it<br />
in the historical context of archival collections that have trans<strong>for</strong>med the world,<br />
<strong>for</strong> example, the King of Assyria’s Library <strong>and</strong> the Library at Alex<strong>and</strong>ria. Second,<br />
it will consider the way in which the National Archives are central to UK security,<br />
providing a point of reference <strong>for</strong> society, <strong>and</strong> supporting citizens’ rights <strong>and</strong> the<br />
rule of law. It will also discuss the potential threat that emerges from a loss of<br />
trust in the processes that underlie the transfer of records to the Archives. Third,<br />
the paper will cover how the challenges of sensitivity reviews of digital records,<br />
which ensure that sensitive government records are archived appropriately, 1<br />
could give rise to further threats to the Archives <strong>and</strong> thus the wider security of<br />
our society. The paper goes on to show that in addressing the challenges of the<br />
sensitivity review of digital records, by using the <strong>Big</strong> <strong>Data</strong> nature of archives,<br />
opportunities arise to counter the wider threats to the security of our society.<br />
The Archives <strong>and</strong> <strong>Big</strong> <strong>Data</strong><br />
The classic definition of <strong>Big</strong> <strong>Data</strong> rests on volume, variety <strong>and</strong> velocity, 2 <strong>and</strong><br />
is inherently assumed to be digital. Taking a longer view, there are a number<br />
of points in history where such trans<strong>for</strong>mative conditions have existed with<br />
collections of other media, such as:<br />
• The 30,000 clay tablets from the oldest surviving royal library in the<br />
world: that of Ashurbanipal, King of Assyria (around 668–630 BC),<br />
including the story of Gilgamesh 3<br />
• The iconic Library of Alex<strong>and</strong>ria, alleged to have collected the<br />
knowledge of the ancient world under one roof (including 400,000–<br />
700,000 rolls within the collection). 4<br />
1. National Archives, ‘Step 3: Sensitivity Reviews of Selected Records’, ,<br />
accessed 25 July 2014.<br />
2. Anton Chuvakin, ‘Broadening <strong>Big</strong> <strong>Data</strong> Definition Leads to <strong>Security</strong> Idiotics!’, Gartner<br />
blog, 18 September 2013, , accessed 18 July 2014.<br />
3. British Museum, ‘The Library of Ashurbanipal, Research Project at the British<br />
Museum’, , accessed 19 August 2014.<br />
4. Heather Phillips, ‘The Great Library of Alex<strong>and</strong>ria’, Library, Philosophy <strong>and</strong> Practice<br />
2010, , accessed 18 July 2014.
6<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
In comparatively more recent times, as the practice <strong>and</strong> conventions<br />
of common law developed in Britain, the need to collect the records of<br />
cases <strong>and</strong> to access legal judgments <strong>for</strong> precedent gave rise to another<br />
example of <strong>Big</strong> <strong>Data</strong> of its day. Drawing on in<strong>for</strong>mation from the National<br />
Archives Catalogue, 5 we learn that ‘The Dialogus de Scaccario’, describing<br />
Exchequer administration in the 1170s, mentions a clerk who was deputy<br />
to the chancellor <strong>and</strong> had responsibility <strong>for</strong> the preparation <strong>and</strong> custody<br />
of <strong>for</strong>mal Chancery enrolments. Thereafter, the chancellor’s principal clerk<br />
was invariably associated with these duties, although progressively more<br />
<strong>and</strong> more remote from their direct execution; by 1388, <strong>and</strong> probably long<br />
be<strong>for</strong>e, a staff of subordinate clerks carried out the actual enrolments. From<br />
the mid-thirteenth century, this officer was generally known as the ‘keeper<br />
of the rolls’, <strong>and</strong>, as the first rank of Chancery clerks gradually came to be<br />
known as ‘masters’, the title ‘Master of the Rolls’ had become the st<strong>and</strong>ard<br />
designation by the fifteenth century. The holder of that post now chairs the<br />
Lord Chancellor’s Advisory Council, which assures the transfer of records to<br />
the Archives. 6<br />
Bringing the picture up to date, the paper holdings of the National Archives<br />
at Kew are over 1 billion paper pages, representing 1,000 years of history. 7<br />
At the same time, there are now over 2.5 billion archived pages accessible<br />
from the UK Government Web Archive (representing less than 20 years of<br />
contemporary history) 8 that are now being aggregated <strong>and</strong> mined to answer<br />
novel research questions that would have previously been intractable. The<br />
Archive is, <strong>and</strong> always has been, <strong>Big</strong> <strong>Data</strong>.<br />
The Archive <strong>and</strong> <strong>Security</strong><br />
Discussion of security should not be limited to considerations of criminality<br />
<strong>and</strong> terrorism. The security of UK society relies at its deepest level on the trust<br />
of the citizen in the state. It is all about the rule of law <strong>and</strong> the fact that no<br />
one, not even the executive, is above that rule. 9 The British state is different<br />
from many others in that the citizen expects the state to be subservient to it<br />
rather than the more common case. This is the very fabric of UK society; the<br />
rule of law supports <strong>and</strong> empowers the citizen.<br />
5. National Archives Catalogue, ,<br />
accessed 19 August 2014.<br />
6. National Archives Advisory Council In<strong>for</strong>mation, , accessed 19 August 2014.<br />
7. The authors’ own estimate based on approximately 12 million entries in the National<br />
Archives’ catalogue that refer to boxes or folders of records that can reasonably<br />
expected to hold upwards of 100 sheets of paper.<br />
8. National Archives UK Government Web Archive In<strong>for</strong>mation, , accessed 18 July 2014.<br />
9. The Rule of Law definition, LexisNexis, , accessed 18 July 2014.
7<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
The National Archives are fundamental to this aspect of security. The<br />
Archives provide the impartial witness that enables ‘holding to account’<br />
under the rule of law <strong>and</strong> in the court of history. They contain evidence<br />
of the transactions of the state <strong>and</strong> the executive <strong>and</strong> evidence of the<br />
decisions <strong>and</strong> policies enacted. This is central to Lord Bingham’s Fourth<br />
Principle: ‘Ministers <strong>and</strong> public officers at all levels must exercise the<br />
powers conferred on them in good faith, fairly, <strong>for</strong> the purpose <strong>for</strong> which<br />
the powers were conferred, without exceeding the limits of such powers<br />
<strong>and</strong> not unreasonably.’ 10 How can we know what the executive has done if<br />
the records are not kept?<br />
However, it is clearly not sufficient to consider the keeping of the record<br />
without considering how the record is selected <strong>and</strong> transferred to the<br />
Archives. The content of the Archives is clearly dependent on these processes.<br />
It follows there<strong>for</strong>e that the citizen must trust the process by which the<br />
Archives receive their material to sustain their rights.<br />
Transfer to the Archive<br />
The process by which public records are transferred to the National Archive<br />
is not widely understood, even among scholars who regularly use its content<br />
<strong>for</strong> their research. The principles of the appraisal that underlies transfer were<br />
laid down by the great archivist Hilary Jenkinson, who described many of the<br />
fundamentals of the UK system. 11 In setting out his approach, Jenkinson was<br />
trying to ensure that the UK archive (at that time The Public Records Office)<br />
was able to guard its independence under the rule of law, <strong>and</strong> could not fall<br />
foul of the criticism of complicity in wrongdoing that was evident in the case<br />
of the Nazi Archive in Germany with respect to the Holocaust. 12<br />
In summary, the transfer process consists of the following steps:<br />
• Appraisal <strong>and</strong> selection: determining which records meet the<br />
collection policy of the National Archives <strong>and</strong> then choosing which<br />
records should be transferred to the Archives or to a place of deposit<br />
• Sensitivity review: deciding which records should be open on transfer,<br />
which must be closed, <strong>and</strong> which must be retained in departments<br />
(under the ‘Lord Chancellor’s blanket’ – see below)<br />
• Preparation <strong>and</strong> delivery: the cataloguing, preparation <strong>and</strong><br />
10. IAP Annual Conference, ‘The Rule of Law in Prosecuting <strong>Big</strong> Businesses in Application<br />
to Regulatory Frameworks’, 2013, p. 2, , accessed 18 July 2014.<br />
11. Hilary Jenkinson, A Manual of Archive Administration (London: P. Lund, Humphries &<br />
Co Ltd, 1963 [1923]).<br />
12. Eric Westervelt, ‘Probe Details Culpability of Nazi-Era Diplomats’, NPR, 28 October<br />
2010, , accessed<br />
18 July 2014.
8<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
organisation of records <strong>for</strong> transfer <strong>and</strong> the actual transportation of<br />
records to the National Archives or to a place of deposit<br />
• Accessioning: the process by which the National Archives makes the<br />
records appropriately available.<br />
A Threat<br />
The principle of independence derived from <strong>and</strong> identified in the Grigg<br />
Report 13 that initiated the Public Records Act 1958 has led, over the years,<br />
to a series of checks <strong>and</strong> balances intended to ensure that the necessary<br />
records of the activities of the executive are deposited. These checks <strong>and</strong><br />
balances include: 14<br />
• The right of access to in<strong>for</strong>mation in departments under freedom of<br />
in<strong>for</strong>mation legislation be<strong>for</strong>e in<strong>for</strong>mation is transferred<br />
• Departments’ responsibility <strong>for</strong> the selection of the records, <strong>and</strong> <strong>for</strong><br />
the identification of any sensitivity in the records that would cause an<br />
exemption under freedom of in<strong>for</strong>mation legislation<br />
• The fact that the exemptions that can be applied to delay transfer are<br />
proscribed in law <strong>and</strong> their application can be challenged through the<br />
in<strong>for</strong>mation commissioner <strong>and</strong> thence by appeal to the In<strong>for</strong>mation<br />
Tribunal<br />
• The public visibility of the selection criteria that the departments<br />
must apply – as agreed with the National Archives<br />
• The National Archives’ process of oversight during the creation of the<br />
criteria <strong>and</strong> the Archives’ process of monitoring their application<br />
• The publication of in<strong>for</strong>mation regarding transfers<br />
• The <strong>for</strong>mal oversight of the timeliness of the transfer process <strong>and</strong><br />
the application of freedom of in<strong>for</strong>mation exemptions by the Lord<br />
Chancellor’s Advisory Council on Public Records.<br />
Un<strong>for</strong>tunately, in 2012, negative publicity 15 concerning the ‘migrated archives’<br />
of the colonial administrations (papers of the British administrations which<br />
should have been passed to the Public Records Office in a timely fashion<br />
but were wrongly kept at the government’s Hanslope Park facility) <strong>and</strong><br />
subsequent questions concerning other collections of documents at the<br />
Foreign Office raised the issue of the degree of trust in this system.<br />
13. James Grigg, Report of the Committee on Departmental Records, Cmnd 9163 (London:<br />
HMSO, 1954).<br />
14. National Archives, History of the Public Records Act, ,<br />
accessed 18 July 2014.<br />
15. Ian Cobain <strong>and</strong> Richard Norton-Taylor, ‘Sins of Colonialists Lay Concealed <strong>for</strong> Decades<br />
in Secret Archive’, Guardian, 18 April 2012, , accessed 22 July 2014.
9<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
While the process of selection, sensitivity review <strong>and</strong> transfer is in principle<br />
an open one, the process is complex <strong>and</strong> there are opaque aspects (not<br />
least, the use of the Lord Chancellor’s <strong>Security</strong> <strong>and</strong> Intelligence Instrument,<br />
known colloquially as ‘the Lord Chancellor’s blanket’, which is used to protect<br />
specific aspects of national security). 16 The very nature of such a situation, in<br />
which the shape of the process is open, <strong>and</strong> yet the detail of the data passing<br />
through the process must be hidden (since to reveal that detail would render<br />
the process moot), creates a situation in which conspiracy theorists can ply<br />
their trade. 17<br />
In essence, it can look like the establishment has something to hide <strong>and</strong> such<br />
appearances are important. While in no sense a conspiracy theorist, when<br />
someone of the eminence of Professor Margaret MacMillan feels compelled<br />
to challenge her own definitive works on the First World War, we should<br />
take note. 18 For trust to be maintained in the Archives, it is clear that any<br />
further barriers to the timely, open <strong>and</strong> transparent transfer of records must<br />
be avoided.<br />
Sensitivity Review of Digital Records<br />
The argument set out in this paper so far applies to all public records<br />
regardless of <strong>for</strong>mat or media. There are, however, particular consequences<br />
of the transition to the use of digital records that need to be considered.<br />
During the three decades from 1984 to 2014, administrative practices have<br />
been trans<strong>for</strong>med by the introduction of a sequence of waves of technology.<br />
This started with the photocopier <strong>and</strong> moved on to the personal computer<br />
(PC), the local area network to the internet, a wide range of mobile devices<br />
<strong>and</strong>, most recently, the ‘cloud’. All of these technologies created the ability<br />
<strong>and</strong> tendency to duplicate <strong>and</strong> proliferate in<strong>for</strong>mation in ever-increasing<br />
volumes. This process was piecemeal <strong>and</strong> began in the early 1990s, but by<br />
the middle of the first decade of this century all UK government records<br />
were digital. The impact of these technologies <strong>and</strong> the trans<strong>for</strong>mation of<br />
16. Notes on the Lord Chancellor’s <strong>Security</strong> <strong>and</strong> Intelligence Instrument, ,<br />
accessed 18 July 2014.<br />
17. National Archives, ‘20 Year Rule: Record Transfer Report’, , accessed 30 September 2014.<br />
18. Quoted in the Guardian: ‘I am one of many historians who has benefited from using<br />
the British archives <strong>and</strong> who had confidence that the documents had not been<br />
weeded to suit particular interests. Now I am wondering whether I will have to go<br />
back <strong>and</strong> rethink my work on such matters as the outbreak of the First World War or<br />
the peace conference at the end. But when are we going to get the complete records?<br />
So far the pace of transferring them is stately, to put it politely.’ Ian Cobain, ‘Academics<br />
Consider Legal Action to Force Foreign Office to Release Public Records’, Guardian,<br />
13 January 2014, ,<br />
accessed 19 August 2014.
10<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
administrative practice on the records of the public sector has not been<br />
examined in detail, however a detailed examination of the <strong>for</strong>mat <strong>and</strong> nature<br />
of the evidence presented to the Hutton Inquiry 19 is not positive. 20 In the<br />
evidence, the paper trail <strong>for</strong> a decision was no longer in a single Manila file;<br />
instead, the record was found in a blizzard of e-mails sent from person to<br />
person <strong>and</strong> stored on multiple computing systems. It would appear that the<br />
previously clear <strong>and</strong> unambiguous rules <strong>for</strong> the creation <strong>and</strong> management of<br />
in<strong>for</strong>mation in the public services have been challenged.<br />
In July 2012, the government announced the transition towards releasing<br />
records when they are twenty years old, instead of thirty 21 (as has been the<br />
case since the amendment to the Public Records Act in 1967). 22 From 2013,<br />
two years’ worth of government records will be transferred to the National<br />
Archives through a ten-year transition period until a new ‘20-year rule’ is in<br />
place in 2023. The records covered by this transition are those from 1983 to<br />
2003, 23 coinciding with the time during which the most extreme aspects of<br />
the technical changes mentioned above took place.<br />
When examining the process of transfer described above, <strong>and</strong> considering<br />
the impact of the change to digital records, it is clear that all of the steps in<br />
the process need to be examined. Appraisal <strong>and</strong> selection, preparation <strong>and</strong><br />
delivery, <strong>and</strong> accessioning will all present challenges to departments <strong>and</strong><br />
the Archives but there are a number of mitigations, including the doctrine<br />
of macro appraisal <strong>and</strong> the recent developments in digital preservation at<br />
the National Archives. 24 It is the process of sensitivity review that generates<br />
the most significant challenges <strong>and</strong> where considerable work is needed to<br />
identify mitigations.<br />
Additional Threats<br />
The challenges of digital records to the process of sensitivity review are as<br />
follows:<br />
19. Lord Hutton, Report of the Inquiry into the Circumstances Surrounding the Death of Dr<br />
David Kelly C.M.G. [the Hutton Inquiry], HC 247 (London: The Stationery Office, 2004),<br />
, accessed 18 July 2014.<br />
20. Michael Moss, ‘The Hutton Inquiry, the President of Nigeria <strong>and</strong> What the Butler<br />
Hoped to See’, English Historical Review (Vol. 120, No. 487, June 2005), pp. 577–92,<br />
, accessed 19 August 2014.<br />
21. National Archives, ‘Government Confirms Transition to a 20-Year Rule Will Begin from<br />
2013’, 13 July 2012, , accessed<br />
18 July 2014.<br />
22. Public Records Act 1967, , accessed<br />
22 July 2014.<br />
23. Ibid.<br />
24. Tim Gollins, ‘Puting Parsimonuous Preservation into practice’, The National Archives,<br />
2012, , accessed 25 July 2014.
11<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
• Volume <strong>and</strong> resources: Following advances in office technology during<br />
the late twentieth century, the consequent proliferation of in<strong>for</strong>mation,<br />
<strong>and</strong> the broadening of the interest of the scholarly community, a much<br />
greater volume of material is being deemed worthy of preservation<br />
in the digital age. Against a background of budgetary constraint, the<br />
manual review of digitally born records is not practical<br />
• Complex context: Technology has challenged earlier clear <strong>and</strong><br />
unambiguous rules <strong>for</strong> the creation <strong>and</strong> management of in<strong>for</strong>mation.<br />
This situation will significantly complicate the process of digital<br />
sensitivity review, as underst<strong>and</strong>ing a record’s context (including its<br />
distribution) is crucial in assessing its sensitivity<br />
• Risk: These challenges <strong>for</strong> review also occur in a context of significantly<br />
increased risk. Although the consequences of mistaken disclosure<br />
have not changed with the advent of digital records, the probability of<br />
discovering a mistake has. It is hard to discover particular in<strong>for</strong>mation<br />
in the paper world, in marked contrast to the digital environment<br />
where ubiquitous search engines index content rapidly. Risk-averse<br />
depositors may feel obliged to close large swathes of records if they<br />
cannot efficiently <strong>and</strong> effectively determine the sensitivity of each<br />
individual record with some clear degree of certainty.<br />
If sensitivity review of digitally born records is not practical, <strong>and</strong> against<br />
a background of budgetary constraint <strong>and</strong> increasing litigation, unless<br />
something is done large swaths of records will be closed in their entirety<br />
<strong>for</strong> long periods (up to 120 years in the case of some exemptions). Such<br />
precautionary closure (due to the costs or difficulty of review) is permissible<br />
under freedom of in<strong>for</strong>mation legislation, but it will contradict citizens’<br />
expectations of openness in a democratic society <strong>and</strong> will only serve to<br />
exacerbate the threat to trust in the Archives, as described above, <strong>and</strong> the<br />
subsequent threat to our security.<br />
Opportunities<br />
While digital records may challenge sensitivity review, <strong>and</strong> this may give rise<br />
to threats to our wider security, their very nature also offers opportunities to<br />
address those challenges <strong>and</strong> counter the threats. Some of the opportunities<br />
are as follows:<br />
• Some sensitivities are not subtle. They can relate to specific terms<br />
<strong>and</strong> thus an appropriately configured search system should be able<br />
to highlight them. For example, the records that related to the Al-<br />
Yamamah Contract, 25 although still available on the Campaign Against<br />
Arms Trade (CAAT) website, have been closed officially to prevent<br />
further damage to international relations.<br />
25. David Leigh <strong>and</strong> Rob Evans, ‘Secrets of al-Yamamah’, Guardian, [no date], , accessed 18 July 2014.
12<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
• Consistency: by using electronic means, it is possible to drive some<br />
consistency across the review process.<br />
• Accurate estimation of residual risk: unlike in the review of paper<br />
records, it is possible to estimate the risk posed by reviewed records<br />
using the concept of technologically assisted digital review.<br />
• Exploitation of the <strong>Big</strong> <strong>Data</strong> aspects of digital records, coupled with the<br />
application of machine learning applied in the context of in<strong>for</strong>mation<br />
retrieval technology, can result in patterns emerging that can in<strong>for</strong>m<br />
reviewers of where to look.<br />
All of the above requires significant research, first to determine what the<br />
digital record looks like, <strong>and</strong> then to demonstrate the opportunities that can<br />
be derived.<br />
Conclusion<br />
Freedom of in<strong>for</strong>mation does not relate solely to openness. There is a<br />
fundamental difference between openness (driven by what the state wants<br />
it citizens to see) <strong>and</strong> freedom of in<strong>for</strong>mation, which proscribes the right of<br />
accessing in<strong>for</strong>mation to the individual. 26 Freedom of in<strong>for</strong>mation creates<br />
a balance between the public interest, the state interest <strong>and</strong> the personal<br />
interest based on human rights, all mediated <strong>and</strong> governed by the rule of<br />
law.<br />
Balance is crucial to achieving freedom of in<strong>for</strong>mation alongside openness.<br />
Limits on openness are necessary <strong>for</strong> reasons of national security (<strong>for</strong><br />
example, the location of Britain’s nuclear weapons should not be revealed,<br />
nor should their targeting in<strong>for</strong>mation). Individuals also need to be protected<br />
from harm, <strong>and</strong> this has to be done through some limits on public access<br />
to in<strong>for</strong>mation. However, the ability to hold the executive to account under<br />
the rule of law <strong>and</strong> in the court of history is also central to the security of<br />
a modern democratic society. This can only be achieved through open <strong>and</strong><br />
transparent access to the records of government.<br />
How these challenges play out in the digital age of <strong>Big</strong> <strong>Data</strong> requires significant<br />
research, in order to gain a better underst<strong>and</strong>ing of how public records have<br />
changed <strong>and</strong> thus how they can be sensitivity reviewed <strong>and</strong> appropriately<br />
archived.<br />
Tim Gollins is currently an Honorary Research Fellow in the School of Computing<br />
Science at Glasgow University, working on the technically assisted sensitivity<br />
review of digital public records while on secondment from the National<br />
26. S Curtis, ‘In<strong>for</strong>mation Commissioner, “Open data is no substitute <strong>for</strong> freedom of<br />
in<strong>for</strong>mation”’, Daily Telegraph, 29 October 2013, ,<br />
accessed 29 July 2014.
13<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Archives. Tim started his career in the UK civil service in 1987 <strong>and</strong> joined<br />
the National Archives in April 2008 to lead the delivery <strong>and</strong> procurement<br />
workstream of the Digital Continuity Project. Tim was part of the team that<br />
developed the National Archives business in<strong>for</strong>mation architecture <strong>and</strong><br />
helped to initiate work on the new Discovery system to enable users to find<br />
<strong>and</strong> access the records held at the National Archives. He has recently worked<br />
on the design <strong>and</strong> implementation of a new digital-records infrastructure at<br />
the National Archives, which embodies the new parsimonious preservation<br />
approach he developed. Tim is a Director of the Digital Preservation Coalition<br />
<strong>and</strong> a member of the University of Sheffield I-School’s Advisory Panel.
II. Trends in <strong>Big</strong> <strong>Data</strong>: Key Challenges <strong>for</strong> Skills<br />
Harvey Lewis<br />
This paper will explain how Deloitte, one of the largest professional services<br />
networks in the world, has used <strong>Big</strong> <strong>Data</strong> both internally <strong>and</strong> in the services<br />
it provides to clients. The paper will address three main points: the role of<br />
<strong>Big</strong> <strong>Data</strong> at Deloitte, the challenges that exist surrounding the competency<br />
<strong>and</strong> basic skill sets of staff working with these data, <strong>and</strong> the current trends<br />
in <strong>Big</strong> <strong>Data</strong> <strong>and</strong> how they impact methodologies <strong>and</strong> practices. A particular<br />
area of growing interest is open data 1 – data that can be freely used, reused<br />
<strong>and</strong> distributed by anyone, subject only, at most, to the requirement to<br />
attribute <strong>and</strong> share alike 2 – which is providing a new source of resources <strong>for</strong><br />
organisations in the public <strong>and</strong> private sectors. This paper will examine the<br />
impact this is having on ethics, responsibility <strong>and</strong> business efficiency within<br />
companies <strong>and</strong> governments, <strong>and</strong> amongst civilians.<br />
What is <strong>Big</strong> <strong>Data</strong>? The widely cited <strong>and</strong> accepted 2001 GARTNER 3 definition<br />
lists the three Vs: volume, velocity <strong>and</strong> variety. In Deloitte, the term is often<br />
used to describe data that are too rich or complex to analyse well in a<br />
spreadsheet <strong>and</strong> without concepts from university-level statistics. Deloitte<br />
has used <strong>Big</strong> <strong>Data</strong> in a number of ways. For example, as Hurricane Irene was<br />
bearing down on the US in 2011, Deloitte helped one large US retailer to<br />
combine in<strong>for</strong>mation about curfews <strong>and</strong> road closures culled from social<br />
media with storm maps from the National Weather Service <strong>and</strong> GPS data<br />
from its own trucks to prepare <strong>for</strong> the storm’s impact on operations <strong>and</strong><br />
devise a logistics strategy <strong>for</strong> response <strong>and</strong> recovery.<br />
Despite some reservations about the reliability <strong>and</strong> usefulness of social<br />
media, it is nevertheless proving to be a useful additional source of data <strong>for</strong><br />
providing insights. For example, it can be used to identify instances of foodborne<br />
illnesses or other public health incidents, helping officials to work<br />
backwards from the spread portrayed in social media to the retail location<br />
<strong>and</strong> the distributor, <strong>and</strong> so on, as recently illustrated in analysis Deloitte<br />
per<strong>for</strong>med on an outbreak of pet-food poisoning in the US. The social<br />
networks embodied in social media also provide useful clues about influence,<br />
which has also been investigated by mapping physicians understood to be<br />
1. Deloitte, ‘Open <strong>Data</strong>: Driving Growth, Ingenuity <strong>and</strong> Innovation’, June 2012,<br />
, accessed 15 July 2014.<br />
2. Open <strong>Data</strong> H<strong>and</strong>book, ‘What is Open <strong>Data</strong>?’, ,<br />
accessed 15 July 2014.<br />
3. Douglas Laney, ‘3D <strong>Data</strong> Management: Controlling <strong>Data</strong> Volume, Velocity <strong>and</strong> Variety’,<br />
Gartner blog, 2001, , accessed 20<br />
August 2014.
15<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
exceptionally influential in pharmaceutical networks. These sorts of projects<br />
have considerable reach across to security <strong>and</strong> resilience areas.<br />
Skills Challenges <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
From a security <strong>and</strong> resilience perspective, the first challenge that needs<br />
to be addressed is over-reliance on technology, which can be manifested in<br />
three ways. The assumption that technology has to come first exemplifies<br />
how the lure of <strong>Big</strong> <strong>Data</strong> is driving bad decision-making. Second, upstream<br />
technology choice dominates downstream activities, which is very important<br />
– particularly from a public-sector procurement perspective. The range of<br />
different technologies <strong>and</strong> the choices that can be made early in programmes<br />
may significantly influence what is able to happen downstream. Furthermore,<br />
the infrastructure associated with educating <strong>and</strong> skill-based learning<br />
techniques is scarce; there are far too many ‘car drivers’ in <strong>Big</strong> <strong>Data</strong> <strong>and</strong><br />
not enough ‘mechanics’. There are not enough people who underst<strong>and</strong> the<br />
algorithms, logic <strong>and</strong> computer science behind the plat<strong>for</strong>ms they use that<br />
will allow them to be more innovative <strong>and</strong> creative in devising new solutions.<br />
The third challenge that exists is particularly acute <strong>for</strong> security <strong>and</strong> resilience:<br />
namely, that of ensuring that no stone is left unturned while identifying <strong>and</strong><br />
extracting the useful data from the pile. The problem with <strong>Big</strong> <strong>Data</strong> is that the<br />
availability <strong>and</strong> ease with which these data can be stored inadvertently leads<br />
to a desire to collect everything that can be collected. From a security <strong>and</strong><br />
resilience perspective, it is paramount that analysts are trained to identify<br />
necessary in<strong>for</strong>mation <strong>and</strong> to be very selective about the data they collect,<br />
process <strong>and</strong> analyse. More data does not always equal more in<strong>for</strong>mation.<br />
For example, to build a model that predicts the outcome of a coin toss, one<br />
can store either all the outcomes from individual coin tosses, or simply the<br />
total number of tosses <strong>and</strong> the number of times the coin came up heads. In<br />
the first instance, every piece of data is captured but it provides no further<br />
insight than can be extracted from the second instance.<br />
On the other h<strong>and</strong>, a different problem exists in data selection. For<br />
example, in the Second World War, the UK’s Bomber Comm<strong>and</strong> per<strong>for</strong>med<br />
a comprehensive survey of anti-aircraft weapons damage on its bomber<br />
fleet <strong>and</strong> recommended that armour be placed in those areas most<br />
susceptible to damage. The problem was that the sample of bombers<br />
surveyed was biased. It did not include the bombers that had not returned,<br />
which may well have shown additional areas of damage, which were not<br />
being factored into the analysis. Ultimately, this flaw was detected by<br />
the newly <strong>for</strong>med Operations Research Group, which recommended that<br />
armour be placed in areas showing least signs of damage. This reiterates<br />
the challenges addressed by Alex Gammerman <strong>and</strong> Jennifer Cole in the<br />
introduction: when data are available at very high volumes <strong>and</strong> rates, the<br />
problem is how to pick out the data that are actually needed (or, in the
16<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
case of Bomber Comm<strong>and</strong>, to realise that the most valuable data relate to<br />
what is missing, not what is in front of you).<br />
Analysts also need to guard against mistaking correlation <strong>for</strong> causality. <strong>Data</strong><br />
alone are not sufficient to answer any question that might be thrown at them.<br />
This is a particular challenge <strong>for</strong> researchers <strong>and</strong> analysts when it comes to<br />
finding interesting insights that no one has discovered be<strong>for</strong>e. It is important<br />
to underst<strong>and</strong> the root of the correlation, <strong>and</strong> to be able to assess whether<br />
or not it makes sense. For example, between 2006 <strong>and</strong> 2011, the murder<br />
rate in the US dropped from nearly 16,000 to just over 14,000 (a reduction of<br />
13.5 per cent), 4 <strong>and</strong> during the same period the market share of Microsoft’s<br />
Internet Explorer Internet browser also fell, from over 60 per cent to 20 per<br />
cent. 5 The two graphs showing these figures can be superimposed on top<br />
of one another, but this does not mean that as people became less likely to<br />
choose Internet Explorer as their preferred browser, they also became less<br />
likely to commit murder. There is correlation, but no causation.<br />
Finally, another significant challenge comes with mistaking the equations<br />
<strong>and</strong> models that analysts generate <strong>for</strong> insight. For example, the output of<br />
a regression model – the mathematical equation, as illustrated in Figure 1<br />
– is not the same as the insight that might gleaned when it is applied to a<br />
particular business domain or problem.<br />
Figure 1: Regression equation presented as business insight.<br />
y(x) = e (b0+b1x)<br />
1 + e (b0+b1x)<br />
b 0<br />
= 2.298057<br />
b 1<br />
= 30.023823<br />
The means by which insight is generated is not the insight <strong>and</strong>, in many<br />
cases, means nothing to anyone but the mathematicians who created it. It is<br />
in the interpretation <strong>and</strong> use of these models that value is derived, <strong>and</strong> this<br />
interpretation will depend on how data are visualised <strong>and</strong> the context into<br />
which the data fit.<br />
Future Trends in <strong>Big</strong> <strong>Data</strong><br />
Deloitte has identified seven current trends in <strong>Big</strong> <strong>Data</strong>: identifying people<br />
with the right talent to do the right things; visualising data appropriately<br />
so that they can be easily understood; recognising the value of machine<br />
learning in interpreting <strong>and</strong> analysing data; developing better data discovery<br />
4. Wall Street Journal, ‘Murder in America’, , accessed 15 July 2014.<br />
5. W3 Schools, Browser Statistics <strong>and</strong> Trends, , accessed 15 July 2014.
17<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
plat<strong>for</strong>ms; improving planning <strong>for</strong> how to get the most from data; improving<br />
the techniques <strong>for</strong>, <strong>and</strong> the use of, predictive data; <strong>and</strong> addressing the death<br />
of the data warehouse – an end to collecting <strong>and</strong> storing vast amounts of<br />
data <strong>for</strong> the sake of it.<br />
Those using <strong>Big</strong> <strong>Data</strong> need to know how to recognise <strong>and</strong> underst<strong>and</strong> the<br />
opportunities it offers, based on the trends that can be seen <strong>for</strong> the next year<br />
<strong>and</strong> beyond. Companies need to be able to spot trends that are going to help<br />
reduce the cost <strong>and</strong> ef<strong>for</strong>t associated with processing complex data, or those<br />
that will improve the marginal returns from <strong>Big</strong> <strong>Data</strong> into something more<br />
significant – more signal, <strong>and</strong> less noise.<br />
In conclusion, the starting point with <strong>Big</strong> <strong>Data</strong> should be the objective or<br />
question that needs to be addressed. The data <strong>and</strong> technology are the<br />
means to the end; they are necessary but not sufficient. What do we want<br />
to do with the data? Where do we want them to take us? These are the<br />
questions that will drive innovation <strong>and</strong> creativity. Just because the data<br />
<strong>and</strong> the technology are available (<strong>for</strong> example, from social media) is there<br />
really any benefit from using them <strong>and</strong> how is this determined? Finally, just<br />
because the data are there, if collecting, processing <strong>and</strong> analysing them is<br />
not going be cost efficient, there is nothing wrong with looking at other ways<br />
of achieving the same end.<br />
Harvey Lewis is Research Director, <strong>Data</strong> <strong>and</strong> Analytics at Deloitte. Based in<br />
London, he leads a team of researchers <strong>and</strong> data scientists investigating<br />
<strong>Big</strong> <strong>Data</strong>, open data <strong>and</strong> trends in analytics. He also leads focused research<br />
projects <strong>for</strong> clients in both the public <strong>and</strong> private sectors. He has spent twenty<br />
years in the in<strong>for</strong>mation technology industry, <strong>and</strong> specialises in analytics,<br />
cyber-security <strong>and</strong> national security. Harvey has authored numerous reports,<br />
white papers <strong>and</strong> blog posts. He is a frequent media commentator, <strong>and</strong> has<br />
contributed to many articles in the national <strong>and</strong> trade press. Harvey holds a<br />
BEng (Hons) <strong>and</strong> a PhD from Southampton University.
III: <strong>Big</strong> <strong>Data</strong> <strong>and</strong> Financial Transactions: Providing<br />
New Means of Analysis<br />
Gregory M<strong>and</strong>oli<br />
A good government implies two things: first, fidelity to the object<br />
of government, which is the happiness of the people; secondly, a<br />
knowledge of the means by which that object can be best attained. 1<br />
The Importance of Adaptation<br />
During the course of its history, the US has been confronted with, <strong>and</strong> has<br />
responded to, incidents threatening its welfare. Regrettably, it often takes<br />
a crisis to catalyse a critical review of current affairs <strong>and</strong> the creation of<br />
new operational paradigms. The events of 9/11 illustrate this. Typically,<br />
programmes evolve slowly, <strong>and</strong> it is not until numerous injustices are tallied<br />
or a catastrophe hits that a major shift occurs.<br />
Much has been made about the intelligence failures that led to 9/11. After<br />
9/11, the US federal government was <strong>for</strong>ced to recognise <strong>and</strong> seek a remedy<br />
to its lack of operational cohesiveness <strong>and</strong> to the lack of in<strong>for</strong>mation sharing<br />
among federal, state <strong>and</strong> local agencies. Seemingly overnight, the regulatory,<br />
inspection, interdiction <strong>and</strong> investigative focus of government shifted <strong>and</strong><br />
the global War on Terror began.<br />
Clearly, this new conflict impacted on the American psyche in a deep <strong>and</strong> visceral<br />
way. It also stirred cynicism, scrutiny <strong>and</strong> a public appetite <strong>for</strong> redressing the<br />
defects that exist in governmental administration. The surge of interest in<br />
strengthening public agencies prompted the passage of the Homel<strong>and</strong> <strong>Security</strong><br />
Act (HLSA) of 2002 <strong>and</strong> the creation of the Department of Homel<strong>and</strong> <strong>Security</strong><br />
(DHS), both of which aim to enhance the per<strong>for</strong>mance of all strata of government.<br />
In the immediate aftermath of 9/11, the HLSA <strong>and</strong> DHS seemed symbolic<br />
surrogates <strong>for</strong> the Twin Towers, though significant doses of confusion <strong>and</strong><br />
dysfunction accompanied the rapid creation of a new department with its<br />
inaugural group of twenty-two agencies. The creation of DHS was similar<br />
to what the UK is experiencing today with the creation of the National<br />
Crime Agency (NCA) <strong>and</strong> the rebr<strong>and</strong>ing <strong>and</strong> realignment of the UK Border<br />
Agency into three distinct entities: the Border Police Comm<strong>and</strong>, Home Office<br />
Immigration En<strong>for</strong>cement <strong>and</strong> UK Border Force.<br />
Homel<strong>and</strong> <strong>Security</strong> Investigations (HSI) is the principle investigative agency<br />
within DHS. It is the most unique law en<strong>for</strong>cement agency in the world because<br />
of its capability to investigate persons <strong>and</strong> property across borders <strong>and</strong> to<br />
1. James Madison, The Federalist (No. 62, 27 February 1788).
19<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
pursue violators within the US or overseas. HSI also has special en<strong>for</strong>cement<br />
powers that enable it to conduct border searches at ports of entry – functional<br />
equivalents of the border <strong>and</strong> the extended border. It can also prosecute<br />
violators via criminal, civil <strong>and</strong> administrative judicial processes.<br />
As such, HSI brings together assets <strong>and</strong> capabilities that do not exist in any<br />
other agency. Originally identified as Immigration <strong>and</strong> Customs En<strong>for</strong>cement,<br />
but rebr<strong>and</strong>ed as HSI in 2011, it merges elements from its legacy immigration<br />
<strong>and</strong> customs agencies into a globally oriented police <strong>for</strong>ce. HSI employs<br />
a ‘points of genesis’ investigative methodology that focuses on tackling<br />
transnational crime where it begins. The points of genesis approach allows<br />
threats to be attacked at inchoate stages, be<strong>for</strong>e they are fully <strong>for</strong>med <strong>and</strong><br />
more threatening. This is more effective <strong>and</strong> efficient than waiting until<br />
the ‘point of commission’, when a crime is more developed <strong>and</strong> difficult to<br />
counter. This is a corollary of pre-emptive self-defence, a readily recognised<br />
precept in international law.<br />
As a result, HSI deploys agents worldwide to work with <strong>for</strong>eign police services.<br />
HSI works closely with many UK partners, but primarily the National Crime<br />
Agency, City of London Police, Metropolitan Police Service, Police Scotl<strong>and</strong><br />
<strong>and</strong> the Police Service of Northern Irel<strong>and</strong>. These bilateral engagements<br />
significantly augment capacity <strong>and</strong> render a return on investment <strong>for</strong> the<br />
respective agencies as well as the overall US law en<strong>for</strong>cement mission.<br />
The New Paradigm<br />
From an academic perspective, homel<strong>and</strong> security is a new <strong>and</strong> not fully<br />
defined discipline, though after more than a decade the concept has been<br />
widely accepted to encompass more than counter-terrorism. Clearly, other<br />
manmade or natural entities, movements <strong>and</strong> phenomena threaten our<br />
national security, such as illegal immigration, street gangs, illegal drugs <strong>and</strong><br />
natural disasters, to name just a few. Thus, a reasonable interpretation of<br />
the concept m<strong>and</strong>ates that homel<strong>and</strong> security must be comprehensive <strong>and</strong><br />
address all hazards, though the criteria <strong>for</strong> what constitutes a hazard are not<br />
strictly defined, nor should they be. Unnecessary constraint is antithetical to<br />
a ‘light is right’ mentality that promotes strategic <strong>and</strong> operational fluidity in<br />
an environment where metastasis of modus oper<strong>and</strong>i <strong>and</strong> criminal networks<br />
is rapid.<br />
For HSI the threats are clear. Its mission is to conduct criminal investigations<br />
against terrorist <strong>and</strong> other criminal organisations that threaten US national<br />
security <strong>and</strong> seek to exploit America’s legitimate trade, travel <strong>and</strong> financial<br />
systems – further in<strong>for</strong>mation on how this is carried out is shown in Box<br />
1. 2 HSI will also support other DHS agencies in the response <strong>and</strong> recovery<br />
2. Immigrations <strong>and</strong> Customs En<strong>for</strong>cement, Homel<strong>and</strong> <strong>Security</strong> Investigations, , accessed 17 July 2014.
20<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
phases of natural disasters. The mission is purposefully broad <strong>and</strong> elastic to<br />
ensure focus <strong>and</strong> acuity against transnational criminal organisations that are<br />
globalised <strong>and</strong> poly-plat<strong>for</strong>m.<br />
The paradigm of a one-dimensional criminal enterprise, generically embodied<br />
in a Colombian cartel cocaine trafficking <strong>and</strong> bulk cash smuggling model, is<br />
both oversimplified <strong>and</strong> antiquated. Today, criminal organisations diversify<br />
their illicit plat<strong>for</strong>ms <strong>and</strong> become involved in myriad offences, including<br />
narcotics, intellectual property rights, human smuggling <strong>and</strong> trafficking,<br />
fraud <strong>and</strong> money laundering, to list just a few.<br />
Likewise, modern, internationally focused law en<strong>for</strong>cement agencies must<br />
be able to investigate a broad array of criminality in order to confront the<br />
poly-plat<strong>for</strong>m threats competently. Single or limited mission agencies<br />
have difficulty in this environment. In practice, they creep outside their<br />
missions. This then produces negative characteristics <strong>and</strong> results in deflated<br />
per<strong>for</strong>mance.<br />
Box 1: Value Transfer <strong>and</strong> Criminal Gain.<br />
HSI takes the attitude that, with the exception of criminals who have psychopathic<br />
tendencies, most are interested in making money or achieving some other<br />
commodity that has a measurable value. Within the wide range of crimes that<br />
are (or appear to be) perpetrated <strong>for</strong> money, a particular area of interest to HSI<br />
in the digital age is ‘value transfer’, which is a way to assess <strong>and</strong> identify money<br />
laundering. Value transfer focuses on the relative as well as the absolute value of<br />
transactions, <strong>and</strong> there<strong>for</strong>e often sheds light on money-laundering techniques,<br />
particularly where ‘dirty’ money may be phased through in modular increments<br />
rather than in single transactions.<br />
Value transfer can be physical (carrying bank notes or other <strong>for</strong>ms of currency<br />
from one place to another); virtual (transferring credit through online banking<br />
systems); based on trust (often using Hawala-style transactions, as described<br />
below); or carried out via trade (buying or selling something <strong>for</strong> above or below<br />
its real market value).<br />
The easiest way to launder money is through straight<strong>for</strong>ward cash smuggling,<br />
referred to as ‘bold cash’: a criminal has illegal proceeds from drug sales in<br />
the US that he wants to take back to Mexico, so he tapes the notes to himself<br />
<strong>and</strong> smuggles them across the border, where he then spends them. This is the<br />
simplest example: he might also swallow his money, or drive it across, but the<br />
value is in physical currency.
21<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Of course, the criminal could deposit the cash into bank accounts in the US <strong>and</strong><br />
withdraw it in Mexico, but doing this through legitimate banking systems will<br />
tend to leave a trail that can be identified <strong>and</strong> followed. For example, money<br />
derived from drug sales in Cali<strong>for</strong>nia may need to be moved to Yemen. The<br />
value could be transferred by putting all of the illegal proceeds into various bank<br />
accounts, but high-value, unusual transactions (such as those paid in cash) are<br />
likely to be picked up <strong>and</strong> questioned by banking systems. When money starts<br />
to pass through conventional money services, the intersections can be seen:<br />
where the money was put into the bank, how its value was transferred <strong>and</strong><br />
where it was transferred to (<strong>for</strong> example, by wire or by sending blank cheques<br />
to a partner in Sana’a), can help law en<strong>for</strong>cement agencies to identify <strong>and</strong> trace<br />
the criminal actors.<br />
For this reason, criminal transactions may be more likely to be moved thorough<br />
Hawala, an in<strong>for</strong>mal system of money brokers used in the Middle East, North<br />
Africa <strong>and</strong> the Horn of Africa, which does not transfer currency, but receives<br />
money in one country <strong>and</strong> makes a loan to someone in another on the basis that<br />
the loan will eventually be repaid. Hawala tends not to leave a digital footprint,<br />
which can be critical with criminal transactions.<br />
Another option is in<strong>for</strong>mal value transfer, which builds on the techniques used<br />
in bold cash smuggling. One way around the challenge of the interface with<br />
conventional money services is trade: if a criminal can take value, put it into<br />
a commodity <strong>and</strong> get it to where he needs it to go, this can provide a more<br />
sophisticated way of transferring the value. This is more complex <strong>and</strong> also more<br />
covert. For example, drugs might come into the US from Colombia, with a gross<br />
profit from sale of the drugs in the US of $1 million in cash. The drug baron is<br />
not in the US, however, he is in Bogota in Colombia <strong>and</strong> he wants to get the cash<br />
out of the US. One way to do this would be to find a corrupt jeweller in New<br />
York <strong>and</strong> buy $1 million worth of gold from him, melt it down, cast it as nuts <strong>and</strong><br />
bolts, dye it or plate it to make it appear to be a much cheaper metal, export<br />
these nuts <strong>and</strong> bolts to Colombia <strong>and</strong>, once it arrives, melt it down <strong>and</strong> resell<br />
it as gold, at its real value. Or the criminal might be even smarter <strong>and</strong> melt the<br />
nuts <strong>and</strong> bolts down into gold, then ship this back to the same corrupt jeweller<br />
in New York, who can then legitimately wire transfer the value of the gold. HSI,<br />
as a customs as well as a law en<strong>for</strong>cement agency, is able to look at <strong>Big</strong> <strong>Data</strong><br />
from trade transactions to analyse any unusual movements of goods or trends<br />
that might suggest that this kind of activity is going on.<br />
Evolution of Money-Laundering Techniques<br />
As we move into the digital age, value transfer transactions are becoming<br />
more sophisticated, particularly with the evolution of crypto-currency <strong>and</strong><br />
cryptography. Money laundering is an extremely dynamic activity, which law<br />
en<strong>for</strong>cement has to keep on top of. It is exceptionally fast-paced: situations
22<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
change very fast. There is a definite evolution of comparisons <strong>and</strong> differentiations<br />
between bitcoin, virtual currencies <strong>and</strong> crypto-currency <strong>for</strong> example, <strong>and</strong> these<br />
must be taken seriously: they are value transfer mechanisms with real value<br />
being traded. This does not exist in a Ponzi Scheme or other type of pyramid<br />
scheme where the ‘real’ value is completely ethereal.<br />
Crypto-currency has been coupled with the Darknet – ‘hidden’ or private<br />
networks on the Internet that can only be accessed by those who are invited to<br />
make connections, <strong>and</strong> are generally associated with illegal or dissident activity.<br />
The Silk Road in particular has been discussed frequently in recent months; it is<br />
one of many sites operating on a TOR (previously The Onion Router) Network,<br />
which conceals a user’s location from anyone conducting surveillance. This is<br />
attractive to criminals because it offers anonymity <strong>and</strong> decentralised control.<br />
The flow of value through the Darknet – the buyer <strong>and</strong> seller value – on the<br />
Silk Road is something that needs to be looked into, as it underlies how cryptocurrency<br />
can flow in transactions.<br />
There is no reason why a criminal could not take that flow to supplant the transfer<br />
of value seen in the trade transparency route. Why would he need to melt down<br />
gold into nuts <strong>and</strong> bolts if he could just use digital currency to transfer value? It<br />
is really a matter of acceptance – <strong>and</strong> when is the tipping point going to come<br />
when this <strong>for</strong>m of value transfer becomes more acceptable? If one looks back at<br />
currency as a symbolic <strong>for</strong>m of value, the same thing is happening with digital<br />
currencies. The US dollar is backed by the US government, which guarantees it<br />
<strong>and</strong> gives people confidence in its value. At present, that confidence does not<br />
exist in the crypto-world but it may come: in a service-based world, if a criminal<br />
could transfer the commodity or the service, using cryptography, it would give<br />
them autonomy <strong>and</strong> anonymity (no government scrutiny or taxes to pay). Would<br />
that be something that would interest the criminal fraternity? Probably, <strong>and</strong> it<br />
is possible to do.<br />
<strong>Big</strong> <strong>Data</strong>: Privacy <strong>and</strong> Consistency<br />
Anomalies in trade data can often be the most useful way to find out what <strong>Big</strong><br />
<strong>Data</strong> is useful to HSI. If someone is seen to be exporting gold from the US or<br />
Colombia, this can be flagged up as an anomaly. Normally coffee or sugar is<br />
seen along this route; it should be asked why large amounts of gold should be<br />
exported from areas that do not have gold mines. The HSI works in conjunction<br />
with other law en<strong>for</strong>cement agencies, customs <strong>and</strong> border protection, <strong>and</strong> with<br />
the State Department, Department of the Treasury <strong>and</strong> so on, to share <strong>and</strong><br />
analyse this sort of in<strong>for</strong>mation.<br />
Looking to the Future<br />
The future of money laundering throws up some very tricky challenges. Anything<br />
that can be mathematically defined can be transferred into the Darknet. If
23<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
criminals come up with a conspiracy to commit malicious acts, what is stopping<br />
others from transferring them the funds that will enable them to carry them out?<br />
Who is going to see it <strong>and</strong> how difficult is that going to be <strong>for</strong> law en<strong>for</strong>cement?<br />
What does it mean to nation states if criminals don’t need to rely on conventional<br />
currency or legitimate banks to guarantee transactions? Furthermore, what<br />
does this mean <strong>for</strong> the rule of law: if a criminal can enter into a contract with<br />
someone that law en<strong>for</strong>cement agencies can mathematically define, will a court<br />
even be needed to resolve the conflict <strong>and</strong>, if not, what effects will this have on<br />
justice in the future? Researching the implications of these possibilities now will<br />
help governments <strong>and</strong> their law en<strong>for</strong>cement agencies to better underst<strong>and</strong> the<br />
challenges when – rather than if – they need to be faced.<br />
Interagency Working<br />
Crime-fighting techniques need to be continually assessed by objective<br />
per<strong>for</strong>mance measures so that best practices can be identified. Relevant<br />
per<strong>for</strong>mance measures include efficiency, effectiveness, capacity,<br />
responsiveness, trust <strong>and</strong> confidence. In the <strong>Big</strong> <strong>Data</strong> context, these<br />
per<strong>for</strong>mance measures need to be weighed against an organisation’s<br />
ability to collect, store, analyse <strong>and</strong> disseminate in<strong>for</strong>mation internally <strong>and</strong><br />
externally. In<strong>for</strong>mation sharing lies at the core of HSI’s ethos.<br />
When HSI was created, the challenge was how to ensure that the previously<br />
disparate law en<strong>for</strong>cement agencies from which it sprang were able to interact<br />
in a way that would not adversely affect per<strong>for</strong>mance. Similar challenges are<br />
being encountered in the UK with the creation of multiple new agencies, as<br />
noted above. Thus, it becomes critical to assess new policies, programmes<br />
<strong>and</strong> strategies against these per<strong>for</strong>mance measures.<br />
In the US, there are a number of different criminal investigative agencies<br />
at the federal level. Quantitatively, HSI, the Federal Bureau of Investigation<br />
(FBI), the Drug En<strong>for</strong>cement Administration (DEA), the US Secret Service<br />
(USSS) <strong>and</strong> the Bureau of Alcohol, Tobacco, Firearms <strong>and</strong> Explosives (ATF)<br />
envelop most of this space. For many, HSI is less well-known, but it is the<br />
second largest agency in this class next to the FBI, which in the post-9/11<br />
environment is really a hybrid en<strong>for</strong>cement <strong>and</strong> domestic intelligence<br />
agency. Despite divergent missions, each of these agencies is empowered to<br />
investigate many of the same laws. Thus interoperability becomes essential,<br />
though parochial attitudes <strong>and</strong> proprietary interests still exist.<br />
The best way of achieving optimal per<strong>for</strong>mance is to ensure that separate<br />
agencies work closely, <strong>and</strong> well, together. A lack of coherent integration will<br />
degrade the ability to share in<strong>for</strong>mation <strong>and</strong> lead to negative outcomes.<br />
Based on professional experience <strong>and</strong> conversations with prosecutors;
24<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
defence attorneys; military personnel; local, state <strong>and</strong> federal police officers;<br />
<strong>and</strong> criminal defendants, five characteristics can be identified that are present<br />
in, <strong>and</strong> adversely impact, the investigative capabilities of law en<strong>for</strong>cement.<br />
Certainly, other characteristics may exist, but these appear dominant. These<br />
five characteristics are defined as follows:<br />
1. Interagency conflict: the real or perceived incongruity of agencies’<br />
interest that detrimentally affects the per<strong>for</strong>mance of one or both.<br />
Such conflict can materialise in different <strong>for</strong>ms, such as inter-agency<br />
rivalry, mistrust or malfeasance; one example of this can be seen in<br />
the law en<strong>for</strong>cement context, when, during a joint operation, the<br />
participating agencies vie <strong>for</strong> control or credit <strong>for</strong> an action or case<br />
2. Redundancy: the duplication or repetition of action; one example of<br />
this can be seen in the law en<strong>for</strong>cement context, when two or more<br />
agencies participate in an investigation <strong>and</strong> unnecessarily per<strong>for</strong>m<br />
the same or similar tasks<br />
3. <strong>Data</strong> fragmentation: the collection <strong>and</strong> segregation of in<strong>for</strong>mation<br />
in a way that prevents its sharing; one example of this can be seen<br />
in the law en<strong>for</strong>cement context, when one agency has in<strong>for</strong>mation<br />
regarding a suspect that may be of value to another agency <strong>and</strong> does<br />
not or cannot provide access to the in<strong>for</strong>mation or make the other<br />
party aware of the in<strong>for</strong>mation<br />
4. Jurisdictional <strong>for</strong>eclosure: the inability to en<strong>for</strong>ce a law because of<br />
lack of authority or resources<br />
5. Violation of civil rights: the deprivation of rights belonging to an<br />
individual, including civil liberties, due process, equal protection<br />
of the laws, <strong>and</strong> freedom from discrimination through an act or<br />
omission to act by law en<strong>for</strong>cement.<br />
In relation to <strong>Big</strong> <strong>Data</strong>, the fragmentation of data becomes the universal issue.<br />
The other characteristics, primarily civil rights violations, can be present, but to<br />
a lesser extent. The centralisation of data enhances the work product derived<br />
from the collection, storage, analysis <strong>and</strong> dissemination cycle. A de facto or<br />
de jure centralised comm<strong>and</strong> structure is needed to foster the integration of<br />
the disparate elements. It is not wise to decentralise operations into small<br />
autonomous units because they will become unco-ordinated <strong>and</strong> per<strong>for</strong>m at a<br />
less than optimal or ‘dysfunctional’ level when compared with the centralised<br />
model. Recognition of interdependence becomes paramount.<br />
In the context of this <strong>for</strong>um, centralisation creates an economy of scale<br />
<strong>and</strong> management scheme <strong>for</strong> <strong>Big</strong> <strong>Data</strong>. So what becomes crucial is not the<br />
per<strong>for</strong>mance of entities per se, but the construct employed to collect, store,<br />
analyse <strong>and</strong> disseminate in<strong>for</strong>mation in a manner that will generate synergy.<br />
HSI has recognised this <strong>and</strong> participates actively in multiple ‘data crunching’<br />
<strong>for</strong>a.
25<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Directly on point is HSI’s Trade <strong>and</strong> Transparency Program. Under this<br />
initiative, HSI works jointly with customs agencies worldwide to share <strong>and</strong><br />
analyse trade data <strong>for</strong> anomalies.<br />
The HSI established the Trade Transparency Unit to conduct ongoing analysis<br />
of trade data provided through partnerships with other countries’ trade<br />
transparency units. One of the most effective ways to identify instances<br />
<strong>and</strong> patterns of trade-based money laundering is through the exchange<br />
<strong>and</strong> subsequent analysis of trade data <strong>for</strong> anomalies that would only be<br />
apparent by examining both sides of a trade transaction. The unit is <strong>for</strong>med<br />
when the US <strong>and</strong> any of its trading partners agree to exchange trade data<br />
<strong>for</strong> the purpose of comparison <strong>and</strong> analysis. Using state-of-the-art software<br />
<strong>and</strong> proven investigative techniques, the unit can easily identify previously<br />
invisible trade-based alternative remittance systems <strong>and</strong> customs fraud. 3<br />
To facilitate the creation <strong>and</strong> management of <strong>Big</strong> <strong>Data</strong>, agencies need to<br />
integrate in some way. This integration can occur at different levels, consisting<br />
of recognition, co-ordination, collaboration, community, consolidation <strong>and</strong><br />
merger. Any one of these is better than nothing <strong>and</strong>, realistically, community<br />
is as far as most agencies can go without legislative intervention. These<br />
phases are defined as follows:<br />
• Recognition: the confirmation of existence which occurs in the <strong>Big</strong><br />
<strong>Data</strong> context when one agency acknowledges that another agency has<br />
the authority to per<strong>for</strong>m a particular act <strong>and</strong> has relevant in<strong>for</strong>mation<br />
that the other agency may or may not have<br />
• Co-ordination: the act of confirming concurrent jurisdiction <strong>and</strong><br />
agreeing to separate areas of en<strong>for</strong>cement to reduce redundancy, but<br />
agreeing to respond to requests <strong>for</strong> in<strong>for</strong>mation<br />
• Collaboration: the act of working together in a joint operation<br />
<strong>and</strong> sharing in<strong>for</strong>mation, but not granting open access – agency<br />
participation in a task <strong>for</strong>ce or a memor<strong>and</strong>um of underst<strong>and</strong>ing<br />
being examples of this<br />
• Community: the act or process of openly sharing resources or<br />
in<strong>for</strong>mation among several entities with some restriction<br />
• Consolidation: the act or process of sharing in<strong>for</strong>mation without<br />
restriction<br />
• Merger: the fusion of disparate entities into a single entity.<br />
It is important to note that this type of integrative scheme, regardless of the<br />
level, elicits claims of privacy invasions <strong>and</strong> civil-rights violations. Perhaps<br />
ironically, the consolidation of data or the creation of <strong>Big</strong> <strong>Data</strong> can actually<br />
3. ICE, Trade Transparency Unit, , accessed 18<br />
July 2014.
26<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
minimise incursions into personal privacy. Counterintuitively, this shrinks<br />
government <strong>and</strong> mitigates violations.<br />
In the US, when the Founding Fathers were looking at how to configure<br />
the new union, the initial scheme was decentralised, or ‘anti-federalist’<br />
in the parlance of the era. This resulted in the drafting of the Articles of<br />
Confederation <strong>and</strong> the creation of thirteen disparate, EU-style states. This<br />
scheme ultimately failed, mostly because individual state interests trumped<br />
the collective good. More on point was that it led to the institution of thirteen<br />
different policies <strong>and</strong> programmes. Today, there are fifty states <strong>and</strong> if the<br />
US was still decentralised, there would be fifty different, <strong>and</strong> potentially<br />
incongruous, regulatory programmes. A positive derivative of integration is<br />
that it organises the collection, storage, analysis <strong>and</strong> dissemination of <strong>Big</strong><br />
<strong>Data</strong>, thus making the work product more effective, but the process more<br />
efficient, capable, responsive <strong>and</strong> trustworthy. There<strong>for</strong>e, privacy <strong>and</strong> civil<br />
rights infringements are not inherent to <strong>Big</strong> <strong>Data</strong> schemes. Ideally, the other<br />
negative characteristics should be minimised as well.<br />
The need <strong>for</strong> a shared vision <strong>and</strong> <strong>for</strong> unified approaches is even more<br />
apparent in the digital age, where in<strong>for</strong>mation is exponentially propagated<br />
with each passing day. Couple this with advances in web technology whereby<br />
users can remain anonymous, or at least pseudonymous. Currently, the<br />
Darknet <strong>and</strong> the use of cryptographic algorithms present emerging threats<br />
<strong>and</strong> create a new dimension <strong>for</strong> traditional criminal enterprises. The joint<br />
investigation into the online black market site Silk Road, headed by HSI <strong>and</strong><br />
the FBI, provides a good example of this.<br />
HSI recognises this <strong>and</strong> that there will be future, presently unconceived,<br />
advances in criminal practice. Such spectres are daunting, but not intimidating<br />
or insurmountable when the infrastructure <strong>and</strong> partnerships to confront<br />
them already exist. This is the message of HSI.<br />
Greg M<strong>and</strong>oli is a special agent with the Department of Homel<strong>and</strong> <strong>Security</strong>’s<br />
Homel<strong>and</strong> <strong>Security</strong> Investigations (HSI) <strong>and</strong> is currently assigned to the US<br />
Embassy in London. His related professional activities include eight years<br />
as an army reservist in the Judge Advocate General Corps <strong>and</strong> positions at<br />
the University of Maryl<strong>and</strong> as a Course Developer <strong>and</strong> Adjunct Associate<br />
professor. In 2006, Greg became the first HSI agent to graduate from the<br />
Naval Postgraduate’s Master of Arts programme in Homel<strong>and</strong> Defense <strong>and</strong><br />
<strong>Security</strong>. In 1994, Greg graduated from Golden Gate University School of<br />
Law with the recognition of a public-interest law scholar. Be<strong>for</strong>e becoming a<br />
special agent, Greg practised law as a Deputy Public Defender in Cali<strong>for</strong>nia<br />
where he h<strong>and</strong>led felony matters involving homicide, three strikes, gang <strong>and</strong><br />
drug offences.
27<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
This paper is a summary of topics presented by Homel<strong>and</strong> <strong>Security</strong><br />
Investigations Special Agent <strong>and</strong> University of Maryl<strong>and</strong> Adjunct Associate<br />
Professor Gregory M<strong>and</strong>oli at the RUSI/STFC event ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong><br />
<strong>and</strong> Resilence: Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next Generation of<br />
Policymakers’. The paper represents his personal viewpoints <strong>and</strong> is partially<br />
based on previously authored materials.
IV. Characteristics of Terrorist Finance Networks:<br />
The Human Element<br />
Neil Bennett<br />
In Chapter III, Gregory M<strong>and</strong>oli writes about <strong>Big</strong> <strong>Data</strong> <strong>and</strong> financial<br />
transactions, <strong>and</strong> this paper will return to the subject while taking a slightly<br />
different perspective – focusing on the benefits of linking money flows to<br />
human behaviour <strong>and</strong> human activity. This will show how analysis of data<br />
by social scientists as well as data analysts can support the identification of<br />
important individuals within a network.<br />
The aim of this conference has been to underst<strong>and</strong> how academics,<br />
researchers <strong>and</strong> policy-makers can utilise <strong>Big</strong> <strong>Data</strong>. As noted, definitions of<br />
big data commonly refer to the four Vs: volume, variety, velocity <strong>and</strong> veracity. 1<br />
This paper will attempt to give a perspective from the operational, end-user<br />
requirement: that a wide variety of data are available in ever-increasing<br />
volumes. This presents challenges as to how those data are stored, by whom<br />
they are analysed, <strong>and</strong> why. The paper will describe operational challenges<br />
faced by law en<strong>for</strong>cement <strong>and</strong> defence, focusing on the operational<br />
opportunities <strong>and</strong> outputs.<br />
The end goal of data analysis is improved efficiency. Efficiency is effectiveness<br />
driven by an exploitation path towards the operational outcomes <strong>and</strong>, in turn,<br />
towards end use. Financial intelligence <strong>and</strong> terrorist finance can be used as the<br />
lens through which this process is viewed. Why finance? Law en<strong>for</strong>cement,<br />
defence, the UK government as a whole, as well as other governments around<br />
the world, see financial interventions as interventions of first choice in the<br />
fight against international crime. A perfect example of this is the recent case<br />
involving Ukraine, 2 in which the UK, together with the EU, imposed restrictive<br />
measures – financial sanctions – upon eighteen (later extended to twentytwo)<br />
Ukrainian <strong>for</strong>mer regime members <strong>for</strong> misappropriation of Ukrainian<br />
state funds. The sanctions prevented the politicians from accessing assets<br />
or funds held by European financial institutions, a significant move as many<br />
Ukrainian <strong>and</strong> Russian politicians hold money in accounts in Luxembourg<br />
<strong>and</strong> the Netherl<strong>and</strong>s in particular.<br />
This paper will focus specifically on the alternative remittance system Hawala.<br />
It is worth bearing in mind here that the system of banking we recognise as<br />
1. IBM, ‘The FOUR V’s of <strong>Big</strong> <strong>Data</strong>’, ,<br />
accessed 9 July 2014.<br />
2. HM Treasury, ‘Financial Sanctions, Ukraine (Misappropriation <strong>and</strong> Human Rights)’,<br />
15 April 2014, ,<br />
accessed 9 July 2014.
29<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
official today was first established in the 1700s, <strong>and</strong> at its oldest can only<br />
be traced back as far as the financial institutions of Italy in the fourteen<br />
century. As Hawala arose in the early Medieval period, which system should<br />
be categorised as the ‘alternative’ remittance system is open to debate.<br />
There is nothing wrong with Hawala. It is the abuse of Hawala, not the system<br />
itself, that causes issues: those using Hawala systems are not automatically<br />
guilty. There<strong>for</strong>e, the question to ask is how the manipulation of data obtained<br />
from Hawala transactions can assist policy-makers <strong>and</strong> law en<strong>for</strong>cement in<br />
making the right decisions about the right people <strong>and</strong> the right entities to go<br />
<strong>for</strong>. This requires a number of different issues to be considered: behaviour,<br />
attitude, language, customs, values, beliefs, influence, institutions, power –<br />
political, economic, legal – social structures, clan tribe <strong>and</strong> ideology. These<br />
issues are critical to underst<strong>and</strong>ing the situation in areas of the developing<br />
world in which law en<strong>for</strong>cement or defence is expected to operate, often<br />
in support of the Foreign Office, DfID <strong>and</strong> the Stabilisation Unit. If that<br />
underst<strong>and</strong>ing is not in place from the beginning, the decision-making<br />
process may be flawed. Complex human <strong>and</strong> cultural dimensions play a<br />
large part in any decision-making cycle, <strong>and</strong> there are both inter- <strong>and</strong> intradependencies<br />
between these <strong>and</strong> the cultural, institutional, technological<br />
<strong>and</strong> physical environment.<br />
Underst<strong>and</strong>ing Trust in Networks<br />
Individuals who are involved with <strong>and</strong> control money or value are extremely<br />
highly trusted within a network, but how can these levels of trust be assessed,<br />
identified <strong>and</strong> quantified? A Hawala transaction may move across South<br />
Asia, through Afghanistan, Pakistan, Iran <strong>and</strong> the Gulf States. Each stage of<br />
the transaction will include different languages, currencies <strong>and</strong> methods<br />
of communication, including fax, e-mail, mobile-phone calls <strong>and</strong> Internet<br />
communications. The human activity taking place within those environments<br />
is incredibly complex even be<strong>for</strong>e the Hawala aspect is considered, <strong>and</strong><br />
comprises unstructured in<strong>for</strong>mation <strong>and</strong> inherent knowledge as well as<br />
data. Underst<strong>and</strong>ing this complexity is critical to underst<strong>and</strong>ing the decisionmaking<br />
cycle. What in<strong>for</strong>mation is in the ledgers <strong>and</strong> what does this actually<br />
mean – all the while bearing in mind that the in<strong>for</strong>mation may be on paper,<br />
rather than in electronic <strong>for</strong>mat?<br />
Another key question to ask is why we look at money, <strong>and</strong> in what context<br />
we look at it. It is important here to underst<strong>and</strong> two particular elements.<br />
First, there is the threat environment, which is a combination of interacting<br />
elements. We need to underst<strong>and</strong> the systematic dimension of the threat<br />
from cradle to grave, all along the line of process, during which there will be<br />
different data in different <strong>for</strong>mats: structured, unstructured, paper or digital.<br />
In order to underst<strong>and</strong> the system, all those different inputs need to be<br />
understood <strong>and</strong> made sense of. Ultimately, this will require an enterprise of
30<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
transnational co-operation: one analyst, or even one organisation, cannot do<br />
everything <strong>and</strong> there<strong>for</strong>e an approach is needed that allows the generation<br />
of best impact <strong>and</strong> best ef<strong>for</strong>t.<br />
One of the ways in which this is done is by breaking the process out into<br />
functions, underst<strong>and</strong>ing what the vulnerabilities are <strong>and</strong> underst<strong>and</strong>ing<br />
what actions are needed to generate effect against the vulnerabilities that<br />
have been identified as critical within the system or enterprise. This is only<br />
possible if a huge variety of data can be understood, which in turn requires<br />
data to be taken in from all of the different environments.<br />
The critical issues here are impact <strong>and</strong> benefit analysis: whether those data<br />
can be used to predict what may occur, <strong>and</strong> what particular action should be<br />
taken against them. This requires a huge range of hard <strong>and</strong> soft factors to be<br />
considered, including:<br />
• Social media<br />
• High per<strong>for</strong>mance analytics<br />
• Sociology<br />
• Link <strong>and</strong> entity extraction<br />
• Natural language processing<br />
• Anthropology<br />
• Semantic search<br />
• Node disambiguation<br />
• Graph databasing<br />
• Pattern <strong>and</strong> prediction<br />
• Visualisation<br />
• Fuzzy link analysis<br />
• Machine learning<br />
• Psychology<br />
• Web science<br />
• Predictive modelling<br />
• Linguistic analysis in microblogs.<br />
Industry <strong>and</strong> academia are approaching the challenges from all of these<br />
angles, <strong>and</strong> are working on ways to ensure that they work more coherently<br />
together. Sociology, anthropology <strong>and</strong> psychology are in italics in the above<br />
list as three examples that academics working in the area of machine learning<br />
in particular have indicated that they do not always consider. They feel that<br />
they would be well served by a better underst<strong>and</strong>ing of how their work could<br />
benefit from or impact on some of these disciplines – in particular psychology.<br />
Certainly in the case of link entity extraction (which extracts key entities such<br />
as names, locations, terms <strong>and</strong> dates <strong>and</strong> links them together), language<br />
processing <strong>and</strong> semantic search, they would benefit from considering the<br />
issues again from human, cultural <strong>and</strong> behavioural perspectives. How do
31<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
data analysts really underst<strong>and</strong> what is going on in those environments,<br />
so that the right decisions can be made based on the right interpretation<br />
of in<strong>for</strong>mation? Without this, bad decisions may be made that deliver an<br />
inappropriate <strong>and</strong> even damaging intervention, be it tactical, operational<br />
or strategic, because those making the decision did not underst<strong>and</strong> human<br />
behaviour.<br />
There is a huge amount of powerful, fascinating work that can done with<br />
large, structured data sets, but the real challenge is the high per<strong>for</strong>mance<br />
analytics that will support it also getting faster. The ability to do this mainly<br />
depends on how the unstructured feeds can be integrated, be these from<br />
social media or from the human characteristics identified by psychology,<br />
anthropology <strong>and</strong> sociology that actually allow the development of a<br />
holistic perspective of predictive modelling. Finance is a good way of trying<br />
to underst<strong>and</strong> certain characteristics of human behaviour, <strong>and</strong> so, from a<br />
research perspective would be a good place to start building up a better<br />
underst<strong>and</strong>ing of not only the finance networks themselves, but also the<br />
terrorist <strong>and</strong> criminal networks that sit behind them.<br />
Neil Bennett has moved on from his role since this conference, <strong>and</strong> RUSI<br />
has not been able to contact him to approve this paper <strong>for</strong> publication. We<br />
there<strong>for</strong>e apologise <strong>for</strong> any errors it may contain, <strong>and</strong> stress that these are<br />
the responsibility of the editorial process, not the speaker.
V: Terrorism <strong>and</strong> Political Risk Modelling<br />
Mark Lynch<br />
This paper will discuss the insurance industry’s assessment <strong>and</strong> dealings with<br />
<strong>Big</strong> <strong>Data</strong>, using the specific example of risk modelling around the threat <strong>and</strong><br />
likely impacts of political violence. It will provide a brief overview of how the<br />
insurance industry approaches the challenge of political violence, including<br />
how analytics have started to become a far more dominant component of<br />
this, <strong>and</strong> will show how <strong>Big</strong> <strong>Data</strong> is starting to filter in. It will then explore<br />
some of the challenges that are starting to emerge.<br />
The approach of the insurance industry to political violence is a particularly<br />
interesting example to consider as it resembles a microcosm of how business<br />
in general has dealt with <strong>Big</strong> <strong>Data</strong>, <strong>and</strong> also because the lessons learned by the<br />
insurance industry have a lot to offer other sectors with regard to resilience.<br />
Without insurance, the impact of a terrorist attack or widespread political<br />
violence would be greatly amplified. Terrorist violence can damage health,<br />
property <strong>and</strong> vehicles; result in interruption to businesses; <strong>and</strong> require<br />
compensation payments to those affected. Insurance is integral to enabling<br />
the reconstruction of buildings after attacks, facilitating payments to the<br />
families of those killed <strong>and</strong> seriously injured, <strong>and</strong> ensuring that victims are<br />
able to access disability benefits <strong>and</strong> other services as quickly as possible.<br />
As a result, the insurance market provides a fundamental component<br />
of resilience in an increasingly interconnected world. 1 Furthermore, the<br />
insurance industry holds a lot of <strong>Big</strong> <strong>Data</strong>, as it is very useful to the market<br />
to underst<strong>and</strong> the composition of claims <strong>and</strong> the spread of insurance, <strong>and</strong><br />
to identify indices that would establish whether an individual is more likely<br />
to be of a higher risk of losses (not exclusively tied to terrorism insurance).<br />
For example, it can be used to look at which areas were not insured or what<br />
claims were made <strong>for</strong> post-traumatic stress disorder following a terrorist<br />
attack. These aspects could be extremely useful in helping to make future<br />
resilience assessments, as they can help to highlight where vulnerabilities<br />
are more likely to occur. Such data held by the insurance industry could<br />
provide a rich vein of in<strong>for</strong>mation to academic, medical <strong>and</strong> governmental<br />
organisations if greater interaction was prioritised.<br />
What Constitutes Political Violence?<br />
The insurance industry has very specific terminology <strong>and</strong> definitions of<br />
what constitutes political violence. First, it segregates political violence into<br />
three components that can be insured individually or in conjunction with<br />
1. Claudia Aradau <strong>and</strong> Rens van Munster, ‘Insuring Terrorism, Assuring Subjects, Ensuring<br />
Normality: The Politics of Risk after 9/11’, Alternatives: Global, Local, Political (Vol. 33,<br />
No. 2, 2008), pp. 191–210.
33<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
others: terrorism or sabotage; strikes, riots <strong>and</strong> civil commotion; <strong>and</strong> war<br />
on l<strong>and</strong>. This scale becomes very important when writing exemptions <strong>and</strong><br />
incorporating this into recovery. Each area offers a distinct challenge to the<br />
insurance industry as a result.<br />
Political violence affects multiple business lines that are vital <strong>for</strong> the insurance<br />
industry. For example, if there was a blast at a theatre in London, the loss of<br />
revenue in the subsequent two years as a result of people not wanting to go<br />
to the theatre out of fear could have a massive effect. The market calls this<br />
contingency insurance <strong>and</strong> as a result the insurance market can be obliged to<br />
pick up the losses. Indeed, a number of studies have identified the material<br />
effect of terrorism on the tourism industry as international travellers are more<br />
likely to avoid perceived high-risk areas. 2 Similarly, business interruption<br />
has proven to be a key driving <strong>for</strong>ce <strong>for</strong> terrorism losses <strong>for</strong> the insurance<br />
industry <strong>and</strong> was the driving <strong>for</strong>ce behind the losses the insurance industry<br />
suffered in the wake of 9/11. 3 The sheer multitude of claims that were paid<br />
by the insurance sector following the 9/11 attacks highlights just how many<br />
disparate areas terrorism can affect within the insurance market.<br />
The overarching factor is that there are many unknowns <strong>and</strong> there<strong>for</strong>e<br />
many risks can occur that can have a harmful effect on the market. This is a<br />
particular challenge at present, when the insurance market is trying to exp<strong>and</strong><br />
into emerging markets such as Asia <strong>and</strong> Africa, where knowledge levels<br />
on these kinds of risks are very limited. Ironically, as the insurance market<br />
penetrates further into emerging markets, it needs to be able to calculate<br />
its own insurance policies, based on an underst<strong>and</strong>ing of the risks likely to<br />
be encountered. Indeed, among the fastest growing insurance markets in<br />
the world, seventeen out of twenty have suffered from either a sustained<br />
terrorism threat or from intensive rioting or civil commotion over the last<br />
ten to fifteen years. 4 There<strong>for</strong>e, as insurance markets grow, the emphasis on<br />
underst<strong>and</strong>ing this risk will grow significantly.<br />
New Approaches to Risk<br />
The greatest driving catalyst <strong>for</strong> the insurance industry in approaching these<br />
challenges was the 1993 Bishopsgate bomb attack by the IRA. That blast,<br />
which killed only one person but caused about £3-billion-worth of damage,<br />
almost crippled the whole sector. 5 Prior to this disaster, the sector had used<br />
2. Sevil Sonmez <strong>and</strong> Alan R Graefe, ‘Influence of Terrorism Risk on Foreign Tourism<br />
Decisions’, Annals of Tourism Research (Vol. 25, No. 1, 1998), pp. 112–44.<br />
3. Dixon, Lloyd <strong>and</strong> Kaganoff.<br />
4. Ernest <strong>and</strong> Young, Waves of Change: The Shifting Insurance L<strong>and</strong>scape in Rapid-Growth Markets,<br />
2014, ,<br />
accessed 19 July 2014.<br />
5. Andrew Silke, ‘Underst<strong>and</strong>ing Terrorism Target Selection’, in A Richards, P Fussey <strong>and</strong><br />
A Silke (eds), Terrorism <strong>and</strong> the Olympics: Major Event <strong>Security</strong> <strong>and</strong> Lessons <strong>for</strong> the<br />
Future (London: Routledge, 2010), pp. 49–66.
34<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
very limited analytical capabilities to quantify the threat: companies bought<br />
insurance without a keen underst<strong>and</strong>ing of the accumulations of risk that<br />
were developing. This is because terrorism operates in a unique manner<br />
compared with traditional perils such as earthquakes, floods <strong>and</strong> hurricanes<br />
that the market is used to dealing with regularly. 6 Terrorism is an intensive,<br />
highly localised threat that requires a keen underst<strong>and</strong>ing of the proximity of<br />
risks to each other <strong>and</strong> the epicentre of a given blast, which is something that<br />
was quite new to the insurance market. As a result, a number of companies<br />
retained large clusters of policies around central London, <strong>and</strong> following the<br />
Bishopsgate bomb attack many were unable to cover the losses stemming<br />
from it. This was exacerbated by 9/11, which cost the industry £22 billion, a<br />
figure which is only going to grow owing to further claims <strong>for</strong> debris inhalation<br />
<strong>and</strong> continuing incapacity <strong>and</strong> post-traumatic stress disorder claims. 7 This is<br />
a severe issue <strong>for</strong> the insurance market.<br />
The Impact on Society<br />
Any attack is terrible, however the misery stemming from an attack can<br />
be compounded greatly if there is no financial restitution <strong>for</strong> individuals<br />
to cover medical bills or the reconstruction of their businesses or homes. 8<br />
If the insurance market thinks political risk <strong>and</strong> terrorism is too risky, it<br />
will not offer insurance against it or it will put exemptions into insurance<br />
policies so that those affected will have to pay <strong>for</strong> damages themselves.<br />
Such exemptions may cover certain types of incidents or certain areas that<br />
are seen as being at high risk, or providing cover <strong>for</strong> these examples may<br />
push up premiums considerably. Insurance companies see this frequently<br />
with the issue of chemical <strong>and</strong> biological weapons: because there are various<br />
unknown factors in this field, insurance companies are reluctant to include<br />
this within their coverage as there are too many uncertainties associated<br />
with such an attack. The resilience challenge is significant, as the lack of<br />
available insurance may be more to do with a lack of analytical capabilities<br />
to assess these risks properly than a genuine inability to calculate likely risks<br />
<strong>and</strong> their impacts.<br />
As a result, it is clear that better interaction with the insurance sector is key<br />
to providing a holistic approach to resilience. Government agencies should<br />
seek to avoid a situation comparable to earthquake cover in Cali<strong>for</strong>nia,<br />
where market penetration has historically been extremely low as the cost of<br />
insurance is prohibitively high <strong>for</strong> most people <strong>and</strong> insurers are reluctant to<br />
6. H Kunreuther, E. Michel-Kerjan <strong>and</strong> B. Porter, Assessing, Managing, <strong>and</strong> Financing<br />
Extreme Events: Dealing with Terrorism (Cambridge, MA: National Bureau of Economic<br />
Research, 2003).<br />
7. Gail Makinen, Economic Effects of 9/11: A Retrospective Assessment (New York, NY:<br />
DIANE Publishing, 2011).<br />
8. R Roth Jr, Earthquake Insurance Protection in Cali<strong>for</strong>nia (Washington, DC: Joseph<br />
Henry Press, 1988).
35<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
be over-exposed within the Cali<strong>for</strong>nia area. In order to avoid this, a number<br />
of governments, including those of the UK, US <strong>and</strong> Germany, have provided<br />
at least partial state backstops against terrorist attacks. 9 However, it is only<br />
through better analysis <strong>and</strong> a grasp of <strong>Big</strong> <strong>Data</strong> that the insurance sector can<br />
truly feel com<strong>for</strong>table with the risk of terrorism.<br />
Incorporating <strong>Big</strong> <strong>Data</strong> into Analysis<br />
In previous years, risk analysis could often be a case of underwriters<br />
assessing the risk based on a preconceived underst<strong>and</strong>ing of political unrest<br />
following an extremely rudimentary analysis. However, over the last twenty<br />
years this has changed considerably. Insurance companies have started<br />
to hire scientists, statisticians <strong>and</strong> security experts who are able to better<br />
incorporate data into analysis. The risks from natural hazards, such as<br />
hurricanes or earthquakes, are now well understood by the market, <strong>and</strong> the<br />
underst<strong>and</strong>ing of political risk needs to reach similar levels in the future.<br />
Political risk is not only a less well-understood subject area, but there are<br />
also many more variables involved as it is a more qualitative subject. In order<br />
to incorporate analysis into political risk calculation, it is necessary to delve<br />
into the historical records as well as simply looking at present-day data.<br />
For example, Aon Benfield’s 2014 Interactive Political Risk Map shows how<br />
different countries’ histories of rioting <strong>and</strong> civil commotion over the past ten<br />
to fifteen years can be mapped out <strong>and</strong> analysed. 10 Analysts need to be able<br />
to identify different patterns <strong>for</strong> different regions, <strong>and</strong> subsequently provide<br />
this in<strong>for</strong>mation to the insurance industry in order to flag up certain areas<br />
that are at greater risk than others. It is very easy to look back <strong>and</strong> decipher<br />
the risks at certain points in history, but the key progression would be to look<br />
<strong>for</strong>ward. A very important part of this research is trying to pull data <strong>for</strong> GDP<br />
statistics <strong>and</strong> mortality <strong>for</strong> regions, <strong>and</strong> to see how those variables fit with<br />
incidences of political violence. Indeed, a number of statistical studies have<br />
shown that key identifiers such as unemployment <strong>and</strong>, in particular, infant<br />
mortality can be keen indicators of political unrest, particularly if there is a<br />
significant statistical switch. 11 Furthermore, historical analysis is extremely<br />
useful to the analysis of terrorism modelling, another area where there is<br />
much variation <strong>and</strong> uncertainty, particularly if plots <strong>and</strong> failed attacks are<br />
included.<br />
9. Alfonso Najera, Terrorism Coverage Schemes: A Comparative Table, 2011, , accessed 12 July 2014.<br />
10. Aon Risk Solutions, ‘Aon’s 2014 Interactive Political Risk Map’, 2014, , accessed 14 July 2014.<br />
11. J A Goldstone et al., ‘A Global Model <strong>for</strong> Forecasting Political Instability’, American<br />
Journal of Political Science (Vol. 54, No. 1, 2010), pp. 190–208.
36<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Extracting Useful <strong>Data</strong> <strong>and</strong> Avoiding Biases in Analysis<br />
Good data do exist if analysts are fully aware of the advantages they offer<br />
<strong>and</strong> how to use them. The Global Terrorism <strong>Data</strong>base 12 <strong>and</strong> the RAND<br />
Corporation 13 are two examples of highly respected organisations that take<br />
data accumulation <strong>and</strong> analysis seriously: both are heavyweight analysts of<br />
terrorism history <strong>and</strong> can provide a wealth of statistical data <strong>for</strong> the security<br />
<strong>and</strong> insurance sector.<br />
However, some of the challenges the insurance industry faces involve more<br />
quantitative data analysis methods. For example, there is a major issue<br />
surrounding the quality of data that the insurance companies themselves<br />
hold; they have vast amounts of data but not all of them are useful,<br />
particularly in emerging markets where geographical data are sparse. This<br />
makes a big difference when attempting to underst<strong>and</strong> the risk; terrorism or<br />
even political violence is often a very localised threat, thus a correspondingly<br />
localised underst<strong>and</strong>ing is needed, from units of measurement to political<br />
parties to currency <strong>and</strong> financial transactions. Similarly, there are a<br />
number of internal constraints holding the industry back from providing a<br />
comprehensive underst<strong>and</strong>ing of the risk. Insurance companies do not share<br />
data with each other <strong>and</strong> as a result it is difficult to get a holistic picture of<br />
the degree of terrorism coverage, or indeed the types of clients taking up<br />
this cover. This is important as such analysis of what is covered, what sort of<br />
policies are in place <strong>and</strong> the size of the clients themselves have a material<br />
impact on the ability of a state to recover following a catastrophic event.<br />
Furthermore, there are many challenges regarding privacy, <strong>and</strong> governments<br />
or companies who are unwilling to provide sufficient data. Even where the<br />
data do exist, it is difficult to know how to sell them to the client to enable<br />
them to be used appropriately.<br />
Conclusions <strong>and</strong> Recommendations<br />
It is a great shame that insurance companies do not tend to have the<br />
means to look at these data analytically; if they did, their analytics would<br />
be substantially better, particularly those on loss history. There are a lot<br />
of data out there on the length of time people are off injured following a<br />
terrorist attack, the effect of post-traumatic stress disorder, the variation per<br />
country or the amount of time it takes <strong>for</strong> a business to recuperate after an<br />
attack. All of this would be extremely useful <strong>for</strong> the academic <strong>and</strong> scientific<br />
communities. The insurance industry would greatly benefit from the opening<br />
up of governments’ empirical data <strong>and</strong> greater involvement of government<br />
on the level of security clearance. The insurance sector is already heavily<br />
regulated on data security, owing to the sensitive financial data that it<br />
holds, so it would not be a giant leap to allow certain key representatives<br />
in the insurance industry clearance to access <strong>and</strong> dissect certain pieces<br />
12. Global Terrorism <strong>Data</strong>base, , accessed 14 July 2014.<br />
13. RAND, , accessed 14 July 2014.
37<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
of classified material. While there may be concern among the public that<br />
private companies can access sensitive data <strong>and</strong> that their private insurance<br />
data could be looked at by the security services, these fears are largely<br />
unfounded. Most data that the insurance industry have are aggregated to a<br />
policy level so personal data about individual names or addresses are usually<br />
unavailable. Similarly, as long as the security services can vet individuals<br />
accessing the data <strong>and</strong> keep the numbers down, additional access to secure<br />
material should not be a significant hindrance.<br />
As a result, the interaction between government <strong>and</strong> the academic <strong>and</strong><br />
insurance sectors could be extremely rewarding. Each sector has processed<br />
a significant amount of data points <strong>and</strong> analysis that is simply unavailable<br />
to the other. Proving analytical in<strong>for</strong>mation about the primary threat levels,<br />
changing dynamics <strong>and</strong> targeting analysis to the insurance sector would<br />
allow the market to avoid the hyperbole that was witnessed in the insurance<br />
market following the 9/11 attacks. While on the other h<strong>and</strong>, in<strong>for</strong>mation on<br />
market coverage, <strong>and</strong> quantification of the time taken to recover from an<br />
attack, whether business interruption, medical rehabilitation or international<br />
comparisons, are all held by the insurance sector, which would be incredibly<br />
useful <strong>for</strong> government <strong>and</strong> academic partners.<br />
<strong>Big</strong> <strong>Data</strong> has begun to play a much more prominent role in the insurance<br />
industry. Whether this will have a positive or negative impact <strong>for</strong> clients is<br />
uncertain, but this has begun to be a more accepted branch of science among<br />
the community. Greater co-operation stimulated between the business <strong>and</strong><br />
academic communities <strong>and</strong> government would enable a greater impact to<br />
be made in this field.<br />
Mark Lynch is Head of Impact Forecasting’s Terrorism <strong>and</strong> Political vVolence<br />
Modelling Team. He has a background in international security analysis <strong>and</strong><br />
counter-terrorism <strong>and</strong> is responsible <strong>for</strong> the composition of <strong>and</strong> academic<br />
input into Impact Forecasting’s human-security catastrophe models, including<br />
terrorism, rioting <strong>and</strong> drug cartel violence. Mark has a Master’s degree in<br />
International <strong>Security</strong> Studies from the Centre <strong>for</strong> the Study of Terrorism<br />
<strong>and</strong> Political Violence <strong>and</strong> he previously worked in the Royal United Services<br />
Institute’s National <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong> Department. He has also worked<br />
with the London School of Economics, analysing violent manifestations of<br />
nationalism <strong>and</strong> has been published on the changing nature of nationalist<br />
<strong>and</strong> Islamic fundamentalist terrorism in the twenty-first century.
VI: Intelligent Use of Electronic <strong>Data</strong> to Enhance<br />
Public Health Surveillance<br />
Edward Velasco<br />
The exchange of health in<strong>for</strong>mation on social media <strong>and</strong> the Internet would<br />
appear to offer obvious opportunities to gain insight into emerging disease<br />
outbreaks. With new initiatives like Google Flu Trends 1 <strong>and</strong> HealthMap, 2 there<br />
are now more ways than ever be<strong>for</strong>e to monitor outbreaks. This paper will<br />
explore the opportunities these initiatives offer to public health practitioners<br />
trying to detect emerging diseases in their regions.<br />
From an epidemiological perspective, there are obvious advantages to<br />
decreasing the time needed to detect an infectious disease health event,<br />
so that appropriate prevention or mitigation measures can be undertaken<br />
as quickly as possible. The existing types of public health surveillance<br />
systems are indicator-based <strong>and</strong> event-based surveillance. Indicator-based<br />
surveillance, the oldest <strong>and</strong> most commonly found system, is widely used by<br />
regional, national <strong>and</strong> international public health agencies. These systems<br />
are designed to collect <strong>and</strong> analyse structured data, based on protocols<br />
tailored to each disease, including calculating the incidence, seasonality <strong>and</strong><br />
burden of disease. Their goal is to find increased numbers or clusters that<br />
might indicate a threat. There is generally a time lag between the occurrence<br />
of an event <strong>and</strong> its detection by indicator-based surveillance, however; these<br />
systems lack the ability to detect potential threats more quickly. In addition,<br />
they are not equipped to detect new or unexpected disease occurrences<br />
because they only collect predefined epidemiological attributes <strong>for</strong> each<br />
disease. This is why the first cases of Severe Acute Respiratory Syndrome<br />
Coronavirus (SARS-CoV) in Asia, <strong>for</strong> example, a new strain of viral infection,<br />
were not detected sooner. 3<br />
Instead of relying on official reports, event-based surveillance in<strong>for</strong>mation<br />
is obtained directly from witnesses of real-time events or indirectly from a<br />
variety of communication channels, including social media <strong>and</strong> established<br />
alert systems, as well as from in<strong>for</strong>mation channels such as the news<br />
media, public health networks <strong>and</strong> NGOs. Because it occurs in ‘real time’,<br />
event-based surveillance can identify events faster than indicator-based<br />
surveillance <strong>and</strong> can identify new events that will not be picked up by<br />
indicator-based surveillance. Health in<strong>for</strong>mation monitored via the Internet<br />
<strong>and</strong> social media is an important part of event-based surveillance, <strong>and</strong> is<br />
1. Google Flu Trends, , accessed 14 July 2014.<br />
2. Healthmap, , accessed 14 July 2014.<br />
3. C Castillo-Delgado, ‘Trends <strong>and</strong> Directions of Global Public Health Surveillance’,<br />
Epidemiologic Reviews (Vol. 32, No. 1, 2010), pp. 93–109.
39<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
most often the focus of existing event-based surveillance systems. Research<br />
has shown that event-based surveillance identifies trends comparable to<br />
those found using established indicator-based surveillance methods. 4 In<br />
practice, however, event-based surveillance systems have not been widely<br />
accepted <strong>and</strong> integrated into mainstream use by national <strong>and</strong> international<br />
health authorities, mainly because they have not yet been systematically<br />
evaluated within a public health agency.<br />
From 2010 to 2012, the Robert Koch Institute, the national public health<br />
agency of Germany, participated in a multidisciplinary scientific consortium<br />
to develop novel methods <strong>for</strong> an event-based surveillance tool to be<br />
integrated into infectious disease monitoring alongside other national<br />
surveillance activities. The multinational team produced a web-based<br />
plat<strong>for</strong>m (the Medical Ecosystem or M-Eco) 5 to develop technologies that<br />
are new to event-based surveillance (<strong>and</strong> have not yet been featured in<br />
existing systems, as also evidenced by a literature review that was previously<br />
conducted). These include content analysis using enhanced data processing<br />
(including stemming, POS 6 tagging <strong>and</strong> named entity recognition) <strong>and</strong> data<br />
collection from various user-generated content resources – including socialmedia<br />
content (such as Twitter) <strong>and</strong> radio <strong>and</strong> TV media transmissions<br />
(transcripts provided by a special media service).<br />
Detection mechanisms were developed to scan the Internet continuously<br />
<strong>for</strong> these media types, based on the simple semantic (disease names <strong>and</strong><br />
symptoms) <strong>and</strong> statistically relevant (search algorithms) epidemiological<br />
requirements that were deemed critical <strong>for</strong> the surveillance of different<br />
infectious diseases. Development of these functionalities resulted in a<br />
‘search function’ on a web-based user interface that enabled epidemiologists<br />
to monitor ‘mentions’ of diseases <strong>and</strong> symptoms on Twitter <strong>and</strong> news media<br />
(fed via a news aggregate technology) over time, geo-located where possible<br />
to enable comparison with other sources of epidemiological in<strong>for</strong>mation,<br />
including st<strong>and</strong>ard governmental infectious disease surveillance <strong>and</strong><br />
monitoring.<br />
Automated technologies provided signals <strong>for</strong> the risk assessment of<br />
infectious disease events to public health epidemiologists in a user-friendly,<br />
rapid <strong>and</strong> easy way. Lastly, policy concerns regarding the integration of<br />
the developed technologies <strong>for</strong> existing public health infectious disease<br />
4. S Doan et al., ‘Global Health Monitor – A Web-Based System <strong>for</strong> Detecting <strong>and</strong><br />
Mapping Infectious Diseases’, 2007; D M Hartley et al., ‘An Overview of Internet<br />
Biosurveillance’, Clinical Microbiology <strong>and</strong> Infection (Vol. 19, No. 11, 2013), pp. 1006–<br />
13; J P Linge et al., ‘Internet Surveillance Systems <strong>for</strong> Early Alerting of Health Threats’,<br />
Euro Surveillance: European Communicable Disease Bulletin (Vol. 14, No. 13, 2009).<br />
5. M-Eco, , accessed 14 July 2014.<br />
6. Part of speech tagging. An explanation of this process can be found at , accessed 14 July 2014.
40<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
surveillance infrastructures were explored. Integrating these technologies<br />
into the surveillance software of the German national health institute<br />
was a goal, with the potential to scale up to other countries based on this<br />
experience.<br />
Figure 2: Components <strong>and</strong> processing pipeline of the M-Eco system.<br />
Evaluation of the Prototype System <strong>for</strong> Event-Based Surveillance<br />
The first of a series of three evaluations attempted to illustrate how well<br />
the system generates signals <strong>for</strong> potential events of public health interest.<br />
A simulation with Twitter was conceived, where thirteen scientists created<br />
tweets <strong>for</strong> three mock infectious disease event scenarios within the<br />
simulation:<br />
• An outbreak of measles in a local school<br />
• An outbreak of Salmonellosis among attendees of a European football<br />
championship<br />
• Cases of hepatitis A appearing in travellers returning to Germany<br />
from North Africa.
41<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
The mock tweets were fed into the M-Eco technology (Figure 2), which<br />
combined them with real-world tweets that were taken from real users of<br />
Twitter <strong>and</strong> subsequently analysed. There were fewer retrieved signals that<br />
referred to true outbreaks than expected: only around a third (31 per cent)<br />
were relevant, compared with the 75–80 per cent expected by evaluators.<br />
While it is difficult to say whether or not this is because of the lack of actual<br />
tweets matching the three scenarios, the main assumption from these results<br />
was that the chosen keywords <strong>for</strong> identification via search <strong>and</strong> screening<br />
algorithms were not comprehensive enough. This could be because many<br />
tweets are written in a vernacular that does not match the <strong>for</strong>mal medical<br />
terms used in the preset keyword lists used in the automated analyses.<br />
A subsequent evaluation tested the hypothesis that the M-Eco technology<br />
could produce viable signals in real time during a large mass-gathering event.<br />
This assumption was tested during the European Football Championship,<br />
which took place in Pol<strong>and</strong> <strong>and</strong> Ukraine in 2012.<br />
Signals were provided to ‘subscribers’, also known as epidemiologists, at<br />
the Robert Koch Institute, <strong>and</strong> one state public health agency in Saxony,<br />
Germany, which received daily deliveries of signals that they then monitored<br />
<strong>for</strong> relevance alongside regular work. As in previous evaluation ef<strong>for</strong>ts,<br />
a lower number of signals were produced than was expected: only an<br />
average of twenty signals on average per day. There were 242 signals during<br />
the event <strong>and</strong>, of these, only thirteen were relevant over the event time.<br />
Similar problems with keywords or terms were recorded, such as the use of<br />
vernacular or the off-use of terms, <strong>for</strong> example ‘football fever’, ‘weakness’ of<br />
players’ ability or ‘headache’ from watching poor per<strong>for</strong>mance.<br />
An additional evaluation was completed over three weeks in order to measure<br />
the appropriateness of the developed system <strong>for</strong> daily epidemiological<br />
monitoring of infectious diseases <strong>and</strong> related symptoms relevant <strong>for</strong><br />
Germany, using the M-Eco search interface. The evaluation exercise was<br />
based on criteria <strong>for</strong> inclusion <strong>and</strong> exclusion. Diseases that were deemed<br />
to be more prevalent in Germany were chosen (rarely occurring tropical<br />
diseases were not searched, <strong>for</strong> instance). Additionally, priority was given to<br />
those diseases <strong>and</strong> symptoms likely to be discussed in the general population<br />
via social media (because of popularity <strong>and</strong> general ubiquity) or those<br />
less likely to induce social stigma. Diseases that were deemed seasonally<br />
irrelevant <strong>for</strong> the time period, such as Tick-Borne Encephalitis (TBE), which<br />
primarily occurs in the summer months, were excluded. Other diseases were<br />
excluded because they occur so rarely that experts found a high likelihood<br />
of them remaining unmentioned on social media, <strong>for</strong> example Q-Fever, or<br />
because of their uncommon prevalence, or a faster or more severe onset of<br />
disease (<strong>and</strong> there<strong>for</strong>e higher likelihood to be detected by other sources) in<br />
Germany, <strong>for</strong> example Hemorrhagic (West Nile) Fever or tuberculosis.
42<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Search terms were entered into the M-Eco search function. Each search<br />
term was allocated to one of four epidemiologists, <strong>and</strong> evaluators<br />
monitored their terms daily with regards to the number of resulting signals<br />
(matching the search term <strong>and</strong> defined by location – Germany); whether<br />
there was an indication of larger events (an outbreak); whether signals<br />
were relevant to their work; <strong>and</strong> whether the search results were found<br />
in another epidemiological surveillance source. Evaluators also provided<br />
qualitative feedback on their experience during the evaluation, focusing on<br />
the integration of such monitoring into their regular epidemiological work,<br />
general feedback <strong>and</strong> any required improvements.<br />
Signals came in primarily indicating influenza or flu. When graphed over time<br />
by date, it was clear that two large waves of signals came up <strong>for</strong> ‘flu’ <strong>and</strong><br />
coincided with media coverage of ‘flu shots’, namely the fact that Germany had<br />
been experiencing a shortage of vaccination coverage. Another interesting<br />
dip, where no signals at all appeared, coincided with the weekend, indicating<br />
that tweets may have patterns that correspond to days of the week.<br />
Figure 3: Monitored signals <strong>for</strong> the search terms in the M-Eco search<br />
function over time.<br />
Note: Black line shows the trend <strong>for</strong> all signals.<br />
The evaluation showed that search terms used primarily by medical<br />
professionals were most prevalent, indicating that more signals might be<br />
derived from the media or reports. A hypothesis made from this finding was<br />
that tweets were mainly from media sources <strong>and</strong> that media tended to break<br />
off at the weekend.
43<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Qualitative evaluation indicated overall acceptance of the concept <strong>for</strong> the<br />
system. Evaluators generally found it impressive that tweets about health can<br />
be monitored. They appreciated that the system provided them with signals<br />
based on aggregated social-media sources, <strong>and</strong> there<strong>for</strong>e allowed <strong>for</strong> easier<br />
<strong>and</strong> faster monitoring of many social-media sources in one place – something<br />
that could not be processed manually within a limited amount of time.<br />
General Discussion Points<br />
The experience with Twitter shows that the total number of signals retrieved<br />
by the prototype was smaller than initially expected throughout evaluation.<br />
This could be due to a smaller overall number of German-language tweets;<br />
social media has been shown to be dominated by English-language users<br />
(many of these in the US), which would result in fewer social-media documents<br />
<strong>and</strong> signals in the German language. 7 Additionally, there could be a perceived<br />
social stigma associated with certain terms <strong>for</strong> diseases or symptoms that<br />
yielded fewer results. 8 This was possibly evident by looking at the retrieved<br />
signals related to flu-related versus gastroenteritis-related symptoms (see<br />
Figure 3). When talking about headache, fever or flu, there may not be such<br />
social stigma, but gastrointestinal diseases, although sometimes mentioned<br />
by the media during large outbreaks, are not necessarily those illnesses<br />
fervently discussed in social media. One is more likely to speak publicly<br />
about a ‘headache’ than about ‘bloody diarrhoea’. Not surprisingly, words<br />
indicating gastrointestinal diseases <strong>and</strong> related symptoms were not very<br />
common in the social-media content retrieved in this evaluation. This is in<br />
contrast to the official notification system in Germany, where gastrointestinal<br />
diseases play an important role compared to flu-like illnesses.<br />
Although the results suggest that Twitter is a useful source of additional<br />
in<strong>for</strong>mation, the difference between media reports <strong>and</strong> personal reports<br />
remains a significant issue. Reports that originate in news media are easier<br />
to retrieve as they tend to reflect language <strong>and</strong> keywords that accurately<br />
mirror health <strong>and</strong> medical terminology. They are also more likely to refer to<br />
outbreaks. Personal reports are hard to detect. Two groups of tweets written<br />
by individuals were identified: those in which people refer to media reports<br />
<strong>and</strong> those in which people refer to a health status (<strong>for</strong> example, a tweet<br />
with content on ‘own health status’ or someone related to it – perhaps, a<br />
joke about health symptoms). The research suggests that people are much<br />
more likely to exchange in<strong>for</strong>mation about less-serious health conditions like<br />
tiredness or nausea than about more serious conditions. A person will share<br />
the fact that they have a headache, <strong>for</strong> example, but not that their recent<br />
cancer diagnosis <strong>and</strong> accompanying antibiotics cause severe diarrhoea.<br />
7. T Webster, Twitter Usage in America: 2010, 2010.<br />
8. T H A Correa <strong>and</strong> H Zuniga, ‘Who Interacts on the Web? The Intersection of Users’<br />
Personality <strong>and</strong> Social Media Use’, Computers in Human Behavior (Vol. 26, No. 7,<br />
2010).
44<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Additionally, it is difficult <strong>for</strong> the system to detect such reports, as they do<br />
not often contain recognisable health or medical terms, but rather content<br />
that includes paraphrases <strong>and</strong> variable language, such as slang (<strong>for</strong> example,<br />
‘the squirts’ versus ‘diarrhoea’) or abbreviations that may include alternative<br />
spellings (e.g. ‘shot 2 l8 4 flu’ versus ‘flu-shot too late <strong>for</strong> flu season’). More<br />
research into language in social-media use is needed be<strong>for</strong>e text-mining <strong>for</strong><br />
infectious disease-relevant in<strong>for</strong>mation can be best technically addressed. 9 It<br />
will be a continuing task to improve the algorithms to better match a constantly<br />
changing media l<strong>and</strong>scape, <strong>and</strong> the language <strong>and</strong> socio-cultural h<strong>and</strong>ling of<br />
social media by the populations whose health needs to be monitored. In other<br />
research, technologies developed to deal with these issues rely on spurious<br />
correlations, leaving keyword-based methods vulnerable to false alarms. 10<br />
Place <strong>and</strong> location are a critical part of outbreak detection, early warning<br />
<strong>and</strong> epidemiological work, <strong>and</strong> will be of increasing utility to health scientists<br />
wanting to monitor diseases using social media. Throughout the development<br />
<strong>and</strong> use of the prototype, geo-location has been a difficult component to<br />
analyse, because of a lack of data. When using Twitter as a data source,<br />
geo-location is not always included in user profiles, <strong>and</strong> users do not always<br />
disclose a location in the content of their tweets. Colleagues looking to geolocation-stamping<br />
have tried ways to analyse textual in<strong>for</strong>mation from the<br />
content of social media in order to provide in<strong>for</strong>mation on location, <strong>and</strong> such<br />
statistical learning frameworks seem to be successful, but can introduce a<br />
high level of complexity. 11 Chalenkha <strong>and</strong> Collier examined geo-encoding of<br />
outbreak reports with more detailed granularity, but found the encoding of<br />
health in<strong>for</strong>mation from reports time-consuming <strong>and</strong> expensive. Automated<br />
systems tend to leave out too much in<strong>for</strong>mation. As a solution, the authors<br />
propose a scheme called ‘spaciotemporal zoning’, in which they analyse<br />
events reported in sources with regard to temporal in<strong>for</strong>mation as a means<br />
to mitigate the limitations of current report-based surveillance systems. 12<br />
Sensitivity <strong>and</strong> specificity remain tough factors in the process of signal generation.<br />
It is essential that enough data are captured, so that important in<strong>for</strong>mation is not<br />
9. N Collier et al., ‘A Multilingual Ontology <strong>for</strong> Infectious Disease Surveillance: Rationale, Design<br />
<strong>and</strong> Challenges’, Language Ressources <strong>and</strong> Evaluation (Vol. 40, No. 3/4, 2006), pp. 405–13; M<br />
Conway et al., ‘Classifying Disease Outbreak Reports Using N-grams <strong>and</strong> Semantic Features’,<br />
International Journal of Medical In<strong>for</strong>matics (Vol. 78, No. 12, 2009), pp. e47–e58.<br />
10. A Culotta, ‘Detecting Influenza Outbreaks by Analyzing Twitter Messages’, 2010,<br />
, accessed 20 August 2014.<br />
11. V Lampos <strong>and</strong> N Christianini, ‘Nowcasting Events from the Social Web with Statistical<br />
Learning’, ACM Transactions on Intelligent Systems <strong>and</strong> Technology, 2011.<br />
12. H Chanlekha <strong>and</strong> N Collier, ‘A Methodology to Enhance Spatial Underst<strong>and</strong>ing of Disease<br />
Outbreak Events Reported in News Articles’, International Journal of Medical In<strong>for</strong>matics<br />
(Vol. 79, No. 4, 2010), pp. 284–96; H Chanlekha, A Kawazoe <strong>and</strong> N Collier, ‘A Framework<br />
<strong>for</strong> Enhancing Spatial <strong>and</strong> Temporal Granularity in Report-Based Health Surveillance<br />
Systems’, BMC Medical In<strong>for</strong>matics <strong>and</strong> Decision Making (Vol. 10, No. 1, 2010).
45<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
overlooked, but simultaneously that not too much is presented to the user, so as<br />
not to overwhelm. The work suggests that signals are only relevant if the personal<br />
tweet mentions an actual outbreak, but this could be limited by existing knowledge<br />
of outbreaks. Signals are sometimes generated by cognates of disease names or<br />
symptoms, or words that sound like disease names or symptoms. 13 As mentioned<br />
above, there are various linguistic aspects that must be constantly improved.<br />
Despite the a<strong>for</strong>ementioned technical challenges, the results of this prototype<br />
evaluation indicated that social media (Twitter) should not be ruled out <strong>for</strong><br />
infectious disease surveillance. Although it has not yet been possible, it<br />
would be ideal to integrate this work alongside indicator-based surveillance<br />
ef<strong>for</strong>ts over an extended timeframe to give a better sense of the true value<br />
as events arise in real time. Further evaluations in the future are needed in<br />
order to measure a true epidemiological impact over time <strong>and</strong> in context.<br />
The experience with the M-Eco prototype provided a means to look further<br />
behind the systematic acquisition <strong>and</strong> processing of social-media data in<br />
health monitoring. Depending on the content available in social media, health<br />
officials can receive in<strong>for</strong>mation about potential health threats earlier or they<br />
can receive additional in<strong>for</strong>mation on health threats already detected by<br />
another system. The M-Eco prototype has been designed to offer automated<br />
methods <strong>and</strong> technologies to rapidly provide signals <strong>for</strong> the detection of<br />
infectious disease events. 14 This is challenging, <strong>and</strong> more time is needed to<br />
explore ways to evaluate such a system <strong>and</strong> the resulting signals over a longer<br />
period. Previous evaluations of event-based surveillance systems have been<br />
completed only to a limited extent, <strong>and</strong> there are very few examples to draw<br />
from. 15 In addition to speeding up the detection process through bypassing<br />
traditional indicator-based surveillance structures, event-based surveillance<br />
can also provide innovation in settings where weak or underdeveloped<br />
13. R Steinberger et al., ‘Text Mining from the Web <strong>for</strong> Medical Intelligence’, in F<br />
Fogelman-Soulié et al. (eds), Mining Massive <strong>Data</strong> Sets <strong>for</strong> <strong>Security</strong> (Amsterdam: IOS<br />
Press; 2008), pp. 295–310; R. Yangarber, R Steinberger et al., ‘Combining In<strong>for</strong>mation<br />
Retrieval <strong>and</strong> In<strong>for</strong>mation Extraction <strong>for</strong> Medical Intelligence’, Proceedings of Mining<br />
Massive Dara Sets <strong>for</strong> <strong>Security</strong> NATO Advanced Study Institute Gazzada, Italy, 2007.<br />
14. G Eysenbach, ‘Medicine 2.0: Social Networking, Collaboration, Participation,<br />
Apomediation, <strong>and</strong> Openness’, Journal of Medical Internet Research (Vol. 10, No.<br />
3, 2008), p. e22; G Eysenbach, ‘Infodemiology <strong>and</strong> Infoveillance: Framework <strong>for</strong> an<br />
Emerging Set of Public Health In<strong>for</strong>matics Methods to Analyze Search, Communication<br />
<strong>and</strong> Publication Behavior on the Internet’, Journal of Medical Internet Research (Vol.<br />
11, No. 1, 2009), p. e11; T W Grein et al., ‘Rumors of Disease in the Global Village:<br />
Outbreak Verification’, Emerging Infectious Diseases (Vol. 6, No. 2, 2000), pp. 97–102;<br />
M Keller et al., ‘Use of Unstructured Event-based Reports <strong>for</strong> Global Infectious Disease<br />
Surveillance’, Emerging Infectious Diseases (Vol. 15, No. 5, 2009), pp. 689–95.<br />
15. J S Brownstein <strong>and</strong> C C Friefeld, Evaluation of Internet-Based In<strong>for</strong>mal Surveillance<br />
<strong>for</strong> Global Infectious Disease Intelligence, 2008; J S Brownstein, C C Freifeld, B Y Reis<br />
<strong>and</strong> K D M<strong>and</strong>le, Evaluation of Online Media Reports <strong>for</strong> Global Infectious Disease<br />
Intelligence, 2007.
46<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
surveillance systems are in place. Currently, several developing countries face<br />
such realities, <strong>and</strong> since socioeconomic disparities <strong>and</strong> poor or insufficient<br />
surveillance infrastructures often have broader consequences in the event of<br />
an outbreak, the potential gain is worth exploring. In these contexts that share<br />
a larger burden than most, the development of surveillance that can access<br />
health in<strong>for</strong>mation in the absence of traditional surveillance institutions<br />
could be critical to the early detection <strong>and</strong> prevention of infectious disease<br />
at the earliest stage to prevent an epidemic outbreak or reduce its impact.<br />
Recent work has begun in this area in order to seek in<strong>for</strong>mation on health<br />
threats using mobile-phone technology, Internet scanning tools, e-mail<br />
distribution lists or networks that complement the early warning function of<br />
routine surveillance systems. 16 The research has shown that the majority of<br />
existing event-based surveillance systems are situated in North America <strong>and</strong><br />
Europe. Local, event-based systems to monitor epidemic threats in Africa,<br />
Asia, Oceania <strong>and</strong> South America are scarce. Guidance <strong>and</strong> training to create<br />
such systems on the ground should be considered, <strong>and</strong> can lead to a faster<br />
assessment of arising health threats <strong>and</strong> improved rapid response by local<br />
authorities.<br />
Edward Velasco is a senior scientist at the Robert Koch Institute, the<br />
national public health agency of Germany. He provides scientific advising<br />
<strong>and</strong> technical support in the Division of Healthcare-Associated Infections,<br />
Antimicrobial Resistance <strong>and</strong> Consumption, including outbreak management<br />
<strong>and</strong> research on clinical <strong>and</strong> social risk factors <strong>for</strong> antimicrobial resistance.<br />
He has widespread experience in infectious disease epidemiology <strong>and</strong> has<br />
consulted with the European Centre <strong>for</strong> Disease Prevention <strong>and</strong> Control on<br />
quality evaluation <strong>for</strong> surveillance systems in EU member states. He has<br />
held positions in evaluation at the Open Society Foundation, London <strong>and</strong><br />
the Social Science Research Centre, Berlin. He has a doctorate in medical<br />
sciences from Charité University Hospital, the joint medical school of the<br />
Humboldt <strong>and</strong> Free Universities of Berlin, <strong>and</strong> a Master of Science in Social<br />
Epidemiology from the Harvard School of Public Health. He can be reached<br />
on VelascoE@rki.de.<br />
16. J P Chretien <strong>and</strong> S H Lewis, ‘Electronic Public Health Surveillance in Developing<br />
Settings: Meeting Summary’, BMC Proceedings (Vol. 2, Suppl. 3, 2008), p. S1; J P<br />
Chretien et al., ‘Syndromic Surveillance: Adapting Innovations to Developing Settings’,<br />
PLoS Medicine (Vol. 5, No. 3, 2008), p. e72; C Robertson et al., ‘Mobile Phone-Based<br />
Infectious Disease Surveillance System, Sri Lanka’, Emerging Infectious Diseases (Vol.<br />
16, No. 10, 2010), pp. 1524–31.
VII: The Raxibacumab Experience: The First Novel<br />
Product Approved Under the US Food <strong>and</strong> Drug<br />
Administration ‘Animal Rule’<br />
Chia-Wei Tsai<br />
The analysis of data – big <strong>and</strong> small – is central to the US government’s<br />
approach to establishing requirements <strong>and</strong> procurement goals <strong>for</strong> medical<br />
countermeasures <strong>for</strong> chemical, biological, radiological <strong>and</strong> nuclear events 1<br />
<strong>and</strong> their approval or licensure by the Food <strong>and</strong> Drug Administration (FDA). 2<br />
Multiple scenarios with a wide variety of variables, such as location, time of<br />
year <strong>and</strong> weather conditions, are analysed to project the potential impact<br />
on humans, animals <strong>and</strong> commerce. Additional analysis is carried out to<br />
identify gaps in resources that are needed versus those that are available,<br />
<strong>and</strong> on their ability to be used successfully, including the logistics that are<br />
affected by the emergency. It is the policy of the US government to seek<br />
FDA approval or licensure <strong>for</strong> these medical countermeasures while they<br />
are being developed or stockpiled. The efficacy of many of these products<br />
cannot ethically be evaluated in humans <strong>and</strong> there<strong>for</strong>e their regulatory path<br />
relies on the Animal Rule. 3 This requires the demonstration of efficacy in<br />
animal models followed by the demonstration of safety in human trials, <strong>and</strong><br />
the development of a pharmokinetic bridging study – which establishes the<br />
safe <strong>and</strong> appropriate human dose – between the two. All of this is based<br />
on the statistical analysis of <strong>Big</strong> <strong>Data</strong> from non-clinical <strong>and</strong> clinical studies.<br />
The successful application of these principles has been demonstrated in the<br />
development of the anthrax antitoxin raxibacumab.<br />
Anthrax Antitoxin Requirement<br />
In 2004, anthrax was determined by the US secretary of Homel<strong>and</strong> <strong>Security</strong><br />
to present ‘a material threat against the US population sufficient to affect<br />
national security’. 4 Thus, the US government has established an integrated<br />
anthrax response strategy that includes antitoxins, antibacterials <strong>and</strong><br />
1. National Strategy <strong>for</strong> Countering Biological Threats, , accessed 19<br />
August 2014.<br />
2. Medical Countermeasures Initiative Strategic Plan 2012–2016, ,<br />
accessed 19 August 2014.<br />
3. Food <strong>and</strong> Drug Administration, Animal Rule Summary, ,<br />
accessed 15 July 2014.<br />
4. Taking Measure of Countermeasures (Part 1), ,<br />
accessed 19 August 2014.
48<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
vaccines. 5 In 2008, the Enterprise Executive Committee 6 of the Public Health<br />
Emergency Medical Countermeasures Enterprise approved a scenario-based<br />
requirement <strong>for</strong> anthrax antitoxins. 7 That requirement was established<br />
from an assessment of high-consequence scenarios involving the exposure<br />
of a single major metropolitan area to a defined amount of anthrax spores<br />
through computer modelling <strong>and</strong> simulation. Exposure modelling involved<br />
very large data sets related to spore dispersion <strong>and</strong> fate, transport modelling,<br />
infection modelling involving analysis of human outbreak data from the<br />
Sverdlovsk incident that occurred in Russia in April 1979 (in which spores<br />
of anthrax were accidentally released from a military facility, resulting in an<br />
estimated one hundred deaths), <strong>and</strong> from non-human primate experimental<br />
exposures.<br />
In order to better determine how much antitoxin should be procured,<br />
the Analytical Decision Support Division of the US Biomedical Advanced<br />
Research <strong>and</strong> Development Authority (BARDA), conducted a preparedness<br />
analysis that included two different approaches to the analysis of very large<br />
data sets, which included a wide variety of parameters. In the first analysis,<br />
a fixed percentage approach was taken; in the second analysis, a populationdensity<br />
approach was taken. These two approaches both concluded that a<br />
similar level of coverage of all metropolitan statistical areas (cities) in the US<br />
was achievable, allowing a procurement goal to be established. This level of<br />
preparedness represents approximately the maximum amount of product<br />
that can be manufactured with existing capabilities, <strong>and</strong> a reasonable cost–<br />
benefit ratio based on existing funding <strong>and</strong> drug costs.<br />
With in<strong>for</strong>mation from this analysis, a meeting was held in Seattle to discuss<br />
antitoxin use <strong>and</strong> logistical issues with state <strong>and</strong> local end-users. The group<br />
included policy-makers, planners, physicians, nurses, emergency responders<br />
<strong>and</strong> first responders. The objective was to identify the parameters this group<br />
felt were important to the analysis of response capabilities. This qualitative<br />
input allowed weight factors to be established <strong>for</strong> parameters identified<br />
as critical in the subsequent quantitative analysis. The participants were<br />
in<strong>for</strong>med about the antitoxins that are currently available in the Strategic<br />
National Stockpile <strong>and</strong> given an opportunity to discuss their use in masscasualty<br />
scenarios. Several important issues were raised, but the participants<br />
all agreed that antitoxins would be an important component of the response<br />
to anthrax events. The results of this <strong>for</strong>um are currently being used to build<br />
medical countermeasures distribution <strong>and</strong> dispensing models that can be<br />
5. HHS PHEMCE Strategy <strong>and</strong> Implementation Plan, 2012, , accessed 19 August 2014.<br />
6. PHEMCE Governance, , accessed 19 August 2014.<br />
7. PHEMCE Mission Components, , accessed 19 August 2014.
49<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
used to predict the medical outcomes in mass-casualty events. These models<br />
consider two approaches to surveillance <strong>and</strong> the initiation of an emergency<br />
response: detection through the BioWatch system <strong>and</strong> index clinical cases. 8<br />
At the meeting, officials of the Office of the Assistant Secretary <strong>for</strong><br />
Preparedness <strong>and</strong> Response (ASPR) described the Department of Health <strong>and</strong><br />
Human Services (HHS) Medical Countermeasure Strategy, 9 the acquisition<br />
process <strong>and</strong> BARDA’s role. An outside consultant described the lessons<br />
learned in the treatment of the victims of the 2001 anthrax attacks. ASPR<br />
provided product background on the two antitoxins available <strong>for</strong> use in<br />
treating anthrax exposure, raxibacumab <strong>and</strong> anthrax immune globulin<br />
intravenous. The role of government organisations <strong>and</strong> the private sector in<br />
the distribution <strong>and</strong> dispensing of medical countermeasures was discussed,<br />
including the need <strong>for</strong> large amounts of ancillary supplies to administer these<br />
medical countermeasures.<br />
The results of the preparedness analysis <strong>and</strong> the meeting with state <strong>and</strong><br />
local end-users were used to in<strong>for</strong>m decisions regarding the prioritisation of<br />
treatment geographically <strong>and</strong> at the patient level. The Centers <strong>for</strong> Disease<br />
Control <strong>and</strong> Prevention organised the first meeting of the Clinical Utilization<br />
Plan <strong>for</strong> Anthrax Countermeasures in a Mass Event Setting (CUPAC). 10 Using<br />
a ‘best practices’ approach, the CUPAC focused on patients with clinical<br />
signs <strong>and</strong> symptoms of anthrax presenting at health-care centres following a<br />
large-scale bioterrorism event associated with wild-type Bacillus anthracis.<br />
The goal of the CUPAC is to create strategies to triage <strong>and</strong> care <strong>for</strong> large<br />
numbers of patients effectively <strong>and</strong> to create a scalable prioritisation scheme<br />
<strong>for</strong> the use of medical countermeasures. Working with a Federal Steering<br />
Committee <strong>and</strong> the National Association of County <strong>and</strong> City Health Officials,<br />
a systematic review was conducted by the Triage <strong>and</strong> Critical Care Working<br />
Group <strong>and</strong> the Medical Countermeasure Working Group. In gathering <strong>and</strong><br />
analysing data <strong>and</strong> drafting preliminary recommendations, the CUPAC<br />
working groups are considering questions about the prioritisation of<br />
antitoxins, <strong>and</strong> the prioritisation <strong>and</strong> duration of antibacterials, triage <strong>and</strong><br />
critical care. The analysis includes large data sets from non-clinical studies<br />
as well as clinical data from the use of antitoxins to treat anthrax cases that<br />
occurred in 2009 <strong>and</strong> 2010 in Scotl<strong>and</strong>, UK. Again, the analysis of <strong>Big</strong> <strong>Data</strong><br />
with diverse parameters is playing a central role in the development of these<br />
guidelines.<br />
8. Department of Homel<strong>and</strong> <strong>Security</strong>, Homel<strong>and</strong> <strong>Security</strong> BioWatch programme,<br />
, accessed 14 July 2014.<br />
9. Public Health Emergency Medical Countermeasures (PHEMCE) Strategy, 2012,<br />
,<br />
accessed 19 August 2014.<br />
10. Conference Report on Public Health <strong>and</strong> Clinical Guidelines <strong>for</strong> Anthrax, , accessed 19 August 2014.
50<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Development of Raxibacumab <strong>for</strong> Anthrax Treatment<br />
Since 2007, an inventory of treatment courses of antitoxins, including<br />
raxibacumab, a monoclonal antibody targeting the protective antigen of<br />
Bacillus anthracis, has been available 11 in the SNS through the Project<br />
BioShield contracts awarded in 2005. 12 Raxibacumab is a monoclonal<br />
antibody antitoxin against the protective antigen of Bacillus anthracis <strong>for</strong><br />
the treatment of inhalational anthrax. Its efficacy has been demonstrated in<br />
multiple animal trials as a monotherapy <strong>and</strong> in combination with antibiotics.<br />
Its safety has been demonstrated in healthy adults through large clinical<br />
trials <strong>and</strong> statistical analysis of results from those trials.<br />
The development of raxibacumab is the result of a co-ordinated response to<br />
a recognised public bioterrorism threat <strong>and</strong> the US Government’s request <strong>for</strong><br />
medical countermeasures to treat inhalational anthrax. Following the anthrax<br />
attacks in 2001, over 30,000 people with suspected exposures initiated<br />
antimicrobial prophylaxis. Eleven people developed inhalational anthrax, <strong>and</strong><br />
despite the best available treatment, five of them died. All subjects received<br />
at least two antibiotics <strong>and</strong> some received as many as seven. Antibiotics<br />
alone were insufficient to treat subjects who had developed anthrax. While<br />
antibiotics can overcome blood infections caused by anthrax, they do not<br />
directly address the presence of toxins that drives the development of the<br />
disease. Anthrax toxin is responsible <strong>for</strong> most morbidity (illnesses) <strong>and</strong><br />
mortality (deaths) associated with anthrax.<br />
In humans <strong>and</strong> animals inhalational anthrax occurs following inhalation<br />
of Bacillus anthracis spores, which germinate within macrophages (a type<br />
of white blood cell that ingests <strong>for</strong>eign particles) as they travel to the<br />
lymph nodes of the lung, 13 from where they are drained out of the body.<br />
Multiplication of the bacteria results in a high organism count in the blood, the<br />
production of bacterial toxins, <strong>and</strong> the rapid onset of septicemia. Although<br />
bacterial replication (bacteremia) can be controlled by the administration of<br />
appropriate antibiotics, it is the bacterial toxin that exerts deleterious effects<br />
on the cells within the body, resulting in substantial pathology <strong>and</strong> high<br />
mortality in infected individuals. Because antibiotics have no direct effect on<br />
the toxin, they do not treat the toxemia. After the toxin has reached sufficient<br />
levels in an individual, controlling bacterial replication with an antibiotic may<br />
not alter the clinical course of the patient.<br />
11. US Department of Health <strong>and</strong> Human Services, Project BioShield Annual Report<br />
to Congress, , accessed 30 September 2014.<br />
12. BARDA Strategic Plan 2011–2016, , accessed 19 August 2014.<br />
13. Anthrax, , accessed 19 August 2014.
51<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
There is an effective anthrax vaccine that works by inducing the body’s<br />
immune response primarily to the protective agent component of anthrax<br />
toxin, however. Once subjects have this antibody, they are protected against<br />
the effects of anthrax. Of those who became ill in the 2001 attacks, all of<br />
the survivors developed an immune response to the anthrax toxin by day<br />
twenty-eight after exposure.<br />
The challenge with this rapidly progressing <strong>and</strong> often fatal disease is the time<br />
required <strong>for</strong> the infected person’s immune system to generate a response to<br />
the toxins. The anthrax vaccines that have traditionally been available require<br />
more than two months to achieve protective levels of antitoxin antibodies in<br />
the blood. Raxibacumab works by delivering human recombinant antitoxin<br />
antibody to the subject immediately. At the proposed dosage, raxibacumab<br />
persists long enough <strong>for</strong> the development of immunity, helping subjects<br />
survive to develop long-lasting toxin-neutralising antibodies. This immediate<br />
onset of action fills the need <strong>for</strong> subjects who have not received the anthrax<br />
vaccine. This approach also addresses the need arising from the inability of<br />
antibiotics to address anthrax toxemia directly. As demonstrated in studies<br />
in rabbits <strong>and</strong> non-human primates, raxibacumab improves survival when<br />
administered early, be<strong>for</strong>e symptoms develop, as well as later, when the<br />
disease has progressed to systemic infection. The results of the animal<br />
studies are subjected to statistical analysis <strong>and</strong> computer modelling in order<br />
to estimate how efficacious the antitoxin might be in humans. Moreover,<br />
raxibacumab is effective both as monotherapy <strong>and</strong> in combination with<br />
antibiotics.<br />
While it is possible to achieve 100 per cent cure rates using antibiotics<br />
alone under experimental conditions, the 2001 attacks <strong>and</strong> other real-world<br />
experiences have demonstrated that antibiotics alone are not 100 per cent<br />
effective. In addition, antibiotics would not be effective against antibioticresistant<br />
strains of anthrax, which have already been identified. The US<br />
government has recognised the need <strong>for</strong> additional anthrax bioterrorism<br />
countermeasures. Immediately after the anthrax attacks in September<br />
<strong>and</strong> October of 2001, Human Genome Sciences, Inc. (HGS) embarked on<br />
a development programme to produce a monoclonal antibody to treat<br />
inhalational anthrax. HGS was acquired by GlaxoSmithKline (GSK) in 2012,<br />
which continues the development <strong>and</strong> production of raxibacumab. The<br />
goal of the programme was to address the unmet bioterrorism <strong>and</strong> medical<br />
needs posed by inhalational anthrax <strong>and</strong> the limitations of current therapies.<br />
In less than a year, using recombinant DNA technology, a potent <strong>and</strong> specific<br />
antibody had been developed that binds the protective antigen of Bacillus<br />
anthracis with high affinity <strong>and</strong> inhibits protective antigen binding to anthrax<br />
toxin receptors, thus protecting animal <strong>and</strong> human macrophages from<br />
anthrax toxin-mediated cell death. HGS then began non-clinical work to<br />
establish proof of concept of the antibody as a therapeutic in the laboratory
52<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
<strong>and</strong> in animals, <strong>and</strong> initiated the process development work to manufacture<br />
<strong>and</strong> characterise the product.<br />
Bacillus anthracis produces three toxins. While antimicrobials cut off the source<br />
of anthrax toxin production, they do nothing to inhibit the adverse effects of<br />
toxins that have already been released. The pathogenic effects of toxemia<br />
can persist after bacteremia has been resolved. However, antitoxin antibodies<br />
directly neutralise the toxin <strong>and</strong> prevent its pathogenic effects. Recombinant<br />
human antitoxin monoclonal antibodies immediately provide the protection that<br />
develops from the immune response in anthrax survivors or that is stimulated by<br />
vaccines over the course of weeks with multiple injections. Antitoxin antibodies<br />
can be used in combination with antibiotics to protect subjects from the toxemia<br />
that antibiotics do not treat <strong>and</strong> would also be an important therapeutic option<br />
when antimicrobials are unavailable or contraindicated, or in the event of<br />
exposure to an antibiotic-resistant anthrax strain.<br />
The Regulatory Path under the FDA’s Animal Rule<br />
Raxibacumab is the first new drug developed since the bioterrorism attacks of<br />
2001 to seek licensure under the US FDA regulation that describes ‘Evidence<br />
Needed to Demonstrate Effectiveness of New Drugs When Human Efficacy<br />
Studies Are Not Ethical or Feasible’, or the Animal Rule (21 CFR 601, Subpart<br />
H, 2002). 14 The animal studies with raxibacumab were designed to meet the<br />
criteria <strong>for</strong> demonstration of efficacy under the Animal Rule <strong>and</strong> the animal<br />
models used <strong>for</strong> evaluation contained the essential elements provided in<br />
FDA guidance, which are recommended to generate data likely to predict<br />
the effectiveness of the product in humans.<br />
A treatment model must be based on the administration of the therapeutic,<br />
based on a sign or observation, not just the parameter of time that has<br />
passed after exposure; because large non-clinical studies to establish these<br />
treatment triggers are not ethically acceptable, a meta-analysis of data from<br />
studies in the US <strong>and</strong> UK spanning over ten years was conducted. Analysis<br />
of large data sets from rabbit <strong>and</strong> macaque studies, which included many<br />
diverse parameters such as body temperature <strong>and</strong> biochemical assay results,<br />
allowed reproducible triggers to be identified in rabbits (body temperature<br />
increase) <strong>and</strong> macaques (the quantitative measurement of protective agent<br />
in the blood) to be established.<br />
While the efficacy of raxibacumab was demonstrated in two animal models<br />
of inhalational anthrax, safety was evaluated in human clinical studies with<br />
single <strong>and</strong> repeat dosing, alone <strong>and</strong> in combination with antibiotics, in healthy<br />
14. Product Development under the Animal Rule, , accessed<br />
19 August 2014.
53<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
adult volunteers. 15 The animal efficacy studies demonstrated that a single<br />
dose of raxibacumab administered intravenously effectively neutralises the<br />
protective agent <strong>and</strong> significantly improves survival. Its effect is immediate,<br />
<strong>and</strong> maximum raxibacumab serum concentrations are critical <strong>for</strong> survival, as<br />
the goal is to neutralise protective agent as rapidly as possible. Moreover,<br />
because of its relatively long half-life, raxibacumab is durable, maintaining<br />
antitoxin protection until natural immunity can develop in twenty-eight<br />
days. Importantly, raxibacumab does not prevent the development of<br />
antitoxin immunity in anthrax-infected animals, nor does it interfere with the<br />
pharmacokinetics or safety of concomitantly administered antimicrobials.<br />
Animal studies have demonstrated that raxibacumab does not interfere<br />
with the activity of antibiotics <strong>and</strong> that the combination of raxibacumab <strong>and</strong><br />
antibiotic provides a higher survival outcome than antibiotics alone.<br />
Because raxibacumab would likely be used with antimicrobials, the activity<br />
of antimicrobials was evaluated in combination with raxibacumab using<br />
the same study design as the pivotal efficacy studies. Per the suggestion<br />
of FDA, animal studies were per<strong>for</strong>med in which a full human-equivalent<br />
dose of levofloxacin or ciprofloxacin was administered at the same time as<br />
raxibacumab to animals with symptomatic disease. Because antimicrobials<br />
are most effective when all spores have germinated, administering the<br />
antimicrobials after the animals had become septic maximised the efficacy<br />
of the antibiotics. This is reflected in the high survival rates in the antibiotic<br />
alone <strong>and</strong> raxibacumab-antibiotic combination treatment groups (85–100<br />
per cent). In this study, levofloxacin alone or in combination with raxibacumab<br />
was administered to the 42 per cent of anthrax-infected animals surviving<br />
to 84 hours after spore exposure. The combination of raxibacumab <strong>and</strong><br />
levofloxacin resulted in a higher survival outcome than <strong>for</strong> levofloxacin<br />
treatment alone.<br />
The results of the added benefit study serve to supplement the results of the<br />
original efficacy studies, which demonstrated the efficacy of raxibacumab<br />
administered early in the course of the disease. In contrast to the survival<br />
rates observed late in the course of disease, survival rates are highest with<br />
raxibacumab when it is given as the protective agent is first being produced,<br />
with 90‐100 per cent survival rates in rabbits <strong>and</strong> monkeys when raxibacumab<br />
is administered as monotherapy at the time of spore challenge or at twelve<br />
hours after spore challenge. In the clinical setting, when neither the time<br />
of spore exposure, onset of symptoms, nor individual time course of the<br />
disease is easily identified, administering both antimicrobials to kill bacteria<br />
<strong>and</strong> anti-protective agent antibody to neutralise toxin is an effective strategy<br />
<strong>for</strong> combating both the source <strong>and</strong> effects of the disease.<br />
15. Clinical Pharmacology <strong>and</strong> Biopharmaceutics Review of Raxibacumab, ,<br />
accessed 19 August 2014.
54<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Per agreement with FDA <strong>for</strong> an indication as a therapeutic treatment <strong>and</strong><br />
consistent with the Animal Rule, the safety of raxibacumab has been evaluated<br />
in over 400 healthy human volunteers. Adverse events were generally mild to<br />
moderate <strong>and</strong> did not occur at a rate that was different from that observed<br />
among placebo-treated subjects. A low incidence of mild to moderate rash<br />
was observed in some subjects. These rashes were transient <strong>and</strong> resolved<br />
without medication or with oral diphenhydramine (a readily available drug<br />
used to reduce irritation <strong>and</strong> runny noses caused by hayfever or allergies).<br />
Concomitant administration of raxibacumab with ciprofloxacin (a common<br />
antibiotic), did not alter the safety or pharmacokinetics of either antibiotic<br />
or raxibacumab.<br />
Raxibacumab treatment should be initiated when a diagnosis of inhalational<br />
anthrax is suspected or confirmed. Raxibacumab provides a significant<br />
survival benefit in animals symptomatic <strong>for</strong> systemic anthrax disease.<br />
Raxibacumab treatment is also associated with significant <strong>and</strong> greater<br />
improvement in survival when given as pre- or post-exposure prophylaxis<br />
(preventative medicine). Raxibacumab is an important treatment option<br />
<strong>for</strong> inhalational anthrax: an effective antitoxin with a mechanism of action<br />
distinct from that of antimicrobials. Raxibacumab neutralizes protective<br />
agent, improves survival <strong>and</strong> reduces signs of the disease. When used in<br />
combination with antibiotics, raxibacumab does not interfere with antibiotic<br />
efficacy <strong>and</strong> results in a higher survival outcome than antimicrobial therapy<br />
alone. Raxibacumab used alone is also expected to provide clinical benefit<br />
<strong>for</strong> individuals in whom antibiotics are contraindicated or in whom anthrax<br />
disease is due to antibiotic-resistant strains of Bacillus anthracis.<br />
Post-Marketing Requirement after the Licensure<br />
Raxibacumab was approved by the FDA <strong>for</strong> the treatment of inhalational<br />
anthrax due to Bacillus anthracis in December 2012. Its approval was based<br />
on the analysis of data from non-clinical studies, <strong>and</strong> the development<br />
of a mathematical pharmacokinetic model bridging efficacious animal<br />
exposures to safe human exposures. Based on these data, the FDAapproved<br />
raxibacumab <strong>for</strong> the treatment of adult <strong>and</strong> paediatric patients<br />
with inhalational anthrax due to Bacillus anthracis, in combination with<br />
appropriate antibacterial drugs, <strong>and</strong> <strong>for</strong> prophylaxis of inhalational anthrax<br />
when alternative therapies are not available or appropriate. However, this<br />
approval requires GSK to conduct post-marketing studies, such as field studies,<br />
to verify <strong>and</strong> describe raxibacumab’s clinical benefit <strong>and</strong> to assess its safety<br />
when used as indicated, <strong>and</strong> the role of <strong>Big</strong> <strong>Data</strong> analysis is far from over. GSK<br />
has submitted a field study protocol to evaluate the effectiveness, suitable<br />
human dosage <strong>and</strong> safety of raxibacumab use <strong>for</strong> Bacillus anthracis infection<br />
in the US. This phase four, open-label study will be the first human study to<br />
collect data on Bacillus anthracis-infected or exposed patients treated with<br />
raxibacumab. It will also be the first study to gain a better underst<strong>and</strong>ing
55<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
of the clinical benefit <strong>and</strong> safety of raxibacumab in human subjects. <strong>Data</strong><br />
collected from this study (observation of adverse responses, measurement<br />
of antibody concentrations, white blood-cell counts, <strong>and</strong> so on) will further<br />
in<strong>for</strong>m patient care <strong>and</strong> treatment choices <strong>for</strong> the management of anthrax.<br />
Conclusions<br />
The analysis of <strong>Big</strong> <strong>Data</strong> played a central role throughout the experience<br />
with raxibacumab. From the determination of a requirement based on<br />
scenario-based analysis, to the establishment of a procurement objective, to<br />
the development of an animal model based on a treatment trigger <strong>and</strong> the<br />
eventual approval of raxibacumab, effective data collection, management<br />
<strong>and</strong> analysis has been essential. This vital role will continue throughout<br />
the lifespan of raxibacumab. Anthrax cases arising from natural exposure<br />
or criminal or terrorist activity may be treated with raxibacumab <strong>and</strong> new<br />
data will be collected to exp<strong>and</strong> our underst<strong>and</strong>ing of safety <strong>and</strong> efficacy in<br />
humans. If used in response to mass-casualty events, additional data will be<br />
collected on distribution <strong>and</strong> dispensing to further refine our underst<strong>and</strong>ing<br />
of logistics. Better underst<strong>and</strong>ing of the potential use of <strong>Big</strong> <strong>Data</strong> across many<br />
research areas <strong>and</strong> academic disciplines will help resolve issues surrounding<br />
the collection <strong>and</strong> use of these data in an environment in which privacy<br />
protection <strong>and</strong> public health needs are at times on opposite sides of the<br />
balance.<br />
Chia-Wei Tsai is a Project Officer in the Division of Chemical, Biological,<br />
Radiological <strong>and</strong> Nuclear (CBRN) Countermeasures. She is the project lead <strong>for</strong><br />
advanced development <strong>and</strong> acquisition of medical countermeasures in the<br />
Antitoxins <strong>and</strong> Therapeutic Biologics Branch of the CBRN Program. She is also<br />
the contracting officer representative overseeing two advance development<br />
contacts <strong>and</strong> four procurement contracts. She also serves as the Chair <strong>for</strong> the<br />
technical evaluation panel <strong>for</strong> the CBRN antitoxin rolling Business Associate<br />
Arrangement (BAA). Dr Tsai recently received the Secretary’s Award <strong>for</strong><br />
Distinguished Service 2012 <strong>for</strong> her contribution in leading CBRN medical<br />
countermeasures through FDA approval. Prior to joining HHS, Dr Tsai served<br />
as a scientist at DynPort Vaccine Company, which supports a Department of<br />
Defense plague vaccine development programme. She also served as project<br />
lead in the Malaria Vaccine Development Branch in the National Institute of<br />
Allergy <strong>and</strong> Infectious Disease. Dr Tsai received her PhD from the University<br />
of Maryl<strong>and</strong>, College Park in Cell Biology <strong>and</strong> Molecular Genetics <strong>and</strong><br />
completed her post-doctoral training at Johns Hopkins School of Medicine in<br />
Pharmacology.
Discussion Groups<br />
During the afternoon, the conference broke up into focused discussion<br />
groups, each comprising between ten <strong>and</strong> twenty delegates. The outcomes<br />
of these discussion <strong>for</strong>a are presented over the following pages.<br />
Discussions were without attribution. The in<strong>for</strong>mation presented here<br />
seeks to represent the discussions that took place; there is not always<br />
robust academic referencing to support the views offered, but it has been<br />
assumed that if comments made by individual delegates were not credible,<br />
they would have been rejected by the other members of that group during<br />
the discussions. Views presented are there<strong>for</strong>e assumed to be broadly<br />
supported by the majority of those present. Where possible, transcripts of<br />
the discussion <strong>for</strong>a were distributed to the participants during the editing<br />
process <strong>for</strong> further comment <strong>and</strong> clarification.<br />
There was, inevitably, some crossover of subject matter <strong>and</strong> topic discussion<br />
between one group <strong>and</strong> the next, <strong>and</strong> where this occurred, comments have<br />
been amalgamated under one heading to avoid repetition.
Discussion Group 1: The Ethics <strong>and</strong> Legality of <strong>Big</strong><br />
<strong>Data</strong> Sharing<br />
Chair <strong>and</strong> Rapporteur: Edward Hawker<br />
Key Issues <strong>and</strong> Challenges<br />
• The nature of what is <strong>and</strong> is not socially acceptable, regardless of<br />
what is legal, can change over time. This can be situation-dependent<br />
<strong>and</strong> is not absolute<br />
• Individuals do not always read <strong>and</strong> consider terms <strong>and</strong> conditions that<br />
set out privacy <strong>and</strong> data sharing obligations be<strong>for</strong>e accepting them.<br />
• There are fears that data may be misused to enable discrimination<br />
against certain individuals <strong>and</strong> groups<br />
• Who should be able to look at or have access to the data? How is this<br />
determined <strong>and</strong> how can it be en<strong>for</strong>ced?<br />
This discussion group was asked to consider the ethics <strong>and</strong> legality of data<br />
sharing, <strong>and</strong> how <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Big</strong> <strong>Data</strong> projects affect public perceptions.<br />
As government moves <strong>for</strong>ward on a digital agenda that is increasingly<br />
dependent on public participation <strong>and</strong> acceptance, these issues will become<br />
ever-more important.<br />
Ethical Use: A Shifting Concept<br />
The group felt that a major consideration is how ethics <strong>and</strong> ethical use is<br />
defined. In the context of <strong>Big</strong> <strong>Data</strong>, ethics can include notions of privacy,<br />
anonymity, fair use <strong>and</strong> in<strong>for</strong>med consent but perceptions of these terms can<br />
change over time. The 11 September 2001 attacks on the US fundamentally<br />
changed the paradigm <strong>and</strong> ushered in a new age of security-dominated<br />
policy <strong>and</strong> thinking, <strong>for</strong> example, but many people now see that reactive<br />
policies such as The US PATRIOT Act 1 went too far – this is a view that was<br />
independently raised in Discussion Group 4: ‘Individual Privacy versus<br />
Community Safety’, <strong>and</strong> will be discussed in further detail there. Future<br />
events may again change public perceptions <strong>and</strong> attitudes.<br />
The group agreed that a strong component of ethics is proportionality but<br />
determining what is proportionate to any given situation is also difficult.<br />
Collecting all of the available data may ensure that nothing of key importance<br />
is missed, but may be difficult to justify as proportionate <strong>and</strong> ethical. Targeted<br />
data collection, factoring in proportionality, may be seen as a more ethical<br />
approach but risks missing in<strong>for</strong>mation that might later turn out to be of<br />
value.<br />
1. The US PATRIOT Act: Preserving Life <strong>and</strong> Liberty, , accessed 16 June 2014.
58<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
<strong>Data</strong> Collection<br />
The participants then moved on to discussing the ethics of collecting <strong>and</strong><br />
sharing data, <strong>and</strong> how consent is requested <strong>and</strong> obtained from the public<br />
to enable their data to be shared between organisations. Members of the<br />
public generate large volumes of data every day, via social-media plat<strong>for</strong>ms,<br />
online purchases, electronic tickets such as the Oyster cards used on<br />
London’s public transport, <strong>and</strong> geo-location data on mobile phones to name<br />
just a few examples. These data may be gathered <strong>and</strong> stored by the user’s<br />
mobile-phone operator, Internet service provider, the retail sites visited or<br />
bank used, under terms <strong>and</strong> conditions to which they have ostensibly agreed<br />
<strong>and</strong> act as a legal agreement between the user <strong>and</strong> company. Most people<br />
do not read these terms <strong>and</strong> conditions when they accept them, however,<br />
<strong>and</strong> so it can be argued that they do not truly underst<strong>and</strong> what they are<br />
signing up <strong>for</strong> <strong>and</strong> would not sign away so many rights if they did underst<strong>and</strong>.<br />
Companies then (rightly) claim that they are acting within the law when<br />
they share data with other organisations or even sell customer in<strong>for</strong>mation<br />
to third parties, but there are questions over how ethical such behaviour<br />
can really be considered to be. The challenge identified by many of the<br />
participants was that there is no negotiation involved – terms <strong>and</strong> conditions<br />
must either be accepted, or the user will not be able to use the service.<br />
There is not generally an option to accept some of the terms while rejecting<br />
others, or to opt into some aspects of the service without accepting all of<br />
the terms <strong>and</strong> conditions. Could academia suggest ways in which different<br />
levels of privacy settings <strong>and</strong> data-sharing agreements might be built into<br />
online systems, so that customers genuinely have a choice in whether or not<br />
to accept the terms they are offered?<br />
<strong>Data</strong> Protection<br />
Next, the group discussed who should be allowed to look at data. There was<br />
general agreement that only authorised personnel should have access, but<br />
this raised the questions of who can be considered authorised <strong>and</strong> what<br />
protection this really gives. Insiders may be authorised but still may not act<br />
ethically: the actions of Edward Snowden, who was an authorised US National<br />
<strong>Security</strong> Agency (NSA) contractor when he passed classified in<strong>for</strong>mation<br />
to the Guardian <strong>and</strong> Washington Post, were raised here. Snowden stole<br />
in<strong>for</strong>mation <strong>and</strong> then released it on the web <strong>and</strong> to journalists – <strong>and</strong> yet may<br />
commentators would consider his actions, which were illegal but highlighted<br />
widespread US government surveillance of citizens’ private communications,<br />
to be more ethical than the actions of the NSA <strong>and</strong> other government<br />
agencies. The group agreed that there is a need to monitor those who h<strong>and</strong>le<br />
the data <strong>and</strong> to make sure that they do it responsibly, as well as monitoring<br />
the data themselves; this also highlights a need to adopt corporate social<br />
responsibility procedures in relation to data h<strong>and</strong>ling in private companies.
59<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Under the UK’s <strong>Data</strong> Protection Act, 2 members of the public have the<br />
legal right to ask data controllers what in<strong>for</strong>mation is being held on them.<br />
However, there are many situations in which the in<strong>for</strong>mation does not have<br />
to be disclosed, particularly if it compromises someone else’s privacy. The<br />
group felt that there is a lack of public knowledge about these rights <strong>and</strong><br />
the legal framework to protect user in<strong>for</strong>mation, particularly where data are<br />
collected anonymously <strong>and</strong> used to draw conclusions about individuals or<br />
groups more generally.<br />
The Changing Nature of Surveillance<br />
Participants saw the way in which UK society is subjected to surveillance,<br />
<strong>and</strong> how this has changed over the last decade, as an important ethical<br />
issue. New technologies such as smartphones are able to generate more<br />
data <strong>and</strong> more accurate in<strong>for</strong>mation than any time previously, <strong>and</strong> generate<br />
some in<strong>for</strong>mation – such as the user’s current location – automatically. This<br />
suggests there is an ‘almost unconscious’ acceptance of surveillance by those<br />
who buy a smartphone, <strong>and</strong> also implies that people do not consider this to<br />
be surveillance in the same way they would if the government was tracking<br />
them. There is a role <strong>for</strong> academia in explaining why people’s perceptions of<br />
what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically depending<br />
on who is collecting the data.<br />
The group felt that the willingness with which the public sign away their<br />
privacy rights online suggests that there is a balance between convenience<br />
<strong>and</strong> security which may warrant further research. Most people willingly give<br />
up some (if not all) of their security because it is convenient <strong>for</strong> them to<br />
use the service being offered without questioning what this might enable<br />
others to do with their data. Such in<strong>for</strong>mation can easily be extracted by<br />
cyber-criminals <strong>and</strong> used <strong>for</strong> illegitimate purposes, as well as by legitimate<br />
agencies, but many users do not underst<strong>and</strong> the potential dangers or the<br />
security vulnerabilities. Better education would help to ease some of the<br />
challenges, <strong>and</strong> research into how this might be delivered <strong>and</strong> accepted<br />
by the public would be of benefit: the group felt that most people do not<br />
underst<strong>and</strong> where the in<strong>for</strong>mation they share over social-media plat<strong>for</strong>ms<br />
actually goes. A compromised social-media account can give large amounts<br />
of personal in<strong>for</strong>mation to fraudsters, which can enable them to then target<br />
scams very precisely.<br />
There is a strong perceived correlation between conventional media <strong>and</strong><br />
social-media privacy settings: the group thought that people do change<br />
their security settings when privacy breaches are reported in the media. This<br />
highlights the power that the media has to shape perceptions of privacy <strong>and</strong><br />
security, but none of the group – at the conference or subsequently – were<br />
2. <strong>Data</strong> Protection Act 1998, ,<br />
accessed 16 June 2014.
60<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
able to provide any proof or reference to studies that show that people do<br />
actually change their behaviour in these circumstances. Further research on<br />
both the perception of how behaviour is affected, <strong>and</strong> how in fact it actually<br />
is affected, is needed.<br />
The group also felt that it was important to consider the fact that much of<br />
the in<strong>for</strong>mation that can be derived from social-media communication is<br />
in the <strong>for</strong>m of metadata (the context rather than the content of the data,<br />
such as when a message was sent, <strong>and</strong> where it was sent from). This raises<br />
additional challenges <strong>for</strong> privacy <strong>and</strong> ethics, as if it is possible to see who<br />
has spoken to whom <strong>and</strong> when, without looking at precisely what they<br />
said, there are different levels of privacy that may need to be considered<br />
separately. This raised a number of questions, including: where are the<br />
boundaries of consent? Does the ethics of a situation change depending on<br />
whether all in<strong>for</strong>mation is freely available, or only the metadata? How would<br />
this affect a message posted in the private areas of a message <strong>for</strong>um, or sent<br />
as a private e-mail which is then <strong>for</strong>warded to people beyond the original<br />
intended participant?<br />
In<strong>for</strong>mation Requests<br />
A further ethical issue surrounded the extent to which it is acceptable <strong>for</strong> the<br />
private sector to pass in<strong>for</strong>mation to the government when the latter requests<br />
it. Companies such as Facebook <strong>and</strong> Google have the choice of whether or<br />
not to comply with such requests, <strong>and</strong> the ethics of this can be complicated<br />
depending on which government is asking <strong>for</strong> the data. Academics such as<br />
Baker <strong>and</strong> Tang have explored these issues in more depth. 3 If a company fails<br />
to comply with a government request they may receive a court order <strong>for</strong>cing<br />
them to do so. This may make them more likely to comply with the first<br />
request out of convenience, opening them up to criticism of being too ready<br />
to ‘cosy up’ with government <strong>and</strong> <strong>for</strong> not being protective enough of their<br />
customers’ data. There are a number of ethical dilemmas <strong>for</strong> companies in<br />
this context. Their concern with customer focus <strong>and</strong> public image may make<br />
them less likely to comply with requests they think will lead to negative public<br />
backlash, <strong>for</strong> example. It is not their job (nor necessarily in their interests) to<br />
capture or highlight potential terrorists or criminal activities. Nevertheless,<br />
private companies could <strong>and</strong> should take a more proactive stance against<br />
criminal activity that might be detected by looking <strong>for</strong> it more closely within<br />
the data they hold. The group acknowledged that banks in particular have<br />
become more proactive recently, especially in relation to money laundering.<br />
Negative sanctions such as those h<strong>and</strong>ed out to HSBC in relation to failing<br />
3. Jane Stuart Baker <strong>and</strong> Lu Tang, ‘Google’s Dilemma in China’, in Steve May (ed.), Case<br />
Studies in Organizational Communication: Ethical Perspectives <strong>and</strong> Practices, 2nd ed.<br />
(Chapel Hill: Sage, 2012), , accessed 17 June 2014.
61<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
to maintain effective anti-money-laundering programmes as much as social<br />
corporate responsibility have played a large role in this shift in behaviour. 4<br />
Predictive Analytics<br />
A final area the group discussed was predictive analytics – analysing data to<br />
predict how people may behave in future. This has the potential to prevent<br />
crimes <strong>and</strong> enhance public safety, but is ethically contentious. Concerns were<br />
raised that in<strong>for</strong>mation gained on individuals may be used to stereotype entire<br />
groups. The application of ideas such as the ‘broken windows theory’, 5 which<br />
states that communities where low-level crime is endemic are predisposed<br />
to more serious crime, are not universally accepted. While arguments <strong>for</strong><br />
predictive analytics would claim that identifying <strong>and</strong> tackling low-level crime<br />
will help to prevent more serious misdemeanours (<strong>and</strong> indeed, when Police<br />
Commissioner William Bratton applied the theory to turnstile jumpers on<br />
the New York subway in the early 1990s, crimes of all kinds on the transport<br />
system decreased), critics express concern that such approaches can lead<br />
to negative categorisations of some members of society. Academia could,<br />
however, help to analyse data <strong>and</strong> identify potential trends which could<br />
then be explored in more detail through a multidisciplinary approach<br />
involving behavioural psychologists, sociologists <strong>and</strong> criminologists, as well<br />
as computer scientists.<br />
Suggested Research Topics<br />
• An in-depth examination is needed of public underst<strong>and</strong>ing of the<br />
surveillance <strong>and</strong> privacy debate, to provide recommendations that<br />
will encourage more people to engage in shaping future policy<br />
• Academic research can help to explain why people’s perceptions of<br />
what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically<br />
depending on who is collecting the data<br />
• Research is needed into how to educate people not to willingly give<br />
up data without questioning what this might enable others to do with<br />
those data. Many users do not underst<strong>and</strong> the potential dangers or<br />
the security vulnerabilities<br />
• Academia should suggest ways in which different levels of privacy<br />
settings <strong>and</strong> data-sharing agreements can be built into online systems,<br />
so that customers genuinely have a choice in whether or not to accept<br />
the terms they are offered.<br />
4. ‘HSBC Holdings Plc. <strong>and</strong> HSBC Bank USA N.A. Admit to Anti-Money Laundering <strong>and</strong><br />
Sanctions Violations, Forfeit $1.256 Billion in Deferred Prosecution Agreement,<br />
Department of Justice Office of Public Affairs’, 11 December 2012, , accessed 17 June 2014.<br />
5. James Q Wilson <strong>and</strong> George L Kelling, ‘The Police <strong>and</strong> Community Safety: Broken<br />
Windows’, Manhattan Institute, 1982, , accessed 19 August 2014.
Discussion Group 2: Policing, Terrorism, Crime<br />
<strong>and</strong> Fraud<br />
Chair: David Smart<br />
Rapporteur: Philippa Morrell<br />
Key Issues <strong>and</strong> Challenges<br />
• <strong>Data</strong> analysis needs to begin somewhere <strong>and</strong> follow the best direction.<br />
How can the most appropriate leads to follow be identified?<br />
• Missing data, where data should be expected, might provide as much<br />
in<strong>for</strong>mation as the analysis of data themselves. How can such gaps be<br />
identified <strong>and</strong> analysed?<br />
• Is the apparent lack of skilled security data analysts due to a genuine<br />
skills gap or better career prospects <strong>and</strong> higher pay in other sectors<br />
<strong>for</strong> people with the appropriate skills?<br />
• How can the benefits of <strong>Big</strong> <strong>Data</strong> be measured <strong>and</strong> proven, to avoid<br />
it becoming a sink <strong>for</strong> valuable resources that would be best used<br />
elsewhere?<br />
Since the 11 September 2001 terrorist attacks on the US, there has been greater<br />
collaboration between police <strong>for</strong>ces, security services <strong>and</strong> governments at<br />
the international <strong>and</strong> national levels. Links between terrorism <strong>and</strong> crime –<br />
<strong>and</strong>, in particular, acquisitive crimes such as fraud <strong>and</strong> money laundering<br />
– have been identified, studied <strong>and</strong> researched extensively. This discussion<br />
group considered how new approaches to the data available might help to<br />
improve the quantity <strong>and</strong> quality of the linkages that are being made. The<br />
group also discussed whether there are other sources of data, not currently<br />
analysed <strong>for</strong> this purpose, which might also yield valuable in<strong>for</strong>mation <strong>and</strong><br />
intelligence.<br />
A key point to consider is that effective analysis of data depends on having a<br />
lead (or leads) in the first place, which helps to identify what the analyst – or<br />
the analytical tool – is looking <strong>for</strong> within the data. These starting points can<br />
then be developed to provide further insights or in<strong>for</strong>mation. A lack of data<br />
where they might be expected can, however, be just as important a lead, but<br />
is much more difficult to spot. A particularly useful research focus, there<strong>for</strong>e,<br />
may be to look at developing new methods of data analysis to identify the<br />
‘unknown unknowns’, described by one member of the group as ‘the Holy<br />
Grail of intelligence work’, by spotting anomalies within the more routine data<br />
that could then be subjected to further analysis <strong>and</strong> investigation. The group<br />
thought that, at present, gaps in the data are not sufficiently recognised as<br />
such, nor effectively interpreted.
63<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
The group also felt that while identifying phenomena such as linkages in the<br />
data can be relatively easy to do automatically, it is much harder to attribute<br />
significance (or the quality of the significance) to those links without a human<br />
analyst involved in the process. A good underst<strong>and</strong>ing of what the linkages<br />
mean is needed in order <strong>for</strong> them to be given value <strong>and</strong> <strong>for</strong> any consequent<br />
interventions to be effective. The group felt that more research is needed on<br />
the relationship between qualitative (human) <strong>and</strong> quantitative (automatic)<br />
analysis in making effective interpretations.<br />
Regional Differences<br />
Some of the delegates felt that once technological solutions are available,<br />
there is a tendency to apply the same technology everywhere in order to<br />
promote st<strong>and</strong>ardisation <strong>and</strong> interoperability, but this risks introducing<br />
a ‘one-size-fits-all’ approach that is not equally applicable to all situations<br />
or regions. The group felt that this was important as local initiatives <strong>for</strong><br />
countering extremism <strong>and</strong> radicalisation within at-risk communities often<br />
work because of local conditions at a particular point in time; what works<br />
in one place is not automatically transferable to another location or even<br />
repeatable within the same community. Good policy is often derived from<br />
historic experiences – <strong>for</strong> instance, if a particular approach has worked in<br />
one area of the country, or on one operation, there may be a push <strong>for</strong> it to<br />
become st<strong>and</strong>ard policy to use it in others, but this has not always proven<br />
successful. The subtleties <strong>and</strong> reasons that programmes have been successful<br />
may not be easily extractable from data, but may be more apparent to a<br />
human analyst.<br />
There have been a number of short-term solutions in recent times, but<br />
the group felt that there needs to be a longer-term view, with a greater<br />
emphasis on determining what is well understood <strong>and</strong> what is not. For<br />
example, between two <strong>and</strong> five years after a specific event, retroactive data<br />
analysis could help to assess impacts <strong>and</strong> changes that have occurred in the<br />
intervening period, <strong>and</strong> perhaps identify which have been the most effective<br />
responses, so that they can be focused on more strongly. There is also a need<br />
to ensure that all available data are considered: the most effective responses<br />
may not have been the official ones. For such an approach to work, however,<br />
there needs to be a high level of detail <strong>and</strong> the quality of the data has to be<br />
assured – quantity alone is not a guarantee of the results, nor that the data<br />
gathered will be useful.<br />
Research to evaluate vigorously what has <strong>and</strong> has not worked well may<br />
help best practice to be identified <strong>and</strong> also prevent mistakes from being<br />
repeated. Better ways to identify the ‘starting state’ are also needed, so that<br />
there is a baseline against which success or failure can be measured. This<br />
will help to determine how success or failure will be evaluated (which the<br />
group felt is often lacking at present) <strong>and</strong> may provide better underst<strong>and</strong>ing
64<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
of how multiple concurrent interventions can be evaluated separately <strong>and</strong><br />
individually to test which are most effective.<br />
In order <strong>for</strong> data to be useful, they need to be multidimensional. This requires<br />
robust methodology behind how the dimensions are set, <strong>and</strong> thus what data<br />
should be collected. The most valuable data do not always relate to what is<br />
happening ‘on the ground’; they may be more subtle – such as where stolen<br />
credit cards are being used, rather than where they are being stolen – <strong>and</strong><br />
this in turn may involve comparison at the local, national <strong>and</strong> international<br />
levels. A credit card stolen in one area or country could be used in another to<br />
order goods from a company based in a third, which are delivered to a fourth<br />
location. The data sets analysed need to be relevant to the answers required,<br />
but the ability to integrate many different data sets, <strong>and</strong> to do this much<br />
faster <strong>and</strong> more efficiently, does not by itself provide the cultural context.<br />
Implications of Real-Time Analysis<br />
<strong>Big</strong> data provide new timeframes <strong>for</strong> the collection <strong>and</strong> analysis of<br />
in<strong>for</strong>mation, enabling real-time processing <strong>and</strong> real-time updates, as well as<br />
data collection over long periods of time. Real-time processing, combined<br />
with analysts who have an underst<strong>and</strong>ing of the context in which the data<br />
have been collected, can enable ‘non-normal’ trends to be picked out<br />
more easily. The use of hard <strong>and</strong> soft intelligence combined with <strong>Big</strong> <strong>Data</strong><br />
might enable the detection of those hiding in plain sight, <strong>for</strong> example, using<br />
predictive analytics against the norm that looks at deviant behaviour at an<br />
individual level rather than at the community level.<br />
More research is needed to determine the circumstances in which <strong>Big</strong> <strong>Data</strong><br />
is likely to be most relevant – at the tactical level, or at a more operational<br />
or strategic level to enable decision-making. What analytical methods are<br />
required to underst<strong>and</strong> the data being created <strong>and</strong> do these differ depending<br />
on what outcome the analysis is aiming <strong>for</strong>? The group also questioned<br />
whether an overemphasis on the potential benefits of <strong>Big</strong> <strong>Data</strong> make it a<br />
sink <strong>for</strong> resources – <strong>and</strong> whether there is proof that they add real benefits.<br />
Useful <strong>Data</strong> Extraction<br />
When collected <strong>and</strong> applied appropriately, <strong>Big</strong> <strong>Data</strong> can combine <strong>and</strong> share<br />
many sources of data <strong>and</strong> in<strong>for</strong>mation to provide pertinent intelligence. <strong>Big</strong><br />
<strong>Data</strong> enables the use of multiple sources of data <strong>and</strong> provides an ability to<br />
filter these data in new <strong>and</strong> innovative ways. It is also dependent on the<br />
ability to strip out unnecessary <strong>and</strong> extraneous data. <strong>Data</strong> weeded out by<br />
the filters can still be analysed but their removal from the main data sets<br />
raise questions over data quality. This can be a particular issue where data<br />
are collected centrally <strong>for</strong> use by different organisations, as each needs a<br />
different portion of the complete data set: the point(s) at which data are<br />
removed will have a strong influence on what the remainder can be used <strong>for</strong>,
65<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
as in<strong>for</strong>mation that is extraneous to one organisation may be highly insightful<br />
to another. In addition, if data are going to be shared across a number of<br />
organisations, underlying definitions of how <strong>and</strong> with whom the data are<br />
intended to be shared will need to be agreed <strong>and</strong> understood by all parties<br />
<strong>for</strong> the process to work seamlessly.<br />
In any data analysis, it is important to know who <strong>and</strong> what is being looked<br />
at to ensure that the cultural aspects are considered <strong>and</strong> understood. For<br />
instance, one participant offered an example of data that treat the British<br />
Kashmiri <strong>and</strong> Punjabi populations as the same. Both are defined as either as<br />
Muslims, or at a more granular level by the language they speak (Punjabi),<br />
whereas there are very distinct historical <strong>and</strong> cultural differences between<br />
the two groups. If the data cannot distinguish between them, it will not be<br />
possible to highlight a trend within one group that is not present in the other.<br />
A further consideration is that when analysing any data there is a danger<br />
of pre-supposition of what the data may represent (or what the collector<br />
wants them to represent) <strong>and</strong> a misuse of them because of this. The actual<br />
underlying causal relationship is overlooked or ignored. An example here is<br />
a recorded increase in the number of neonatal deaths in Japan following<br />
the Fukushima Dai-ichi nuclear power plant accident: 1 the increase appears<br />
to point to radiation exposure causing the deaths, but was in fact due to<br />
increased maternal stress following the evacuation of homes damaged in<br />
the earthquake <strong>and</strong> tsumani, leading to an increase in premature births,<br />
combined with damage to hospitals, which meant that neonatal care was<br />
compromised. The neonatal deaths <strong>and</strong> the damaged power station shared<br />
the same root cause – the earthquake <strong>and</strong> tsunami – a different relationship<br />
from what one interpretation of the data might suggest. Comparing<br />
neonatal death rates closer to Fukushima with those further away, in areas<br />
less affected by the nuclear power station, but which had suffered similar<br />
earthquake damage, would help to provide more accurate data analysis.<br />
Human analysis can also dismiss links that are relevant, however – one of<br />
the group gave an example in which one of the men subsequently involved<br />
with plotting the 7 July 2005 bombings on the London Underground had<br />
previously shown up in police analysis of a terrorist network but had been<br />
dismissed <strong>for</strong> further analysis as he was ‘only’ involved in financial crime,<br />
<strong>and</strong> so was assumed to be an ‘ordinary’ criminal who just coincidentally<br />
overlapped with the terrorist network.<br />
It is important to have ‘checks’ that guard against this: techniques are available<br />
that enable an initial filtering of data which can then be revisited, so that<br />
1. Alfred Korblein, ‘Infant Mortality in Japan after Fukushima’, December 2012, ,<br />
accessed 19 July 2014.
66<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
data can be analysed in line with presuppositions <strong>and</strong> then remodelled with<br />
the excluded data reintegrated, in order to see whether <strong>and</strong> how the results<br />
differ, thus testing the validity of initial assumptions. Humans inevitably look<br />
<strong>for</strong> patterns <strong>and</strong> may find them where they do not exist.<br />
Social-Network Analysis<br />
There is much potential <strong>for</strong> <strong>Big</strong> <strong>Data</strong> to help identify linkages when used in<br />
conjunction with social network analysis – the mapping <strong>and</strong> measuring of<br />
relationships <strong>and</strong> flows between people, groups, organisations, computers,<br />
URLs <strong>and</strong> other connected in<strong>for</strong>mation. 2 Identifying the key nodes of a<br />
network through social-network analysis will help to generate further<br />
leads. The group felt that this tends to work better with regard to lower<br />
level operatives, but it can also lead to the very top of the network – the<br />
classic example being the role social network analysis played in the capture<br />
of Saddam Hussein. 3 Individuals found via social-network analysis are often<br />
known to law en<strong>for</strong>cement from other operations that may or may not be<br />
considered to be relevant. Amalgamating data sets from many different<br />
operations may help to indentify linkages by providing an overview that<br />
will help to put the social-network analysis in context but, to return to the<br />
issue of automated versus human analysis discussed earlier in this paper,<br />
social network analysis is likely to throw up many low-level links, only some<br />
of which are relevant. It may take strong cultural underst<strong>and</strong>ing <strong>and</strong> human<br />
analysis to decide which of the many options are worth pursuing.<br />
<strong>Data</strong> Lessons from the Private Sector<br />
The group discussed whether <strong>and</strong> to what extent the commercial sector’s<br />
experiences with <strong>Big</strong> <strong>Data</strong> can be applied to national security. For example,<br />
commercial approaches enable companies to market particular products to<br />
particular individuals based on their previous buying behaviour <strong>and</strong>, while<br />
it may not be immediately obvious that this could aid counter-terrorism or<br />
serious organised crime operations, there is potential benefit in being able to<br />
analyse someone’s previous behaviour in order to predict <strong>and</strong> influence their<br />
future behaviour. The group expressed concern that <strong>for</strong> such an approach<br />
to work, high numbers of skilled human analysts need to be directed at the<br />
available data sets, <strong>and</strong> need to be able to underst<strong>and</strong> additional background<br />
in<strong>for</strong>mation <strong>and</strong> context. There is currently a shortage of appropriate skilled<br />
security analysts available but it is not clear whether this is a real skills<br />
shortage or a funding issue: whether there are too few people with the<br />
appropriate skills, or whether those who have these skills are being attracted<br />
2. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014.<br />
3. ‘Case Study: The Capture of Saddam Hussein, War 2.0: The National <strong>Security</strong> <strong>and</strong> the<br />
Science of Networks’, ,<br />
accessed 19 August 2014.
67<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
to commercial sector marketing jobs rather than public-sector security work<br />
because of the salaries offered.<br />
Finding out how national security can benefit from the experiences of the<br />
commercial sector will be a continuous learning experience; a multidisciplinary<br />
approach is needed involving social scientists as well as computer experts.<br />
Summary<br />
The analysis of large amounts of data <strong>and</strong> diverse data streams needs to<br />
be multidisciplinary <strong>and</strong> multidimensional. <strong>Big</strong> data can enable real-time<br />
processing of large amounts of in<strong>for</strong>mation, but <strong>for</strong> this to be of value in the<br />
policing, terrorism, crime <strong>and</strong> fraud arenas, a better underst<strong>and</strong>ing is needed<br />
of the value that can be added by the data being collected <strong>and</strong> analysed, along<br />
with more analysis of precisely what this value is <strong>and</strong> how it is added. The<br />
financial sector, in particular, needs to guard against complete automation<br />
in the detection of anomalies: there is no replacement <strong>for</strong> human analysis.<br />
Suggested Research Topics<br />
• More research is needed into how data analysis (<strong>and</strong> data analysts)<br />
can identify <strong>and</strong> interpret missing data <strong>and</strong> data on deviations from<br />
the expected norm. Real-time processing, combined with analysts<br />
who have an underst<strong>and</strong>ing of the context in which the data have<br />
been collected, will help such trends to be picked out more easily <strong>and</strong><br />
interpreted appropriately. Predictive analytics against the norm are<br />
needed, which can look at deviant behaviour at an individual level, as<br />
well as at the wider community level<br />
• A better underst<strong>and</strong>ing is required of how to link data to underlying<br />
causes, along with methodology that can guard against the negative<br />
influence of supposition. Techniques need to be developed that<br />
remodel data sets with excluded or removed data reintegrated, so<br />
that results can be compared <strong>and</strong> differences analysed in order to test<br />
the validity of the initial assumptions<br />
• Better research is needed into ways to remodel data <strong>and</strong> test<br />
assumptions so that a more detailed picture can be built of how data<br />
reflect assumptions. This may help to identify whether some leads<br />
are currently being missed because of inherent biases in the way the<br />
data are approached.
Discussion Group 3: Health <strong>Data</strong>, Public Health<br />
<strong>and</strong> Public Health Emergencies<br />
Chair: Chris Watkins<br />
Key Issues <strong>and</strong> Challenges<br />
• The public requires honesty <strong>and</strong> transparency about why health data<br />
are being collected <strong>and</strong> what they will be used <strong>for</strong>. Good communication<br />
is essential <strong>for</strong> building trust in health databases <strong>and</strong> data sets<br />
• Allowing some degree of personal choice over what data are stored,<br />
who they might be shared with <strong>and</strong> in what situations will help<br />
individuals feel in control of their data. This may be important in<br />
gaining acceptance <strong>for</strong> new public health data strategies, however<br />
the number of people who choose to sign up to the NHS Organ Donor<br />
Register (ODR), as an example, has been disappointingly low<br />
• The public’s concern over possible discrimination resulting from<br />
health in<strong>for</strong>mation kept on individuals, <strong>and</strong> of the potential misuse<br />
of data, appears to be creating a negative impression of large health<br />
data projects. This needs to be addressed be<strong>for</strong>e health in<strong>for</strong>mation<br />
possibilities can be fully realised.<br />
This group discussed the ways in which big data might be used to positive<br />
effect in public health, both in general <strong>and</strong> during public health emergencies. 1<br />
The main benefits the group identified include opportunities to improve<br />
surveillance so that outbreaks of disease are detected more quickly <strong>and</strong> more<br />
accurately, particularly by developing opportunities <strong>for</strong> self-reporting through<br />
social-media <strong>and</strong> mobile communications plat<strong>for</strong>ms <strong>and</strong> by improving the<br />
dissemination of in<strong>for</strong>mation during the response to an outbreak.<br />
The group felt that the systems which exist <strong>and</strong> are widely used prior to a<br />
health emergency, but that have the capacity to also cope during the crisis,<br />
are likely to be more useful than entirely novel systems that only come into<br />
play in extreme situations. One valuable area of research, there<strong>for</strong>e, would<br />
be to consider how easily existing systems could morph from ‘business as<br />
usual’ to ‘public health emergency’ conditions, <strong>and</strong> what additional features<br />
or functionalities might need to be added to the normal system to enable<br />
this. For instance: how easily could current surveillance systems cope with<br />
significantly increased data traffic, or more frequent updating? How rapidly<br />
can different surveillance systems be aggregated together <strong>and</strong> analysed?<br />
1. A public health emergency of international concern is defined by the World Health<br />
Organization as ‘an extraordinary event [that] constitute[s] a public health risk to other<br />
States through the international spread of disease, or that potentially requires an<br />
international response’, , accessed 19<br />
July 2014.
69<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Once such technological challenges are met, the group felt that there are a<br />
number of ways in which data collected over social-media plat<strong>for</strong>ms in particular<br />
can help to improve public health <strong>and</strong> the response(s) to a health emergency. In<br />
some cases, studies are already available. During the 2009–10 H1N1 influenza<br />
outbreak, 2 <strong>for</strong> example, the NHS set up an online service to enable a more<br />
effective distribution of the antiviral drug TamiFlu, which was in short supply.<br />
The technological requirements to set up self-diagnosis <strong>and</strong> self-reporting<br />
systems that will help to collect data on the number <strong>and</strong> location of systems<br />
are relatively easy to set up – such as an app that would list symptoms <strong>and</strong><br />
allow someone to check which ones they have, report them to the NHS <strong>and</strong><br />
receive advice on whether further consultation with a local pharmacist or<br />
their GP is needed – but their efficiency <strong>and</strong> accuracy is affected by people’s<br />
willingness to engage. This, in turn, would be influenced by the nature of<br />
the disease. Self-diagnosis <strong>and</strong> reporting systems could collect accurate data<br />
on sexually transmitted infections (STIs) <strong>for</strong> example, helping to locate ‘hotspots’<br />
of new outbreaks, while social-network analysis might help to trace<br />
social contacts from whom the STI might have been caught or who are in<br />
danger of having it passed on to them, but whether or not people would<br />
be willing to engage in such self-reporting is a different matter. Apps that<br />
have proven effective in providing in<strong>for</strong>mation on seasonal influenza may<br />
not have been suitable technology to engage during the early days of the<br />
AIDS epidemic, had such technology been available.<br />
Privacy Challenges<br />
Using <strong>Big</strong> <strong>Data</strong> <strong>for</strong> public health benefit is more complex than just the<br />
technological development. There have been several challenges to public<br />
acceptance of large health-related data projects to date, generally over<br />
concerns around privacy, which seem to be particularly acute with regard to<br />
personal health in<strong>for</strong>mation. There has been considerable public <strong>and</strong> media<br />
backlash to projects such as the NHS <strong>Data</strong>spine, 3 which intended to provide<br />
a central repository <strong>for</strong> in<strong>for</strong>mation on more than 70 million patients from<br />
27,000 individual organisations, <strong>and</strong> its successor, the Health <strong>and</strong> Social Care<br />
In<strong>for</strong>mation Centre, 4 which was set up in April 2013 under the Health <strong>and</strong><br />
Social Care Act 2012, to ‘collect, analyse <strong>and</strong> present UK national health<br />
<strong>and</strong> social care data’. Protest groups such as GeneWatchUK 5 <strong>and</strong> Privacy<br />
International 6 have raised issues around the mere existence of such a<br />
2. NHS, ‘Swine Flu’, , accessed 19 July 2014.<br />
3. Computer Weekly, ‘NHS <strong>Data</strong> Spine out of Action <strong>for</strong> 28 Hours in a Week’, 10 January<br />
2006, ,<br />
accessed 19 July 2014.<br />
4. See , accessed 2 June 2014.<br />
5. See , accessed 2 June 2014.<br />
6. See ,<br />
accessed 2 June 2014.
70<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
database, including objections to the ease with which it can be accessed,<br />
who should be allowed access to it <strong>for</strong> research or surveillance purposes,<br />
<strong>and</strong> what rights patients should have to access <strong>and</strong> amend their in<strong>for</strong>mation.<br />
Health professionals <strong>and</strong> policy-makers see the potential benefits of such<br />
a database, which will enable the best available treatments to be targeted<br />
towards individuals based on their health profile <strong>and</strong> potentially even their<br />
genome, as far outweighing the potential <strong>for</strong> misuse, but the UK government<br />
nonetheless needs to ensure that privacy safeguards are in place <strong>and</strong> that<br />
these safeguards are clearly communicated to (<strong>and</strong> trusted by) the general<br />
public in order to ensure public <strong>and</strong> media acceptance of such programmes.<br />
The group felt that poor communication of the benefits of such systems<br />
is most likely to be at the heart of the public <strong>and</strong> media backlash, but<br />
also acknowledged that public trust in large government data projects is<br />
undermined by perceptions that government does not act responsibly<br />
with public data. Such perceptions have been rein<strong>for</strong>ced by the Snowden<br />
revelations published in the Guardian <strong>and</strong> Washington Post since June<br />
2013, in which <strong>for</strong>mer US National <strong>Security</strong> Agency (NSA) contractor Edward<br />
Snowden revealed that the NSA had collected <strong>and</strong> stored large volumes of<br />
Internet communications by private citizens between 2007 <strong>and</strong> 2013 – in<br />
collaboration with a number of private sector companies such as Google <strong>and</strong><br />
Amazon, <strong>and</strong> other national government agencies including the UK’s GCHQ<br />
– under a strategy of collecting everything, <strong>and</strong> only then analysing it to<br />
reveal criminal or terrorist activities, rather than collecting communications<br />
only where there was reason to suspect those communications may contain<br />
something untoward. The revelations have caused public <strong>and</strong> media outrage,<br />
even though in most cases collection <strong>and</strong> sharing had been possible because<br />
the users of plat<strong>for</strong>ms such as Facebook, Yahoo <strong>and</strong> Amazon did not change<br />
default privacy settings that would have prevented their personal in<strong>for</strong>mation<br />
from being shared in this way. Prior to the Snowden revelations, most users<br />
considered the benefits of data sharing on Facebook, including the amount<br />
of data that can be shared <strong>and</strong> the number of people it can reach, to heavily<br />
outweigh the negatives (<strong>and</strong> in practice they still do – there is little concrete<br />
evidence that a significant proportion of individuals are genuinely changing<br />
their behaviour with regard to privacy settings as a result of Snowden).<br />
Fear of Discrimination<br />
Negative attitudes towards <strong>Big</strong> <strong>Data</strong> health projects are largely driven by<br />
fear that the in<strong>for</strong>mation contained in the data will enable the state, private<br />
companies or other agents to discriminate against individuals or certain groups.<br />
For example, certain data might enable insurance companies to discriminate<br />
against individuals with high-risk profiles <strong>and</strong> thus affect the cost of insurance<br />
premiums. An individual with a genetic predisposition to cancer may find it<br />
difficult to take out life insurance or a long-term loan, such as a mortgage.<br />
This is in spite of counter-arguments that awareness of their genetic make-up
71<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
may make them more likely to follow a healthy lifestyle that avoids known<br />
triggers to the genetic condition, such as smoking, <strong>and</strong> to attend regular<br />
medical screenings that will pick up emerging conditions early <strong>and</strong> enable<br />
more effective treatment, which may even increase their life expectancy.<br />
Fears include perceptions that if the health-care sector had access to large data<br />
sets <strong>and</strong> was able to share this in<strong>for</strong>mation with other organisations, this might<br />
enable health insurance companies to profile customers to determine whether<br />
they should be given insurance or not. For example, when a customer signs up to<br />
a supermarket reward card, they often accept terms <strong>and</strong> conditions that enable<br />
their buying habits to be shared with a number of third-party organisations,<br />
mostly <strong>for</strong> marketing purposes. The in<strong>for</strong>mation supermarkets collect about<br />
people’s buying habits can be used to make inferences about their lifestyle <strong>and</strong><br />
this could in turn be analysed to make predictions about their likely long-term<br />
health. If supermarkets were to share this in<strong>for</strong>mation with the health sector, it<br />
may enable positive health interventions to be targeted towards communities<br />
or even individuals, <strong>and</strong> to improve planning on what health-care services might<br />
be needed in future, but a potential downside might be that health insurance<br />
companies use the in<strong>for</strong>mation to determine whether a potential customer is a<br />
heavy consumer of alcohol, or whether their diet is particularly unhealthy, <strong>and</strong><br />
approve or deny insurance based on this personal in<strong>for</strong>mation.<br />
Building Trust in Health <strong>Data</strong> Projects<br />
The group felt that with regard to examples such as the one given above, more<br />
research is needed into what uses people will accept or object to regarding their<br />
personal data, <strong>and</strong> in what circumstances. This would help to develop a better<br />
underst<strong>and</strong>ing of how the way in which people are in<strong>for</strong>med of data collection<br />
(including who is doing the collection, <strong>for</strong> what reason, <strong>and</strong> what the data are<br />
likely to be used <strong>for</strong>) determines how they will react to it. There may be very<br />
different reactions from people depending on whether the data are being<br />
collected by the government or by pharmaceutical companies, <strong>and</strong> this may not<br />
be consistent from one country to the next. In Germany, <strong>for</strong> example, where the<br />
legacy of Nazi rule has made the population cautious of allowing government<br />
collection of personal in<strong>for</strong>mation, most health data are collected by private<br />
health companies rather than the state, <strong>and</strong> there are local, federal <strong>and</strong> state<br />
differences in the legal structures surrounding data protection <strong>and</strong> privacy.<br />
Cultural <strong>and</strong> historical factors can strongly affect public perceptions surrounding<br />
the in<strong>for</strong>mation being collected <strong>and</strong> can affect how successful data collection is.<br />
Plat<strong>for</strong>m(s) through which in<strong>for</strong>mation is collected <strong>and</strong> how it is stored are<br />
also important to consider. The group felt that is unclear under current EU<br />
law whether in<strong>for</strong>mation stored on a mobile device such as a smart phone<br />
is classed as broadcast in<strong>for</strong>mation or personal in<strong>for</strong>mation, <strong>and</strong> there<strong>for</strong>e<br />
whether or not it is protected by privacy laws; more research is needed on<br />
how existing legislation relates to rapid technological advances.
72<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
The discussion group felt that with regard to health-related big data projects<br />
such as the single NHS database, allowing personal choice through an ‘optin’<br />
system may the best approach to building trust. The NHS ODR 7 could be<br />
used as a model, though it is worth noting that only around half of the UK<br />
population (54 per cent of women <strong>and</strong> 46 per cent of men) have opted in.<br />
While this amounts to many millions of individuals, there are also many<br />
millions who do not opt in. A survey was carried out on behalf of NHS Blood<br />
<strong>and</strong> Transplant in 2013 to find out why people do not sign up to the Register. 8<br />
The results are shown in the box below.<br />
Box 1: Reasons why people choose not to opt in to the NHS’s Organ Donor<br />
Register.<br />
30% are aware of the ODR but do not know or have confused underst<strong>and</strong>ing of<br />
what it is<br />
16% are not aware of the ODR or are unable to say what it is<br />
16% say they do not want to think about their death<br />
15% say they worry that their family might be upset if they donated their organs<br />
12% say they worry that they could still be alive when the operation is carried out<br />
11% say they do not want to donate to someone who does not deserve it<br />
10% believe they are too old.<br />
(respondents were allowed to give more than one reason)<br />
Where opt-in to data collection is required, offering even minor incentives<br />
(which can be as simple as free access to the service, even if the company<br />
offering the services makes a profit from the individuals signing up)<br />
encourages individuals to sign up. For example, the market research company<br />
YouGov offers points <strong>for</strong> signing up <strong>and</strong> participating in its surveys, which can<br />
be redeemed <strong>for</strong> financial remuneration <strong>and</strong> other rewards. 9 Equally, making<br />
it difficult or inconvenient to opt out can also ‘nudge’ behaviour in a certain<br />
7. NHS, ‘Organ Donation, How to Register’, , accessed 3 June 2014.<br />
8. Figures provided on 10 June 2014 by NHS Blood <strong>and</strong> Transplant, from a commissioned<br />
market research report carried out by Optimisa in 2013.<br />
9. YouGov, ‘Join the YouGov Panel Today!’, ,<br />
accessed 2 June 2014.
73<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
direction. Facebook <strong>and</strong> most social-networking sites set low default privacy<br />
settings, as the company wants to be able to collect as much in<strong>for</strong>mation<br />
on individuals as possible. A conscious decision has to be made to change<br />
to higher privacy settings (though this is starting to change following the<br />
backlash from Snowden). As the easiest <strong>and</strong> least inconvenient course of<br />
action <strong>for</strong> a new user is to accept the low privacy settings, most people tend to<br />
do so. With regard to health data projects, the group felt that opt-in systems<br />
would be more likely to generate trust if they began with very high privacy<br />
settings, <strong>and</strong> allowed users various levels of opt-out if they were happy <strong>for</strong><br />
their in<strong>for</strong>mation to be shared with other government departments, research<br />
institutions or private-sector companies.<br />
Targeted <strong>Data</strong> Collection<br />
The group felt that an important consideration in the collection of health data<br />
– <strong>and</strong>, in fact, any data – is that the collection must be targeted to ensure that<br />
collection is necessary <strong>and</strong> efficient. <strong>Data</strong> should not be collected <strong>for</strong> the sake<br />
of collecting data. There are important distinctions here between targeted<br />
analysis (which knows what is being looked <strong>for</strong> <strong>and</strong> seeks actively to find it)<br />
<strong>and</strong> general pattern analysis (the recognition of patterns <strong>and</strong> regularities in<br />
data, without necessarily being aware of what those patterns mean without<br />
further analysis), though both are useful in determining dynamics within a<br />
data set. For example, general pattern analysis in public health may pick up a<br />
sudden increase in the quantities of influenza drugs being prescribed, <strong>and</strong> the<br />
location(s) in which the prescriptions are being made, which might indicate<br />
<strong>and</strong> identify the beginning of a new p<strong>and</strong>emic, while targeted analysis might<br />
then aim to track family members <strong>and</strong> colleagues of the individuals showing<br />
symptoms, so that they can be tested <strong>and</strong> offered preventative treatments<br />
be<strong>for</strong>e symptoms develop. <strong>Data</strong> subjected to general pattern analysis<br />
should be anonymised (as the patterns will emerge whether the data are<br />
anonymised or not), whereas it is very difficult to carry out targeted analysis<br />
on anonymised data. This raises questions around how <strong>and</strong> at what stage<br />
data are anonymised – the collection of only anonymised data may make<br />
meaningful interpretation or practical action difficult at a later stage, but if<br />
the public is not convinced that data are being appropriately anonymsied,<br />
they may be unwilling to provide data in the first place.<br />
Summary <strong>and</strong> Conclusions<br />
<strong>Big</strong> <strong>Data</strong> offers enormous potential benefits to health care, including<br />
improved surveillance so that outbreaks of disease are detected more<br />
quickly <strong>and</strong> accurately (particularly by developing opportunities <strong>for</strong> selfreporting<br />
through social media <strong>and</strong> mobile communications plat<strong>for</strong>ms)<br />
<strong>and</strong> by improving the dissemination of in<strong>for</strong>mation during the response to<br />
an outbreak. While the technology exists to enable this, in most cases the<br />
more pressing challenges surround people’s unwillingness to engage. Public<br />
acceptance of large health-related data projects has met several challenges to
74<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
date, generally over concerns around privacy, which seem to be particularly<br />
acute with regard to personal health in<strong>for</strong>mation.<br />
Communication of the benefits of such systems needs to be improved to<br />
ensure public trust in large government data projects is not undermined<br />
by perceptions that government is not capable of acting responsibly with<br />
public data. In particular, there are fears that if the health-care sector has<br />
access to large data sets, <strong>and</strong> is able to share this in<strong>for</strong>mation with other<br />
organisations, this may enable health insurance companies <strong>and</strong> other<br />
private-sector companies to profile customers <strong>and</strong> discriminate against<br />
them. These fears need to be addressed, <strong>and</strong> better underst<strong>and</strong>ing is needed<br />
on how in<strong>for</strong>mation regarding such data collection projects can be best<br />
communicated.<br />
An important consideration in the collection of health data – <strong>and</strong> in fact any<br />
data – is that the collection must be targeted to ensure that collection is<br />
necessary <strong>and</strong> efficient. In order to win (<strong>and</strong> maintain) public confidence <strong>and</strong><br />
trust, data should not be collected <strong>for</strong> the sake of collecting data.<br />
Suggested Research Topics<br />
• Research is needed on how easily existing data collection <strong>and</strong><br />
surveillance systems could morph from ‘business as usual’ to<br />
‘public health emergency’ conditions, <strong>and</strong> what additional features<br />
or functionality are needed to enable this. How easily can current<br />
surveillance systems cope with significantly increased data traffic,<br />
or more frequent updating? How rapidly can different surveillance<br />
systems be aggregated together <strong>and</strong> analysed?<br />
• Further research is needed into what uses people will accept or object<br />
to regarding their personal health data, <strong>and</strong> in what circumstances.<br />
This would help to develop a better underst<strong>and</strong>ing of how the way in<br />
which people are in<strong>for</strong>med of data collection (including who is doing<br />
the collection, <strong>for</strong> what reason, <strong>and</strong> what the data is likely to be used<br />
<strong>for</strong>) will influence how they react to it<br />
• Research is needed into how best to encourage people to opt in<br />
to data collection schemes, possibly by offering minor incentives<br />
that encourage individuals to sign up. The volume of health-care<br />
data available makes them ideal <strong>for</strong> subjection to general pattern<br />
analysis. This should be anonymised (as the patterns will emerge<br />
whether the data are anonymised or not), but the analysis may<br />
reveal where targeted interventions will have maximum impact; this<br />
requires the data to be linked to the individuals who will receive<br />
that intervention. Research is needed on how <strong>and</strong> at what stage<br />
data should be anonymised <strong>and</strong> de-anonymised during the analysis<br />
process.
75<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Box 2: Social media <strong>and</strong> health emergencies.<br />
As well as being a potential tool <strong>for</strong> surveillance <strong>and</strong> data collection, social<br />
media offers opportunities to influence the affected population to take up (or<br />
refrain from) certain behaviours during health emergencies, including enabling<br />
discussions to take place over whether or not suggested behaviours should be<br />
followed. In Asia, a legacy of the 2002–03 outbreak of Serious Acute Respiratory<br />
Syndrome (SARS) is that many people now wear a protective face mask when<br />
they have a cold, or to prevent them from catching a cold. Such masks are not as<br />
popular in Europe, however, <strong>and</strong> there is considerable disagreement over their<br />
usefulness in restricting the spread of infections. Social-media plat<strong>for</strong>ms could<br />
be used to promote discussion on whether wearing masks is useful <strong>for</strong> not,<br />
<strong>and</strong> where the medical community agrees that a behaviour is beneficial, social<br />
media could be used to disseminate advice. It could also be used to provide<br />
updates on the spread of infections (such as the number <strong>and</strong> location of cases)<br />
<strong>and</strong> predictions on where the disease is likely to spread next.<br />
Community Engagement<br />
During a p<strong>and</strong>emic or other serious disease outbreak, social media could<br />
also be used to raise <strong>and</strong> support community organisations <strong>and</strong> organise<br />
volunteers to carry out activities beneficial to the community, such as<br />
organising shopping collection <strong>and</strong> delivery <strong>for</strong> those infected, so that they<br />
do not need to leave the house, <strong>and</strong> organising regular cleaning of lifts,<br />
staircases <strong>and</strong> other shared areas in blocks of flats. An example of such a<br />
scheme is ‘FluFriends’, 10 which during the H1N1 p<strong>and</strong>emic encouraged people<br />
to arrange who would be able to collect antiviral drugs <strong>for</strong> them should they<br />
become infected, do their shopping <strong>for</strong> them, <strong>and</strong> who they could phone<br />
regularly to say how they were feeling. More recently, FloodVolunteers 11<br />
has enabled individuals to come together <strong>and</strong> request assistance or offer<br />
expertise <strong>and</strong> skills to help those affected by flooding. The technology<br />
needed to co-ordinate community ef<strong>for</strong>ts during a crisis is simple, but the<br />
behavioural factors that will determine whether or not people sign up to<br />
such community networks <strong>and</strong> actively engage with them are more complex.<br />
10. Margaret Lally, ‘Flu Friends – a Possible Alternative’, British Red Cross, ,<br />
accessed 2 June 2014.<br />
11. FloodVolunteers, , accessed 3 June 2014.
Discussion Group 4: Individual Privacy Versus<br />
Community Safety<br />
Chair <strong>and</strong> Rapporteur: Jennifer Cole<br />
Key Issues <strong>and</strong> Challenges<br />
• There are situations in which individual privacy <strong>and</strong> community<br />
safety may be in direct conflict, as ensuring community safety may be<br />
dependent on intruding on personal privacies<br />
• Most of the public appear to expect higher levels of data protection<br />
<strong>and</strong> assurance from the public sector than they do from the private<br />
sector; they accept supermarkets in particular gathering, storing<br />
<strong>and</strong> sharing large amounts of in<strong>for</strong>mation on them, but object when<br />
government wants to do the same<br />
• Most surveillance legislation in the UK came into <strong>for</strong>ce be<strong>for</strong>e socialmedia<br />
plat<strong>for</strong>ms were widespread, resulting in confusion over how the<br />
current laws relate to this <strong>for</strong>m of communication, <strong>and</strong> in particular<br />
over what constitutes private communication on social media.<br />
The aim of this discussion group was to consider situations in which<br />
requirements <strong>for</strong> individual privacy <strong>and</strong> community safety might come into<br />
conflict with data collection <strong>and</strong> sharing, <strong>and</strong> how academic research might<br />
help to provide a better underst<strong>and</strong>ing of, or solutions to, this dilemma.<br />
The group was provided with examples to consider, such as the police<br />
<strong>and</strong> security services conducting surveillance that might be considered an<br />
invasion of privacy on individuals who pose (or are suspected of posing) a<br />
danger to the wider community because of extreme views or suspected links<br />
to terrorist networks. To what extent does the need to protect society from<br />
such individuals justify monitoring not only the individuals themselves, but<br />
also their social networks, including friends, family members <strong>and</strong> colleagues?<br />
An example presented was of health services tracking the movements of an<br />
individual suffering from (or suspected to be suffering from) an infectious<br />
disease that might spread to others, or monitoring their recent social media<br />
communications to actively identify people who might have been in contact<br />
with them so that they can be contacted <strong>for</strong> treatment.<br />
The group was asked to discuss whether legislation around such surveillance,<br />
data collection <strong>and</strong> data sharing should be absolute: equally applicable<br />
to a gang of youths suspected of anti-social activity, such as graffiti, as to<br />
a suspected terrorist network, <strong>for</strong> example, or during a routine disease<br />
outbreak involving mild symptoms or a p<strong>and</strong>emic of life-threatening severity.<br />
The group was also asked to discuss how political <strong>and</strong> public opinion shapes<br />
surveillance <strong>and</strong> data legislation, how attitudes change over time, <strong>and</strong> what
77<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
the drivers of change are likely to be. Are there specific drivers or qualifiers<br />
that enable new legislation to be introduced <strong>and</strong> accepted?<br />
The 11 September 2001 terrorist attacks on the US created an atmosphere<br />
of fear that has led to greater social acceptance of surveillance when this is<br />
explained as being <strong>for</strong> security purposes. The US PATRIOT Act was given an<br />
example of a measure that has curtailed liberties in favour of security. 1 The<br />
group discussed what impact legislation such as this has on privacy, <strong>and</strong> at<br />
what point the balance between security <strong>and</strong> privacy might be considered<br />
to have tipped too far towards security. In times of great need, rules may<br />
change, but how <strong>and</strong> through what processes this should be allowed <strong>and</strong><br />
accepted are not currently understood.<br />
Determining the Risk Threshold <strong>for</strong> Collection<br />
The conference was held less than a year after the Snowden releases, at a<br />
time when new revelations about US <strong>and</strong> UK government actions under the<br />
controversial PRISM programme were coming to light each month. 2 Privacy<br />
<strong>and</strong> surveillance issues were there<strong>for</strong>e fresh in participants’ minds, <strong>and</strong> the<br />
relationship between them was still a very controversial issue. This applied<br />
particularly to widespread surveillance programmes that collect large volumes<br />
of data against relatively low-risk thresholds – in other words, surveillance<br />
programmes that lean far more towards security than privacy. A key point,<br />
the group felt, was to determine what level of activity, or connection to a<br />
network under surveillance, justified placing that individual or community<br />
under surveillance themselves. Set the bar higher <strong>for</strong> privacy, <strong>and</strong> the risk<br />
is that individuals who are involved will not be identified; set the bar higher<br />
<strong>for</strong> security, <strong>and</strong> the authorities will be accused of ‘snooping’ on innocent<br />
people. The group recognised that this is a dilemma <strong>for</strong> governments to<br />
which there is no easy answer, but that academia could help by researching<br />
attitudes to privacy <strong>and</strong> surveillance <strong>and</strong> identifying what decisions are more<br />
likely to be accepted, <strong>and</strong> why.<br />
The group broadly agreed with US President Barak Obama’s declaration that<br />
it is impossible to have ‘100 percent security <strong>and</strong> then have 100 percent<br />
privacy’, 3 <strong>and</strong> acknowledged that the actions of the UK government with<br />
regard to PRISM have been legal <strong>and</strong> based around existing legislation. There<br />
were concerns, however, that such surveillance was only legal because of the<br />
way in which the public seemed happy to sign away privacy rights in the terms<br />
1. The US PATRIOT Act: Preserving Life <strong>and</strong> Liberty, , accessed 16 June 2014.<br />
2. For a full explanation of PRISM, see Leon Kelion, ‘Q&A: NSA’s Prism Internet<br />
Surveillance Scheme’, BBC News, 1 July 2013, , accessed 16 June 2014.<br />
3. Paul Adams, ‘Barack Obama Defends US Surveillance Tactics’, BBC News, 8 June 2013,<br />
, accessed 16 June 2014.
78<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
<strong>and</strong> conditions they accept when signing up to social-media plat<strong>for</strong>ms such<br />
as Facebook, <strong>and</strong> because most surveillance <strong>and</strong> data assurance regulation<br />
includes clauses stating, or are worded in such a way, that individuals’ rights<br />
to privacy can often be over-ridden in situations that are loosely defined<br />
in legislation using terms such as ‘public safety’, <strong>and</strong> ‘immediate danger’,<br />
without expressing what would constitute such a situation. Such wording<br />
can there<strong>for</strong>e be argued to amount to ‘get-out clauses’ that enable privacy<br />
to be overridden whenever the government decides.<br />
Article 8.1 of the UK Human Rights Act 1998, 4 which came into <strong>for</strong>ce in<br />
October 2000, states that ‘everyone has the right to respect <strong>for</strong> his private<br />
<strong>and</strong> family life, his home <strong>and</strong> his correspondence’. This is a qualified right,<br />
however, which can be overruled by Article 8.2, which states: 5<br />
There shall be no interference by a public authority with the exercise of<br />
this right except such as is in accordance with the law <strong>and</strong> is necessary<br />
in a democratic society in the interests of national security, public<br />
safety or the economic well-being of the country, <strong>for</strong> the prevention<br />
of disorder or crime, <strong>for</strong> the protection of health or morals, or <strong>for</strong> the<br />
protection of the rights <strong>and</strong> freedoms of others.<br />
In other words, perceived danger to the wider community or society trumps<br />
the right to privacy of the individual.<br />
A second key piece of UK legislation affecting this debate is the Regulation of<br />
Investigatory Powers Act (RIPA) 2000, 6 which was introduced to modernise<br />
laws relating to the interception of communications in order to protect the<br />
public adequately from terrorism, cyber-crime <strong>and</strong> online paedophilia, <strong>and</strong><br />
has attracted criticism from civil rights <strong>and</strong> privacy campaigners such as<br />
Liberty 7 <strong>and</strong> The Open Rights Group, 8 which refer to it as ‘The Snooper’s<br />
Charter’. Section s26(2)c states that the Act ‘allows covert surveillance where<br />
there is “immediate danger”’, including directed surveillance undertaken <strong>for</strong><br />
the purposes of a specific investigation or operation. This would appear to<br />
allow surveillance that would actively seek out <strong>and</strong> identify individuals whose<br />
4. Human Rights Act 1998, Article 8, Right to Respect <strong>for</strong> Private <strong>and</strong> Family Life, , accessed 16 June 2014.<br />
5. Human Rights Act 1998, Schedule 1, Article 8, Right to Respect <strong>for</strong> Private <strong>and</strong> Family Life,<br />
, accessed 19 July 2014.<br />
6. Regulation of Investigatory Powers Act 2000, , accessed 16 June 2014.<br />
7. Liberty, ‘State Surveillance’, , accessed 16 June 2014.<br />
8. Digital Surveillance, ‘Why the Snooper’s Charter is the Wrong Approach: A Call <strong>for</strong><br />
Targeted <strong>and</strong> Accountable Investigatory Powers’, The Open Rights Group, , accessed<br />
16 June 2014.
79<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
private correspondence suggested that they may be involved in activities<br />
that pose a threat, or potential threat, to wider society; though, again, the<br />
exact terms of this threat are not specified.<br />
While RIPA dictates what in<strong>for</strong>mation can be sought out by the state during<br />
exceptional circumstances, the <strong>Data</strong> Protection Act 1998 9 (in particular the<br />
emergency powers set out under Schedule 2 a, c <strong>and</strong> d) determines to what<br />
extent in<strong>for</strong>mation that might normally be protected can be shared; clauses<br />
exclude protection where the processing is ‘necessary’ <strong>and</strong> ‘<strong>for</strong> the exercise<br />
of any functions of either House of Parliament’. The Act is primarily concerned<br />
with protecting confidentiality <strong>and</strong> imposes a duty on organisations to<br />
ensure that data are used only <strong>for</strong> authorised purposes <strong>and</strong> are properly<br />
protected (HSC/99/012). In addition, the <strong>Data</strong> Protection Act enables sharing<br />
where ‘the data subject has given his consent to the processing’, which has<br />
been hotly debated with regard to how the terms <strong>and</strong> conditions on socialmedia<br />
plat<strong>for</strong>ms are accepted by users when they join, <strong>and</strong> whether this can<br />
genuinely be interpreted as in<strong>for</strong>med consent. Since the conference, at least<br />
one court case has challenged the right of Facebook to pass in<strong>for</strong>mation to<br />
the US security agencies. 10 The discussion group also noted that the laws<br />
relating to surveillance, data collection <strong>and</strong> sharing came into <strong>for</strong>ce be<strong>for</strong>e<br />
social media plat<strong>for</strong>ms such as Twitter <strong>and</strong> Facebook, which communicate<br />
one-to-many, existed. To what extent social-media communications over<br />
such plat<strong>for</strong>ms constitute ‘private communication’, <strong>and</strong> how this can be<br />
interpreted under such legislation, still needs considerable debate, to which<br />
academia is well placed to contribute.<br />
Responsibilities of Government<br />
The discussion group agreed that protection of the many – be this a particular<br />
community or society as a whole – is paramount <strong>for</strong> government. Where<br />
there is conflict between this <strong>and</strong> the right of an individual to privacy, the<br />
group felt that it is right <strong>for</strong> the government to focus on the protection of<br />
the many. Nevertheless, there still need to be clearly set boundaries <strong>and</strong><br />
a response that is scalable to the actual risk posed. Academics can help<br />
to define these boundaries <strong>and</strong> to qualify the risk(s). The group discussed<br />
the degree to which individuals should be monitored at different levels of<br />
suspected or known involvement in activity, <strong>and</strong> agreed that rather than<br />
being absolute, this should probably differ depending on what that activity is.<br />
For example, it may be (more) acceptable to begin surveillance on the entire<br />
9. <strong>Data</strong> Protection Act 1998, ,<br />
accessed 16 June 2014.<br />
10. Mary Carolan, ‘Facebook <strong>Data</strong> Transfer Interfered With Privacy Daily, Court Told’,<br />
Irish Times, 30 April 2014, , accessed<br />
16 June 2014; Europe vs Facebook, ,<br />
accessed 16 June 2014.
80<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
social network of a suspected terrorist from the moment that individual is<br />
suspected of involvement in terrorist activity, but less acceptable to monitor<br />
all social contacts of a group of youths involved in anti-social but reasonably<br />
harmless graffitiing from such an early stage.<br />
The group felt that different threats, such as minor crime <strong>and</strong> terrorism,<br />
warrant different approaches, <strong>and</strong> there<strong>for</strong>e more research is needed into<br />
how the harm caused by certain activities is defined <strong>and</strong> measured, with<br />
better underst<strong>and</strong>ing needed of the extent to which some relatively lowharm<br />
activities overlap with more serious ones. For example, if a definite<br />
link can be identified between graffiti <strong>and</strong> terrorism – in the same way that<br />
definite links have now been identified between financial crime <strong>and</strong> terrorism<br />
– this would help to justify carrying out surveillance on individuals that are<br />
not directly linked to more serious activities but may well be the link on a<br />
social network, or between two social networks, that will lead to the more<br />
serious threats. Academic research could help to highlight which activities<br />
appear to have definite links <strong>and</strong> which do not.<br />
The Privacy Narrative<br />
Some members of the group felt that a small but disproportionately vocal<br />
privacy lobby negatively influence public opinion against surveillance that<br />
only disadvantages those who are breaking the law or who have something<br />
to hide. Innocent citizens have nothing to fear from government surveillance<br />
of their activities <strong>and</strong> so should not mind if such surveillance goes on. It can<br />
be argued that the advantages to a law-abiding citizen of having all personal<br />
communications <strong>and</strong> personal data collected <strong>and</strong> stored <strong>for</strong> potential analysis<br />
so strongly outweigh any perceived disadvantages that no one should object<br />
to it. Explaining this in such a way that people more easily see the benefits<br />
would act as a strong counter to the images conjured up by privacy lobbyists<br />
of sinister government agents spying on innocent citizens’ private lives <strong>and</strong><br />
somehow causing them harm by doing so. A suggested academic research<br />
project that might help to counter such anti-surveillance narratives was a<br />
retrospective look at IRA terrorism <strong>and</strong> the social networks of IRA terrorists.<br />
Could modern social network analysis 11 be applied retrospectively to<br />
historical case studies of IRA terrorism <strong>and</strong> illustrate how the use of such<br />
technology might have been able to construct social networks, identify<br />
other potential terrorists <strong>and</strong> prevent IRA attacks be<strong>for</strong>e they happened?<br />
The group felt that such historical revisiting of past terrorism networks could<br />
help to illustrate the value of storing data on individuals who may appear to<br />
be (or, in fact, actually be) innocent, but who can lead investigators to more<br />
dangerous individuals.<br />
11. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014.
81<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
Differing Perceptions of Public <strong>and</strong> Private <strong>Data</strong> Collection<br />
The group noted the very different attitude to privacy with regard to the public<br />
<strong>and</strong> private sectors, <strong>and</strong> also recognised that private-sector companies react<br />
to public opinion just as much as politicians, but are often able to change<br />
their procedures <strong>and</strong> policy more rapidly. Public opinion has a great effect on<br />
supermarkets, <strong>for</strong> example, which invest hugely in underst<strong>and</strong>ing customer<br />
psychology. To maintain customer trust (<strong>and</strong> there<strong>for</strong>e custom) they have<br />
to be seen to be acting responsibly with customers’ data. There is still not<br />
enough underst<strong>and</strong>ing of why customers readily sign up <strong>for</strong> supermarket<br />
reward cards, often signing away most of their data protection rights in the<br />
process, when very little in<strong>for</strong>mation is given on where the in<strong>for</strong>mation they<br />
provide will be going or how it will be used. This raised questions within<br />
the group as to why it appears to be less acceptable to the general public<br />
<strong>for</strong> certain government departments to share in<strong>for</strong>mation as readily as the<br />
private sector does. The group had little doubt that Virgin Health is likely to<br />
share customers’ in<strong>for</strong>mation with Virgin Media, but automatic transfer of<br />
in<strong>for</strong>mation from one government department to another is treated with<br />
suspicion by the public, suggesting that the public <strong>and</strong> private sectors are<br />
perceived very differently. Academic research could help to pinpoint what<br />
these differences are.<br />
One explanation that was offered was that the public readily underst<strong>and</strong><br />
that supermarkets (<strong>and</strong> other commercial entities) are motivated by selling<br />
more goods, <strong>and</strong> are there<strong>for</strong>e using their data to target advertising to them.<br />
In general, the public do not object to this as they may well be pleased to<br />
be alerted to new products they may like. The problem of government data<br />
collection was that obvious benefits are often less apparent, leaving the<br />
public with the perception that the government is trying to catch them out in<br />
some way – to check if they are paying enough tax, or to make sure they are<br />
not claiming benefits to which they are not entitled, <strong>for</strong> example. There are<br />
there<strong>for</strong>e few easily understood advantages to the individual to government<br />
use of such data, but many disadvantages. Better communication of the<br />
benefits, <strong>and</strong> perhaps even some obvious rewards, would encourage<br />
engagement with data projects. A suggestion was made that signing up<br />
to NHS health databases should enable individuals to receive discounts<br />
on prescriptions, or to be prioritised on hospital waiting lists above those<br />
who have not signed up, but this was challenged by other members of the<br />
group as unethical, <strong>and</strong> would need considerable research to underst<strong>and</strong> the<br />
implications be<strong>for</strong>e it should be considered.<br />
The International Community<br />
Finally, the group discussed who constitutes the ‘community’ whose security<br />
might benefit from compromises on personal privacy: the community<br />
as a whole – that is, everyone – or just the ruling elite. A major issue in<br />
compromises to personal privacy was the fear of a fascist state using the data<br />
collected on its citizens to discriminate against them or actively harm them.
82<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
This leads to different attitudes towards data collection by the state that are<br />
influenced by local <strong>and</strong> national histories: countries that have been subjected<br />
to harsh regimes in the past may be more cautious about h<strong>and</strong>ing over<br />
their data in future. One delegate gave an example of an Eastern European<br />
nation setting up its equivalent of the National Archives in a building that<br />
had previously been the headquarters of its secret police. Though there was<br />
no suggestion that the new archive had any sinister intention behind it, the<br />
public’s willingness to engage with it was clouded by the past associations<br />
of the building in which it was housed. Different cultures react differently to<br />
the sharing of historical records, <strong>and</strong> while British citizens expect the state to<br />
be subservient to them, this view is not held to the same degree across the<br />
entire European Union. Taking this into account, the relationship between<br />
the community <strong>and</strong> the individual may differ in different regions <strong>and</strong> nations,<br />
raising new challenges where the community is international, <strong>and</strong> data<br />
may potentially be being shared across borders. Again, the group felt that<br />
academia could help to build underst<strong>and</strong>ing of these differences, <strong>and</strong> map<br />
cultural attitudes that may help to predict acceptance of, or resistance to,<br />
the creation of new databases <strong>and</strong> international data sharing.<br />
Suggested Research Topics<br />
• Academics could try to identify links between different criminal or<br />
anti-social behaviours to help justify carrying out surveillance on<br />
individuals who are not directly linked to more serious activities but<br />
may well be one or two links away on overlapping social networks <strong>and</strong><br />
who could help lead to the more serious threats. Academic research<br />
could help to highlight which activities appear to have definite links<br />
with one another <strong>and</strong> which do not<br />
• Where there is conflict between the need to protect the community<br />
<strong>and</strong> the right of an individual to privacy, academia can help to<br />
determine where the boundaries should lie <strong>and</strong> suggest how to<br />
develop a response that is scalable to the actual risk posed. In<br />
particular, academia can help to determine qualitative <strong>and</strong> quantitative<br />
measurements of the risks<br />
• Historical revisiting of known terrorist <strong>and</strong> criminal networks using<br />
modern data analysis techniques, <strong>and</strong> using these to highlight how<br />
such in<strong>for</strong>mation may have helped to prevent terrorist attacks or<br />
disrupt activity earlier, might help to communicate the benefits<br />
of large-scale surveillance programmes <strong>and</strong> explain how <strong>and</strong> why<br />
sacrificing some privacy <strong>for</strong> security is beneficial.
Research Themes Identified in the<br />
Discussion Groups
Research Themes Identified in the Discussion<br />
Groups<br />
Discussion Group 1: Legality <strong>and</strong> Ethics of <strong>Data</strong> Sharing<br />
• An in-depth examination is needed of public underst<strong>and</strong>ing of the<br />
surveillance <strong>and</strong> privacy debate, to provide recommendations that<br />
will encourage more people to engage in shaping future policy<br />
• Academic research can help to explain why people’s perceptions of<br />
what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically<br />
depending on who is collecting the data<br />
• Research is needed into how to educate people not to willingly give<br />
up data without questioning what this might enable others to do with<br />
those data. Many users do not underst<strong>and</strong> the potential dangers or<br />
the security vulnerabilities<br />
• Academia should suggest ways in which different levels of privacy<br />
settings <strong>and</strong> data sharing agreements can be built into online systems,<br />
so that customers genuinely have a choice in whether or not to accept<br />
the terms they are offered.<br />
Discussion Group 2: Policing, Terrorism, Crime <strong>and</strong> Fraud<br />
• More research is needed into how data analysis (<strong>and</strong> data analysts)<br />
can identify <strong>and</strong> interpret missing data <strong>and</strong> data on deviations from<br />
the expected norm. Real-time processing, combined with analysts<br />
who have an underst<strong>and</strong>ing of the context in which the data have<br />
been collected, will help such trends to be picked out more easily <strong>and</strong><br />
interpreted appropriately. Predictive analytics against the norm are<br />
needed, which can look at deviant behaviour at an individual level, as<br />
well as at the wider community level<br />
• A better underst<strong>and</strong>ing is required of how to link data to underlying<br />
causes, along with methodology that can guard against the negative<br />
influence of supposition. Techniques need to be developed that<br />
remodel datasets with excluded or removed data reintegrated, so<br />
that results can be compared <strong>and</strong> differences analysed in order to<br />
test the validity of initial assumptions<br />
• Better research is needed into ways to remodel data <strong>and</strong> test<br />
assumptions so that a more detailed picture can be built of how data<br />
reflect assumptions. This may help to identify whether some leads<br />
are currently being missed because of inherent biases in the way the<br />
data are approached.<br />
Discussion Group 3: Health <strong>Data</strong>, Public Health <strong>and</strong> Public Health<br />
Emergencies<br />
• Research is needed into how easily existing data collection <strong>and</strong><br />
surveillance systems could morph from ‘business as usual’ to
85<br />
<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />
‘public health emergency’ conditions, <strong>and</strong> what additional features<br />
or functionality are needed to enable this. How easily can current<br />
surveillance systems cope with significantly increased data traffic,<br />
or more frequent updating? How rapidly can different surveillance<br />
systems be aggregated together <strong>and</strong> analysed?<br />
• Further research is needed into what uses people will accept or object<br />
to regarding their personal health data, <strong>and</strong> in what circumstances.<br />
This would help to develop a better underst<strong>and</strong>ing of how the way in<br />
which people are in<strong>for</strong>med of data collection (including who is doing<br />
the collection, <strong>for</strong> what reason, <strong>and</strong> what the data is likely to be used<br />
<strong>for</strong>) will influence how they will react to it<br />
• Research is needed into how best to encourage people to opt in to<br />
data collection schemes, possibly by offering minor incentives that<br />
encourage individuals to sign up<br />
• The volume of health-care data available makes them ideal <strong>for</strong><br />
subjection to general pattern analysis. This should be anonymised (as<br />
the patterns will emerge whether the data are anonymised or not)<br />
but the analysis may reveal where targeted interventions will have<br />
maximum impact; this requires the data to be linked to the individuals<br />
who will receive that intervention. Research is needed on how <strong>and</strong> at<br />
what stage data should be anonymised <strong>and</strong> de-anonymised during<br />
the analysis process.<br />
Discussion Group 4: Individual Privacy versus Community Safety<br />
• Academics could try to identify links between different criminal or<br />
antisocial behaviours to help justify carrying out surveillance on<br />
individuals who are not directly linked to more serious activities but<br />
may well be one or two links away on overlapping social networks,<br />
<strong>and</strong> who can help lead to the more serious threats. Academic research<br />
could help to highlight which activities appear to have definite links<br />
with one another <strong>and</strong> which do not<br />
• Where there is conflict between the need to protect the community<br />
<strong>and</strong> the right of an individual to privacy, academia can help to<br />
determine where the boundaries should be <strong>and</strong> suggest how to<br />
develop a response that is scalable to the actual risk posed. In<br />
particular, academia can help to determine qualitative <strong>and</strong> quantitative<br />
measurements of the risks<br />
• Historical revisiting of known terrorist <strong>and</strong> criminal networks using<br />
modern data analysis techniques, <strong>and</strong> using these to highlight how<br />
such in<strong>for</strong>mation may have helped to prevent terrorist attacks or<br />
disrupt activity earlier, might help to communicate the benefits<br />
of large-scale surveillance programmes <strong>and</strong> explain how <strong>and</strong> why<br />
sacrificing some privacy <strong>for</strong> security is beneficial.