25.08.2015 Views

Big Data for Security and Resilience

201410_Big_Data_STFC_WEB_FINAL

201410_Big_Data_STFC_WEB_FINAL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong><br />

<strong>and</strong> <strong>Resilience</strong><br />

Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next<br />

Generation of Policy-Makers<br />

Proceedings of the Conference ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong>: Challenges<br />

<strong>and</strong> Opportunities <strong>for</strong> the Next Generation of Policy-Makers’<br />

Edited by Jennifer Cole<br />

STFC/RUSI Conference Series No. 4


Conference Report, October 2014<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next Generation<br />

of Policy-Makers<br />

Proceedings of the Conference ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong>: Challenges <strong>and</strong> Opportunities <strong>for</strong> the<br />

Next Generation of Policy-Makers’, March 2014<br />

Edited by Jennifer Cole<br />

www.stfc.ac.uk<br />

www.rusi.org


A joint publication of RUSI <strong>and</strong> the STFC, 2014.<br />

Royal United Services Institute <strong>for</strong> Defence <strong>and</strong> <strong>Security</strong> Studies<br />

Whitehall<br />

London<br />

SW1A 2ET<br />

UK<br />

Science <strong>and</strong> Technology Facilities Council<br />

Polaris House<br />

North Star Avenue<br />

Swindon<br />

SN2 1SZ<br />

Editor: Jennifer Cole<br />

Sub-editor: Susannah Wright<br />

Individual authors retain copyright of their contributions to this publication.<br />

This report may be copied <strong>and</strong> electronically transmitted freely. It may not<br />

be reproduced in a different <strong>for</strong>m without prior permission of RUSI <strong>and</strong> the<br />

SFTC.


Contents<br />

Foreword<br />

Bryan Edwards<br />

v<br />

Introduction: Machine Learning <strong>for</strong> <strong>Big</strong> <strong>Data</strong> 1<br />

Alex Gammerman <strong>and</strong> Jennifer Cole<br />

I. The National Archives, <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Security</strong>: Why Dusty<br />

Documents Really Matter 5<br />

Tim Gollins<br />

II. Trends in <strong>Big</strong> <strong>Data</strong>: Key Challenges <strong>for</strong> Skills 14<br />

Harvey Lewis<br />

III: <strong>Big</strong> <strong>Data</strong> <strong>and</strong> Financial Transactions: Providing New Means<br />

of Analysis 18<br />

Gregory M<strong>and</strong>oli<br />

IV. Characteristics of Terrorist Finance Networks: The Human Element 28<br />

Neil Bennett<br />

V: Terrorism <strong>and</strong> Political Risk Modelling 32<br />

Mark Lynch<br />

VI: Intelligent Use of Electronic <strong>Data</strong> to Enhance Public Health<br />

Surveillance 38<br />

Edward Velasco<br />

VII: The Raxibacumab Experience: The First Novel Product Approved<br />

Under the US Food <strong>and</strong> Drug Administration ‘Animal Rule’ 47<br />

Chia-Wei Tsai<br />

Discussion Groups<br />

Rapporteurs: Philippa Morrell, Chris Sheehan, Ed Hawker<br />

Discussion Group 1: The Ethics <strong>and</strong> Legality of <strong>Big</strong> <strong>Data</strong> Sharing 57<br />

Chair <strong>and</strong> Rapporteur: Edward Hawker<br />

Discussion Group 2: Policing, Terrorism, Crime <strong>and</strong> Fraud 62<br />

Chair: David Smart; Rapporteur: Philippa Morrell


Discussion Group 3: Health <strong>Data</strong>, Public Health <strong>and</strong> Public Health<br />

Emergencies 68<br />

Chair: Chris Watkins<br />

Discussion Group 4: Individual Privacy Versus Community Safety 76<br />

Chair <strong>and</strong> Rapporteur: Jennifer Cole<br />

Research Themes Identified in the Presentations <strong>and</strong> Discussion Groups 83<br />

An additional three presentations were given at the conference by<br />

Professor John Parkinson of the Medicines <strong>and</strong> Health Products<br />

Regulatory Agency (MHRA), Michael Connaughton of Oracle, <strong>and</strong> Dr<br />

Catriona McLeish of the University of Sussex. For a variety of reasons,<br />

no written papers were produced <strong>for</strong> these presentations, but we would<br />

still like to acknowledge their contribution to the event. The Powerpoint<br />

presentations given by Michael Connaugton <strong>and</strong> Professor Parkinson, as<br />

well as those delivered by the speakers who have contributed a written<br />

paper, can be accessed on the RUSI website events page here: http://goo.<br />

gl/9cXC3g.


Foreword<br />

Bryan Edwards<br />

Of all the challenges facing the UK today, few are as dem<strong>and</strong>ing as those<br />

affecting its national security. Some threats to the UK <strong>and</strong> its citizens are<br />

modern variants of those that the country has faced <strong>for</strong> many years. Others<br />

are entirely new <strong>and</strong> different to anything that has preceded them; while<br />

some, no doubt, have yet to be recognised or understood.<br />

One feature of this large, complex <strong>and</strong> constantly evolving array of challenges<br />

is that few, if any, lend themselves to single-discipline solutions.<br />

With this in mind, the Science <strong>and</strong> Technology Facilities Council (STFC)<br />

operates a Defence, <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong> Futures Programme. Challengeled<br />

<strong>and</strong> agnostic with respect to academic discipline, the STFC’s aim is to<br />

identify <strong>and</strong> facilitate opportunities to engage relevant capabilities within<br />

the UK National Laboratories <strong>and</strong> university research groups in relation to<br />

some of the highest-priority <strong>and</strong> most dem<strong>and</strong>ing challenges in national<br />

security.<br />

As part of this programme, the STFC is delighted to fund <strong>and</strong> proud to<br />

collaborate closely with RUSI in delivering a series of conferences on topical<br />

issues within this domain.<br />

Each meeting is designed to explore the interface between academic<br />

research <strong>and</strong> government policy <strong>and</strong> operations, in order to stimulate debate<br />

on how a step change, rather than incremental change, in the protection<br />

of the UK could be achieved. The meetings are strategic in character, with<br />

contributions from an atypically broad community drawn from universities,<br />

industry, government <strong>and</strong> its agencies <strong>and</strong> partners.<br />

At the <strong>for</strong>efront of the organisers’ minds is a deceptively simple question:<br />

what academic research can offer now, <strong>and</strong> in the future, to allow<br />

government to further enhance its capabilities in key areas, enabling it either<br />

to do significantly different things or to do what it does now in significantly<br />

different <strong>and</strong> better ways.<br />

In this context, <strong>Big</strong> <strong>Data</strong> is often identified as being of particular importance.<br />

Certainly, there is little doubt that raw data are being generated at what<br />

appears to be an accelerating rate. This is a trend that seems set to continue<br />

<strong>for</strong> the <strong>for</strong>eseeable future. Not only that, but complementary improvements<br />

in data storage technologies <strong>and</strong> telecommunications infrastructure mean<br />

that more of these data can be archived (potentially indefinitely) <strong>and</strong><br />

accessed on a global basis. And yet volume alone is insufficient to fully


vi<br />

Foreword<br />

appreciate either the nature of the challenge or the opportunities that<br />

exist. Indeed, if <strong>Big</strong> <strong>Data</strong> was defined simply according to volume alone,<br />

there would be few grounds <strong>for</strong> claiming a revolution. For example, during<br />

the 1990s, the strategy of the UK’s Department <strong>for</strong> Social <strong>Security</strong> sought<br />

to migrate benefits, such as unemployment benefits <strong>and</strong> pensions, from<br />

traditional paper-based systems to IT systems. The data volumes associated<br />

with this enterprise were large, even by today’s st<strong>and</strong>ards. It is there<strong>for</strong>e<br />

necessary to look instead at other characteristics of the data to identify<br />

what is qualitatively different, <strong>and</strong> to establish the source of the challenges<br />

<strong>and</strong> opportunities we are now presented with. These include features such<br />

as the diversity of the data, in terms of type <strong>and</strong> reliability. These in turn<br />

create new challenges <strong>for</strong> the development of the automated data analysis<br />

<strong>and</strong> interpretation systems required. This raises questions not only over how<br />

one could, in principle, approach the analysis of such data, but equally how<br />

systems based on these new principles could themselves be tested, verified<br />

<strong>and</strong> validated.<br />

While these technical challenges are significant, there are additional<br />

complexities associated with data residing in different organisations, <strong>and</strong><br />

a population that is becoming increasingly aware of <strong>and</strong> sensitive to the<br />

possibility of exploitation of data whose ownership they question in ways<br />

they consider inappropriate.<br />

In this meeting we look at some of the technical challenges that <strong>Big</strong> <strong>Data</strong><br />

presents, <strong>and</strong> consider a range of possible uses of <strong>and</strong> perspectives on<br />

data to tease out new issues. In the course of a one-day event, the scope<br />

<strong>for</strong> exploring them in detail is extremely limited. However, it is hoped that<br />

identifying relevant questions to be explored elsewhere is, in itself, a useful<br />

contribution to the debate.<br />

I would very much like to acknowledge the generous assistance <strong>and</strong> support<br />

offered by the US Department of Homel<strong>and</strong> <strong>Security</strong>, which contributed to<br />

making the day a success. Similarly, thanks must go to the staff at the STFC<br />

<strong>and</strong> RUSI, whose extremely hard work made this event possible. However,<br />

the final word of appreciation <strong>and</strong> gratitude is reserved <strong>for</strong> all those who<br />

participated so enthusiastically on the day itself, whether as speakers or as<br />

delegates.<br />

Anyone wishing to know more about the STFC’s Defence, <strong>Security</strong> <strong>and</strong><br />

<strong>Resilience</strong> Futures Programme in general, or about these conferences in<br />

particular, is invited to contact me using the e-mail address below.<br />

Professor Bryan Edwards<br />

Science <strong>and</strong> Technology Facilities Council<br />

bryan.edwards@stfc.ac.uk


Introduction: Machine Learning <strong>for</strong> <strong>Big</strong> <strong>Data</strong><br />

Jennifer Cole <strong>and</strong> Alex Gammerman<br />

This paper discusses the impact of the current high level of interest in <strong>Big</strong><br />

<strong>Data</strong> from academia <strong>and</strong> industry, <strong>and</strong> comments on how this is influencing<br />

the approach taken to funding research <strong>and</strong> developing skills in particular<br />

areas of computer science. It also discusses the relationship between <strong>Big</strong><br />

<strong>Data</strong> <strong>and</strong> machine learning – systems that have the ability to learn from<br />

data, rather than only following explicitly programmed instructions – <strong>and</strong><br />

the influence <strong>Big</strong> <strong>Data</strong> has on machine learning.<br />

For <strong>Big</strong> <strong>Data</strong> (or, <strong>for</strong> that matter, Small <strong>Data</strong>) to have any value, machine learning<br />

needs to be applied in order to extract useful in<strong>for</strong>mation from the data. The<br />

current approach to <strong>Big</strong> <strong>Data</strong> arguably places too much focus on the data as an<br />

end in themselves at the expense of properly considering the techniques <strong>and</strong><br />

approaches that will enable the best use to be made of them. For example, in<br />

2012 the International <strong>Data</strong> Corporation estimated that while the global data<br />

supply had reached about 2.8 zettabytes (1 zettabyte equalling 10 21 bytes),<br />

only an estimated 0.5 per cent of all data collected is used <strong>for</strong> analysis. 1 There<br />

is little point in <strong>Big</strong> <strong>Data</strong> per se; a problem needs to be defined <strong>and</strong> then the<br />

amount of data needed to solve this problem can be decided.<br />

As a way of extracting useful in<strong>for</strong>mation from data (irrespective of whether<br />

they are <strong>Big</strong> or Small <strong>Data</strong>) along with the academic disciplines <strong>and</strong> research<br />

that have contributed (<strong>and</strong> continue to contribute) to it, machine learning has<br />

much to offer in determining how the data are collected, analysed <strong>and</strong> used.<br />

Buzzwords in Computer History<br />

<strong>Big</strong> <strong>Data</strong> is a buzzword (or two), <strong>and</strong> it is not the first time in computer science<br />

that a new concept has been hailed as the answer to everything. In 1982, the<br />

Japanese Ministry of International Trade <strong>and</strong> Industry (MITI) began the Fifth<br />

Generation Computer Systems (FGCS) 2 project to develop a supercomputer<br />

that would further develop artificial intelligence. The British response to the<br />

Japanese challenge was the Alvey Programme 3 in in<strong>for</strong>mation technology. At<br />

that time, the way <strong>for</strong>ward <strong>for</strong> artificial intelligence was largely considered to<br />

be expert systems: computer systems that could help a human in the decision-<br />

1. John Gants <strong>and</strong> David Reinsel, ‘The Digital Universe in 2020: <strong>Big</strong> <strong>Data</strong>, <strong>Big</strong>ger Digital<br />

Shadows <strong>and</strong> <strong>Big</strong>gest Growth in the Far East’, International <strong>Data</strong> Corporation <strong>and</strong><br />

EMC, 2012, ,<br />

accessed 2 July 2014.<br />

2. Ehud Shapiro, ‘The Fifth Generation Project – a Trip Report’, Communications of the<br />

ACM (Vol. 26, No. 9, 1983), pp. 637–41, ,<br />

accessed 2 July 2014.<br />

3. The Alvey Programme, , accessed 30 July 2014.


2<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

making process by emulating the reasoning abilities of an expert. Such systems<br />

were supposed to solve everything. Gradually, however, as it become clear that<br />

expert systems have narrow <strong>and</strong> limited areas of application, unsubstantiated<br />

claims died down <strong>and</strong> the boom was over.<br />

The expert systems boom has much in common with the <strong>Big</strong> <strong>Data</strong> hullabaloo<br />

being experienced today. There seems to be an assumption that everything<br />

can be resolved by <strong>Big</strong> <strong>Data</strong>. It is somewhat naive to assume that theory is no<br />

longer needed to solve problems, just a lot of data <strong>and</strong> an ability to calculate<br />

a correlation between various items of data. This is nonetheless what some<br />

of proponents of <strong>Big</strong> <strong>Data</strong> say. 4 The myth persists that <strong>Big</strong> <strong>Data</strong> will provide<br />

the answers to all our questions. <strong>Big</strong> <strong>Data</strong> will not do this, but combined with<br />

machine learning it may help to provide some of them.<br />

<strong>Big</strong> <strong>Data</strong> <strong>and</strong> Machine Learning<br />

Modern machine learning exists at the intersection between statistics <strong>and</strong><br />

computer science. 5 Two main topics – inference (the process of reaching<br />

a conclusion from known facts) <strong>and</strong> data analysis – have been taken<br />

from statistics. In particular, non-parametric statistics (which makes no<br />

assumptions about probability distributions) has developed many methods<br />

<strong>and</strong> algorithms that are in use in machine learning. On the other h<strong>and</strong>, how<br />

to develop efficient algorithms <strong>and</strong> knowledge representation – the tractable,<br />

intractable, non-computable functions – are coming from computer science.<br />

Basically, machine learning tries to find regularities within past (or training)<br />

data (or examples) that allow the user to make predictions in future examples.<br />

This is done irrespective of the amount of data – big or small.<br />

Researchers at Royal Holloway, University of London, have been doing this<br />

<strong>for</strong> years: in 1998, the Computer Learning Research Centre 6 was established<br />

there <strong>and</strong> today two prominent Royal Holloway researchers are working in<br />

the field of statistical learning theory (SLT) with Vladimir Vapnik <strong>and</strong> Alexey<br />

Chervonenkis, the theory’s founders.<br />

Classical statistics usually deals with small scales <strong>and</strong> low dimensions of data;<br />

conceptual <strong>and</strong> computational difficulties may begin to arise when there are<br />

complex, sizable <strong>and</strong> high-dimensional data (roughly speaking, where the<br />

number of attributes or features are greater than a number of examples).<br />

Several machine learning methods are being developed to deal with these<br />

4. Chris Anderson, ‘The End of Theory: The <strong>Data</strong> Deluge Makes the Scientific Method<br />

Obsolete’, Wired, 16 July 2008, , accessed 30 July 2014.<br />

5. Many disciplines like psychology, mathematics, philosophy, linguistics, biology contribute<br />

to machine learning, but the main ones at present are statistics <strong>and</strong> computing.<br />

6. Computer Research Learning Centre, Royal Holloway University of London, , accessed 30 July 2014.


3<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

problems, including online predictions, parallel algorithms <strong>and</strong> efficient<br />

methods. Some of the new techniques being developed at Royal Holloway<br />

include string kernel techniques, prediction with expert advice <strong>and</strong> online<br />

con<strong>for</strong>mal predictors (or transductive confidence machines) – new learning<br />

techniques that make valid predictions. These techniques have been applied<br />

in a number of areas, <strong>for</strong> example <strong>for</strong> automatic target recognition, statistical<br />

profiling of offenders <strong>for</strong> the Home Office, material identification <strong>and</strong><br />

atmospheric correction <strong>for</strong> military applications, <strong>and</strong> anomaly detection to<br />

identify suspicious behaviour of ships <strong>and</strong> other vehicles. These techniques<br />

have also been applied to several medical fields, <strong>for</strong> example <strong>for</strong> detecting<br />

various abdominal diseases <strong>and</strong> ovarian cancer, <strong>and</strong> finding the best<br />

treatment <strong>for</strong> depression.<br />

One of the central questions in the theory of learning concerns the quantity of<br />

data needed in order to achieve a solution with a desirable degree of accuracy. A<br />

simple pattern recognition system to classify digits (0–9) can learn to recognise<br />

<strong>and</strong> correctly predict a shown digit after being trained on only a few hundred<br />

digits out of the hundreds of thous<strong>and</strong>s of digits available <strong>for</strong> training. 7 That<br />

is only a fraction of data, but enough to solve a problem. Pattern recognition<br />

systems often need surprisingly small amounts of data to obtain an answer.<br />

While intuitively it seems that the more data are used, the more accurate<br />

the prediction will be, the founders of SLT, 8 Vapnik <strong>and</strong> Chervonenkis, have<br />

shown that it is not just the length of training data that is important, but a<br />

concept called ‘capacity’ or ‘VC-dimension’ (after Vapnik <strong>and</strong> Chervonenkis).<br />

Roughly speaking, VC-dimension is a number of parameters of a decision<br />

rule. The important factor <strong>for</strong> quality of learning is a ratio of a length of<br />

the training set to the VC-dimension. A large ratio is ‘good’ from a learning<br />

perspective, as the results obtained on the test set are close to those on the<br />

training set to avoid ‘overfitting’ – the test set should show about the same<br />

accuracy (number of errors) as in the training set.<br />

If, however, there is a request to apply machine learning algorithms when <strong>Big</strong><br />

<strong>Data</strong> is provided but the analysis cannot be h<strong>and</strong>led on one machine, parallel<br />

algorithms can be developed <strong>and</strong> run on parallel machines. This requires more<br />

efficient methods to be developed, which is currently a challenge, though<br />

some progress is being made to resolve this. For example, in addition to wellknown<br />

methods such as induction, there are some advances in developing<br />

7. Alex Gammerman <strong>and</strong> Volodya Volk (2007), ‘Hedging Prediction in Machine Learning’,<br />

The Computer Journal (Vol. 50, No. 2, 2007), pp. 151–63.<br />

8. Oliver Bousquet at al., ‘Introduction to Statistical Learning Theory’, Max Plank Institute<br />

<strong>for</strong> Biological Cybernetics, 2004, , accessed 2 July 2014.


4<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

transductive methods. 9 In induction, particular examples are used to <strong>for</strong>mulate<br />

a general rule <strong>and</strong> then make predictions using this rule. The transductive<br />

instead goes from one example to another, which should be more efficient as<br />

the model does not have to solve an infinite number of examples, just find one<br />

particular example, which will in turn predict the next one. This could be a way<br />

<strong>for</strong>ward <strong>for</strong> developing new, efficient algorithms <strong>for</strong> prediction.<br />

Conclusions<br />

There is currently a lot of research into machine learning taking place <strong>and</strong><br />

new algorithms are being developed. They are both simple <strong>and</strong> rigorous, <strong>and</strong><br />

give a wide range of statistical learning methods. John Poppelaars 10 compared<br />

the current belief in <strong>Big</strong> <strong>Data</strong> with a fictional computer, Deep Thought, in The<br />

Hitchhiker’s Guide to the Galaxy, which took 10 million years to compute<br />

the ultimate question of life, the universe <strong>and</strong> everything, but because the<br />

beings who had programmed it never really knew what the question was,<br />

nobody knew what to make of the answer. Nowadays, people hope that <strong>Big</strong><br />

<strong>Data</strong> will help to find the ultimate question, but if we slightly paraphrased<br />

The Hitchhiker’s Guide to the Galaxy, we would argue that it is not <strong>Big</strong> <strong>Data</strong><br />

that will define the question: it is machine learning.<br />

Jennifer Cole is a Senior Research Fellow in <strong>Resilience</strong> <strong>and</strong> Emergency<br />

Management at the Royal United Services Institute, where her research<br />

programme has included a number of reports <strong>and</strong> projects on the use of <strong>Big</strong><br />

<strong>Data</strong> <strong>and</strong> cyber-security <strong>for</strong> the UK government, including the Foreign Office<br />

<strong>and</strong> Ministry of Defence. She is also a PhD c<strong>and</strong>idate in the Computer Science<br />

Department at Royal Holloway, University of London.<br />

Professor Alex Gammerman studied in Leningrad (now St Petersburg) <strong>and</strong> then<br />

worked in several research institutes of the Academy of Science of the USSR. In<br />

1983 he moved to the UK. He was appointed to the established Chair in Computer<br />

Science at the University of London (Royal Holloway <strong>and</strong> Bed<strong>for</strong>d New College)<br />

in 1993. Currently, he is Founding Director of the Computer Learning Research<br />

Centre at Royal Holloway, University of London, <strong>and</strong> a Fellow of the Royal<br />

Statistical Society. Professor Gammerman’s research interest lies in the field<br />

of machine learning, particularly the development of inductive–transductive<br />

confidence machines. Areas in which these techniques have been applied include<br />

medical diagnosis, <strong>for</strong>ensic science, genomics, environment <strong>and</strong> finance.<br />

This is a version of the paper written by the authors <strong>and</strong> can be found at http://<br />

clrc.rhul.ac.uk/publications/techrep.htm<br />

9. Vladimir Vapnik, The Nature of Statistical Learning Theory (New York, NY: Springet,<br />

1995).<br />

10. John Poppelaars, ‘Will <strong>Big</strong> <strong>Data</strong> End Operations Research?’, 2013, , accessed 30 July 2014.


I. The National Archives, <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Security</strong>:<br />

Why Dusty Documents Really Matter<br />

Tim Gollins<br />

This paper discusses three linked propositions. First, the way in which the National<br />

Archives, as a national institution of the United Kingdom, can be regarded as a<br />

repository of <strong>Big</strong> <strong>Data</strong>. The paper will discuss the concept of big data <strong>and</strong> place it<br />

in the historical context of archival collections that have trans<strong>for</strong>med the world,<br />

<strong>for</strong> example, the King of Assyria’s Library <strong>and</strong> the Library at Alex<strong>and</strong>ria. Second,<br />

it will consider the way in which the National Archives are central to UK security,<br />

providing a point of reference <strong>for</strong> society, <strong>and</strong> supporting citizens’ rights <strong>and</strong> the<br />

rule of law. It will also discuss the potential threat that emerges from a loss of<br />

trust in the processes that underlie the transfer of records to the Archives. Third,<br />

the paper will cover how the challenges of sensitivity reviews of digital records,<br />

which ensure that sensitive government records are archived appropriately, 1<br />

could give rise to further threats to the Archives <strong>and</strong> thus the wider security of<br />

our society. The paper goes on to show that in addressing the challenges of the<br />

sensitivity review of digital records, by using the <strong>Big</strong> <strong>Data</strong> nature of archives,<br />

opportunities arise to counter the wider threats to the security of our society.<br />

The Archives <strong>and</strong> <strong>Big</strong> <strong>Data</strong><br />

The classic definition of <strong>Big</strong> <strong>Data</strong> rests on volume, variety <strong>and</strong> velocity, 2 <strong>and</strong><br />

is inherently assumed to be digital. Taking a longer view, there are a number<br />

of points in history where such trans<strong>for</strong>mative conditions have existed with<br />

collections of other media, such as:<br />

• The 30,000 clay tablets from the oldest surviving royal library in the<br />

world: that of Ashurbanipal, King of Assyria (around 668–630 BC),<br />

including the story of Gilgamesh 3<br />

• The iconic Library of Alex<strong>and</strong>ria, alleged to have collected the<br />

knowledge of the ancient world under one roof (including 400,000–<br />

700,000 rolls within the collection). 4<br />

1. National Archives, ‘Step 3: Sensitivity Reviews of Selected Records’, ,<br />

accessed 25 July 2014.<br />

2. Anton Chuvakin, ‘Broadening <strong>Big</strong> <strong>Data</strong> Definition Leads to <strong>Security</strong> Idiotics!’, Gartner<br />

blog, 18 September 2013, , accessed 18 July 2014.<br />

3. British Museum, ‘The Library of Ashurbanipal, Research Project at the British<br />

Museum’, , accessed 19 August 2014.<br />

4. Heather Phillips, ‘The Great Library of Alex<strong>and</strong>ria’, Library, Philosophy <strong>and</strong> Practice<br />

2010, , accessed 18 July 2014.


6<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

In comparatively more recent times, as the practice <strong>and</strong> conventions<br />

of common law developed in Britain, the need to collect the records of<br />

cases <strong>and</strong> to access legal judgments <strong>for</strong> precedent gave rise to another<br />

example of <strong>Big</strong> <strong>Data</strong> of its day. Drawing on in<strong>for</strong>mation from the National<br />

Archives Catalogue, 5 we learn that ‘The Dialogus de Scaccario’, describing<br />

Exchequer administration in the 1170s, mentions a clerk who was deputy<br />

to the chancellor <strong>and</strong> had responsibility <strong>for</strong> the preparation <strong>and</strong> custody<br />

of <strong>for</strong>mal Chancery enrolments. Thereafter, the chancellor’s principal clerk<br />

was invariably associated with these duties, although progressively more<br />

<strong>and</strong> more remote from their direct execution; by 1388, <strong>and</strong> probably long<br />

be<strong>for</strong>e, a staff of subordinate clerks carried out the actual enrolments. From<br />

the mid-thirteenth century, this officer was generally known as the ‘keeper<br />

of the rolls’, <strong>and</strong>, as the first rank of Chancery clerks gradually came to be<br />

known as ‘masters’, the title ‘Master of the Rolls’ had become the st<strong>and</strong>ard<br />

designation by the fifteenth century. The holder of that post now chairs the<br />

Lord Chancellor’s Advisory Council, which assures the transfer of records to<br />

the Archives. 6<br />

Bringing the picture up to date, the paper holdings of the National Archives<br />

at Kew are over 1 billion paper pages, representing 1,000 years of history. 7<br />

At the same time, there are now over 2.5 billion archived pages accessible<br />

from the UK Government Web Archive (representing less than 20 years of<br />

contemporary history) 8 that are now being aggregated <strong>and</strong> mined to answer<br />

novel research questions that would have previously been intractable. The<br />

Archive is, <strong>and</strong> always has been, <strong>Big</strong> <strong>Data</strong>.<br />

The Archive <strong>and</strong> <strong>Security</strong><br />

Discussion of security should not be limited to considerations of criminality<br />

<strong>and</strong> terrorism. The security of UK society relies at its deepest level on the trust<br />

of the citizen in the state. It is all about the rule of law <strong>and</strong> the fact that no<br />

one, not even the executive, is above that rule. 9 The British state is different<br />

from many others in that the citizen expects the state to be subservient to it<br />

rather than the more common case. This is the very fabric of UK society; the<br />

rule of law supports <strong>and</strong> empowers the citizen.<br />

5. National Archives Catalogue, ,<br />

accessed 19 August 2014.<br />

6. National Archives Advisory Council In<strong>for</strong>mation, , accessed 19 August 2014.<br />

7. The authors’ own estimate based on approximately 12 million entries in the National<br />

Archives’ catalogue that refer to boxes or folders of records that can reasonably<br />

expected to hold upwards of 100 sheets of paper.<br />

8. National Archives UK Government Web Archive In<strong>for</strong>mation, , accessed 18 July 2014.<br />

9. The Rule of Law definition, LexisNexis, , accessed 18 July 2014.


7<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

The National Archives are fundamental to this aspect of security. The<br />

Archives provide the impartial witness that enables ‘holding to account’<br />

under the rule of law <strong>and</strong> in the court of history. They contain evidence<br />

of the transactions of the state <strong>and</strong> the executive <strong>and</strong> evidence of the<br />

decisions <strong>and</strong> policies enacted. This is central to Lord Bingham’s Fourth<br />

Principle: ‘Ministers <strong>and</strong> public officers at all levels must exercise the<br />

powers conferred on them in good faith, fairly, <strong>for</strong> the purpose <strong>for</strong> which<br />

the powers were conferred, without exceeding the limits of such powers<br />

<strong>and</strong> not unreasonably.’ 10 How can we know what the executive has done if<br />

the records are not kept?<br />

However, it is clearly not sufficient to consider the keeping of the record<br />

without considering how the record is selected <strong>and</strong> transferred to the<br />

Archives. The content of the Archives is clearly dependent on these processes.<br />

It follows there<strong>for</strong>e that the citizen must trust the process by which the<br />

Archives receive their material to sustain their rights.<br />

Transfer to the Archive<br />

The process by which public records are transferred to the National Archive<br />

is not widely understood, even among scholars who regularly use its content<br />

<strong>for</strong> their research. The principles of the appraisal that underlies transfer were<br />

laid down by the great archivist Hilary Jenkinson, who described many of the<br />

fundamentals of the UK system. 11 In setting out his approach, Jenkinson was<br />

trying to ensure that the UK archive (at that time The Public Records Office)<br />

was able to guard its independence under the rule of law, <strong>and</strong> could not fall<br />

foul of the criticism of complicity in wrongdoing that was evident in the case<br />

of the Nazi Archive in Germany with respect to the Holocaust. 12<br />

In summary, the transfer process consists of the following steps:<br />

• Appraisal <strong>and</strong> selection: determining which records meet the<br />

collection policy of the National Archives <strong>and</strong> then choosing which<br />

records should be transferred to the Archives or to a place of deposit<br />

• Sensitivity review: deciding which records should be open on transfer,<br />

which must be closed, <strong>and</strong> which must be retained in departments<br />

(under the ‘Lord Chancellor’s blanket’ – see below)<br />

• Preparation <strong>and</strong> delivery: the cataloguing, preparation <strong>and</strong><br />

10. IAP Annual Conference, ‘The Rule of Law in Prosecuting <strong>Big</strong> Businesses in Application<br />

to Regulatory Frameworks’, 2013, p. 2, , accessed 18 July 2014.<br />

11. Hilary Jenkinson, A Manual of Archive Administration (London: P. Lund, Humphries &<br />

Co Ltd, 1963 [1923]).<br />

12. Eric Westervelt, ‘Probe Details Culpability of Nazi-Era Diplomats’, NPR, 28 October<br />

2010, , accessed<br />

18 July 2014.


8<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

organisation of records <strong>for</strong> transfer <strong>and</strong> the actual transportation of<br />

records to the National Archives or to a place of deposit<br />

• Accessioning: the process by which the National Archives makes the<br />

records appropriately available.<br />

A Threat<br />

The principle of independence derived from <strong>and</strong> identified in the Grigg<br />

Report 13 that initiated the Public Records Act 1958 has led, over the years,<br />

to a series of checks <strong>and</strong> balances intended to ensure that the necessary<br />

records of the activities of the executive are deposited. These checks <strong>and</strong><br />

balances include: 14<br />

• The right of access to in<strong>for</strong>mation in departments under freedom of<br />

in<strong>for</strong>mation legislation be<strong>for</strong>e in<strong>for</strong>mation is transferred<br />

• Departments’ responsibility <strong>for</strong> the selection of the records, <strong>and</strong> <strong>for</strong><br />

the identification of any sensitivity in the records that would cause an<br />

exemption under freedom of in<strong>for</strong>mation legislation<br />

• The fact that the exemptions that can be applied to delay transfer are<br />

proscribed in law <strong>and</strong> their application can be challenged through the<br />

in<strong>for</strong>mation commissioner <strong>and</strong> thence by appeal to the In<strong>for</strong>mation<br />

Tribunal<br />

• The public visibility of the selection criteria that the departments<br />

must apply – as agreed with the National Archives<br />

• The National Archives’ process of oversight during the creation of the<br />

criteria <strong>and</strong> the Archives’ process of monitoring their application<br />

• The publication of in<strong>for</strong>mation regarding transfers<br />

• The <strong>for</strong>mal oversight of the timeliness of the transfer process <strong>and</strong><br />

the application of freedom of in<strong>for</strong>mation exemptions by the Lord<br />

Chancellor’s Advisory Council on Public Records.<br />

Un<strong>for</strong>tunately, in 2012, negative publicity 15 concerning the ‘migrated archives’<br />

of the colonial administrations (papers of the British administrations which<br />

should have been passed to the Public Records Office in a timely fashion<br />

but were wrongly kept at the government’s Hanslope Park facility) <strong>and</strong><br />

subsequent questions concerning other collections of documents at the<br />

Foreign Office raised the issue of the degree of trust in this system.<br />

13. James Grigg, Report of the Committee on Departmental Records, Cmnd 9163 (London:<br />

HMSO, 1954).<br />

14. National Archives, History of the Public Records Act, ,<br />

accessed 18 July 2014.<br />

15. Ian Cobain <strong>and</strong> Richard Norton-Taylor, ‘Sins of Colonialists Lay Concealed <strong>for</strong> Decades<br />

in Secret Archive’, Guardian, 18 April 2012, , accessed 22 July 2014.


9<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

While the process of selection, sensitivity review <strong>and</strong> transfer is in principle<br />

an open one, the process is complex <strong>and</strong> there are opaque aspects (not<br />

least, the use of the Lord Chancellor’s <strong>Security</strong> <strong>and</strong> Intelligence Instrument,<br />

known colloquially as ‘the Lord Chancellor’s blanket’, which is used to protect<br />

specific aspects of national security). 16 The very nature of such a situation, in<br />

which the shape of the process is open, <strong>and</strong> yet the detail of the data passing<br />

through the process must be hidden (since to reveal that detail would render<br />

the process moot), creates a situation in which conspiracy theorists can ply<br />

their trade. 17<br />

In essence, it can look like the establishment has something to hide <strong>and</strong> such<br />

appearances are important. While in no sense a conspiracy theorist, when<br />

someone of the eminence of Professor Margaret MacMillan feels compelled<br />

to challenge her own definitive works on the First World War, we should<br />

take note. 18 For trust to be maintained in the Archives, it is clear that any<br />

further barriers to the timely, open <strong>and</strong> transparent transfer of records must<br />

be avoided.<br />

Sensitivity Review of Digital Records<br />

The argument set out in this paper so far applies to all public records<br />

regardless of <strong>for</strong>mat or media. There are, however, particular consequences<br />

of the transition to the use of digital records that need to be considered.<br />

During the three decades from 1984 to 2014, administrative practices have<br />

been trans<strong>for</strong>med by the introduction of a sequence of waves of technology.<br />

This started with the photocopier <strong>and</strong> moved on to the personal computer<br />

(PC), the local area network to the internet, a wide range of mobile devices<br />

<strong>and</strong>, most recently, the ‘cloud’. All of these technologies created the ability<br />

<strong>and</strong> tendency to duplicate <strong>and</strong> proliferate in<strong>for</strong>mation in ever-increasing<br />

volumes. This process was piecemeal <strong>and</strong> began in the early 1990s, but by<br />

the middle of the first decade of this century all UK government records<br />

were digital. The impact of these technologies <strong>and</strong> the trans<strong>for</strong>mation of<br />

16. Notes on the Lord Chancellor’s <strong>Security</strong> <strong>and</strong> Intelligence Instrument, ,<br />

accessed 18 July 2014.<br />

17. National Archives, ‘20 Year Rule: Record Transfer Report’, , accessed 30 September 2014.<br />

18. Quoted in the Guardian: ‘I am one of many historians who has benefited from using<br />

the British archives <strong>and</strong> who had confidence that the documents had not been<br />

weeded to suit particular interests. Now I am wondering whether I will have to go<br />

back <strong>and</strong> rethink my work on such matters as the outbreak of the First World War or<br />

the peace conference at the end. But when are we going to get the complete records?<br />

So far the pace of transferring them is stately, to put it politely.’ Ian Cobain, ‘Academics<br />

Consider Legal Action to Force Foreign Office to Release Public Records’, Guardian,<br />

13 January 2014, ,<br />

accessed 19 August 2014.


10<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

administrative practice on the records of the public sector has not been<br />

examined in detail, however a detailed examination of the <strong>for</strong>mat <strong>and</strong> nature<br />

of the evidence presented to the Hutton Inquiry 19 is not positive. 20 In the<br />

evidence, the paper trail <strong>for</strong> a decision was no longer in a single Manila file;<br />

instead, the record was found in a blizzard of e-mails sent from person to<br />

person <strong>and</strong> stored on multiple computing systems. It would appear that the<br />

previously clear <strong>and</strong> unambiguous rules <strong>for</strong> the creation <strong>and</strong> management of<br />

in<strong>for</strong>mation in the public services have been challenged.<br />

In July 2012, the government announced the transition towards releasing<br />

records when they are twenty years old, instead of thirty 21 (as has been the<br />

case since the amendment to the Public Records Act in 1967). 22 From 2013,<br />

two years’ worth of government records will be transferred to the National<br />

Archives through a ten-year transition period until a new ‘20-year rule’ is in<br />

place in 2023. The records covered by this transition are those from 1983 to<br />

2003, 23 coinciding with the time during which the most extreme aspects of<br />

the technical changes mentioned above took place.<br />

When examining the process of transfer described above, <strong>and</strong> considering<br />

the impact of the change to digital records, it is clear that all of the steps in<br />

the process need to be examined. Appraisal <strong>and</strong> selection, preparation <strong>and</strong><br />

delivery, <strong>and</strong> accessioning will all present challenges to departments <strong>and</strong><br />

the Archives but there are a number of mitigations, including the doctrine<br />

of macro appraisal <strong>and</strong> the recent developments in digital preservation at<br />

the National Archives. 24 It is the process of sensitivity review that generates<br />

the most significant challenges <strong>and</strong> where considerable work is needed to<br />

identify mitigations.<br />

Additional Threats<br />

The challenges of digital records to the process of sensitivity review are as<br />

follows:<br />

19. Lord Hutton, Report of the Inquiry into the Circumstances Surrounding the Death of Dr<br />

David Kelly C.M.G. [the Hutton Inquiry], HC 247 (London: The Stationery Office, 2004),<br />

, accessed 18 July 2014.<br />

20. Michael Moss, ‘The Hutton Inquiry, the President of Nigeria <strong>and</strong> What the Butler<br />

Hoped to See’, English Historical Review (Vol. 120, No. 487, June 2005), pp. 577–92,<br />

, accessed 19 August 2014.<br />

21. National Archives, ‘Government Confirms Transition to a 20-Year Rule Will Begin from<br />

2013’, 13 July 2012, , accessed<br />

18 July 2014.<br />

22. Public Records Act 1967, , accessed<br />

22 July 2014.<br />

23. Ibid.<br />

24. Tim Gollins, ‘Puting Parsimonuous Preservation into practice’, The National Archives,<br />

2012, , accessed 25 July 2014.


11<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

• Volume <strong>and</strong> resources: Following advances in office technology during<br />

the late twentieth century, the consequent proliferation of in<strong>for</strong>mation,<br />

<strong>and</strong> the broadening of the interest of the scholarly community, a much<br />

greater volume of material is being deemed worthy of preservation<br />

in the digital age. Against a background of budgetary constraint, the<br />

manual review of digitally born records is not practical<br />

• Complex context: Technology has challenged earlier clear <strong>and</strong><br />

unambiguous rules <strong>for</strong> the creation <strong>and</strong> management of in<strong>for</strong>mation.<br />

This situation will significantly complicate the process of digital<br />

sensitivity review, as underst<strong>and</strong>ing a record’s context (including its<br />

distribution) is crucial in assessing its sensitivity<br />

• Risk: These challenges <strong>for</strong> review also occur in a context of significantly<br />

increased risk. Although the consequences of mistaken disclosure<br />

have not changed with the advent of digital records, the probability of<br />

discovering a mistake has. It is hard to discover particular in<strong>for</strong>mation<br />

in the paper world, in marked contrast to the digital environment<br />

where ubiquitous search engines index content rapidly. Risk-averse<br />

depositors may feel obliged to close large swathes of records if they<br />

cannot efficiently <strong>and</strong> effectively determine the sensitivity of each<br />

individual record with some clear degree of certainty.<br />

If sensitivity review of digitally born records is not practical, <strong>and</strong> against<br />

a background of budgetary constraint <strong>and</strong> increasing litigation, unless<br />

something is done large swaths of records will be closed in their entirety<br />

<strong>for</strong> long periods (up to 120 years in the case of some exemptions). Such<br />

precautionary closure (due to the costs or difficulty of review) is permissible<br />

under freedom of in<strong>for</strong>mation legislation, but it will contradict citizens’<br />

expectations of openness in a democratic society <strong>and</strong> will only serve to<br />

exacerbate the threat to trust in the Archives, as described above, <strong>and</strong> the<br />

subsequent threat to our security.<br />

Opportunities<br />

While digital records may challenge sensitivity review, <strong>and</strong> this may give rise<br />

to threats to our wider security, their very nature also offers opportunities to<br />

address those challenges <strong>and</strong> counter the threats. Some of the opportunities<br />

are as follows:<br />

• Some sensitivities are not subtle. They can relate to specific terms<br />

<strong>and</strong> thus an appropriately configured search system should be able<br />

to highlight them. For example, the records that related to the Al-<br />

Yamamah Contract, 25 although still available on the Campaign Against<br />

Arms Trade (CAAT) website, have been closed officially to prevent<br />

further damage to international relations.<br />

25. David Leigh <strong>and</strong> Rob Evans, ‘Secrets of al-Yamamah’, Guardian, [no date], , accessed 18 July 2014.


12<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

• Consistency: by using electronic means, it is possible to drive some<br />

consistency across the review process.<br />

• Accurate estimation of residual risk: unlike in the review of paper<br />

records, it is possible to estimate the risk posed by reviewed records<br />

using the concept of technologically assisted digital review.<br />

• Exploitation of the <strong>Big</strong> <strong>Data</strong> aspects of digital records, coupled with the<br />

application of machine learning applied in the context of in<strong>for</strong>mation<br />

retrieval technology, can result in patterns emerging that can in<strong>for</strong>m<br />

reviewers of where to look.<br />

All of the above requires significant research, first to determine what the<br />

digital record looks like, <strong>and</strong> then to demonstrate the opportunities that can<br />

be derived.<br />

Conclusion<br />

Freedom of in<strong>for</strong>mation does not relate solely to openness. There is a<br />

fundamental difference between openness (driven by what the state wants<br />

it citizens to see) <strong>and</strong> freedom of in<strong>for</strong>mation, which proscribes the right of<br />

accessing in<strong>for</strong>mation to the individual. 26 Freedom of in<strong>for</strong>mation creates<br />

a balance between the public interest, the state interest <strong>and</strong> the personal<br />

interest based on human rights, all mediated <strong>and</strong> governed by the rule of<br />

law.<br />

Balance is crucial to achieving freedom of in<strong>for</strong>mation alongside openness.<br />

Limits on openness are necessary <strong>for</strong> reasons of national security (<strong>for</strong><br />

example, the location of Britain’s nuclear weapons should not be revealed,<br />

nor should their targeting in<strong>for</strong>mation). Individuals also need to be protected<br />

from harm, <strong>and</strong> this has to be done through some limits on public access<br />

to in<strong>for</strong>mation. However, the ability to hold the executive to account under<br />

the rule of law <strong>and</strong> in the court of history is also central to the security of<br />

a modern democratic society. This can only be achieved through open <strong>and</strong><br />

transparent access to the records of government.<br />

How these challenges play out in the digital age of <strong>Big</strong> <strong>Data</strong> requires significant<br />

research, in order to gain a better underst<strong>and</strong>ing of how public records have<br />

changed <strong>and</strong> thus how they can be sensitivity reviewed <strong>and</strong> appropriately<br />

archived.<br />

Tim Gollins is currently an Honorary Research Fellow in the School of Computing<br />

Science at Glasgow University, working on the technically assisted sensitivity<br />

review of digital public records while on secondment from the National<br />

26. S Curtis, ‘In<strong>for</strong>mation Commissioner, “Open data is no substitute <strong>for</strong> freedom of<br />

in<strong>for</strong>mation”’, Daily Telegraph, 29 October 2013, ,<br />

accessed 29 July 2014.


13<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Archives. Tim started his career in the UK civil service in 1987 <strong>and</strong> joined<br />

the National Archives in April 2008 to lead the delivery <strong>and</strong> procurement<br />

workstream of the Digital Continuity Project. Tim was part of the team that<br />

developed the National Archives business in<strong>for</strong>mation architecture <strong>and</strong><br />

helped to initiate work on the new Discovery system to enable users to find<br />

<strong>and</strong> access the records held at the National Archives. He has recently worked<br />

on the design <strong>and</strong> implementation of a new digital-records infrastructure at<br />

the National Archives, which embodies the new parsimonious preservation<br />

approach he developed. Tim is a Director of the Digital Preservation Coalition<br />

<strong>and</strong> a member of the University of Sheffield I-School’s Advisory Panel.


II. Trends in <strong>Big</strong> <strong>Data</strong>: Key Challenges <strong>for</strong> Skills<br />

Harvey Lewis<br />

This paper will explain how Deloitte, one of the largest professional services<br />

networks in the world, has used <strong>Big</strong> <strong>Data</strong> both internally <strong>and</strong> in the services<br />

it provides to clients. The paper will address three main points: the role of<br />

<strong>Big</strong> <strong>Data</strong> at Deloitte, the challenges that exist surrounding the competency<br />

<strong>and</strong> basic skill sets of staff working with these data, <strong>and</strong> the current trends<br />

in <strong>Big</strong> <strong>Data</strong> <strong>and</strong> how they impact methodologies <strong>and</strong> practices. A particular<br />

area of growing interest is open data 1 – data that can be freely used, reused<br />

<strong>and</strong> distributed by anyone, subject only, at most, to the requirement to<br />

attribute <strong>and</strong> share alike 2 – which is providing a new source of resources <strong>for</strong><br />

organisations in the public <strong>and</strong> private sectors. This paper will examine the<br />

impact this is having on ethics, responsibility <strong>and</strong> business efficiency within<br />

companies <strong>and</strong> governments, <strong>and</strong> amongst civilians.<br />

What is <strong>Big</strong> <strong>Data</strong>? The widely cited <strong>and</strong> accepted 2001 GARTNER 3 definition<br />

lists the three Vs: volume, velocity <strong>and</strong> variety. In Deloitte, the term is often<br />

used to describe data that are too rich or complex to analyse well in a<br />

spreadsheet <strong>and</strong> without concepts from university-level statistics. Deloitte<br />

has used <strong>Big</strong> <strong>Data</strong> in a number of ways. For example, as Hurricane Irene was<br />

bearing down on the US in 2011, Deloitte helped one large US retailer to<br />

combine in<strong>for</strong>mation about curfews <strong>and</strong> road closures culled from social<br />

media with storm maps from the National Weather Service <strong>and</strong> GPS data<br />

from its own trucks to prepare <strong>for</strong> the storm’s impact on operations <strong>and</strong><br />

devise a logistics strategy <strong>for</strong> response <strong>and</strong> recovery.<br />

Despite some reservations about the reliability <strong>and</strong> usefulness of social<br />

media, it is nevertheless proving to be a useful additional source of data <strong>for</strong><br />

providing insights. For example, it can be used to identify instances of foodborne<br />

illnesses or other public health incidents, helping officials to work<br />

backwards from the spread portrayed in social media to the retail location<br />

<strong>and</strong> the distributor, <strong>and</strong> so on, as recently illustrated in analysis Deloitte<br />

per<strong>for</strong>med on an outbreak of pet-food poisoning in the US. The social<br />

networks embodied in social media also provide useful clues about influence,<br />

which has also been investigated by mapping physicians understood to be<br />

1. Deloitte, ‘Open <strong>Data</strong>: Driving Growth, Ingenuity <strong>and</strong> Innovation’, June 2012,<br />

, accessed 15 July 2014.<br />

2. Open <strong>Data</strong> H<strong>and</strong>book, ‘What is Open <strong>Data</strong>?’, ,<br />

accessed 15 July 2014.<br />

3. Douglas Laney, ‘3D <strong>Data</strong> Management: Controlling <strong>Data</strong> Volume, Velocity <strong>and</strong> Variety’,<br />

Gartner blog, 2001, , accessed 20<br />

August 2014.


15<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

exceptionally influential in pharmaceutical networks. These sorts of projects<br />

have considerable reach across to security <strong>and</strong> resilience areas.<br />

Skills Challenges <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

From a security <strong>and</strong> resilience perspective, the first challenge that needs<br />

to be addressed is over-reliance on technology, which can be manifested in<br />

three ways. The assumption that technology has to come first exemplifies<br />

how the lure of <strong>Big</strong> <strong>Data</strong> is driving bad decision-making. Second, upstream<br />

technology choice dominates downstream activities, which is very important<br />

– particularly from a public-sector procurement perspective. The range of<br />

different technologies <strong>and</strong> the choices that can be made early in programmes<br />

may significantly influence what is able to happen downstream. Furthermore,<br />

the infrastructure associated with educating <strong>and</strong> skill-based learning<br />

techniques is scarce; there are far too many ‘car drivers’ in <strong>Big</strong> <strong>Data</strong> <strong>and</strong><br />

not enough ‘mechanics’. There are not enough people who underst<strong>and</strong> the<br />

algorithms, logic <strong>and</strong> computer science behind the plat<strong>for</strong>ms they use that<br />

will allow them to be more innovative <strong>and</strong> creative in devising new solutions.<br />

The third challenge that exists is particularly acute <strong>for</strong> security <strong>and</strong> resilience:<br />

namely, that of ensuring that no stone is left unturned while identifying <strong>and</strong><br />

extracting the useful data from the pile. The problem with <strong>Big</strong> <strong>Data</strong> is that the<br />

availability <strong>and</strong> ease with which these data can be stored inadvertently leads<br />

to a desire to collect everything that can be collected. From a security <strong>and</strong><br />

resilience perspective, it is paramount that analysts are trained to identify<br />

necessary in<strong>for</strong>mation <strong>and</strong> to be very selective about the data they collect,<br />

process <strong>and</strong> analyse. More data does not always equal more in<strong>for</strong>mation.<br />

For example, to build a model that predicts the outcome of a coin toss, one<br />

can store either all the outcomes from individual coin tosses, or simply the<br />

total number of tosses <strong>and</strong> the number of times the coin came up heads. In<br />

the first instance, every piece of data is captured but it provides no further<br />

insight than can be extracted from the second instance.<br />

On the other h<strong>and</strong>, a different problem exists in data selection. For<br />

example, in the Second World War, the UK’s Bomber Comm<strong>and</strong> per<strong>for</strong>med<br />

a comprehensive survey of anti-aircraft weapons damage on its bomber<br />

fleet <strong>and</strong> recommended that armour be placed in those areas most<br />

susceptible to damage. The problem was that the sample of bombers<br />

surveyed was biased. It did not include the bombers that had not returned,<br />

which may well have shown additional areas of damage, which were not<br />

being factored into the analysis. Ultimately, this flaw was detected by<br />

the newly <strong>for</strong>med Operations Research Group, which recommended that<br />

armour be placed in areas showing least signs of damage. This reiterates<br />

the challenges addressed by Alex Gammerman <strong>and</strong> Jennifer Cole in the<br />

introduction: when data are available at very high volumes <strong>and</strong> rates, the<br />

problem is how to pick out the data that are actually needed (or, in the


16<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

case of Bomber Comm<strong>and</strong>, to realise that the most valuable data relate to<br />

what is missing, not what is in front of you).<br />

Analysts also need to guard against mistaking correlation <strong>for</strong> causality. <strong>Data</strong><br />

alone are not sufficient to answer any question that might be thrown at them.<br />

This is a particular challenge <strong>for</strong> researchers <strong>and</strong> analysts when it comes to<br />

finding interesting insights that no one has discovered be<strong>for</strong>e. It is important<br />

to underst<strong>and</strong> the root of the correlation, <strong>and</strong> to be able to assess whether<br />

or not it makes sense. For example, between 2006 <strong>and</strong> 2011, the murder<br />

rate in the US dropped from nearly 16,000 to just over 14,000 (a reduction of<br />

13.5 per cent), 4 <strong>and</strong> during the same period the market share of Microsoft’s<br />

Internet Explorer Internet browser also fell, from over 60 per cent to 20 per<br />

cent. 5 The two graphs showing these figures can be superimposed on top<br />

of one another, but this does not mean that as people became less likely to<br />

choose Internet Explorer as their preferred browser, they also became less<br />

likely to commit murder. There is correlation, but no causation.<br />

Finally, another significant challenge comes with mistaking the equations<br />

<strong>and</strong> models that analysts generate <strong>for</strong> insight. For example, the output of<br />

a regression model – the mathematical equation, as illustrated in Figure 1<br />

– is not the same as the insight that might gleaned when it is applied to a<br />

particular business domain or problem.<br />

Figure 1: Regression equation presented as business insight.<br />

y(x) = e (b0+b1x)<br />

1 + e (b0+b1x)<br />

b 0<br />

= 2.298057<br />

b 1<br />

= 30.023823<br />

The means by which insight is generated is not the insight <strong>and</strong>, in many<br />

cases, means nothing to anyone but the mathematicians who created it. It is<br />

in the interpretation <strong>and</strong> use of these models that value is derived, <strong>and</strong> this<br />

interpretation will depend on how data are visualised <strong>and</strong> the context into<br />

which the data fit.<br />

Future Trends in <strong>Big</strong> <strong>Data</strong><br />

Deloitte has identified seven current trends in <strong>Big</strong> <strong>Data</strong>: identifying people<br />

with the right talent to do the right things; visualising data appropriately<br />

so that they can be easily understood; recognising the value of machine<br />

learning in interpreting <strong>and</strong> analysing data; developing better data discovery<br />

4. Wall Street Journal, ‘Murder in America’, , accessed 15 July 2014.<br />

5. W3 Schools, Browser Statistics <strong>and</strong> Trends, , accessed 15 July 2014.


17<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

plat<strong>for</strong>ms; improving planning <strong>for</strong> how to get the most from data; improving<br />

the techniques <strong>for</strong>, <strong>and</strong> the use of, predictive data; <strong>and</strong> addressing the death<br />

of the data warehouse – an end to collecting <strong>and</strong> storing vast amounts of<br />

data <strong>for</strong> the sake of it.<br />

Those using <strong>Big</strong> <strong>Data</strong> need to know how to recognise <strong>and</strong> underst<strong>and</strong> the<br />

opportunities it offers, based on the trends that can be seen <strong>for</strong> the next year<br />

<strong>and</strong> beyond. Companies need to be able to spot trends that are going to help<br />

reduce the cost <strong>and</strong> ef<strong>for</strong>t associated with processing complex data, or those<br />

that will improve the marginal returns from <strong>Big</strong> <strong>Data</strong> into something more<br />

significant – more signal, <strong>and</strong> less noise.<br />

In conclusion, the starting point with <strong>Big</strong> <strong>Data</strong> should be the objective or<br />

question that needs to be addressed. The data <strong>and</strong> technology are the<br />

means to the end; they are necessary but not sufficient. What do we want<br />

to do with the data? Where do we want them to take us? These are the<br />

questions that will drive innovation <strong>and</strong> creativity. Just because the data<br />

<strong>and</strong> the technology are available (<strong>for</strong> example, from social media) is there<br />

really any benefit from using them <strong>and</strong> how is this determined? Finally, just<br />

because the data are there, if collecting, processing <strong>and</strong> analysing them is<br />

not going be cost efficient, there is nothing wrong with looking at other ways<br />

of achieving the same end.<br />

Harvey Lewis is Research Director, <strong>Data</strong> <strong>and</strong> Analytics at Deloitte. Based in<br />

London, he leads a team of researchers <strong>and</strong> data scientists investigating<br />

<strong>Big</strong> <strong>Data</strong>, open data <strong>and</strong> trends in analytics. He also leads focused research<br />

projects <strong>for</strong> clients in both the public <strong>and</strong> private sectors. He has spent twenty<br />

years in the in<strong>for</strong>mation technology industry, <strong>and</strong> specialises in analytics,<br />

cyber-security <strong>and</strong> national security. Harvey has authored numerous reports,<br />

white papers <strong>and</strong> blog posts. He is a frequent media commentator, <strong>and</strong> has<br />

contributed to many articles in the national <strong>and</strong> trade press. Harvey holds a<br />

BEng (Hons) <strong>and</strong> a PhD from Southampton University.


III: <strong>Big</strong> <strong>Data</strong> <strong>and</strong> Financial Transactions: Providing<br />

New Means of Analysis<br />

Gregory M<strong>and</strong>oli<br />

A good government implies two things: first, fidelity to the object<br />

of government, which is the happiness of the people; secondly, a<br />

knowledge of the means by which that object can be best attained. 1<br />

The Importance of Adaptation<br />

During the course of its history, the US has been confronted with, <strong>and</strong> has<br />

responded to, incidents threatening its welfare. Regrettably, it often takes<br />

a crisis to catalyse a critical review of current affairs <strong>and</strong> the creation of<br />

new operational paradigms. The events of 9/11 illustrate this. Typically,<br />

programmes evolve slowly, <strong>and</strong> it is not until numerous injustices are tallied<br />

or a catastrophe hits that a major shift occurs.<br />

Much has been made about the intelligence failures that led to 9/11. After<br />

9/11, the US federal government was <strong>for</strong>ced to recognise <strong>and</strong> seek a remedy<br />

to its lack of operational cohesiveness <strong>and</strong> to the lack of in<strong>for</strong>mation sharing<br />

among federal, state <strong>and</strong> local agencies. Seemingly overnight, the regulatory,<br />

inspection, interdiction <strong>and</strong> investigative focus of government shifted <strong>and</strong><br />

the global War on Terror began.<br />

Clearly, this new conflict impacted on the American psyche in a deep <strong>and</strong> visceral<br />

way. It also stirred cynicism, scrutiny <strong>and</strong> a public appetite <strong>for</strong> redressing the<br />

defects that exist in governmental administration. The surge of interest in<br />

strengthening public agencies prompted the passage of the Homel<strong>and</strong> <strong>Security</strong><br />

Act (HLSA) of 2002 <strong>and</strong> the creation of the Department of Homel<strong>and</strong> <strong>Security</strong><br />

(DHS), both of which aim to enhance the per<strong>for</strong>mance of all strata of government.<br />

In the immediate aftermath of 9/11, the HLSA <strong>and</strong> DHS seemed symbolic<br />

surrogates <strong>for</strong> the Twin Towers, though significant doses of confusion <strong>and</strong><br />

dysfunction accompanied the rapid creation of a new department with its<br />

inaugural group of twenty-two agencies. The creation of DHS was similar<br />

to what the UK is experiencing today with the creation of the National<br />

Crime Agency (NCA) <strong>and</strong> the rebr<strong>and</strong>ing <strong>and</strong> realignment of the UK Border<br />

Agency into three distinct entities: the Border Police Comm<strong>and</strong>, Home Office<br />

Immigration En<strong>for</strong>cement <strong>and</strong> UK Border Force.<br />

Homel<strong>and</strong> <strong>Security</strong> Investigations (HSI) is the principle investigative agency<br />

within DHS. It is the most unique law en<strong>for</strong>cement agency in the world because<br />

of its capability to investigate persons <strong>and</strong> property across borders <strong>and</strong> to<br />

1. James Madison, The Federalist (No. 62, 27 February 1788).


19<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

pursue violators within the US or overseas. HSI also has special en<strong>for</strong>cement<br />

powers that enable it to conduct border searches at ports of entry – functional<br />

equivalents of the border <strong>and</strong> the extended border. It can also prosecute<br />

violators via criminal, civil <strong>and</strong> administrative judicial processes.<br />

As such, HSI brings together assets <strong>and</strong> capabilities that do not exist in any<br />

other agency. Originally identified as Immigration <strong>and</strong> Customs En<strong>for</strong>cement,<br />

but rebr<strong>and</strong>ed as HSI in 2011, it merges elements from its legacy immigration<br />

<strong>and</strong> customs agencies into a globally oriented police <strong>for</strong>ce. HSI employs<br />

a ‘points of genesis’ investigative methodology that focuses on tackling<br />

transnational crime where it begins. The points of genesis approach allows<br />

threats to be attacked at inchoate stages, be<strong>for</strong>e they are fully <strong>for</strong>med <strong>and</strong><br />

more threatening. This is more effective <strong>and</strong> efficient than waiting until<br />

the ‘point of commission’, when a crime is more developed <strong>and</strong> difficult to<br />

counter. This is a corollary of pre-emptive self-defence, a readily recognised<br />

precept in international law.<br />

As a result, HSI deploys agents worldwide to work with <strong>for</strong>eign police services.<br />

HSI works closely with many UK partners, but primarily the National Crime<br />

Agency, City of London Police, Metropolitan Police Service, Police Scotl<strong>and</strong><br />

<strong>and</strong> the Police Service of Northern Irel<strong>and</strong>. These bilateral engagements<br />

significantly augment capacity <strong>and</strong> render a return on investment <strong>for</strong> the<br />

respective agencies as well as the overall US law en<strong>for</strong>cement mission.<br />

The New Paradigm<br />

From an academic perspective, homel<strong>and</strong> security is a new <strong>and</strong> not fully<br />

defined discipline, though after more than a decade the concept has been<br />

widely accepted to encompass more than counter-terrorism. Clearly, other<br />

manmade or natural entities, movements <strong>and</strong> phenomena threaten our<br />

national security, such as illegal immigration, street gangs, illegal drugs <strong>and</strong><br />

natural disasters, to name just a few. Thus, a reasonable interpretation of<br />

the concept m<strong>and</strong>ates that homel<strong>and</strong> security must be comprehensive <strong>and</strong><br />

address all hazards, though the criteria <strong>for</strong> what constitutes a hazard are not<br />

strictly defined, nor should they be. Unnecessary constraint is antithetical to<br />

a ‘light is right’ mentality that promotes strategic <strong>and</strong> operational fluidity in<br />

an environment where metastasis of modus oper<strong>and</strong>i <strong>and</strong> criminal networks<br />

is rapid.<br />

For HSI the threats are clear. Its mission is to conduct criminal investigations<br />

against terrorist <strong>and</strong> other criminal organisations that threaten US national<br />

security <strong>and</strong> seek to exploit America’s legitimate trade, travel <strong>and</strong> financial<br />

systems – further in<strong>for</strong>mation on how this is carried out is shown in Box<br />

1. 2 HSI will also support other DHS agencies in the response <strong>and</strong> recovery<br />

2. Immigrations <strong>and</strong> Customs En<strong>for</strong>cement, Homel<strong>and</strong> <strong>Security</strong> Investigations, , accessed 17 July 2014.


20<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

phases of natural disasters. The mission is purposefully broad <strong>and</strong> elastic to<br />

ensure focus <strong>and</strong> acuity against transnational criminal organisations that are<br />

globalised <strong>and</strong> poly-plat<strong>for</strong>m.<br />

The paradigm of a one-dimensional criminal enterprise, generically embodied<br />

in a Colombian cartel cocaine trafficking <strong>and</strong> bulk cash smuggling model, is<br />

both oversimplified <strong>and</strong> antiquated. Today, criminal organisations diversify<br />

their illicit plat<strong>for</strong>ms <strong>and</strong> become involved in myriad offences, including<br />

narcotics, intellectual property rights, human smuggling <strong>and</strong> trafficking,<br />

fraud <strong>and</strong> money laundering, to list just a few.<br />

Likewise, modern, internationally focused law en<strong>for</strong>cement agencies must<br />

be able to investigate a broad array of criminality in order to confront the<br />

poly-plat<strong>for</strong>m threats competently. Single or limited mission agencies<br />

have difficulty in this environment. In practice, they creep outside their<br />

missions. This then produces negative characteristics <strong>and</strong> results in deflated<br />

per<strong>for</strong>mance.<br />

Box 1: Value Transfer <strong>and</strong> Criminal Gain.<br />

HSI takes the attitude that, with the exception of criminals who have psychopathic<br />

tendencies, most are interested in making money or achieving some other<br />

commodity that has a measurable value. Within the wide range of crimes that<br />

are (or appear to be) perpetrated <strong>for</strong> money, a particular area of interest to HSI<br />

in the digital age is ‘value transfer’, which is a way to assess <strong>and</strong> identify money<br />

laundering. Value transfer focuses on the relative as well as the absolute value of<br />

transactions, <strong>and</strong> there<strong>for</strong>e often sheds light on money-laundering techniques,<br />

particularly where ‘dirty’ money may be phased through in modular increments<br />

rather than in single transactions.<br />

Value transfer can be physical (carrying bank notes or other <strong>for</strong>ms of currency<br />

from one place to another); virtual (transferring credit through online banking<br />

systems); based on trust (often using Hawala-style transactions, as described<br />

below); or carried out via trade (buying or selling something <strong>for</strong> above or below<br />

its real market value).<br />

The easiest way to launder money is through straight<strong>for</strong>ward cash smuggling,<br />

referred to as ‘bold cash’: a criminal has illegal proceeds from drug sales in<br />

the US that he wants to take back to Mexico, so he tapes the notes to himself<br />

<strong>and</strong> smuggles them across the border, where he then spends them. This is the<br />

simplest example: he might also swallow his money, or drive it across, but the<br />

value is in physical currency.


21<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Of course, the criminal could deposit the cash into bank accounts in the US <strong>and</strong><br />

withdraw it in Mexico, but doing this through legitimate banking systems will<br />

tend to leave a trail that can be identified <strong>and</strong> followed. For example, money<br />

derived from drug sales in Cali<strong>for</strong>nia may need to be moved to Yemen. The<br />

value could be transferred by putting all of the illegal proceeds into various bank<br />

accounts, but high-value, unusual transactions (such as those paid in cash) are<br />

likely to be picked up <strong>and</strong> questioned by banking systems. When money starts<br />

to pass through conventional money services, the intersections can be seen:<br />

where the money was put into the bank, how its value was transferred <strong>and</strong><br />

where it was transferred to (<strong>for</strong> example, by wire or by sending blank cheques<br />

to a partner in Sana’a), can help law en<strong>for</strong>cement agencies to identify <strong>and</strong> trace<br />

the criminal actors.<br />

For this reason, criminal transactions may be more likely to be moved thorough<br />

Hawala, an in<strong>for</strong>mal system of money brokers used in the Middle East, North<br />

Africa <strong>and</strong> the Horn of Africa, which does not transfer currency, but receives<br />

money in one country <strong>and</strong> makes a loan to someone in another on the basis that<br />

the loan will eventually be repaid. Hawala tends not to leave a digital footprint,<br />

which can be critical with criminal transactions.<br />

Another option is in<strong>for</strong>mal value transfer, which builds on the techniques used<br />

in bold cash smuggling. One way around the challenge of the interface with<br />

conventional money services is trade: if a criminal can take value, put it into<br />

a commodity <strong>and</strong> get it to where he needs it to go, this can provide a more<br />

sophisticated way of transferring the value. This is more complex <strong>and</strong> also more<br />

covert. For example, drugs might come into the US from Colombia, with a gross<br />

profit from sale of the drugs in the US of $1 million in cash. The drug baron is<br />

not in the US, however, he is in Bogota in Colombia <strong>and</strong> he wants to get the cash<br />

out of the US. One way to do this would be to find a corrupt jeweller in New<br />

York <strong>and</strong> buy $1 million worth of gold from him, melt it down, cast it as nuts <strong>and</strong><br />

bolts, dye it or plate it to make it appear to be a much cheaper metal, export<br />

these nuts <strong>and</strong> bolts to Colombia <strong>and</strong>, once it arrives, melt it down <strong>and</strong> resell<br />

it as gold, at its real value. Or the criminal might be even smarter <strong>and</strong> melt the<br />

nuts <strong>and</strong> bolts down into gold, then ship this back to the same corrupt jeweller<br />

in New York, who can then legitimately wire transfer the value of the gold. HSI,<br />

as a customs as well as a law en<strong>for</strong>cement agency, is able to look at <strong>Big</strong> <strong>Data</strong><br />

from trade transactions to analyse any unusual movements of goods or trends<br />

that might suggest that this kind of activity is going on.<br />

Evolution of Money-Laundering Techniques<br />

As we move into the digital age, value transfer transactions are becoming<br />

more sophisticated, particularly with the evolution of crypto-currency <strong>and</strong><br />

cryptography. Money laundering is an extremely dynamic activity, which law<br />

en<strong>for</strong>cement has to keep on top of. It is exceptionally fast-paced: situations


22<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

change very fast. There is a definite evolution of comparisons <strong>and</strong> differentiations<br />

between bitcoin, virtual currencies <strong>and</strong> crypto-currency <strong>for</strong> example, <strong>and</strong> these<br />

must be taken seriously: they are value transfer mechanisms with real value<br />

being traded. This does not exist in a Ponzi Scheme or other type of pyramid<br />

scheme where the ‘real’ value is completely ethereal.<br />

Crypto-currency has been coupled with the Darknet – ‘hidden’ or private<br />

networks on the Internet that can only be accessed by those who are invited to<br />

make connections, <strong>and</strong> are generally associated with illegal or dissident activity.<br />

The Silk Road in particular has been discussed frequently in recent months; it is<br />

one of many sites operating on a TOR (previously The Onion Router) Network,<br />

which conceals a user’s location from anyone conducting surveillance. This is<br />

attractive to criminals because it offers anonymity <strong>and</strong> decentralised control.<br />

The flow of value through the Darknet – the buyer <strong>and</strong> seller value – on the<br />

Silk Road is something that needs to be looked into, as it underlies how cryptocurrency<br />

can flow in transactions.<br />

There is no reason why a criminal could not take that flow to supplant the transfer<br />

of value seen in the trade transparency route. Why would he need to melt down<br />

gold into nuts <strong>and</strong> bolts if he could just use digital currency to transfer value? It<br />

is really a matter of acceptance – <strong>and</strong> when is the tipping point going to come<br />

when this <strong>for</strong>m of value transfer becomes more acceptable? If one looks back at<br />

currency as a symbolic <strong>for</strong>m of value, the same thing is happening with digital<br />

currencies. The US dollar is backed by the US government, which guarantees it<br />

<strong>and</strong> gives people confidence in its value. At present, that confidence does not<br />

exist in the crypto-world but it may come: in a service-based world, if a criminal<br />

could transfer the commodity or the service, using cryptography, it would give<br />

them autonomy <strong>and</strong> anonymity (no government scrutiny or taxes to pay). Would<br />

that be something that would interest the criminal fraternity? Probably, <strong>and</strong> it<br />

is possible to do.<br />

<strong>Big</strong> <strong>Data</strong>: Privacy <strong>and</strong> Consistency<br />

Anomalies in trade data can often be the most useful way to find out what <strong>Big</strong><br />

<strong>Data</strong> is useful to HSI. If someone is seen to be exporting gold from the US or<br />

Colombia, this can be flagged up as an anomaly. Normally coffee or sugar is<br />

seen along this route; it should be asked why large amounts of gold should be<br />

exported from areas that do not have gold mines. The HSI works in conjunction<br />

with other law en<strong>for</strong>cement agencies, customs <strong>and</strong> border protection, <strong>and</strong> with<br />

the State Department, Department of the Treasury <strong>and</strong> so on, to share <strong>and</strong><br />

analyse this sort of in<strong>for</strong>mation.<br />

Looking to the Future<br />

The future of money laundering throws up some very tricky challenges. Anything<br />

that can be mathematically defined can be transferred into the Darknet. If


23<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

criminals come up with a conspiracy to commit malicious acts, what is stopping<br />

others from transferring them the funds that will enable them to carry them out?<br />

Who is going to see it <strong>and</strong> how difficult is that going to be <strong>for</strong> law en<strong>for</strong>cement?<br />

What does it mean to nation states if criminals don’t need to rely on conventional<br />

currency or legitimate banks to guarantee transactions? Furthermore, what<br />

does this mean <strong>for</strong> the rule of law: if a criminal can enter into a contract with<br />

someone that law en<strong>for</strong>cement agencies can mathematically define, will a court<br />

even be needed to resolve the conflict <strong>and</strong>, if not, what effects will this have on<br />

justice in the future? Researching the implications of these possibilities now will<br />

help governments <strong>and</strong> their law en<strong>for</strong>cement agencies to better underst<strong>and</strong> the<br />

challenges when – rather than if – they need to be faced.<br />

Interagency Working<br />

Crime-fighting techniques need to be continually assessed by objective<br />

per<strong>for</strong>mance measures so that best practices can be identified. Relevant<br />

per<strong>for</strong>mance measures include efficiency, effectiveness, capacity,<br />

responsiveness, trust <strong>and</strong> confidence. In the <strong>Big</strong> <strong>Data</strong> context, these<br />

per<strong>for</strong>mance measures need to be weighed against an organisation’s<br />

ability to collect, store, analyse <strong>and</strong> disseminate in<strong>for</strong>mation internally <strong>and</strong><br />

externally. In<strong>for</strong>mation sharing lies at the core of HSI’s ethos.<br />

When HSI was created, the challenge was how to ensure that the previously<br />

disparate law en<strong>for</strong>cement agencies from which it sprang were able to interact<br />

in a way that would not adversely affect per<strong>for</strong>mance. Similar challenges are<br />

being encountered in the UK with the creation of multiple new agencies, as<br />

noted above. Thus, it becomes critical to assess new policies, programmes<br />

<strong>and</strong> strategies against these per<strong>for</strong>mance measures.<br />

In the US, there are a number of different criminal investigative agencies<br />

at the federal level. Quantitatively, HSI, the Federal Bureau of Investigation<br />

(FBI), the Drug En<strong>for</strong>cement Administration (DEA), the US Secret Service<br />

(USSS) <strong>and</strong> the Bureau of Alcohol, Tobacco, Firearms <strong>and</strong> Explosives (ATF)<br />

envelop most of this space. For many, HSI is less well-known, but it is the<br />

second largest agency in this class next to the FBI, which in the post-9/11<br />

environment is really a hybrid en<strong>for</strong>cement <strong>and</strong> domestic intelligence<br />

agency. Despite divergent missions, each of these agencies is empowered to<br />

investigate many of the same laws. Thus interoperability becomes essential,<br />

though parochial attitudes <strong>and</strong> proprietary interests still exist.<br />

The best way of achieving optimal per<strong>for</strong>mance is to ensure that separate<br />

agencies work closely, <strong>and</strong> well, together. A lack of coherent integration will<br />

degrade the ability to share in<strong>for</strong>mation <strong>and</strong> lead to negative outcomes.<br />

Based on professional experience <strong>and</strong> conversations with prosecutors;


24<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

defence attorneys; military personnel; local, state <strong>and</strong> federal police officers;<br />

<strong>and</strong> criminal defendants, five characteristics can be identified that are present<br />

in, <strong>and</strong> adversely impact, the investigative capabilities of law en<strong>for</strong>cement.<br />

Certainly, other characteristics may exist, but these appear dominant. These<br />

five characteristics are defined as follows:<br />

1. Interagency conflict: the real or perceived incongruity of agencies’<br />

interest that detrimentally affects the per<strong>for</strong>mance of one or both.<br />

Such conflict can materialise in different <strong>for</strong>ms, such as inter-agency<br />

rivalry, mistrust or malfeasance; one example of this can be seen in<br />

the law en<strong>for</strong>cement context, when, during a joint operation, the<br />

participating agencies vie <strong>for</strong> control or credit <strong>for</strong> an action or case<br />

2. Redundancy: the duplication or repetition of action; one example of<br />

this can be seen in the law en<strong>for</strong>cement context, when two or more<br />

agencies participate in an investigation <strong>and</strong> unnecessarily per<strong>for</strong>m<br />

the same or similar tasks<br />

3. <strong>Data</strong> fragmentation: the collection <strong>and</strong> segregation of in<strong>for</strong>mation<br />

in a way that prevents its sharing; one example of this can be seen<br />

in the law en<strong>for</strong>cement context, when one agency has in<strong>for</strong>mation<br />

regarding a suspect that may be of value to another agency <strong>and</strong> does<br />

not or cannot provide access to the in<strong>for</strong>mation or make the other<br />

party aware of the in<strong>for</strong>mation<br />

4. Jurisdictional <strong>for</strong>eclosure: the inability to en<strong>for</strong>ce a law because of<br />

lack of authority or resources<br />

5. Violation of civil rights: the deprivation of rights belonging to an<br />

individual, including civil liberties, due process, equal protection<br />

of the laws, <strong>and</strong> freedom from discrimination through an act or<br />

omission to act by law en<strong>for</strong>cement.<br />

In relation to <strong>Big</strong> <strong>Data</strong>, the fragmentation of data becomes the universal issue.<br />

The other characteristics, primarily civil rights violations, can be present, but to<br />

a lesser extent. The centralisation of data enhances the work product derived<br />

from the collection, storage, analysis <strong>and</strong> dissemination cycle. A de facto or<br />

de jure centralised comm<strong>and</strong> structure is needed to foster the integration of<br />

the disparate elements. It is not wise to decentralise operations into small<br />

autonomous units because they will become unco-ordinated <strong>and</strong> per<strong>for</strong>m at a<br />

less than optimal or ‘dysfunctional’ level when compared with the centralised<br />

model. Recognition of interdependence becomes paramount.<br />

In the context of this <strong>for</strong>um, centralisation creates an economy of scale<br />

<strong>and</strong> management scheme <strong>for</strong> <strong>Big</strong> <strong>Data</strong>. So what becomes crucial is not the<br />

per<strong>for</strong>mance of entities per se, but the construct employed to collect, store,<br />

analyse <strong>and</strong> disseminate in<strong>for</strong>mation in a manner that will generate synergy.<br />

HSI has recognised this <strong>and</strong> participates actively in multiple ‘data crunching’<br />

<strong>for</strong>a.


25<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Directly on point is HSI’s Trade <strong>and</strong> Transparency Program. Under this<br />

initiative, HSI works jointly with customs agencies worldwide to share <strong>and</strong><br />

analyse trade data <strong>for</strong> anomalies.<br />

The HSI established the Trade Transparency Unit to conduct ongoing analysis<br />

of trade data provided through partnerships with other countries’ trade<br />

transparency units. One of the most effective ways to identify instances<br />

<strong>and</strong> patterns of trade-based money laundering is through the exchange<br />

<strong>and</strong> subsequent analysis of trade data <strong>for</strong> anomalies that would only be<br />

apparent by examining both sides of a trade transaction. The unit is <strong>for</strong>med<br />

when the US <strong>and</strong> any of its trading partners agree to exchange trade data<br />

<strong>for</strong> the purpose of comparison <strong>and</strong> analysis. Using state-of-the-art software<br />

<strong>and</strong> proven investigative techniques, the unit can easily identify previously<br />

invisible trade-based alternative remittance systems <strong>and</strong> customs fraud. 3<br />

To facilitate the creation <strong>and</strong> management of <strong>Big</strong> <strong>Data</strong>, agencies need to<br />

integrate in some way. This integration can occur at different levels, consisting<br />

of recognition, co-ordination, collaboration, community, consolidation <strong>and</strong><br />

merger. Any one of these is better than nothing <strong>and</strong>, realistically, community<br />

is as far as most agencies can go without legislative intervention. These<br />

phases are defined as follows:<br />

• Recognition: the confirmation of existence which occurs in the <strong>Big</strong><br />

<strong>Data</strong> context when one agency acknowledges that another agency has<br />

the authority to per<strong>for</strong>m a particular act <strong>and</strong> has relevant in<strong>for</strong>mation<br />

that the other agency may or may not have<br />

• Co-ordination: the act of confirming concurrent jurisdiction <strong>and</strong><br />

agreeing to separate areas of en<strong>for</strong>cement to reduce redundancy, but<br />

agreeing to respond to requests <strong>for</strong> in<strong>for</strong>mation<br />

• Collaboration: the act of working together in a joint operation<br />

<strong>and</strong> sharing in<strong>for</strong>mation, but not granting open access – agency<br />

participation in a task <strong>for</strong>ce or a memor<strong>and</strong>um of underst<strong>and</strong>ing<br />

being examples of this<br />

• Community: the act or process of openly sharing resources or<br />

in<strong>for</strong>mation among several entities with some restriction<br />

• Consolidation: the act or process of sharing in<strong>for</strong>mation without<br />

restriction<br />

• Merger: the fusion of disparate entities into a single entity.<br />

It is important to note that this type of integrative scheme, regardless of the<br />

level, elicits claims of privacy invasions <strong>and</strong> civil-rights violations. Perhaps<br />

ironically, the consolidation of data or the creation of <strong>Big</strong> <strong>Data</strong> can actually<br />

3. ICE, Trade Transparency Unit, , accessed 18<br />

July 2014.


26<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

minimise incursions into personal privacy. Counterintuitively, this shrinks<br />

government <strong>and</strong> mitigates violations.<br />

In the US, when the Founding Fathers were looking at how to configure<br />

the new union, the initial scheme was decentralised, or ‘anti-federalist’<br />

in the parlance of the era. This resulted in the drafting of the Articles of<br />

Confederation <strong>and</strong> the creation of thirteen disparate, EU-style states. This<br />

scheme ultimately failed, mostly because individual state interests trumped<br />

the collective good. More on point was that it led to the institution of thirteen<br />

different policies <strong>and</strong> programmes. Today, there are fifty states <strong>and</strong> if the<br />

US was still decentralised, there would be fifty different, <strong>and</strong> potentially<br />

incongruous, regulatory programmes. A positive derivative of integration is<br />

that it organises the collection, storage, analysis <strong>and</strong> dissemination of <strong>Big</strong><br />

<strong>Data</strong>, thus making the work product more effective, but the process more<br />

efficient, capable, responsive <strong>and</strong> trustworthy. There<strong>for</strong>e, privacy <strong>and</strong> civil<br />

rights infringements are not inherent to <strong>Big</strong> <strong>Data</strong> schemes. Ideally, the other<br />

negative characteristics should be minimised as well.<br />

The need <strong>for</strong> a shared vision <strong>and</strong> <strong>for</strong> unified approaches is even more<br />

apparent in the digital age, where in<strong>for</strong>mation is exponentially propagated<br />

with each passing day. Couple this with advances in web technology whereby<br />

users can remain anonymous, or at least pseudonymous. Currently, the<br />

Darknet <strong>and</strong> the use of cryptographic algorithms present emerging threats<br />

<strong>and</strong> create a new dimension <strong>for</strong> traditional criminal enterprises. The joint<br />

investigation into the online black market site Silk Road, headed by HSI <strong>and</strong><br />

the FBI, provides a good example of this.<br />

HSI recognises this <strong>and</strong> that there will be future, presently unconceived,<br />

advances in criminal practice. Such spectres are daunting, but not intimidating<br />

or insurmountable when the infrastructure <strong>and</strong> partnerships to confront<br />

them already exist. This is the message of HSI.<br />

Greg M<strong>and</strong>oli is a special agent with the Department of Homel<strong>and</strong> <strong>Security</strong>’s<br />

Homel<strong>and</strong> <strong>Security</strong> Investigations (HSI) <strong>and</strong> is currently assigned to the US<br />

Embassy in London. His related professional activities include eight years<br />

as an army reservist in the Judge Advocate General Corps <strong>and</strong> positions at<br />

the University of Maryl<strong>and</strong> as a Course Developer <strong>and</strong> Adjunct Associate<br />

professor. In 2006, Greg became the first HSI agent to graduate from the<br />

Naval Postgraduate’s Master of Arts programme in Homel<strong>and</strong> Defense <strong>and</strong><br />

<strong>Security</strong>. In 1994, Greg graduated from Golden Gate University School of<br />

Law with the recognition of a public-interest law scholar. Be<strong>for</strong>e becoming a<br />

special agent, Greg practised law as a Deputy Public Defender in Cali<strong>for</strong>nia<br />

where he h<strong>and</strong>led felony matters involving homicide, three strikes, gang <strong>and</strong><br />

drug offences.


27<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

This paper is a summary of topics presented by Homel<strong>and</strong> <strong>Security</strong><br />

Investigations Special Agent <strong>and</strong> University of Maryl<strong>and</strong> Adjunct Associate<br />

Professor Gregory M<strong>and</strong>oli at the RUSI/STFC event ‘<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong><br />

<strong>and</strong> Resilence: Challenges <strong>and</strong> Opportunities <strong>for</strong> the Next Generation of<br />

Policymakers’. The paper represents his personal viewpoints <strong>and</strong> is partially<br />

based on previously authored materials.


IV. Characteristics of Terrorist Finance Networks:<br />

The Human Element<br />

Neil Bennett<br />

In Chapter III, Gregory M<strong>and</strong>oli writes about <strong>Big</strong> <strong>Data</strong> <strong>and</strong> financial<br />

transactions, <strong>and</strong> this paper will return to the subject while taking a slightly<br />

different perspective – focusing on the benefits of linking money flows to<br />

human behaviour <strong>and</strong> human activity. This will show how analysis of data<br />

by social scientists as well as data analysts can support the identification of<br />

important individuals within a network.<br />

The aim of this conference has been to underst<strong>and</strong> how academics,<br />

researchers <strong>and</strong> policy-makers can utilise <strong>Big</strong> <strong>Data</strong>. As noted, definitions of<br />

big data commonly refer to the four Vs: volume, variety, velocity <strong>and</strong> veracity. 1<br />

This paper will attempt to give a perspective from the operational, end-user<br />

requirement: that a wide variety of data are available in ever-increasing<br />

volumes. This presents challenges as to how those data are stored, by whom<br />

they are analysed, <strong>and</strong> why. The paper will describe operational challenges<br />

faced by law en<strong>for</strong>cement <strong>and</strong> defence, focusing on the operational<br />

opportunities <strong>and</strong> outputs.<br />

The end goal of data analysis is improved efficiency. Efficiency is effectiveness<br />

driven by an exploitation path towards the operational outcomes <strong>and</strong>, in turn,<br />

towards end use. Financial intelligence <strong>and</strong> terrorist finance can be used as the<br />

lens through which this process is viewed. Why finance? Law en<strong>for</strong>cement,<br />

defence, the UK government as a whole, as well as other governments around<br />

the world, see financial interventions as interventions of first choice in the<br />

fight against international crime. A perfect example of this is the recent case<br />

involving Ukraine, 2 in which the UK, together with the EU, imposed restrictive<br />

measures – financial sanctions – upon eighteen (later extended to twentytwo)<br />

Ukrainian <strong>for</strong>mer regime members <strong>for</strong> misappropriation of Ukrainian<br />

state funds. The sanctions prevented the politicians from accessing assets<br />

or funds held by European financial institutions, a significant move as many<br />

Ukrainian <strong>and</strong> Russian politicians hold money in accounts in Luxembourg<br />

<strong>and</strong> the Netherl<strong>and</strong>s in particular.<br />

This paper will focus specifically on the alternative remittance system Hawala.<br />

It is worth bearing in mind here that the system of banking we recognise as<br />

1. IBM, ‘The FOUR V’s of <strong>Big</strong> <strong>Data</strong>’, ,<br />

accessed 9 July 2014.<br />

2. HM Treasury, ‘Financial Sanctions, Ukraine (Misappropriation <strong>and</strong> Human Rights)’,<br />

15 April 2014, ,<br />

accessed 9 July 2014.


29<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

official today was first established in the 1700s, <strong>and</strong> at its oldest can only<br />

be traced back as far as the financial institutions of Italy in the fourteen<br />

century. As Hawala arose in the early Medieval period, which system should<br />

be categorised as the ‘alternative’ remittance system is open to debate.<br />

There is nothing wrong with Hawala. It is the abuse of Hawala, not the system<br />

itself, that causes issues: those using Hawala systems are not automatically<br />

guilty. There<strong>for</strong>e, the question to ask is how the manipulation of data obtained<br />

from Hawala transactions can assist policy-makers <strong>and</strong> law en<strong>for</strong>cement in<br />

making the right decisions about the right people <strong>and</strong> the right entities to go<br />

<strong>for</strong>. This requires a number of different issues to be considered: behaviour,<br />

attitude, language, customs, values, beliefs, influence, institutions, power –<br />

political, economic, legal – social structures, clan tribe <strong>and</strong> ideology. These<br />

issues are critical to underst<strong>and</strong>ing the situation in areas of the developing<br />

world in which law en<strong>for</strong>cement or defence is expected to operate, often<br />

in support of the Foreign Office, DfID <strong>and</strong> the Stabilisation Unit. If that<br />

underst<strong>and</strong>ing is not in place from the beginning, the decision-making<br />

process may be flawed. Complex human <strong>and</strong> cultural dimensions play a<br />

large part in any decision-making cycle, <strong>and</strong> there are both inter- <strong>and</strong> intradependencies<br />

between these <strong>and</strong> the cultural, institutional, technological<br />

<strong>and</strong> physical environment.<br />

Underst<strong>and</strong>ing Trust in Networks<br />

Individuals who are involved with <strong>and</strong> control money or value are extremely<br />

highly trusted within a network, but how can these levels of trust be assessed,<br />

identified <strong>and</strong> quantified? A Hawala transaction may move across South<br />

Asia, through Afghanistan, Pakistan, Iran <strong>and</strong> the Gulf States. Each stage of<br />

the transaction will include different languages, currencies <strong>and</strong> methods<br />

of communication, including fax, e-mail, mobile-phone calls <strong>and</strong> Internet<br />

communications. The human activity taking place within those environments<br />

is incredibly complex even be<strong>for</strong>e the Hawala aspect is considered, <strong>and</strong><br />

comprises unstructured in<strong>for</strong>mation <strong>and</strong> inherent knowledge as well as<br />

data. Underst<strong>and</strong>ing this complexity is critical to underst<strong>and</strong>ing the decisionmaking<br />

cycle. What in<strong>for</strong>mation is in the ledgers <strong>and</strong> what does this actually<br />

mean – all the while bearing in mind that the in<strong>for</strong>mation may be on paper,<br />

rather than in electronic <strong>for</strong>mat?<br />

Another key question to ask is why we look at money, <strong>and</strong> in what context<br />

we look at it. It is important here to underst<strong>and</strong> two particular elements.<br />

First, there is the threat environment, which is a combination of interacting<br />

elements. We need to underst<strong>and</strong> the systematic dimension of the threat<br />

from cradle to grave, all along the line of process, during which there will be<br />

different data in different <strong>for</strong>mats: structured, unstructured, paper or digital.<br />

In order to underst<strong>and</strong> the system, all those different inputs need to be<br />

understood <strong>and</strong> made sense of. Ultimately, this will require an enterprise of


30<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

transnational co-operation: one analyst, or even one organisation, cannot do<br />

everything <strong>and</strong> there<strong>for</strong>e an approach is needed that allows the generation<br />

of best impact <strong>and</strong> best ef<strong>for</strong>t.<br />

One of the ways in which this is done is by breaking the process out into<br />

functions, underst<strong>and</strong>ing what the vulnerabilities are <strong>and</strong> underst<strong>and</strong>ing<br />

what actions are needed to generate effect against the vulnerabilities that<br />

have been identified as critical within the system or enterprise. This is only<br />

possible if a huge variety of data can be understood, which in turn requires<br />

data to be taken in from all of the different environments.<br />

The critical issues here are impact <strong>and</strong> benefit analysis: whether those data<br />

can be used to predict what may occur, <strong>and</strong> what particular action should be<br />

taken against them. This requires a huge range of hard <strong>and</strong> soft factors to be<br />

considered, including:<br />

• Social media<br />

• High per<strong>for</strong>mance analytics<br />

• Sociology<br />

• Link <strong>and</strong> entity extraction<br />

• Natural language processing<br />

• Anthropology<br />

• Semantic search<br />

• Node disambiguation<br />

• Graph databasing<br />

• Pattern <strong>and</strong> prediction<br />

• Visualisation<br />

• Fuzzy link analysis<br />

• Machine learning<br />

• Psychology<br />

• Web science<br />

• Predictive modelling<br />

• Linguistic analysis in microblogs.<br />

Industry <strong>and</strong> academia are approaching the challenges from all of these<br />

angles, <strong>and</strong> are working on ways to ensure that they work more coherently<br />

together. Sociology, anthropology <strong>and</strong> psychology are in italics in the above<br />

list as three examples that academics working in the area of machine learning<br />

in particular have indicated that they do not always consider. They feel that<br />

they would be well served by a better underst<strong>and</strong>ing of how their work could<br />

benefit from or impact on some of these disciplines – in particular psychology.<br />

Certainly in the case of link entity extraction (which extracts key entities such<br />

as names, locations, terms <strong>and</strong> dates <strong>and</strong> links them together), language<br />

processing <strong>and</strong> semantic search, they would benefit from considering the<br />

issues again from human, cultural <strong>and</strong> behavioural perspectives. How do


31<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

data analysts really underst<strong>and</strong> what is going on in those environments,<br />

so that the right decisions can be made based on the right interpretation<br />

of in<strong>for</strong>mation? Without this, bad decisions may be made that deliver an<br />

inappropriate <strong>and</strong> even damaging intervention, be it tactical, operational<br />

or strategic, because those making the decision did not underst<strong>and</strong> human<br />

behaviour.<br />

There is a huge amount of powerful, fascinating work that can done with<br />

large, structured data sets, but the real challenge is the high per<strong>for</strong>mance<br />

analytics that will support it also getting faster. The ability to do this mainly<br />

depends on how the unstructured feeds can be integrated, be these from<br />

social media or from the human characteristics identified by psychology,<br />

anthropology <strong>and</strong> sociology that actually allow the development of a<br />

holistic perspective of predictive modelling. Finance is a good way of trying<br />

to underst<strong>and</strong> certain characteristics of human behaviour, <strong>and</strong> so, from a<br />

research perspective would be a good place to start building up a better<br />

underst<strong>and</strong>ing of not only the finance networks themselves, but also the<br />

terrorist <strong>and</strong> criminal networks that sit behind them.<br />

Neil Bennett has moved on from his role since this conference, <strong>and</strong> RUSI<br />

has not been able to contact him to approve this paper <strong>for</strong> publication. We<br />

there<strong>for</strong>e apologise <strong>for</strong> any errors it may contain, <strong>and</strong> stress that these are<br />

the responsibility of the editorial process, not the speaker.


V: Terrorism <strong>and</strong> Political Risk Modelling<br />

Mark Lynch<br />

This paper will discuss the insurance industry’s assessment <strong>and</strong> dealings with<br />

<strong>Big</strong> <strong>Data</strong>, using the specific example of risk modelling around the threat <strong>and</strong><br />

likely impacts of political violence. It will provide a brief overview of how the<br />

insurance industry approaches the challenge of political violence, including<br />

how analytics have started to become a far more dominant component of<br />

this, <strong>and</strong> will show how <strong>Big</strong> <strong>Data</strong> is starting to filter in. It will then explore<br />

some of the challenges that are starting to emerge.<br />

The approach of the insurance industry to political violence is a particularly<br />

interesting example to consider as it resembles a microcosm of how business<br />

in general has dealt with <strong>Big</strong> <strong>Data</strong>, <strong>and</strong> also because the lessons learned by the<br />

insurance industry have a lot to offer other sectors with regard to resilience.<br />

Without insurance, the impact of a terrorist attack or widespread political<br />

violence would be greatly amplified. Terrorist violence can damage health,<br />

property <strong>and</strong> vehicles; result in interruption to businesses; <strong>and</strong> require<br />

compensation payments to those affected. Insurance is integral to enabling<br />

the reconstruction of buildings after attacks, facilitating payments to the<br />

families of those killed <strong>and</strong> seriously injured, <strong>and</strong> ensuring that victims are<br />

able to access disability benefits <strong>and</strong> other services as quickly as possible.<br />

As a result, the insurance market provides a fundamental component<br />

of resilience in an increasingly interconnected world. 1 Furthermore, the<br />

insurance industry holds a lot of <strong>Big</strong> <strong>Data</strong>, as it is very useful to the market<br />

to underst<strong>and</strong> the composition of claims <strong>and</strong> the spread of insurance, <strong>and</strong><br />

to identify indices that would establish whether an individual is more likely<br />

to be of a higher risk of losses (not exclusively tied to terrorism insurance).<br />

For example, it can be used to look at which areas were not insured or what<br />

claims were made <strong>for</strong> post-traumatic stress disorder following a terrorist<br />

attack. These aspects could be extremely useful in helping to make future<br />

resilience assessments, as they can help to highlight where vulnerabilities<br />

are more likely to occur. Such data held by the insurance industry could<br />

provide a rich vein of in<strong>for</strong>mation to academic, medical <strong>and</strong> governmental<br />

organisations if greater interaction was prioritised.<br />

What Constitutes Political Violence?<br />

The insurance industry has very specific terminology <strong>and</strong> definitions of<br />

what constitutes political violence. First, it segregates political violence into<br />

three components that can be insured individually or in conjunction with<br />

1. Claudia Aradau <strong>and</strong> Rens van Munster, ‘Insuring Terrorism, Assuring Subjects, Ensuring<br />

Normality: The Politics of Risk after 9/11’, Alternatives: Global, Local, Political (Vol. 33,<br />

No. 2, 2008), pp. 191–210.


33<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

others: terrorism or sabotage; strikes, riots <strong>and</strong> civil commotion; <strong>and</strong> war<br />

on l<strong>and</strong>. This scale becomes very important when writing exemptions <strong>and</strong><br />

incorporating this into recovery. Each area offers a distinct challenge to the<br />

insurance industry as a result.<br />

Political violence affects multiple business lines that are vital <strong>for</strong> the insurance<br />

industry. For example, if there was a blast at a theatre in London, the loss of<br />

revenue in the subsequent two years as a result of people not wanting to go<br />

to the theatre out of fear could have a massive effect. The market calls this<br />

contingency insurance <strong>and</strong> as a result the insurance market can be obliged to<br />

pick up the losses. Indeed, a number of studies have identified the material<br />

effect of terrorism on the tourism industry as international travellers are more<br />

likely to avoid perceived high-risk areas. 2 Similarly, business interruption<br />

has proven to be a key driving <strong>for</strong>ce <strong>for</strong> terrorism losses <strong>for</strong> the insurance<br />

industry <strong>and</strong> was the driving <strong>for</strong>ce behind the losses the insurance industry<br />

suffered in the wake of 9/11. 3 The sheer multitude of claims that were paid<br />

by the insurance sector following the 9/11 attacks highlights just how many<br />

disparate areas terrorism can affect within the insurance market.<br />

The overarching factor is that there are many unknowns <strong>and</strong> there<strong>for</strong>e<br />

many risks can occur that can have a harmful effect on the market. This is a<br />

particular challenge at present, when the insurance market is trying to exp<strong>and</strong><br />

into emerging markets such as Asia <strong>and</strong> Africa, where knowledge levels<br />

on these kinds of risks are very limited. Ironically, as the insurance market<br />

penetrates further into emerging markets, it needs to be able to calculate<br />

its own insurance policies, based on an underst<strong>and</strong>ing of the risks likely to<br />

be encountered. Indeed, among the fastest growing insurance markets in<br />

the world, seventeen out of twenty have suffered from either a sustained<br />

terrorism threat or from intensive rioting or civil commotion over the last<br />

ten to fifteen years. 4 There<strong>for</strong>e, as insurance markets grow, the emphasis on<br />

underst<strong>and</strong>ing this risk will grow significantly.<br />

New Approaches to Risk<br />

The greatest driving catalyst <strong>for</strong> the insurance industry in approaching these<br />

challenges was the 1993 Bishopsgate bomb attack by the IRA. That blast,<br />

which killed only one person but caused about £3-billion-worth of damage,<br />

almost crippled the whole sector. 5 Prior to this disaster, the sector had used<br />

2. Sevil Sonmez <strong>and</strong> Alan R Graefe, ‘Influence of Terrorism Risk on Foreign Tourism<br />

Decisions’, Annals of Tourism Research (Vol. 25, No. 1, 1998), pp. 112–44.<br />

3. Dixon, Lloyd <strong>and</strong> Kaganoff.<br />

4. Ernest <strong>and</strong> Young, Waves of Change: The Shifting Insurance L<strong>and</strong>scape in Rapid-Growth Markets,<br />

2014, ,<br />

accessed 19 July 2014.<br />

5. Andrew Silke, ‘Underst<strong>and</strong>ing Terrorism Target Selection’, in A Richards, P Fussey <strong>and</strong><br />

A Silke (eds), Terrorism <strong>and</strong> the Olympics: Major Event <strong>Security</strong> <strong>and</strong> Lessons <strong>for</strong> the<br />

Future (London: Routledge, 2010), pp. 49–66.


34<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

very limited analytical capabilities to quantify the threat: companies bought<br />

insurance without a keen underst<strong>and</strong>ing of the accumulations of risk that<br />

were developing. This is because terrorism operates in a unique manner<br />

compared with traditional perils such as earthquakes, floods <strong>and</strong> hurricanes<br />

that the market is used to dealing with regularly. 6 Terrorism is an intensive,<br />

highly localised threat that requires a keen underst<strong>and</strong>ing of the proximity of<br />

risks to each other <strong>and</strong> the epicentre of a given blast, which is something that<br />

was quite new to the insurance market. As a result, a number of companies<br />

retained large clusters of policies around central London, <strong>and</strong> following the<br />

Bishopsgate bomb attack many were unable to cover the losses stemming<br />

from it. This was exacerbated by 9/11, which cost the industry £22 billion, a<br />

figure which is only going to grow owing to further claims <strong>for</strong> debris inhalation<br />

<strong>and</strong> continuing incapacity <strong>and</strong> post-traumatic stress disorder claims. 7 This is<br />

a severe issue <strong>for</strong> the insurance market.<br />

The Impact on Society<br />

Any attack is terrible, however the misery stemming from an attack can<br />

be compounded greatly if there is no financial restitution <strong>for</strong> individuals<br />

to cover medical bills or the reconstruction of their businesses or homes. 8<br />

If the insurance market thinks political risk <strong>and</strong> terrorism is too risky, it<br />

will not offer insurance against it or it will put exemptions into insurance<br />

policies so that those affected will have to pay <strong>for</strong> damages themselves.<br />

Such exemptions may cover certain types of incidents or certain areas that<br />

are seen as being at high risk, or providing cover <strong>for</strong> these examples may<br />

push up premiums considerably. Insurance companies see this frequently<br />

with the issue of chemical <strong>and</strong> biological weapons: because there are various<br />

unknown factors in this field, insurance companies are reluctant to include<br />

this within their coverage as there are too many uncertainties associated<br />

with such an attack. The resilience challenge is significant, as the lack of<br />

available insurance may be more to do with a lack of analytical capabilities<br />

to assess these risks properly than a genuine inability to calculate likely risks<br />

<strong>and</strong> their impacts.<br />

As a result, it is clear that better interaction with the insurance sector is key<br />

to providing a holistic approach to resilience. Government agencies should<br />

seek to avoid a situation comparable to earthquake cover in Cali<strong>for</strong>nia,<br />

where market penetration has historically been extremely low as the cost of<br />

insurance is prohibitively high <strong>for</strong> most people <strong>and</strong> insurers are reluctant to<br />

6. H Kunreuther, E. Michel-Kerjan <strong>and</strong> B. Porter, Assessing, Managing, <strong>and</strong> Financing<br />

Extreme Events: Dealing with Terrorism (Cambridge, MA: National Bureau of Economic<br />

Research, 2003).<br />

7. Gail Makinen, Economic Effects of 9/11: A Retrospective Assessment (New York, NY:<br />

DIANE Publishing, 2011).<br />

8. R Roth Jr, Earthquake Insurance Protection in Cali<strong>for</strong>nia (Washington, DC: Joseph<br />

Henry Press, 1988).


35<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

be over-exposed within the Cali<strong>for</strong>nia area. In order to avoid this, a number<br />

of governments, including those of the UK, US <strong>and</strong> Germany, have provided<br />

at least partial state backstops against terrorist attacks. 9 However, it is only<br />

through better analysis <strong>and</strong> a grasp of <strong>Big</strong> <strong>Data</strong> that the insurance sector can<br />

truly feel com<strong>for</strong>table with the risk of terrorism.<br />

Incorporating <strong>Big</strong> <strong>Data</strong> into Analysis<br />

In previous years, risk analysis could often be a case of underwriters<br />

assessing the risk based on a preconceived underst<strong>and</strong>ing of political unrest<br />

following an extremely rudimentary analysis. However, over the last twenty<br />

years this has changed considerably. Insurance companies have started<br />

to hire scientists, statisticians <strong>and</strong> security experts who are able to better<br />

incorporate data into analysis. The risks from natural hazards, such as<br />

hurricanes or earthquakes, are now well understood by the market, <strong>and</strong> the<br />

underst<strong>and</strong>ing of political risk needs to reach similar levels in the future.<br />

Political risk is not only a less well-understood subject area, but there are<br />

also many more variables involved as it is a more qualitative subject. In order<br />

to incorporate analysis into political risk calculation, it is necessary to delve<br />

into the historical records as well as simply looking at present-day data.<br />

For example, Aon Benfield’s 2014 Interactive Political Risk Map shows how<br />

different countries’ histories of rioting <strong>and</strong> civil commotion over the past ten<br />

to fifteen years can be mapped out <strong>and</strong> analysed. 10 Analysts need to be able<br />

to identify different patterns <strong>for</strong> different regions, <strong>and</strong> subsequently provide<br />

this in<strong>for</strong>mation to the insurance industry in order to flag up certain areas<br />

that are at greater risk than others. It is very easy to look back <strong>and</strong> decipher<br />

the risks at certain points in history, but the key progression would be to look<br />

<strong>for</strong>ward. A very important part of this research is trying to pull data <strong>for</strong> GDP<br />

statistics <strong>and</strong> mortality <strong>for</strong> regions, <strong>and</strong> to see how those variables fit with<br />

incidences of political violence. Indeed, a number of statistical studies have<br />

shown that key identifiers such as unemployment <strong>and</strong>, in particular, infant<br />

mortality can be keen indicators of political unrest, particularly if there is a<br />

significant statistical switch. 11 Furthermore, historical analysis is extremely<br />

useful to the analysis of terrorism modelling, another area where there is<br />

much variation <strong>and</strong> uncertainty, particularly if plots <strong>and</strong> failed attacks are<br />

included.<br />

9. Alfonso Najera, Terrorism Coverage Schemes: A Comparative Table, 2011, , accessed 12 July 2014.<br />

10. Aon Risk Solutions, ‘Aon’s 2014 Interactive Political Risk Map’, 2014, , accessed 14 July 2014.<br />

11. J A Goldstone et al., ‘A Global Model <strong>for</strong> Forecasting Political Instability’, American<br />

Journal of Political Science (Vol. 54, No. 1, 2010), pp. 190–208.


36<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Extracting Useful <strong>Data</strong> <strong>and</strong> Avoiding Biases in Analysis<br />

Good data do exist if analysts are fully aware of the advantages they offer<br />

<strong>and</strong> how to use them. The Global Terrorism <strong>Data</strong>base 12 <strong>and</strong> the RAND<br />

Corporation 13 are two examples of highly respected organisations that take<br />

data accumulation <strong>and</strong> analysis seriously: both are heavyweight analysts of<br />

terrorism history <strong>and</strong> can provide a wealth of statistical data <strong>for</strong> the security<br />

<strong>and</strong> insurance sector.<br />

However, some of the challenges the insurance industry faces involve more<br />

quantitative data analysis methods. For example, there is a major issue<br />

surrounding the quality of data that the insurance companies themselves<br />

hold; they have vast amounts of data but not all of them are useful,<br />

particularly in emerging markets where geographical data are sparse. This<br />

makes a big difference when attempting to underst<strong>and</strong> the risk; terrorism or<br />

even political violence is often a very localised threat, thus a correspondingly<br />

localised underst<strong>and</strong>ing is needed, from units of measurement to political<br />

parties to currency <strong>and</strong> financial transactions. Similarly, there are a<br />

number of internal constraints holding the industry back from providing a<br />

comprehensive underst<strong>and</strong>ing of the risk. Insurance companies do not share<br />

data with each other <strong>and</strong> as a result it is difficult to get a holistic picture of<br />

the degree of terrorism coverage, or indeed the types of clients taking up<br />

this cover. This is important as such analysis of what is covered, what sort of<br />

policies are in place <strong>and</strong> the size of the clients themselves have a material<br />

impact on the ability of a state to recover following a catastrophic event.<br />

Furthermore, there are many challenges regarding privacy, <strong>and</strong> governments<br />

or companies who are unwilling to provide sufficient data. Even where the<br />

data do exist, it is difficult to know how to sell them to the client to enable<br />

them to be used appropriately.<br />

Conclusions <strong>and</strong> Recommendations<br />

It is a great shame that insurance companies do not tend to have the<br />

means to look at these data analytically; if they did, their analytics would<br />

be substantially better, particularly those on loss history. There are a lot<br />

of data out there on the length of time people are off injured following a<br />

terrorist attack, the effect of post-traumatic stress disorder, the variation per<br />

country or the amount of time it takes <strong>for</strong> a business to recuperate after an<br />

attack. All of this would be extremely useful <strong>for</strong> the academic <strong>and</strong> scientific<br />

communities. The insurance industry would greatly benefit from the opening<br />

up of governments’ empirical data <strong>and</strong> greater involvement of government<br />

on the level of security clearance. The insurance sector is already heavily<br />

regulated on data security, owing to the sensitive financial data that it<br />

holds, so it would not be a giant leap to allow certain key representatives<br />

in the insurance industry clearance to access <strong>and</strong> dissect certain pieces<br />

12. Global Terrorism <strong>Data</strong>base, , accessed 14 July 2014.<br />

13. RAND, , accessed 14 July 2014.


37<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

of classified material. While there may be concern among the public that<br />

private companies can access sensitive data <strong>and</strong> that their private insurance<br />

data could be looked at by the security services, these fears are largely<br />

unfounded. Most data that the insurance industry have are aggregated to a<br />

policy level so personal data about individual names or addresses are usually<br />

unavailable. Similarly, as long as the security services can vet individuals<br />

accessing the data <strong>and</strong> keep the numbers down, additional access to secure<br />

material should not be a significant hindrance.<br />

As a result, the interaction between government <strong>and</strong> the academic <strong>and</strong><br />

insurance sectors could be extremely rewarding. Each sector has processed<br />

a significant amount of data points <strong>and</strong> analysis that is simply unavailable<br />

to the other. Proving analytical in<strong>for</strong>mation about the primary threat levels,<br />

changing dynamics <strong>and</strong> targeting analysis to the insurance sector would<br />

allow the market to avoid the hyperbole that was witnessed in the insurance<br />

market following the 9/11 attacks. While on the other h<strong>and</strong>, in<strong>for</strong>mation on<br />

market coverage, <strong>and</strong> quantification of the time taken to recover from an<br />

attack, whether business interruption, medical rehabilitation or international<br />

comparisons, are all held by the insurance sector, which would be incredibly<br />

useful <strong>for</strong> government <strong>and</strong> academic partners.<br />

<strong>Big</strong> <strong>Data</strong> has begun to play a much more prominent role in the insurance<br />

industry. Whether this will have a positive or negative impact <strong>for</strong> clients is<br />

uncertain, but this has begun to be a more accepted branch of science among<br />

the community. Greater co-operation stimulated between the business <strong>and</strong><br />

academic communities <strong>and</strong> government would enable a greater impact to<br />

be made in this field.<br />

Mark Lynch is Head of Impact Forecasting’s Terrorism <strong>and</strong> Political vVolence<br />

Modelling Team. He has a background in international security analysis <strong>and</strong><br />

counter-terrorism <strong>and</strong> is responsible <strong>for</strong> the composition of <strong>and</strong> academic<br />

input into Impact Forecasting’s human-security catastrophe models, including<br />

terrorism, rioting <strong>and</strong> drug cartel violence. Mark has a Master’s degree in<br />

International <strong>Security</strong> Studies from the Centre <strong>for</strong> the Study of Terrorism<br />

<strong>and</strong> Political Violence <strong>and</strong> he previously worked in the Royal United Services<br />

Institute’s National <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong> Department. He has also worked<br />

with the London School of Economics, analysing violent manifestations of<br />

nationalism <strong>and</strong> has been published on the changing nature of nationalist<br />

<strong>and</strong> Islamic fundamentalist terrorism in the twenty-first century.


VI: Intelligent Use of Electronic <strong>Data</strong> to Enhance<br />

Public Health Surveillance<br />

Edward Velasco<br />

The exchange of health in<strong>for</strong>mation on social media <strong>and</strong> the Internet would<br />

appear to offer obvious opportunities to gain insight into emerging disease<br />

outbreaks. With new initiatives like Google Flu Trends 1 <strong>and</strong> HealthMap, 2 there<br />

are now more ways than ever be<strong>for</strong>e to monitor outbreaks. This paper will<br />

explore the opportunities these initiatives offer to public health practitioners<br />

trying to detect emerging diseases in their regions.<br />

From an epidemiological perspective, there are obvious advantages to<br />

decreasing the time needed to detect an infectious disease health event,<br />

so that appropriate prevention or mitigation measures can be undertaken<br />

as quickly as possible. The existing types of public health surveillance<br />

systems are indicator-based <strong>and</strong> event-based surveillance. Indicator-based<br />

surveillance, the oldest <strong>and</strong> most commonly found system, is widely used by<br />

regional, national <strong>and</strong> international public health agencies. These systems<br />

are designed to collect <strong>and</strong> analyse structured data, based on protocols<br />

tailored to each disease, including calculating the incidence, seasonality <strong>and</strong><br />

burden of disease. Their goal is to find increased numbers or clusters that<br />

might indicate a threat. There is generally a time lag between the occurrence<br />

of an event <strong>and</strong> its detection by indicator-based surveillance, however; these<br />

systems lack the ability to detect potential threats more quickly. In addition,<br />

they are not equipped to detect new or unexpected disease occurrences<br />

because they only collect predefined epidemiological attributes <strong>for</strong> each<br />

disease. This is why the first cases of Severe Acute Respiratory Syndrome<br />

Coronavirus (SARS-CoV) in Asia, <strong>for</strong> example, a new strain of viral infection,<br />

were not detected sooner. 3<br />

Instead of relying on official reports, event-based surveillance in<strong>for</strong>mation<br />

is obtained directly from witnesses of real-time events or indirectly from a<br />

variety of communication channels, including social media <strong>and</strong> established<br />

alert systems, as well as from in<strong>for</strong>mation channels such as the news<br />

media, public health networks <strong>and</strong> NGOs. Because it occurs in ‘real time’,<br />

event-based surveillance can identify events faster than indicator-based<br />

surveillance <strong>and</strong> can identify new events that will not be picked up by<br />

indicator-based surveillance. Health in<strong>for</strong>mation monitored via the Internet<br />

<strong>and</strong> social media is an important part of event-based surveillance, <strong>and</strong> is<br />

1. Google Flu Trends, , accessed 14 July 2014.<br />

2. Healthmap, , accessed 14 July 2014.<br />

3. C Castillo-Delgado, ‘Trends <strong>and</strong> Directions of Global Public Health Surveillance’,<br />

Epidemiologic Reviews (Vol. 32, No. 1, 2010), pp. 93–109.


39<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

most often the focus of existing event-based surveillance systems. Research<br />

has shown that event-based surveillance identifies trends comparable to<br />

those found using established indicator-based surveillance methods. 4 In<br />

practice, however, event-based surveillance systems have not been widely<br />

accepted <strong>and</strong> integrated into mainstream use by national <strong>and</strong> international<br />

health authorities, mainly because they have not yet been systematically<br />

evaluated within a public health agency.<br />

From 2010 to 2012, the Robert Koch Institute, the national public health<br />

agency of Germany, participated in a multidisciplinary scientific consortium<br />

to develop novel methods <strong>for</strong> an event-based surveillance tool to be<br />

integrated into infectious disease monitoring alongside other national<br />

surveillance activities. The multinational team produced a web-based<br />

plat<strong>for</strong>m (the Medical Ecosystem or M-Eco) 5 to develop technologies that<br />

are new to event-based surveillance (<strong>and</strong> have not yet been featured in<br />

existing systems, as also evidenced by a literature review that was previously<br />

conducted). These include content analysis using enhanced data processing<br />

(including stemming, POS 6 tagging <strong>and</strong> named entity recognition) <strong>and</strong> data<br />

collection from various user-generated content resources – including socialmedia<br />

content (such as Twitter) <strong>and</strong> radio <strong>and</strong> TV media transmissions<br />

(transcripts provided by a special media service).<br />

Detection mechanisms were developed to scan the Internet continuously<br />

<strong>for</strong> these media types, based on the simple semantic (disease names <strong>and</strong><br />

symptoms) <strong>and</strong> statistically relevant (search algorithms) epidemiological<br />

requirements that were deemed critical <strong>for</strong> the surveillance of different<br />

infectious diseases. Development of these functionalities resulted in a<br />

‘search function’ on a web-based user interface that enabled epidemiologists<br />

to monitor ‘mentions’ of diseases <strong>and</strong> symptoms on Twitter <strong>and</strong> news media<br />

(fed via a news aggregate technology) over time, geo-located where possible<br />

to enable comparison with other sources of epidemiological in<strong>for</strong>mation,<br />

including st<strong>and</strong>ard governmental infectious disease surveillance <strong>and</strong><br />

monitoring.<br />

Automated technologies provided signals <strong>for</strong> the risk assessment of<br />

infectious disease events to public health epidemiologists in a user-friendly,<br />

rapid <strong>and</strong> easy way. Lastly, policy concerns regarding the integration of<br />

the developed technologies <strong>for</strong> existing public health infectious disease<br />

4. S Doan et al., ‘Global Health Monitor – A Web-Based System <strong>for</strong> Detecting <strong>and</strong><br />

Mapping Infectious Diseases’, 2007; D M Hartley et al., ‘An Overview of Internet<br />

Biosurveillance’, Clinical Microbiology <strong>and</strong> Infection (Vol. 19, No. 11, 2013), pp. 1006–<br />

13; J P Linge et al., ‘Internet Surveillance Systems <strong>for</strong> Early Alerting of Health Threats’,<br />

Euro Surveillance: European Communicable Disease Bulletin (Vol. 14, No. 13, 2009).<br />

5. M-Eco, , accessed 14 July 2014.<br />

6. Part of speech tagging. An explanation of this process can be found at , accessed 14 July 2014.


40<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

surveillance infrastructures were explored. Integrating these technologies<br />

into the surveillance software of the German national health institute<br />

was a goal, with the potential to scale up to other countries based on this<br />

experience.<br />

Figure 2: Components <strong>and</strong> processing pipeline of the M-Eco system.<br />

Evaluation of the Prototype System <strong>for</strong> Event-Based Surveillance<br />

The first of a series of three evaluations attempted to illustrate how well<br />

the system generates signals <strong>for</strong> potential events of public health interest.<br />

A simulation with Twitter was conceived, where thirteen scientists created<br />

tweets <strong>for</strong> three mock infectious disease event scenarios within the<br />

simulation:<br />

• An outbreak of measles in a local school<br />

• An outbreak of Salmonellosis among attendees of a European football<br />

championship<br />

• Cases of hepatitis A appearing in travellers returning to Germany<br />

from North Africa.


41<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

The mock tweets were fed into the M-Eco technology (Figure 2), which<br />

combined them with real-world tweets that were taken from real users of<br />

Twitter <strong>and</strong> subsequently analysed. There were fewer retrieved signals that<br />

referred to true outbreaks than expected: only around a third (31 per cent)<br />

were relevant, compared with the 75–80 per cent expected by evaluators.<br />

While it is difficult to say whether or not this is because of the lack of actual<br />

tweets matching the three scenarios, the main assumption from these results<br />

was that the chosen keywords <strong>for</strong> identification via search <strong>and</strong> screening<br />

algorithms were not comprehensive enough. This could be because many<br />

tweets are written in a vernacular that does not match the <strong>for</strong>mal medical<br />

terms used in the preset keyword lists used in the automated analyses.<br />

A subsequent evaluation tested the hypothesis that the M-Eco technology<br />

could produce viable signals in real time during a large mass-gathering event.<br />

This assumption was tested during the European Football Championship,<br />

which took place in Pol<strong>and</strong> <strong>and</strong> Ukraine in 2012.<br />

Signals were provided to ‘subscribers’, also known as epidemiologists, at<br />

the Robert Koch Institute, <strong>and</strong> one state public health agency in Saxony,<br />

Germany, which received daily deliveries of signals that they then monitored<br />

<strong>for</strong> relevance alongside regular work. As in previous evaluation ef<strong>for</strong>ts,<br />

a lower number of signals were produced than was expected: only an<br />

average of twenty signals on average per day. There were 242 signals during<br />

the event <strong>and</strong>, of these, only thirteen were relevant over the event time.<br />

Similar problems with keywords or terms were recorded, such as the use of<br />

vernacular or the off-use of terms, <strong>for</strong> example ‘football fever’, ‘weakness’ of<br />

players’ ability or ‘headache’ from watching poor per<strong>for</strong>mance.<br />

An additional evaluation was completed over three weeks in order to measure<br />

the appropriateness of the developed system <strong>for</strong> daily epidemiological<br />

monitoring of infectious diseases <strong>and</strong> related symptoms relevant <strong>for</strong><br />

Germany, using the M-Eco search interface. The evaluation exercise was<br />

based on criteria <strong>for</strong> inclusion <strong>and</strong> exclusion. Diseases that were deemed<br />

to be more prevalent in Germany were chosen (rarely occurring tropical<br />

diseases were not searched, <strong>for</strong> instance). Additionally, priority was given to<br />

those diseases <strong>and</strong> symptoms likely to be discussed in the general population<br />

via social media (because of popularity <strong>and</strong> general ubiquity) or those<br />

less likely to induce social stigma. Diseases that were deemed seasonally<br />

irrelevant <strong>for</strong> the time period, such as Tick-Borne Encephalitis (TBE), which<br />

primarily occurs in the summer months, were excluded. Other diseases were<br />

excluded because they occur so rarely that experts found a high likelihood<br />

of them remaining unmentioned on social media, <strong>for</strong> example Q-Fever, or<br />

because of their uncommon prevalence, or a faster or more severe onset of<br />

disease (<strong>and</strong> there<strong>for</strong>e higher likelihood to be detected by other sources) in<br />

Germany, <strong>for</strong> example Hemorrhagic (West Nile) Fever or tuberculosis.


42<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Search terms were entered into the M-Eco search function. Each search<br />

term was allocated to one of four epidemiologists, <strong>and</strong> evaluators<br />

monitored their terms daily with regards to the number of resulting signals<br />

(matching the search term <strong>and</strong> defined by location – Germany); whether<br />

there was an indication of larger events (an outbreak); whether signals<br />

were relevant to their work; <strong>and</strong> whether the search results were found<br />

in another epidemiological surveillance source. Evaluators also provided<br />

qualitative feedback on their experience during the evaluation, focusing on<br />

the integration of such monitoring into their regular epidemiological work,<br />

general feedback <strong>and</strong> any required improvements.<br />

Signals came in primarily indicating influenza or flu. When graphed over time<br />

by date, it was clear that two large waves of signals came up <strong>for</strong> ‘flu’ <strong>and</strong><br />

coincided with media coverage of ‘flu shots’, namely the fact that Germany had<br />

been experiencing a shortage of vaccination coverage. Another interesting<br />

dip, where no signals at all appeared, coincided with the weekend, indicating<br />

that tweets may have patterns that correspond to days of the week.<br />

Figure 3: Monitored signals <strong>for</strong> the search terms in the M-Eco search<br />

function over time.<br />

Note: Black line shows the trend <strong>for</strong> all signals.<br />

The evaluation showed that search terms used primarily by medical<br />

professionals were most prevalent, indicating that more signals might be<br />

derived from the media or reports. A hypothesis made from this finding was<br />

that tweets were mainly from media sources <strong>and</strong> that media tended to break<br />

off at the weekend.


43<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Qualitative evaluation indicated overall acceptance of the concept <strong>for</strong> the<br />

system. Evaluators generally found it impressive that tweets about health can<br />

be monitored. They appreciated that the system provided them with signals<br />

based on aggregated social-media sources, <strong>and</strong> there<strong>for</strong>e allowed <strong>for</strong> easier<br />

<strong>and</strong> faster monitoring of many social-media sources in one place – something<br />

that could not be processed manually within a limited amount of time.<br />

General Discussion Points<br />

The experience with Twitter shows that the total number of signals retrieved<br />

by the prototype was smaller than initially expected throughout evaluation.<br />

This could be due to a smaller overall number of German-language tweets;<br />

social media has been shown to be dominated by English-language users<br />

(many of these in the US), which would result in fewer social-media documents<br />

<strong>and</strong> signals in the German language. 7 Additionally, there could be a perceived<br />

social stigma associated with certain terms <strong>for</strong> diseases or symptoms that<br />

yielded fewer results. 8 This was possibly evident by looking at the retrieved<br />

signals related to flu-related versus gastroenteritis-related symptoms (see<br />

Figure 3). When talking about headache, fever or flu, there may not be such<br />

social stigma, but gastrointestinal diseases, although sometimes mentioned<br />

by the media during large outbreaks, are not necessarily those illnesses<br />

fervently discussed in social media. One is more likely to speak publicly<br />

about a ‘headache’ than about ‘bloody diarrhoea’. Not surprisingly, words<br />

indicating gastrointestinal diseases <strong>and</strong> related symptoms were not very<br />

common in the social-media content retrieved in this evaluation. This is in<br />

contrast to the official notification system in Germany, where gastrointestinal<br />

diseases play an important role compared to flu-like illnesses.<br />

Although the results suggest that Twitter is a useful source of additional<br />

in<strong>for</strong>mation, the difference between media reports <strong>and</strong> personal reports<br />

remains a significant issue. Reports that originate in news media are easier<br />

to retrieve as they tend to reflect language <strong>and</strong> keywords that accurately<br />

mirror health <strong>and</strong> medical terminology. They are also more likely to refer to<br />

outbreaks. Personal reports are hard to detect. Two groups of tweets written<br />

by individuals were identified: those in which people refer to media reports<br />

<strong>and</strong> those in which people refer to a health status (<strong>for</strong> example, a tweet<br />

with content on ‘own health status’ or someone related to it – perhaps, a<br />

joke about health symptoms). The research suggests that people are much<br />

more likely to exchange in<strong>for</strong>mation about less-serious health conditions like<br />

tiredness or nausea than about more serious conditions. A person will share<br />

the fact that they have a headache, <strong>for</strong> example, but not that their recent<br />

cancer diagnosis <strong>and</strong> accompanying antibiotics cause severe diarrhoea.<br />

7. T Webster, Twitter Usage in America: 2010, 2010.<br />

8. T H A Correa <strong>and</strong> H Zuniga, ‘Who Interacts on the Web? The Intersection of Users’<br />

Personality <strong>and</strong> Social Media Use’, Computers in Human Behavior (Vol. 26, No. 7,<br />

2010).


44<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Additionally, it is difficult <strong>for</strong> the system to detect such reports, as they do<br />

not often contain recognisable health or medical terms, but rather content<br />

that includes paraphrases <strong>and</strong> variable language, such as slang (<strong>for</strong> example,<br />

‘the squirts’ versus ‘diarrhoea’) or abbreviations that may include alternative<br />

spellings (e.g. ‘shot 2 l8 4 flu’ versus ‘flu-shot too late <strong>for</strong> flu season’). More<br />

research into language in social-media use is needed be<strong>for</strong>e text-mining <strong>for</strong><br />

infectious disease-relevant in<strong>for</strong>mation can be best technically addressed. 9 It<br />

will be a continuing task to improve the algorithms to better match a constantly<br />

changing media l<strong>and</strong>scape, <strong>and</strong> the language <strong>and</strong> socio-cultural h<strong>and</strong>ling of<br />

social media by the populations whose health needs to be monitored. In other<br />

research, technologies developed to deal with these issues rely on spurious<br />

correlations, leaving keyword-based methods vulnerable to false alarms. 10<br />

Place <strong>and</strong> location are a critical part of outbreak detection, early warning<br />

<strong>and</strong> epidemiological work, <strong>and</strong> will be of increasing utility to health scientists<br />

wanting to monitor diseases using social media. Throughout the development<br />

<strong>and</strong> use of the prototype, geo-location has been a difficult component to<br />

analyse, because of a lack of data. When using Twitter as a data source,<br />

geo-location is not always included in user profiles, <strong>and</strong> users do not always<br />

disclose a location in the content of their tweets. Colleagues looking to geolocation-stamping<br />

have tried ways to analyse textual in<strong>for</strong>mation from the<br />

content of social media in order to provide in<strong>for</strong>mation on location, <strong>and</strong> such<br />

statistical learning frameworks seem to be successful, but can introduce a<br />

high level of complexity. 11 Chalenkha <strong>and</strong> Collier examined geo-encoding of<br />

outbreak reports with more detailed granularity, but found the encoding of<br />

health in<strong>for</strong>mation from reports time-consuming <strong>and</strong> expensive. Automated<br />

systems tend to leave out too much in<strong>for</strong>mation. As a solution, the authors<br />

propose a scheme called ‘spaciotemporal zoning’, in which they analyse<br />

events reported in sources with regard to temporal in<strong>for</strong>mation as a means<br />

to mitigate the limitations of current report-based surveillance systems. 12<br />

Sensitivity <strong>and</strong> specificity remain tough factors in the process of signal generation.<br />

It is essential that enough data are captured, so that important in<strong>for</strong>mation is not<br />

9. N Collier et al., ‘A Multilingual Ontology <strong>for</strong> Infectious Disease Surveillance: Rationale, Design<br />

<strong>and</strong> Challenges’, Language Ressources <strong>and</strong> Evaluation (Vol. 40, No. 3/4, 2006), pp. 405–13; M<br />

Conway et al., ‘Classifying Disease Outbreak Reports Using N-grams <strong>and</strong> Semantic Features’,<br />

International Journal of Medical In<strong>for</strong>matics (Vol. 78, No. 12, 2009), pp. e47–e58.<br />

10. A Culotta, ‘Detecting Influenza Outbreaks by Analyzing Twitter Messages’, 2010,<br />

, accessed 20 August 2014.<br />

11. V Lampos <strong>and</strong> N Christianini, ‘Nowcasting Events from the Social Web with Statistical<br />

Learning’, ACM Transactions on Intelligent Systems <strong>and</strong> Technology, 2011.<br />

12. H Chanlekha <strong>and</strong> N Collier, ‘A Methodology to Enhance Spatial Underst<strong>and</strong>ing of Disease<br />

Outbreak Events Reported in News Articles’, International Journal of Medical In<strong>for</strong>matics<br />

(Vol. 79, No. 4, 2010), pp. 284–96; H Chanlekha, A Kawazoe <strong>and</strong> N Collier, ‘A Framework<br />

<strong>for</strong> Enhancing Spatial <strong>and</strong> Temporal Granularity in Report-Based Health Surveillance<br />

Systems’, BMC Medical In<strong>for</strong>matics <strong>and</strong> Decision Making (Vol. 10, No. 1, 2010).


45<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

overlooked, but simultaneously that not too much is presented to the user, so as<br />

not to overwhelm. The work suggests that signals are only relevant if the personal<br />

tweet mentions an actual outbreak, but this could be limited by existing knowledge<br />

of outbreaks. Signals are sometimes generated by cognates of disease names or<br />

symptoms, or words that sound like disease names or symptoms. 13 As mentioned<br />

above, there are various linguistic aspects that must be constantly improved.<br />

Despite the a<strong>for</strong>ementioned technical challenges, the results of this prototype<br />

evaluation indicated that social media (Twitter) should not be ruled out <strong>for</strong><br />

infectious disease surveillance. Although it has not yet been possible, it<br />

would be ideal to integrate this work alongside indicator-based surveillance<br />

ef<strong>for</strong>ts over an extended timeframe to give a better sense of the true value<br />

as events arise in real time. Further evaluations in the future are needed in<br />

order to measure a true epidemiological impact over time <strong>and</strong> in context.<br />

The experience with the M-Eco prototype provided a means to look further<br />

behind the systematic acquisition <strong>and</strong> processing of social-media data in<br />

health monitoring. Depending on the content available in social media, health<br />

officials can receive in<strong>for</strong>mation about potential health threats earlier or they<br />

can receive additional in<strong>for</strong>mation on health threats already detected by<br />

another system. The M-Eco prototype has been designed to offer automated<br />

methods <strong>and</strong> technologies to rapidly provide signals <strong>for</strong> the detection of<br />

infectious disease events. 14 This is challenging, <strong>and</strong> more time is needed to<br />

explore ways to evaluate such a system <strong>and</strong> the resulting signals over a longer<br />

period. Previous evaluations of event-based surveillance systems have been<br />

completed only to a limited extent, <strong>and</strong> there are very few examples to draw<br />

from. 15 In addition to speeding up the detection process through bypassing<br />

traditional indicator-based surveillance structures, event-based surveillance<br />

can also provide innovation in settings where weak or underdeveloped<br />

13. R Steinberger et al., ‘Text Mining from the Web <strong>for</strong> Medical Intelligence’, in F<br />

Fogelman-Soulié et al. (eds), Mining Massive <strong>Data</strong> Sets <strong>for</strong> <strong>Security</strong> (Amsterdam: IOS<br />

Press; 2008), pp. 295–310; R. Yangarber, R Steinberger et al., ‘Combining In<strong>for</strong>mation<br />

Retrieval <strong>and</strong> In<strong>for</strong>mation Extraction <strong>for</strong> Medical Intelligence’, Proceedings of Mining<br />

Massive Dara Sets <strong>for</strong> <strong>Security</strong> NATO Advanced Study Institute Gazzada, Italy, 2007.<br />

14. G Eysenbach, ‘Medicine 2.0: Social Networking, Collaboration, Participation,<br />

Apomediation, <strong>and</strong> Openness’, Journal of Medical Internet Research (Vol. 10, No.<br />

3, 2008), p. e22; G Eysenbach, ‘Infodemiology <strong>and</strong> Infoveillance: Framework <strong>for</strong> an<br />

Emerging Set of Public Health In<strong>for</strong>matics Methods to Analyze Search, Communication<br />

<strong>and</strong> Publication Behavior on the Internet’, Journal of Medical Internet Research (Vol.<br />

11, No. 1, 2009), p. e11; T W Grein et al., ‘Rumors of Disease in the Global Village:<br />

Outbreak Verification’, Emerging Infectious Diseases (Vol. 6, No. 2, 2000), pp. 97–102;<br />

M Keller et al., ‘Use of Unstructured Event-based Reports <strong>for</strong> Global Infectious Disease<br />

Surveillance’, Emerging Infectious Diseases (Vol. 15, No. 5, 2009), pp. 689–95.<br />

15. J S Brownstein <strong>and</strong> C C Friefeld, Evaluation of Internet-Based In<strong>for</strong>mal Surveillance<br />

<strong>for</strong> Global Infectious Disease Intelligence, 2008; J S Brownstein, C C Freifeld, B Y Reis<br />

<strong>and</strong> K D M<strong>and</strong>le, Evaluation of Online Media Reports <strong>for</strong> Global Infectious Disease<br />

Intelligence, 2007.


46<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

surveillance systems are in place. Currently, several developing countries face<br />

such realities, <strong>and</strong> since socioeconomic disparities <strong>and</strong> poor or insufficient<br />

surveillance infrastructures often have broader consequences in the event of<br />

an outbreak, the potential gain is worth exploring. In these contexts that share<br />

a larger burden than most, the development of surveillance that can access<br />

health in<strong>for</strong>mation in the absence of traditional surveillance institutions<br />

could be critical to the early detection <strong>and</strong> prevention of infectious disease<br />

at the earliest stage to prevent an epidemic outbreak or reduce its impact.<br />

Recent work has begun in this area in order to seek in<strong>for</strong>mation on health<br />

threats using mobile-phone technology, Internet scanning tools, e-mail<br />

distribution lists or networks that complement the early warning function of<br />

routine surveillance systems. 16 The research has shown that the majority of<br />

existing event-based surveillance systems are situated in North America <strong>and</strong><br />

Europe. Local, event-based systems to monitor epidemic threats in Africa,<br />

Asia, Oceania <strong>and</strong> South America are scarce. Guidance <strong>and</strong> training to create<br />

such systems on the ground should be considered, <strong>and</strong> can lead to a faster<br />

assessment of arising health threats <strong>and</strong> improved rapid response by local<br />

authorities.<br />

Edward Velasco is a senior scientist at the Robert Koch Institute, the<br />

national public health agency of Germany. He provides scientific advising<br />

<strong>and</strong> technical support in the Division of Healthcare-Associated Infections,<br />

Antimicrobial Resistance <strong>and</strong> Consumption, including outbreak management<br />

<strong>and</strong> research on clinical <strong>and</strong> social risk factors <strong>for</strong> antimicrobial resistance.<br />

He has widespread experience in infectious disease epidemiology <strong>and</strong> has<br />

consulted with the European Centre <strong>for</strong> Disease Prevention <strong>and</strong> Control on<br />

quality evaluation <strong>for</strong> surveillance systems in EU member states. He has<br />

held positions in evaluation at the Open Society Foundation, London <strong>and</strong><br />

the Social Science Research Centre, Berlin. He has a doctorate in medical<br />

sciences from Charité University Hospital, the joint medical school of the<br />

Humboldt <strong>and</strong> Free Universities of Berlin, <strong>and</strong> a Master of Science in Social<br />

Epidemiology from the Harvard School of Public Health. He can be reached<br />

on VelascoE@rki.de.<br />

16. J P Chretien <strong>and</strong> S H Lewis, ‘Electronic Public Health Surveillance in Developing<br />

Settings: Meeting Summary’, BMC Proceedings (Vol. 2, Suppl. 3, 2008), p. S1; J P<br />

Chretien et al., ‘Syndromic Surveillance: Adapting Innovations to Developing Settings’,<br />

PLoS Medicine (Vol. 5, No. 3, 2008), p. e72; C Robertson et al., ‘Mobile Phone-Based<br />

Infectious Disease Surveillance System, Sri Lanka’, Emerging Infectious Diseases (Vol.<br />

16, No. 10, 2010), pp. 1524–31.


VII: The Raxibacumab Experience: The First Novel<br />

Product Approved Under the US Food <strong>and</strong> Drug<br />

Administration ‘Animal Rule’<br />

Chia-Wei Tsai<br />

The analysis of data – big <strong>and</strong> small – is central to the US government’s<br />

approach to establishing requirements <strong>and</strong> procurement goals <strong>for</strong> medical<br />

countermeasures <strong>for</strong> chemical, biological, radiological <strong>and</strong> nuclear events 1<br />

<strong>and</strong> their approval or licensure by the Food <strong>and</strong> Drug Administration (FDA). 2<br />

Multiple scenarios with a wide variety of variables, such as location, time of<br />

year <strong>and</strong> weather conditions, are analysed to project the potential impact<br />

on humans, animals <strong>and</strong> commerce. Additional analysis is carried out to<br />

identify gaps in resources that are needed versus those that are available,<br />

<strong>and</strong> on their ability to be used successfully, including the logistics that are<br />

affected by the emergency. It is the policy of the US government to seek<br />

FDA approval or licensure <strong>for</strong> these medical countermeasures while they<br />

are being developed or stockpiled. The efficacy of many of these products<br />

cannot ethically be evaluated in humans <strong>and</strong> there<strong>for</strong>e their regulatory path<br />

relies on the Animal Rule. 3 This requires the demonstration of efficacy in<br />

animal models followed by the demonstration of safety in human trials, <strong>and</strong><br />

the development of a pharmokinetic bridging study – which establishes the<br />

safe <strong>and</strong> appropriate human dose – between the two. All of this is based<br />

on the statistical analysis of <strong>Big</strong> <strong>Data</strong> from non-clinical <strong>and</strong> clinical studies.<br />

The successful application of these principles has been demonstrated in the<br />

development of the anthrax antitoxin raxibacumab.<br />

Anthrax Antitoxin Requirement<br />

In 2004, anthrax was determined by the US secretary of Homel<strong>and</strong> <strong>Security</strong><br />

to present ‘a material threat against the US population sufficient to affect<br />

national security’. 4 Thus, the US government has established an integrated<br />

anthrax response strategy that includes antitoxins, antibacterials <strong>and</strong><br />

1. National Strategy <strong>for</strong> Countering Biological Threats, , accessed 19<br />

August 2014.<br />

2. Medical Countermeasures Initiative Strategic Plan 2012–2016, ,<br />

accessed 19 August 2014.<br />

3. Food <strong>and</strong> Drug Administration, Animal Rule Summary, ,<br />

accessed 15 July 2014.<br />

4. Taking Measure of Countermeasures (Part 1), ,<br />

accessed 19 August 2014.


48<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

vaccines. 5 In 2008, the Enterprise Executive Committee 6 of the Public Health<br />

Emergency Medical Countermeasures Enterprise approved a scenario-based<br />

requirement <strong>for</strong> anthrax antitoxins. 7 That requirement was established<br />

from an assessment of high-consequence scenarios involving the exposure<br />

of a single major metropolitan area to a defined amount of anthrax spores<br />

through computer modelling <strong>and</strong> simulation. Exposure modelling involved<br />

very large data sets related to spore dispersion <strong>and</strong> fate, transport modelling,<br />

infection modelling involving analysis of human outbreak data from the<br />

Sverdlovsk incident that occurred in Russia in April 1979 (in which spores<br />

of anthrax were accidentally released from a military facility, resulting in an<br />

estimated one hundred deaths), <strong>and</strong> from non-human primate experimental<br />

exposures.<br />

In order to better determine how much antitoxin should be procured,<br />

the Analytical Decision Support Division of the US Biomedical Advanced<br />

Research <strong>and</strong> Development Authority (BARDA), conducted a preparedness<br />

analysis that included two different approaches to the analysis of very large<br />

data sets, which included a wide variety of parameters. In the first analysis,<br />

a fixed percentage approach was taken; in the second analysis, a populationdensity<br />

approach was taken. These two approaches both concluded that a<br />

similar level of coverage of all metropolitan statistical areas (cities) in the US<br />

was achievable, allowing a procurement goal to be established. This level of<br />

preparedness represents approximately the maximum amount of product<br />

that can be manufactured with existing capabilities, <strong>and</strong> a reasonable cost–<br />

benefit ratio based on existing funding <strong>and</strong> drug costs.<br />

With in<strong>for</strong>mation from this analysis, a meeting was held in Seattle to discuss<br />

antitoxin use <strong>and</strong> logistical issues with state <strong>and</strong> local end-users. The group<br />

included policy-makers, planners, physicians, nurses, emergency responders<br />

<strong>and</strong> first responders. The objective was to identify the parameters this group<br />

felt were important to the analysis of response capabilities. This qualitative<br />

input allowed weight factors to be established <strong>for</strong> parameters identified<br />

as critical in the subsequent quantitative analysis. The participants were<br />

in<strong>for</strong>med about the antitoxins that are currently available in the Strategic<br />

National Stockpile <strong>and</strong> given an opportunity to discuss their use in masscasualty<br />

scenarios. Several important issues were raised, but the participants<br />

all agreed that antitoxins would be an important component of the response<br />

to anthrax events. The results of this <strong>for</strong>um are currently being used to build<br />

medical countermeasures distribution <strong>and</strong> dispensing models that can be<br />

5. HHS PHEMCE Strategy <strong>and</strong> Implementation Plan, 2012, , accessed 19 August 2014.<br />

6. PHEMCE Governance, , accessed 19 August 2014.<br />

7. PHEMCE Mission Components, , accessed 19 August 2014.


49<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

used to predict the medical outcomes in mass-casualty events. These models<br />

consider two approaches to surveillance <strong>and</strong> the initiation of an emergency<br />

response: detection through the BioWatch system <strong>and</strong> index clinical cases. 8<br />

At the meeting, officials of the Office of the Assistant Secretary <strong>for</strong><br />

Preparedness <strong>and</strong> Response (ASPR) described the Department of Health <strong>and</strong><br />

Human Services (HHS) Medical Countermeasure Strategy, 9 the acquisition<br />

process <strong>and</strong> BARDA’s role. An outside consultant described the lessons<br />

learned in the treatment of the victims of the 2001 anthrax attacks. ASPR<br />

provided product background on the two antitoxins available <strong>for</strong> use in<br />

treating anthrax exposure, raxibacumab <strong>and</strong> anthrax immune globulin<br />

intravenous. The role of government organisations <strong>and</strong> the private sector in<br />

the distribution <strong>and</strong> dispensing of medical countermeasures was discussed,<br />

including the need <strong>for</strong> large amounts of ancillary supplies to administer these<br />

medical countermeasures.<br />

The results of the preparedness analysis <strong>and</strong> the meeting with state <strong>and</strong><br />

local end-users were used to in<strong>for</strong>m decisions regarding the prioritisation of<br />

treatment geographically <strong>and</strong> at the patient level. The Centers <strong>for</strong> Disease<br />

Control <strong>and</strong> Prevention organised the first meeting of the Clinical Utilization<br />

Plan <strong>for</strong> Anthrax Countermeasures in a Mass Event Setting (CUPAC). 10 Using<br />

a ‘best practices’ approach, the CUPAC focused on patients with clinical<br />

signs <strong>and</strong> symptoms of anthrax presenting at health-care centres following a<br />

large-scale bioterrorism event associated with wild-type Bacillus anthracis.<br />

The goal of the CUPAC is to create strategies to triage <strong>and</strong> care <strong>for</strong> large<br />

numbers of patients effectively <strong>and</strong> to create a scalable prioritisation scheme<br />

<strong>for</strong> the use of medical countermeasures. Working with a Federal Steering<br />

Committee <strong>and</strong> the National Association of County <strong>and</strong> City Health Officials,<br />

a systematic review was conducted by the Triage <strong>and</strong> Critical Care Working<br />

Group <strong>and</strong> the Medical Countermeasure Working Group. In gathering <strong>and</strong><br />

analysing data <strong>and</strong> drafting preliminary recommendations, the CUPAC<br />

working groups are considering questions about the prioritisation of<br />

antitoxins, <strong>and</strong> the prioritisation <strong>and</strong> duration of antibacterials, triage <strong>and</strong><br />

critical care. The analysis includes large data sets from non-clinical studies<br />

as well as clinical data from the use of antitoxins to treat anthrax cases that<br />

occurred in 2009 <strong>and</strong> 2010 in Scotl<strong>and</strong>, UK. Again, the analysis of <strong>Big</strong> <strong>Data</strong><br />

with diverse parameters is playing a central role in the development of these<br />

guidelines.<br />

8. Department of Homel<strong>and</strong> <strong>Security</strong>, Homel<strong>and</strong> <strong>Security</strong> BioWatch programme,<br />

, accessed 14 July 2014.<br />

9. Public Health Emergency Medical Countermeasures (PHEMCE) Strategy, 2012,<br />

,<br />

accessed 19 August 2014.<br />

10. Conference Report on Public Health <strong>and</strong> Clinical Guidelines <strong>for</strong> Anthrax, , accessed 19 August 2014.


50<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Development of Raxibacumab <strong>for</strong> Anthrax Treatment<br />

Since 2007, an inventory of treatment courses of antitoxins, including<br />

raxibacumab, a monoclonal antibody targeting the protective antigen of<br />

Bacillus anthracis, has been available 11 in the SNS through the Project<br />

BioShield contracts awarded in 2005. 12 Raxibacumab is a monoclonal<br />

antibody antitoxin against the protective antigen of Bacillus anthracis <strong>for</strong><br />

the treatment of inhalational anthrax. Its efficacy has been demonstrated in<br />

multiple animal trials as a monotherapy <strong>and</strong> in combination with antibiotics.<br />

Its safety has been demonstrated in healthy adults through large clinical<br />

trials <strong>and</strong> statistical analysis of results from those trials.<br />

The development of raxibacumab is the result of a co-ordinated response to<br />

a recognised public bioterrorism threat <strong>and</strong> the US Government’s request <strong>for</strong><br />

medical countermeasures to treat inhalational anthrax. Following the anthrax<br />

attacks in 2001, over 30,000 people with suspected exposures initiated<br />

antimicrobial prophylaxis. Eleven people developed inhalational anthrax, <strong>and</strong><br />

despite the best available treatment, five of them died. All subjects received<br />

at least two antibiotics <strong>and</strong> some received as many as seven. Antibiotics<br />

alone were insufficient to treat subjects who had developed anthrax. While<br />

antibiotics can overcome blood infections caused by anthrax, they do not<br />

directly address the presence of toxins that drives the development of the<br />

disease. Anthrax toxin is responsible <strong>for</strong> most morbidity (illnesses) <strong>and</strong><br />

mortality (deaths) associated with anthrax.<br />

In humans <strong>and</strong> animals inhalational anthrax occurs following inhalation<br />

of Bacillus anthracis spores, which germinate within macrophages (a type<br />

of white blood cell that ingests <strong>for</strong>eign particles) as they travel to the<br />

lymph nodes of the lung, 13 from where they are drained out of the body.<br />

Multiplication of the bacteria results in a high organism count in the blood, the<br />

production of bacterial toxins, <strong>and</strong> the rapid onset of septicemia. Although<br />

bacterial replication (bacteremia) can be controlled by the administration of<br />

appropriate antibiotics, it is the bacterial toxin that exerts deleterious effects<br />

on the cells within the body, resulting in substantial pathology <strong>and</strong> high<br />

mortality in infected individuals. Because antibiotics have no direct effect on<br />

the toxin, they do not treat the toxemia. After the toxin has reached sufficient<br />

levels in an individual, controlling bacterial replication with an antibiotic may<br />

not alter the clinical course of the patient.<br />

11. US Department of Health <strong>and</strong> Human Services, Project BioShield Annual Report<br />

to Congress, , accessed 30 September 2014.<br />

12. BARDA Strategic Plan 2011–2016, , accessed 19 August 2014.<br />

13. Anthrax, , accessed 19 August 2014.


51<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

There is an effective anthrax vaccine that works by inducing the body’s<br />

immune response primarily to the protective agent component of anthrax<br />

toxin, however. Once subjects have this antibody, they are protected against<br />

the effects of anthrax. Of those who became ill in the 2001 attacks, all of<br />

the survivors developed an immune response to the anthrax toxin by day<br />

twenty-eight after exposure.<br />

The challenge with this rapidly progressing <strong>and</strong> often fatal disease is the time<br />

required <strong>for</strong> the infected person’s immune system to generate a response to<br />

the toxins. The anthrax vaccines that have traditionally been available require<br />

more than two months to achieve protective levels of antitoxin antibodies in<br />

the blood. Raxibacumab works by delivering human recombinant antitoxin<br />

antibody to the subject immediately. At the proposed dosage, raxibacumab<br />

persists long enough <strong>for</strong> the development of immunity, helping subjects<br />

survive to develop long-lasting toxin-neutralising antibodies. This immediate<br />

onset of action fills the need <strong>for</strong> subjects who have not received the anthrax<br />

vaccine. This approach also addresses the need arising from the inability of<br />

antibiotics to address anthrax toxemia directly. As demonstrated in studies<br />

in rabbits <strong>and</strong> non-human primates, raxibacumab improves survival when<br />

administered early, be<strong>for</strong>e symptoms develop, as well as later, when the<br />

disease has progressed to systemic infection. The results of the animal<br />

studies are subjected to statistical analysis <strong>and</strong> computer modelling in order<br />

to estimate how efficacious the antitoxin might be in humans. Moreover,<br />

raxibacumab is effective both as monotherapy <strong>and</strong> in combination with<br />

antibiotics.<br />

While it is possible to achieve 100 per cent cure rates using antibiotics<br />

alone under experimental conditions, the 2001 attacks <strong>and</strong> other real-world<br />

experiences have demonstrated that antibiotics alone are not 100 per cent<br />

effective. In addition, antibiotics would not be effective against antibioticresistant<br />

strains of anthrax, which have already been identified. The US<br />

government has recognised the need <strong>for</strong> additional anthrax bioterrorism<br />

countermeasures. Immediately after the anthrax attacks in September<br />

<strong>and</strong> October of 2001, Human Genome Sciences, Inc. (HGS) embarked on<br />

a development programme to produce a monoclonal antibody to treat<br />

inhalational anthrax. HGS was acquired by GlaxoSmithKline (GSK) in 2012,<br />

which continues the development <strong>and</strong> production of raxibacumab. The<br />

goal of the programme was to address the unmet bioterrorism <strong>and</strong> medical<br />

needs posed by inhalational anthrax <strong>and</strong> the limitations of current therapies.<br />

In less than a year, using recombinant DNA technology, a potent <strong>and</strong> specific<br />

antibody had been developed that binds the protective antigen of Bacillus<br />

anthracis with high affinity <strong>and</strong> inhibits protective antigen binding to anthrax<br />

toxin receptors, thus protecting animal <strong>and</strong> human macrophages from<br />

anthrax toxin-mediated cell death. HGS then began non-clinical work to<br />

establish proof of concept of the antibody as a therapeutic in the laboratory


52<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

<strong>and</strong> in animals, <strong>and</strong> initiated the process development work to manufacture<br />

<strong>and</strong> characterise the product.<br />

Bacillus anthracis produces three toxins. While antimicrobials cut off the source<br />

of anthrax toxin production, they do nothing to inhibit the adverse effects of<br />

toxins that have already been released. The pathogenic effects of toxemia<br />

can persist after bacteremia has been resolved. However, antitoxin antibodies<br />

directly neutralise the toxin <strong>and</strong> prevent its pathogenic effects. Recombinant<br />

human antitoxin monoclonal antibodies immediately provide the protection that<br />

develops from the immune response in anthrax survivors or that is stimulated by<br />

vaccines over the course of weeks with multiple injections. Antitoxin antibodies<br />

can be used in combination with antibiotics to protect subjects from the toxemia<br />

that antibiotics do not treat <strong>and</strong> would also be an important therapeutic option<br />

when antimicrobials are unavailable or contraindicated, or in the event of<br />

exposure to an antibiotic-resistant anthrax strain.<br />

The Regulatory Path under the FDA’s Animal Rule<br />

Raxibacumab is the first new drug developed since the bioterrorism attacks of<br />

2001 to seek licensure under the US FDA regulation that describes ‘Evidence<br />

Needed to Demonstrate Effectiveness of New Drugs When Human Efficacy<br />

Studies Are Not Ethical or Feasible’, or the Animal Rule (21 CFR 601, Subpart<br />

H, 2002). 14 The animal studies with raxibacumab were designed to meet the<br />

criteria <strong>for</strong> demonstration of efficacy under the Animal Rule <strong>and</strong> the animal<br />

models used <strong>for</strong> evaluation contained the essential elements provided in<br />

FDA guidance, which are recommended to generate data likely to predict<br />

the effectiveness of the product in humans.<br />

A treatment model must be based on the administration of the therapeutic,<br />

based on a sign or observation, not just the parameter of time that has<br />

passed after exposure; because large non-clinical studies to establish these<br />

treatment triggers are not ethically acceptable, a meta-analysis of data from<br />

studies in the US <strong>and</strong> UK spanning over ten years was conducted. Analysis<br />

of large data sets from rabbit <strong>and</strong> macaque studies, which included many<br />

diverse parameters such as body temperature <strong>and</strong> biochemical assay results,<br />

allowed reproducible triggers to be identified in rabbits (body temperature<br />

increase) <strong>and</strong> macaques (the quantitative measurement of protective agent<br />

in the blood) to be established.<br />

While the efficacy of raxibacumab was demonstrated in two animal models<br />

of inhalational anthrax, safety was evaluated in human clinical studies with<br />

single <strong>and</strong> repeat dosing, alone <strong>and</strong> in combination with antibiotics, in healthy<br />

14. Product Development under the Animal Rule, , accessed<br />

19 August 2014.


53<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

adult volunteers. 15 The animal efficacy studies demonstrated that a single<br />

dose of raxibacumab administered intravenously effectively neutralises the<br />

protective agent <strong>and</strong> significantly improves survival. Its effect is immediate,<br />

<strong>and</strong> maximum raxibacumab serum concentrations are critical <strong>for</strong> survival, as<br />

the goal is to neutralise protective agent as rapidly as possible. Moreover,<br />

because of its relatively long half-life, raxibacumab is durable, maintaining<br />

antitoxin protection until natural immunity can develop in twenty-eight<br />

days. Importantly, raxibacumab does not prevent the development of<br />

antitoxin immunity in anthrax-infected animals, nor does it interfere with the<br />

pharmacokinetics or safety of concomitantly administered antimicrobials.<br />

Animal studies have demonstrated that raxibacumab does not interfere<br />

with the activity of antibiotics <strong>and</strong> that the combination of raxibacumab <strong>and</strong><br />

antibiotic provides a higher survival outcome than antibiotics alone.<br />

Because raxibacumab would likely be used with antimicrobials, the activity<br />

of antimicrobials was evaluated in combination with raxibacumab using<br />

the same study design as the pivotal efficacy studies. Per the suggestion<br />

of FDA, animal studies were per<strong>for</strong>med in which a full human-equivalent<br />

dose of levofloxacin or ciprofloxacin was administered at the same time as<br />

raxibacumab to animals with symptomatic disease. Because antimicrobials<br />

are most effective when all spores have germinated, administering the<br />

antimicrobials after the animals had become septic maximised the efficacy<br />

of the antibiotics. This is reflected in the high survival rates in the antibiotic<br />

alone <strong>and</strong> raxibacumab-antibiotic combination treatment groups (85–100<br />

per cent). In this study, levofloxacin alone or in combination with raxibacumab<br />

was administered to the 42 per cent of anthrax-infected animals surviving<br />

to 84 hours after spore exposure. The combination of raxibacumab <strong>and</strong><br />

levofloxacin resulted in a higher survival outcome than <strong>for</strong> levofloxacin<br />

treatment alone.<br />

The results of the added benefit study serve to supplement the results of the<br />

original efficacy studies, which demonstrated the efficacy of raxibacumab<br />

administered early in the course of the disease. In contrast to the survival<br />

rates observed late in the course of disease, survival rates are highest with<br />

raxibacumab when it is given as the protective agent is first being produced,<br />

with 90‐100 per cent survival rates in rabbits <strong>and</strong> monkeys when raxibacumab<br />

is administered as monotherapy at the time of spore challenge or at twelve<br />

hours after spore challenge. In the clinical setting, when neither the time<br />

of spore exposure, onset of symptoms, nor individual time course of the<br />

disease is easily identified, administering both antimicrobials to kill bacteria<br />

<strong>and</strong> anti-protective agent antibody to neutralise toxin is an effective strategy<br />

<strong>for</strong> combating both the source <strong>and</strong> effects of the disease.<br />

15. Clinical Pharmacology <strong>and</strong> Biopharmaceutics Review of Raxibacumab, ,<br />

accessed 19 August 2014.


54<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Per agreement with FDA <strong>for</strong> an indication as a therapeutic treatment <strong>and</strong><br />

consistent with the Animal Rule, the safety of raxibacumab has been evaluated<br />

in over 400 healthy human volunteers. Adverse events were generally mild to<br />

moderate <strong>and</strong> did not occur at a rate that was different from that observed<br />

among placebo-treated subjects. A low incidence of mild to moderate rash<br />

was observed in some subjects. These rashes were transient <strong>and</strong> resolved<br />

without medication or with oral diphenhydramine (a readily available drug<br />

used to reduce irritation <strong>and</strong> runny noses caused by hayfever or allergies).<br />

Concomitant administration of raxibacumab with ciprofloxacin (a common<br />

antibiotic), did not alter the safety or pharmacokinetics of either antibiotic<br />

or raxibacumab.<br />

Raxibacumab treatment should be initiated when a diagnosis of inhalational<br />

anthrax is suspected or confirmed. Raxibacumab provides a significant<br />

survival benefit in animals symptomatic <strong>for</strong> systemic anthrax disease.<br />

Raxibacumab treatment is also associated with significant <strong>and</strong> greater<br />

improvement in survival when given as pre- or post-exposure prophylaxis<br />

(preventative medicine). Raxibacumab is an important treatment option<br />

<strong>for</strong> inhalational anthrax: an effective antitoxin with a mechanism of action<br />

distinct from that of antimicrobials. Raxibacumab neutralizes protective<br />

agent, improves survival <strong>and</strong> reduces signs of the disease. When used in<br />

combination with antibiotics, raxibacumab does not interfere with antibiotic<br />

efficacy <strong>and</strong> results in a higher survival outcome than antimicrobial therapy<br />

alone. Raxibacumab used alone is also expected to provide clinical benefit<br />

<strong>for</strong> individuals in whom antibiotics are contraindicated or in whom anthrax<br />

disease is due to antibiotic-resistant strains of Bacillus anthracis.<br />

Post-Marketing Requirement after the Licensure<br />

Raxibacumab was approved by the FDA <strong>for</strong> the treatment of inhalational<br />

anthrax due to Bacillus anthracis in December 2012. Its approval was based<br />

on the analysis of data from non-clinical studies, <strong>and</strong> the development<br />

of a mathematical pharmacokinetic model bridging efficacious animal<br />

exposures to safe human exposures. Based on these data, the FDAapproved<br />

raxibacumab <strong>for</strong> the treatment of adult <strong>and</strong> paediatric patients<br />

with inhalational anthrax due to Bacillus anthracis, in combination with<br />

appropriate antibacterial drugs, <strong>and</strong> <strong>for</strong> prophylaxis of inhalational anthrax<br />

when alternative therapies are not available or appropriate. However, this<br />

approval requires GSK to conduct post-marketing studies, such as field studies,<br />

to verify <strong>and</strong> describe raxibacumab’s clinical benefit <strong>and</strong> to assess its safety<br />

when used as indicated, <strong>and</strong> the role of <strong>Big</strong> <strong>Data</strong> analysis is far from over. GSK<br />

has submitted a field study protocol to evaluate the effectiveness, suitable<br />

human dosage <strong>and</strong> safety of raxibacumab use <strong>for</strong> Bacillus anthracis infection<br />

in the US. This phase four, open-label study will be the first human study to<br />

collect data on Bacillus anthracis-infected or exposed patients treated with<br />

raxibacumab. It will also be the first study to gain a better underst<strong>and</strong>ing


55<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

of the clinical benefit <strong>and</strong> safety of raxibacumab in human subjects. <strong>Data</strong><br />

collected from this study (observation of adverse responses, measurement<br />

of antibody concentrations, white blood-cell counts, <strong>and</strong> so on) will further<br />

in<strong>for</strong>m patient care <strong>and</strong> treatment choices <strong>for</strong> the management of anthrax.<br />

Conclusions<br />

The analysis of <strong>Big</strong> <strong>Data</strong> played a central role throughout the experience<br />

with raxibacumab. From the determination of a requirement based on<br />

scenario-based analysis, to the establishment of a procurement objective, to<br />

the development of an animal model based on a treatment trigger <strong>and</strong> the<br />

eventual approval of raxibacumab, effective data collection, management<br />

<strong>and</strong> analysis has been essential. This vital role will continue throughout<br />

the lifespan of raxibacumab. Anthrax cases arising from natural exposure<br />

or criminal or terrorist activity may be treated with raxibacumab <strong>and</strong> new<br />

data will be collected to exp<strong>and</strong> our underst<strong>and</strong>ing of safety <strong>and</strong> efficacy in<br />

humans. If used in response to mass-casualty events, additional data will be<br />

collected on distribution <strong>and</strong> dispensing to further refine our underst<strong>and</strong>ing<br />

of logistics. Better underst<strong>and</strong>ing of the potential use of <strong>Big</strong> <strong>Data</strong> across many<br />

research areas <strong>and</strong> academic disciplines will help resolve issues surrounding<br />

the collection <strong>and</strong> use of these data in an environment in which privacy<br />

protection <strong>and</strong> public health needs are at times on opposite sides of the<br />

balance.<br />

Chia-Wei Tsai is a Project Officer in the Division of Chemical, Biological,<br />

Radiological <strong>and</strong> Nuclear (CBRN) Countermeasures. She is the project lead <strong>for</strong><br />

advanced development <strong>and</strong> acquisition of medical countermeasures in the<br />

Antitoxins <strong>and</strong> Therapeutic Biologics Branch of the CBRN Program. She is also<br />

the contracting officer representative overseeing two advance development<br />

contacts <strong>and</strong> four procurement contracts. She also serves as the Chair <strong>for</strong> the<br />

technical evaluation panel <strong>for</strong> the CBRN antitoxin rolling Business Associate<br />

Arrangement (BAA). Dr Tsai recently received the Secretary’s Award <strong>for</strong><br />

Distinguished Service 2012 <strong>for</strong> her contribution in leading CBRN medical<br />

countermeasures through FDA approval. Prior to joining HHS, Dr Tsai served<br />

as a scientist at DynPort Vaccine Company, which supports a Department of<br />

Defense plague vaccine development programme. She also served as project<br />

lead in the Malaria Vaccine Development Branch in the National Institute of<br />

Allergy <strong>and</strong> Infectious Disease. Dr Tsai received her PhD from the University<br />

of Maryl<strong>and</strong>, College Park in Cell Biology <strong>and</strong> Molecular Genetics <strong>and</strong><br />

completed her post-doctoral training at Johns Hopkins School of Medicine in<br />

Pharmacology.


Discussion Groups<br />

During the afternoon, the conference broke up into focused discussion<br />

groups, each comprising between ten <strong>and</strong> twenty delegates. The outcomes<br />

of these discussion <strong>for</strong>a are presented over the following pages.<br />

Discussions were without attribution. The in<strong>for</strong>mation presented here<br />

seeks to represent the discussions that took place; there is not always<br />

robust academic referencing to support the views offered, but it has been<br />

assumed that if comments made by individual delegates were not credible,<br />

they would have been rejected by the other members of that group during<br />

the discussions. Views presented are there<strong>for</strong>e assumed to be broadly<br />

supported by the majority of those present. Where possible, transcripts of<br />

the discussion <strong>for</strong>a were distributed to the participants during the editing<br />

process <strong>for</strong> further comment <strong>and</strong> clarification.<br />

There was, inevitably, some crossover of subject matter <strong>and</strong> topic discussion<br />

between one group <strong>and</strong> the next, <strong>and</strong> where this occurred, comments have<br />

been amalgamated under one heading to avoid repetition.


Discussion Group 1: The Ethics <strong>and</strong> Legality of <strong>Big</strong><br />

<strong>Data</strong> Sharing<br />

Chair <strong>and</strong> Rapporteur: Edward Hawker<br />

Key Issues <strong>and</strong> Challenges<br />

• The nature of what is <strong>and</strong> is not socially acceptable, regardless of<br />

what is legal, can change over time. This can be situation-dependent<br />

<strong>and</strong> is not absolute<br />

• Individuals do not always read <strong>and</strong> consider terms <strong>and</strong> conditions that<br />

set out privacy <strong>and</strong> data sharing obligations be<strong>for</strong>e accepting them.<br />

• There are fears that data may be misused to enable discrimination<br />

against certain individuals <strong>and</strong> groups<br />

• Who should be able to look at or have access to the data? How is this<br />

determined <strong>and</strong> how can it be en<strong>for</strong>ced?<br />

This discussion group was asked to consider the ethics <strong>and</strong> legality of data<br />

sharing, <strong>and</strong> how <strong>Big</strong> <strong>Data</strong> <strong>and</strong> <strong>Big</strong> <strong>Data</strong> projects affect public perceptions.<br />

As government moves <strong>for</strong>ward on a digital agenda that is increasingly<br />

dependent on public participation <strong>and</strong> acceptance, these issues will become<br />

ever-more important.<br />

Ethical Use: A Shifting Concept<br />

The group felt that a major consideration is how ethics <strong>and</strong> ethical use is<br />

defined. In the context of <strong>Big</strong> <strong>Data</strong>, ethics can include notions of privacy,<br />

anonymity, fair use <strong>and</strong> in<strong>for</strong>med consent but perceptions of these terms can<br />

change over time. The 11 September 2001 attacks on the US fundamentally<br />

changed the paradigm <strong>and</strong> ushered in a new age of security-dominated<br />

policy <strong>and</strong> thinking, <strong>for</strong> example, but many people now see that reactive<br />

policies such as The US PATRIOT Act 1 went too far – this is a view that was<br />

independently raised in Discussion Group 4: ‘Individual Privacy versus<br />

Community Safety’, <strong>and</strong> will be discussed in further detail there. Future<br />

events may again change public perceptions <strong>and</strong> attitudes.<br />

The group agreed that a strong component of ethics is proportionality but<br />

determining what is proportionate to any given situation is also difficult.<br />

Collecting all of the available data may ensure that nothing of key importance<br />

is missed, but may be difficult to justify as proportionate <strong>and</strong> ethical. Targeted<br />

data collection, factoring in proportionality, may be seen as a more ethical<br />

approach but risks missing in<strong>for</strong>mation that might later turn out to be of<br />

value.<br />

1. The US PATRIOT Act: Preserving Life <strong>and</strong> Liberty, , accessed 16 June 2014.


58<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

<strong>Data</strong> Collection<br />

The participants then moved on to discussing the ethics of collecting <strong>and</strong><br />

sharing data, <strong>and</strong> how consent is requested <strong>and</strong> obtained from the public<br />

to enable their data to be shared between organisations. Members of the<br />

public generate large volumes of data every day, via social-media plat<strong>for</strong>ms,<br />

online purchases, electronic tickets such as the Oyster cards used on<br />

London’s public transport, <strong>and</strong> geo-location data on mobile phones to name<br />

just a few examples. These data may be gathered <strong>and</strong> stored by the user’s<br />

mobile-phone operator, Internet service provider, the retail sites visited or<br />

bank used, under terms <strong>and</strong> conditions to which they have ostensibly agreed<br />

<strong>and</strong> act as a legal agreement between the user <strong>and</strong> company. Most people<br />

do not read these terms <strong>and</strong> conditions when they accept them, however,<br />

<strong>and</strong> so it can be argued that they do not truly underst<strong>and</strong> what they are<br />

signing up <strong>for</strong> <strong>and</strong> would not sign away so many rights if they did underst<strong>and</strong>.<br />

Companies then (rightly) claim that they are acting within the law when<br />

they share data with other organisations or even sell customer in<strong>for</strong>mation<br />

to third parties, but there are questions over how ethical such behaviour<br />

can really be considered to be. The challenge identified by many of the<br />

participants was that there is no negotiation involved – terms <strong>and</strong> conditions<br />

must either be accepted, or the user will not be able to use the service.<br />

There is not generally an option to accept some of the terms while rejecting<br />

others, or to opt into some aspects of the service without accepting all of<br />

the terms <strong>and</strong> conditions. Could academia suggest ways in which different<br />

levels of privacy settings <strong>and</strong> data-sharing agreements might be built into<br />

online systems, so that customers genuinely have a choice in whether or not<br />

to accept the terms they are offered?<br />

<strong>Data</strong> Protection<br />

Next, the group discussed who should be allowed to look at data. There was<br />

general agreement that only authorised personnel should have access, but<br />

this raised the questions of who can be considered authorised <strong>and</strong> what<br />

protection this really gives. Insiders may be authorised but still may not act<br />

ethically: the actions of Edward Snowden, who was an authorised US National<br />

<strong>Security</strong> Agency (NSA) contractor when he passed classified in<strong>for</strong>mation<br />

to the Guardian <strong>and</strong> Washington Post, were raised here. Snowden stole<br />

in<strong>for</strong>mation <strong>and</strong> then released it on the web <strong>and</strong> to journalists – <strong>and</strong> yet may<br />

commentators would consider his actions, which were illegal but highlighted<br />

widespread US government surveillance of citizens’ private communications,<br />

to be more ethical than the actions of the NSA <strong>and</strong> other government<br />

agencies. The group agreed that there is a need to monitor those who h<strong>and</strong>le<br />

the data <strong>and</strong> to make sure that they do it responsibly, as well as monitoring<br />

the data themselves; this also highlights a need to adopt corporate social<br />

responsibility procedures in relation to data h<strong>and</strong>ling in private companies.


59<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Under the UK’s <strong>Data</strong> Protection Act, 2 members of the public have the<br />

legal right to ask data controllers what in<strong>for</strong>mation is being held on them.<br />

However, there are many situations in which the in<strong>for</strong>mation does not have<br />

to be disclosed, particularly if it compromises someone else’s privacy. The<br />

group felt that there is a lack of public knowledge about these rights <strong>and</strong><br />

the legal framework to protect user in<strong>for</strong>mation, particularly where data are<br />

collected anonymously <strong>and</strong> used to draw conclusions about individuals or<br />

groups more generally.<br />

The Changing Nature of Surveillance<br />

Participants saw the way in which UK society is subjected to surveillance,<br />

<strong>and</strong> how this has changed over the last decade, as an important ethical<br />

issue. New technologies such as smartphones are able to generate more<br />

data <strong>and</strong> more accurate in<strong>for</strong>mation than any time previously, <strong>and</strong> generate<br />

some in<strong>for</strong>mation – such as the user’s current location – automatically. This<br />

suggests there is an ‘almost unconscious’ acceptance of surveillance by those<br />

who buy a smartphone, <strong>and</strong> also implies that people do not consider this to<br />

be surveillance in the same way they would if the government was tracking<br />

them. There is a role <strong>for</strong> academia in explaining why people’s perceptions of<br />

what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically depending<br />

on who is collecting the data.<br />

The group felt that the willingness with which the public sign away their<br />

privacy rights online suggests that there is a balance between convenience<br />

<strong>and</strong> security which may warrant further research. Most people willingly give<br />

up some (if not all) of their security because it is convenient <strong>for</strong> them to<br />

use the service being offered without questioning what this might enable<br />

others to do with their data. Such in<strong>for</strong>mation can easily be extracted by<br />

cyber-criminals <strong>and</strong> used <strong>for</strong> illegitimate purposes, as well as by legitimate<br />

agencies, but many users do not underst<strong>and</strong> the potential dangers or the<br />

security vulnerabilities. Better education would help to ease some of the<br />

challenges, <strong>and</strong> research into how this might be delivered <strong>and</strong> accepted<br />

by the public would be of benefit: the group felt that most people do not<br />

underst<strong>and</strong> where the in<strong>for</strong>mation they share over social-media plat<strong>for</strong>ms<br />

actually goes. A compromised social-media account can give large amounts<br />

of personal in<strong>for</strong>mation to fraudsters, which can enable them to then target<br />

scams very precisely.<br />

There is a strong perceived correlation between conventional media <strong>and</strong><br />

social-media privacy settings: the group thought that people do change<br />

their security settings when privacy breaches are reported in the media. This<br />

highlights the power that the media has to shape perceptions of privacy <strong>and</strong><br />

security, but none of the group – at the conference or subsequently – were<br />

2. <strong>Data</strong> Protection Act 1998, ,<br />

accessed 16 June 2014.


60<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

able to provide any proof or reference to studies that show that people do<br />

actually change their behaviour in these circumstances. Further research on<br />

both the perception of how behaviour is affected, <strong>and</strong> how in fact it actually<br />

is affected, is needed.<br />

The group also felt that it was important to consider the fact that much of<br />

the in<strong>for</strong>mation that can be derived from social-media communication is<br />

in the <strong>for</strong>m of metadata (the context rather than the content of the data,<br />

such as when a message was sent, <strong>and</strong> where it was sent from). This raises<br />

additional challenges <strong>for</strong> privacy <strong>and</strong> ethics, as if it is possible to see who<br />

has spoken to whom <strong>and</strong> when, without looking at precisely what they<br />

said, there are different levels of privacy that may need to be considered<br />

separately. This raised a number of questions, including: where are the<br />

boundaries of consent? Does the ethics of a situation change depending on<br />

whether all in<strong>for</strong>mation is freely available, or only the metadata? How would<br />

this affect a message posted in the private areas of a message <strong>for</strong>um, or sent<br />

as a private e-mail which is then <strong>for</strong>warded to people beyond the original<br />

intended participant?<br />

In<strong>for</strong>mation Requests<br />

A further ethical issue surrounded the extent to which it is acceptable <strong>for</strong> the<br />

private sector to pass in<strong>for</strong>mation to the government when the latter requests<br />

it. Companies such as Facebook <strong>and</strong> Google have the choice of whether or<br />

not to comply with such requests, <strong>and</strong> the ethics of this can be complicated<br />

depending on which government is asking <strong>for</strong> the data. Academics such as<br />

Baker <strong>and</strong> Tang have explored these issues in more depth. 3 If a company fails<br />

to comply with a government request they may receive a court order <strong>for</strong>cing<br />

them to do so. This may make them more likely to comply with the first<br />

request out of convenience, opening them up to criticism of being too ready<br />

to ‘cosy up’ with government <strong>and</strong> <strong>for</strong> not being protective enough of their<br />

customers’ data. There are a number of ethical dilemmas <strong>for</strong> companies in<br />

this context. Their concern with customer focus <strong>and</strong> public image may make<br />

them less likely to comply with requests they think will lead to negative public<br />

backlash, <strong>for</strong> example. It is not their job (nor necessarily in their interests) to<br />

capture or highlight potential terrorists or criminal activities. Nevertheless,<br />

private companies could <strong>and</strong> should take a more proactive stance against<br />

criminal activity that might be detected by looking <strong>for</strong> it more closely within<br />

the data they hold. The group acknowledged that banks in particular have<br />

become more proactive recently, especially in relation to money laundering.<br />

Negative sanctions such as those h<strong>and</strong>ed out to HSBC in relation to failing<br />

3. Jane Stuart Baker <strong>and</strong> Lu Tang, ‘Google’s Dilemma in China’, in Steve May (ed.), Case<br />

Studies in Organizational Communication: Ethical Perspectives <strong>and</strong> Practices, 2nd ed.<br />

(Chapel Hill: Sage, 2012), , accessed 17 June 2014.


61<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

to maintain effective anti-money-laundering programmes as much as social<br />

corporate responsibility have played a large role in this shift in behaviour. 4<br />

Predictive Analytics<br />

A final area the group discussed was predictive analytics – analysing data to<br />

predict how people may behave in future. This has the potential to prevent<br />

crimes <strong>and</strong> enhance public safety, but is ethically contentious. Concerns were<br />

raised that in<strong>for</strong>mation gained on individuals may be used to stereotype entire<br />

groups. The application of ideas such as the ‘broken windows theory’, 5 which<br />

states that communities where low-level crime is endemic are predisposed<br />

to more serious crime, are not universally accepted. While arguments <strong>for</strong><br />

predictive analytics would claim that identifying <strong>and</strong> tackling low-level crime<br />

will help to prevent more serious misdemeanours (<strong>and</strong> indeed, when Police<br />

Commissioner William Bratton applied the theory to turnstile jumpers on<br />

the New York subway in the early 1990s, crimes of all kinds on the transport<br />

system decreased), critics express concern that such approaches can lead<br />

to negative categorisations of some members of society. Academia could,<br />

however, help to analyse data <strong>and</strong> identify potential trends which could<br />

then be explored in more detail through a multidisciplinary approach<br />

involving behavioural psychologists, sociologists <strong>and</strong> criminologists, as well<br />

as computer scientists.<br />

Suggested Research Topics<br />

• An in-depth examination is needed of public underst<strong>and</strong>ing of the<br />

surveillance <strong>and</strong> privacy debate, to provide recommendations that<br />

will encourage more people to engage in shaping future policy<br />

• Academic research can help to explain why people’s perceptions of<br />

what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically<br />

depending on who is collecting the data<br />

• Research is needed into how to educate people not to willingly give<br />

up data without questioning what this might enable others to do with<br />

those data. Many users do not underst<strong>and</strong> the potential dangers or<br />

the security vulnerabilities<br />

• Academia should suggest ways in which different levels of privacy<br />

settings <strong>and</strong> data-sharing agreements can be built into online systems,<br />

so that customers genuinely have a choice in whether or not to accept<br />

the terms they are offered.<br />

4. ‘HSBC Holdings Plc. <strong>and</strong> HSBC Bank USA N.A. Admit to Anti-Money Laundering <strong>and</strong><br />

Sanctions Violations, Forfeit $1.256 Billion in Deferred Prosecution Agreement,<br />

Department of Justice Office of Public Affairs’, 11 December 2012, , accessed 17 June 2014.<br />

5. James Q Wilson <strong>and</strong> George L Kelling, ‘The Police <strong>and</strong> Community Safety: Broken<br />

Windows’, Manhattan Institute, 1982, , accessed 19 August 2014.


Discussion Group 2: Policing, Terrorism, Crime<br />

<strong>and</strong> Fraud<br />

Chair: David Smart<br />

Rapporteur: Philippa Morrell<br />

Key Issues <strong>and</strong> Challenges<br />

• <strong>Data</strong> analysis needs to begin somewhere <strong>and</strong> follow the best direction.<br />

How can the most appropriate leads to follow be identified?<br />

• Missing data, where data should be expected, might provide as much<br />

in<strong>for</strong>mation as the analysis of data themselves. How can such gaps be<br />

identified <strong>and</strong> analysed?<br />

• Is the apparent lack of skilled security data analysts due to a genuine<br />

skills gap or better career prospects <strong>and</strong> higher pay in other sectors<br />

<strong>for</strong> people with the appropriate skills?<br />

• How can the benefits of <strong>Big</strong> <strong>Data</strong> be measured <strong>and</strong> proven, to avoid<br />

it becoming a sink <strong>for</strong> valuable resources that would be best used<br />

elsewhere?<br />

Since the 11 September 2001 terrorist attacks on the US, there has been greater<br />

collaboration between police <strong>for</strong>ces, security services <strong>and</strong> governments at<br />

the international <strong>and</strong> national levels. Links between terrorism <strong>and</strong> crime –<br />

<strong>and</strong>, in particular, acquisitive crimes such as fraud <strong>and</strong> money laundering<br />

– have been identified, studied <strong>and</strong> researched extensively. This discussion<br />

group considered how new approaches to the data available might help to<br />

improve the quantity <strong>and</strong> quality of the linkages that are being made. The<br />

group also discussed whether there are other sources of data, not currently<br />

analysed <strong>for</strong> this purpose, which might also yield valuable in<strong>for</strong>mation <strong>and</strong><br />

intelligence.<br />

A key point to consider is that effective analysis of data depends on having a<br />

lead (or leads) in the first place, which helps to identify what the analyst – or<br />

the analytical tool – is looking <strong>for</strong> within the data. These starting points can<br />

then be developed to provide further insights or in<strong>for</strong>mation. A lack of data<br />

where they might be expected can, however, be just as important a lead, but<br />

is much more difficult to spot. A particularly useful research focus, there<strong>for</strong>e,<br />

may be to look at developing new methods of data analysis to identify the<br />

‘unknown unknowns’, described by one member of the group as ‘the Holy<br />

Grail of intelligence work’, by spotting anomalies within the more routine data<br />

that could then be subjected to further analysis <strong>and</strong> investigation. The group<br />

thought that, at present, gaps in the data are not sufficiently recognised as<br />

such, nor effectively interpreted.


63<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

The group also felt that while identifying phenomena such as linkages in the<br />

data can be relatively easy to do automatically, it is much harder to attribute<br />

significance (or the quality of the significance) to those links without a human<br />

analyst involved in the process. A good underst<strong>and</strong>ing of what the linkages<br />

mean is needed in order <strong>for</strong> them to be given value <strong>and</strong> <strong>for</strong> any consequent<br />

interventions to be effective. The group felt that more research is needed on<br />

the relationship between qualitative (human) <strong>and</strong> quantitative (automatic)<br />

analysis in making effective interpretations.<br />

Regional Differences<br />

Some of the delegates felt that once technological solutions are available,<br />

there is a tendency to apply the same technology everywhere in order to<br />

promote st<strong>and</strong>ardisation <strong>and</strong> interoperability, but this risks introducing<br />

a ‘one-size-fits-all’ approach that is not equally applicable to all situations<br />

or regions. The group felt that this was important as local initiatives <strong>for</strong><br />

countering extremism <strong>and</strong> radicalisation within at-risk communities often<br />

work because of local conditions at a particular point in time; what works<br />

in one place is not automatically transferable to another location or even<br />

repeatable within the same community. Good policy is often derived from<br />

historic experiences – <strong>for</strong> instance, if a particular approach has worked in<br />

one area of the country, or on one operation, there may be a push <strong>for</strong> it to<br />

become st<strong>and</strong>ard policy to use it in others, but this has not always proven<br />

successful. The subtleties <strong>and</strong> reasons that programmes have been successful<br />

may not be easily extractable from data, but may be more apparent to a<br />

human analyst.<br />

There have been a number of short-term solutions in recent times, but<br />

the group felt that there needs to be a longer-term view, with a greater<br />

emphasis on determining what is well understood <strong>and</strong> what is not. For<br />

example, between two <strong>and</strong> five years after a specific event, retroactive data<br />

analysis could help to assess impacts <strong>and</strong> changes that have occurred in the<br />

intervening period, <strong>and</strong> perhaps identify which have been the most effective<br />

responses, so that they can be focused on more strongly. There is also a need<br />

to ensure that all available data are considered: the most effective responses<br />

may not have been the official ones. For such an approach to work, however,<br />

there needs to be a high level of detail <strong>and</strong> the quality of the data has to be<br />

assured – quantity alone is not a guarantee of the results, nor that the data<br />

gathered will be useful.<br />

Research to evaluate vigorously what has <strong>and</strong> has not worked well may<br />

help best practice to be identified <strong>and</strong> also prevent mistakes from being<br />

repeated. Better ways to identify the ‘starting state’ are also needed, so that<br />

there is a baseline against which success or failure can be measured. This<br />

will help to determine how success or failure will be evaluated (which the<br />

group felt is often lacking at present) <strong>and</strong> may provide better underst<strong>and</strong>ing


64<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

of how multiple concurrent interventions can be evaluated separately <strong>and</strong><br />

individually to test which are most effective.<br />

In order <strong>for</strong> data to be useful, they need to be multidimensional. This requires<br />

robust methodology behind how the dimensions are set, <strong>and</strong> thus what data<br />

should be collected. The most valuable data do not always relate to what is<br />

happening ‘on the ground’; they may be more subtle – such as where stolen<br />

credit cards are being used, rather than where they are being stolen – <strong>and</strong><br />

this in turn may involve comparison at the local, national <strong>and</strong> international<br />

levels. A credit card stolen in one area or country could be used in another to<br />

order goods from a company based in a third, which are delivered to a fourth<br />

location. The data sets analysed need to be relevant to the answers required,<br />

but the ability to integrate many different data sets, <strong>and</strong> to do this much<br />

faster <strong>and</strong> more efficiently, does not by itself provide the cultural context.<br />

Implications of Real-Time Analysis<br />

<strong>Big</strong> data provide new timeframes <strong>for</strong> the collection <strong>and</strong> analysis of<br />

in<strong>for</strong>mation, enabling real-time processing <strong>and</strong> real-time updates, as well as<br />

data collection over long periods of time. Real-time processing, combined<br />

with analysts who have an underst<strong>and</strong>ing of the context in which the data<br />

have been collected, can enable ‘non-normal’ trends to be picked out<br />

more easily. The use of hard <strong>and</strong> soft intelligence combined with <strong>Big</strong> <strong>Data</strong><br />

might enable the detection of those hiding in plain sight, <strong>for</strong> example, using<br />

predictive analytics against the norm that looks at deviant behaviour at an<br />

individual level rather than at the community level.<br />

More research is needed to determine the circumstances in which <strong>Big</strong> <strong>Data</strong><br />

is likely to be most relevant – at the tactical level, or at a more operational<br />

or strategic level to enable decision-making. What analytical methods are<br />

required to underst<strong>and</strong> the data being created <strong>and</strong> do these differ depending<br />

on what outcome the analysis is aiming <strong>for</strong>? The group also questioned<br />

whether an overemphasis on the potential benefits of <strong>Big</strong> <strong>Data</strong> make it a<br />

sink <strong>for</strong> resources – <strong>and</strong> whether there is proof that they add real benefits.<br />

Useful <strong>Data</strong> Extraction<br />

When collected <strong>and</strong> applied appropriately, <strong>Big</strong> <strong>Data</strong> can combine <strong>and</strong> share<br />

many sources of data <strong>and</strong> in<strong>for</strong>mation to provide pertinent intelligence. <strong>Big</strong><br />

<strong>Data</strong> enables the use of multiple sources of data <strong>and</strong> provides an ability to<br />

filter these data in new <strong>and</strong> innovative ways. It is also dependent on the<br />

ability to strip out unnecessary <strong>and</strong> extraneous data. <strong>Data</strong> weeded out by<br />

the filters can still be analysed but their removal from the main data sets<br />

raise questions over data quality. This can be a particular issue where data<br />

are collected centrally <strong>for</strong> use by different organisations, as each needs a<br />

different portion of the complete data set: the point(s) at which data are<br />

removed will have a strong influence on what the remainder can be used <strong>for</strong>,


65<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

as in<strong>for</strong>mation that is extraneous to one organisation may be highly insightful<br />

to another. In addition, if data are going to be shared across a number of<br />

organisations, underlying definitions of how <strong>and</strong> with whom the data are<br />

intended to be shared will need to be agreed <strong>and</strong> understood by all parties<br />

<strong>for</strong> the process to work seamlessly.<br />

In any data analysis, it is important to know who <strong>and</strong> what is being looked<br />

at to ensure that the cultural aspects are considered <strong>and</strong> understood. For<br />

instance, one participant offered an example of data that treat the British<br />

Kashmiri <strong>and</strong> Punjabi populations as the same. Both are defined as either as<br />

Muslims, or at a more granular level by the language they speak (Punjabi),<br />

whereas there are very distinct historical <strong>and</strong> cultural differences between<br />

the two groups. If the data cannot distinguish between them, it will not be<br />

possible to highlight a trend within one group that is not present in the other.<br />

A further consideration is that when analysing any data there is a danger<br />

of pre-supposition of what the data may represent (or what the collector<br />

wants them to represent) <strong>and</strong> a misuse of them because of this. The actual<br />

underlying causal relationship is overlooked or ignored. An example here is<br />

a recorded increase in the number of neonatal deaths in Japan following<br />

the Fukushima Dai-ichi nuclear power plant accident: 1 the increase appears<br />

to point to radiation exposure causing the deaths, but was in fact due to<br />

increased maternal stress following the evacuation of homes damaged in<br />

the earthquake <strong>and</strong> tsumani, leading to an increase in premature births,<br />

combined with damage to hospitals, which meant that neonatal care was<br />

compromised. The neonatal deaths <strong>and</strong> the damaged power station shared<br />

the same root cause – the earthquake <strong>and</strong> tsunami – a different relationship<br />

from what one interpretation of the data might suggest. Comparing<br />

neonatal death rates closer to Fukushima with those further away, in areas<br />

less affected by the nuclear power station, but which had suffered similar<br />

earthquake damage, would help to provide more accurate data analysis.<br />

Human analysis can also dismiss links that are relevant, however – one of<br />

the group gave an example in which one of the men subsequently involved<br />

with plotting the 7 July 2005 bombings on the London Underground had<br />

previously shown up in police analysis of a terrorist network but had been<br />

dismissed <strong>for</strong> further analysis as he was ‘only’ involved in financial crime,<br />

<strong>and</strong> so was assumed to be an ‘ordinary’ criminal who just coincidentally<br />

overlapped with the terrorist network.<br />

It is important to have ‘checks’ that guard against this: techniques are available<br />

that enable an initial filtering of data which can then be revisited, so that<br />

1. Alfred Korblein, ‘Infant Mortality in Japan after Fukushima’, December 2012, ,<br />

accessed 19 July 2014.


66<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

data can be analysed in line with presuppositions <strong>and</strong> then remodelled with<br />

the excluded data reintegrated, in order to see whether <strong>and</strong> how the results<br />

differ, thus testing the validity of initial assumptions. Humans inevitably look<br />

<strong>for</strong> patterns <strong>and</strong> may find them where they do not exist.<br />

Social-Network Analysis<br />

There is much potential <strong>for</strong> <strong>Big</strong> <strong>Data</strong> to help identify linkages when used in<br />

conjunction with social network analysis – the mapping <strong>and</strong> measuring of<br />

relationships <strong>and</strong> flows between people, groups, organisations, computers,<br />

URLs <strong>and</strong> other connected in<strong>for</strong>mation. 2 Identifying the key nodes of a<br />

network through social-network analysis will help to generate further<br />

leads. The group felt that this tends to work better with regard to lower<br />

level operatives, but it can also lead to the very top of the network – the<br />

classic example being the role social network analysis played in the capture<br />

of Saddam Hussein. 3 Individuals found via social-network analysis are often<br />

known to law en<strong>for</strong>cement from other operations that may or may not be<br />

considered to be relevant. Amalgamating data sets from many different<br />

operations may help to indentify linkages by providing an overview that<br />

will help to put the social-network analysis in context but, to return to the<br />

issue of automated versus human analysis discussed earlier in this paper,<br />

social network analysis is likely to throw up many low-level links, only some<br />

of which are relevant. It may take strong cultural underst<strong>and</strong>ing <strong>and</strong> human<br />

analysis to decide which of the many options are worth pursuing.<br />

<strong>Data</strong> Lessons from the Private Sector<br />

The group discussed whether <strong>and</strong> to what extent the commercial sector’s<br />

experiences with <strong>Big</strong> <strong>Data</strong> can be applied to national security. For example,<br />

commercial approaches enable companies to market particular products to<br />

particular individuals based on their previous buying behaviour <strong>and</strong>, while<br />

it may not be immediately obvious that this could aid counter-terrorism or<br />

serious organised crime operations, there is potential benefit in being able to<br />

analyse someone’s previous behaviour in order to predict <strong>and</strong> influence their<br />

future behaviour. The group expressed concern that <strong>for</strong> such an approach<br />

to work, high numbers of skilled human analysts need to be directed at the<br />

available data sets, <strong>and</strong> need to be able to underst<strong>and</strong> additional background<br />

in<strong>for</strong>mation <strong>and</strong> context. There is currently a shortage of appropriate skilled<br />

security analysts available but it is not clear whether this is a real skills<br />

shortage or a funding issue: whether there are too few people with the<br />

appropriate skills, or whether those who have these skills are being attracted<br />

2. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014.<br />

3. ‘Case Study: The Capture of Saddam Hussein, War 2.0: The National <strong>Security</strong> <strong>and</strong> the<br />

Science of Networks’, ,<br />

accessed 19 August 2014.


67<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

to commercial sector marketing jobs rather than public-sector security work<br />

because of the salaries offered.<br />

Finding out how national security can benefit from the experiences of the<br />

commercial sector will be a continuous learning experience; a multidisciplinary<br />

approach is needed involving social scientists as well as computer experts.<br />

Summary<br />

The analysis of large amounts of data <strong>and</strong> diverse data streams needs to<br />

be multidisciplinary <strong>and</strong> multidimensional. <strong>Big</strong> data can enable real-time<br />

processing of large amounts of in<strong>for</strong>mation, but <strong>for</strong> this to be of value in the<br />

policing, terrorism, crime <strong>and</strong> fraud arenas, a better underst<strong>and</strong>ing is needed<br />

of the value that can be added by the data being collected <strong>and</strong> analysed, along<br />

with more analysis of precisely what this value is <strong>and</strong> how it is added. The<br />

financial sector, in particular, needs to guard against complete automation<br />

in the detection of anomalies: there is no replacement <strong>for</strong> human analysis.<br />

Suggested Research Topics<br />

• More research is needed into how data analysis (<strong>and</strong> data analysts)<br />

can identify <strong>and</strong> interpret missing data <strong>and</strong> data on deviations from<br />

the expected norm. Real-time processing, combined with analysts<br />

who have an underst<strong>and</strong>ing of the context in which the data have<br />

been collected, will help such trends to be picked out more easily <strong>and</strong><br />

interpreted appropriately. Predictive analytics against the norm are<br />

needed, which can look at deviant behaviour at an individual level, as<br />

well as at the wider community level<br />

• A better underst<strong>and</strong>ing is required of how to link data to underlying<br />

causes, along with methodology that can guard against the negative<br />

influence of supposition. Techniques need to be developed that<br />

remodel data sets with excluded or removed data reintegrated, so<br />

that results can be compared <strong>and</strong> differences analysed in order to test<br />

the validity of the initial assumptions<br />

• Better research is needed into ways to remodel data <strong>and</strong> test<br />

assumptions so that a more detailed picture can be built of how data<br />

reflect assumptions. This may help to identify whether some leads<br />

are currently being missed because of inherent biases in the way the<br />

data are approached.


Discussion Group 3: Health <strong>Data</strong>, Public Health<br />

<strong>and</strong> Public Health Emergencies<br />

Chair: Chris Watkins<br />

Key Issues <strong>and</strong> Challenges<br />

• The public requires honesty <strong>and</strong> transparency about why health data<br />

are being collected <strong>and</strong> what they will be used <strong>for</strong>. Good communication<br />

is essential <strong>for</strong> building trust in health databases <strong>and</strong> data sets<br />

• Allowing some degree of personal choice over what data are stored,<br />

who they might be shared with <strong>and</strong> in what situations will help<br />

individuals feel in control of their data. This may be important in<br />

gaining acceptance <strong>for</strong> new public health data strategies, however<br />

the number of people who choose to sign up to the NHS Organ Donor<br />

Register (ODR), as an example, has been disappointingly low<br />

• The public’s concern over possible discrimination resulting from<br />

health in<strong>for</strong>mation kept on individuals, <strong>and</strong> of the potential misuse<br />

of data, appears to be creating a negative impression of large health<br />

data projects. This needs to be addressed be<strong>for</strong>e health in<strong>for</strong>mation<br />

possibilities can be fully realised.<br />

This group discussed the ways in which big data might be used to positive<br />

effect in public health, both in general <strong>and</strong> during public health emergencies. 1<br />

The main benefits the group identified include opportunities to improve<br />

surveillance so that outbreaks of disease are detected more quickly <strong>and</strong> more<br />

accurately, particularly by developing opportunities <strong>for</strong> self-reporting through<br />

social-media <strong>and</strong> mobile communications plat<strong>for</strong>ms <strong>and</strong> by improving the<br />

dissemination of in<strong>for</strong>mation during the response to an outbreak.<br />

The group felt that the systems which exist <strong>and</strong> are widely used prior to a<br />

health emergency, but that have the capacity to also cope during the crisis,<br />

are likely to be more useful than entirely novel systems that only come into<br />

play in extreme situations. One valuable area of research, there<strong>for</strong>e, would<br />

be to consider how easily existing systems could morph from ‘business as<br />

usual’ to ‘public health emergency’ conditions, <strong>and</strong> what additional features<br />

or functionalities might need to be added to the normal system to enable<br />

this. For instance: how easily could current surveillance systems cope with<br />

significantly increased data traffic, or more frequent updating? How rapidly<br />

can different surveillance systems be aggregated together <strong>and</strong> analysed?<br />

1. A public health emergency of international concern is defined by the World Health<br />

Organization as ‘an extraordinary event [that] constitute[s] a public health risk to other<br />

States through the international spread of disease, or that potentially requires an<br />

international response’, , accessed 19<br />

July 2014.


69<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Once such technological challenges are met, the group felt that there are a<br />

number of ways in which data collected over social-media plat<strong>for</strong>ms in particular<br />

can help to improve public health <strong>and</strong> the response(s) to a health emergency. In<br />

some cases, studies are already available. During the 2009–10 H1N1 influenza<br />

outbreak, 2 <strong>for</strong> example, the NHS set up an online service to enable a more<br />

effective distribution of the antiviral drug TamiFlu, which was in short supply.<br />

The technological requirements to set up self-diagnosis <strong>and</strong> self-reporting<br />

systems that will help to collect data on the number <strong>and</strong> location of systems<br />

are relatively easy to set up – such as an app that would list symptoms <strong>and</strong><br />

allow someone to check which ones they have, report them to the NHS <strong>and</strong><br />

receive advice on whether further consultation with a local pharmacist or<br />

their GP is needed – but their efficiency <strong>and</strong> accuracy is affected by people’s<br />

willingness to engage. This, in turn, would be influenced by the nature of<br />

the disease. Self-diagnosis <strong>and</strong> reporting systems could collect accurate data<br />

on sexually transmitted infections (STIs) <strong>for</strong> example, helping to locate ‘hotspots’<br />

of new outbreaks, while social-network analysis might help to trace<br />

social contacts from whom the STI might have been caught or who are in<br />

danger of having it passed on to them, but whether or not people would<br />

be willing to engage in such self-reporting is a different matter. Apps that<br />

have proven effective in providing in<strong>for</strong>mation on seasonal influenza may<br />

not have been suitable technology to engage during the early days of the<br />

AIDS epidemic, had such technology been available.<br />

Privacy Challenges<br />

Using <strong>Big</strong> <strong>Data</strong> <strong>for</strong> public health benefit is more complex than just the<br />

technological development. There have been several challenges to public<br />

acceptance of large health-related data projects to date, generally over<br />

concerns around privacy, which seem to be particularly acute with regard to<br />

personal health in<strong>for</strong>mation. There has been considerable public <strong>and</strong> media<br />

backlash to projects such as the NHS <strong>Data</strong>spine, 3 which intended to provide<br />

a central repository <strong>for</strong> in<strong>for</strong>mation on more than 70 million patients from<br />

27,000 individual organisations, <strong>and</strong> its successor, the Health <strong>and</strong> Social Care<br />

In<strong>for</strong>mation Centre, 4 which was set up in April 2013 under the Health <strong>and</strong><br />

Social Care Act 2012, to ‘collect, analyse <strong>and</strong> present UK national health<br />

<strong>and</strong> social care data’. Protest groups such as GeneWatchUK 5 <strong>and</strong> Privacy<br />

International 6 have raised issues around the mere existence of such a<br />

2. NHS, ‘Swine Flu’, , accessed 19 July 2014.<br />

3. Computer Weekly, ‘NHS <strong>Data</strong> Spine out of Action <strong>for</strong> 28 Hours in a Week’, 10 January<br />

2006, ,<br />

accessed 19 July 2014.<br />

4. See , accessed 2 June 2014.<br />

5. See , accessed 2 June 2014.<br />

6. See ,<br />

accessed 2 June 2014.


70<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

database, including objections to the ease with which it can be accessed,<br />

who should be allowed access to it <strong>for</strong> research or surveillance purposes,<br />

<strong>and</strong> what rights patients should have to access <strong>and</strong> amend their in<strong>for</strong>mation.<br />

Health professionals <strong>and</strong> policy-makers see the potential benefits of such<br />

a database, which will enable the best available treatments to be targeted<br />

towards individuals based on their health profile <strong>and</strong> potentially even their<br />

genome, as far outweighing the potential <strong>for</strong> misuse, but the UK government<br />

nonetheless needs to ensure that privacy safeguards are in place <strong>and</strong> that<br />

these safeguards are clearly communicated to (<strong>and</strong> trusted by) the general<br />

public in order to ensure public <strong>and</strong> media acceptance of such programmes.<br />

The group felt that poor communication of the benefits of such systems<br />

is most likely to be at the heart of the public <strong>and</strong> media backlash, but<br />

also acknowledged that public trust in large government data projects is<br />

undermined by perceptions that government does not act responsibly<br />

with public data. Such perceptions have been rein<strong>for</strong>ced by the Snowden<br />

revelations published in the Guardian <strong>and</strong> Washington Post since June<br />

2013, in which <strong>for</strong>mer US National <strong>Security</strong> Agency (NSA) contractor Edward<br />

Snowden revealed that the NSA had collected <strong>and</strong> stored large volumes of<br />

Internet communications by private citizens between 2007 <strong>and</strong> 2013 – in<br />

collaboration with a number of private sector companies such as Google <strong>and</strong><br />

Amazon, <strong>and</strong> other national government agencies including the UK’s GCHQ<br />

– under a strategy of collecting everything, <strong>and</strong> only then analysing it to<br />

reveal criminal or terrorist activities, rather than collecting communications<br />

only where there was reason to suspect those communications may contain<br />

something untoward. The revelations have caused public <strong>and</strong> media outrage,<br />

even though in most cases collection <strong>and</strong> sharing had been possible because<br />

the users of plat<strong>for</strong>ms such as Facebook, Yahoo <strong>and</strong> Amazon did not change<br />

default privacy settings that would have prevented their personal in<strong>for</strong>mation<br />

from being shared in this way. Prior to the Snowden revelations, most users<br />

considered the benefits of data sharing on Facebook, including the amount<br />

of data that can be shared <strong>and</strong> the number of people it can reach, to heavily<br />

outweigh the negatives (<strong>and</strong> in practice they still do – there is little concrete<br />

evidence that a significant proportion of individuals are genuinely changing<br />

their behaviour with regard to privacy settings as a result of Snowden).<br />

Fear of Discrimination<br />

Negative attitudes towards <strong>Big</strong> <strong>Data</strong> health projects are largely driven by<br />

fear that the in<strong>for</strong>mation contained in the data will enable the state, private<br />

companies or other agents to discriminate against individuals or certain groups.<br />

For example, certain data might enable insurance companies to discriminate<br />

against individuals with high-risk profiles <strong>and</strong> thus affect the cost of insurance<br />

premiums. An individual with a genetic predisposition to cancer may find it<br />

difficult to take out life insurance or a long-term loan, such as a mortgage.<br />

This is in spite of counter-arguments that awareness of their genetic make-up


71<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

may make them more likely to follow a healthy lifestyle that avoids known<br />

triggers to the genetic condition, such as smoking, <strong>and</strong> to attend regular<br />

medical screenings that will pick up emerging conditions early <strong>and</strong> enable<br />

more effective treatment, which may even increase their life expectancy.<br />

Fears include perceptions that if the health-care sector had access to large data<br />

sets <strong>and</strong> was able to share this in<strong>for</strong>mation with other organisations, this might<br />

enable health insurance companies to profile customers to determine whether<br />

they should be given insurance or not. For example, when a customer signs up to<br />

a supermarket reward card, they often accept terms <strong>and</strong> conditions that enable<br />

their buying habits to be shared with a number of third-party organisations,<br />

mostly <strong>for</strong> marketing purposes. The in<strong>for</strong>mation supermarkets collect about<br />

people’s buying habits can be used to make inferences about their lifestyle <strong>and</strong><br />

this could in turn be analysed to make predictions about their likely long-term<br />

health. If supermarkets were to share this in<strong>for</strong>mation with the health sector, it<br />

may enable positive health interventions to be targeted towards communities<br />

or even individuals, <strong>and</strong> to improve planning on what health-care services might<br />

be needed in future, but a potential downside might be that health insurance<br />

companies use the in<strong>for</strong>mation to determine whether a potential customer is a<br />

heavy consumer of alcohol, or whether their diet is particularly unhealthy, <strong>and</strong><br />

approve or deny insurance based on this personal in<strong>for</strong>mation.<br />

Building Trust in Health <strong>Data</strong> Projects<br />

The group felt that with regard to examples such as the one given above, more<br />

research is needed into what uses people will accept or object to regarding their<br />

personal data, <strong>and</strong> in what circumstances. This would help to develop a better<br />

underst<strong>and</strong>ing of how the way in which people are in<strong>for</strong>med of data collection<br />

(including who is doing the collection, <strong>for</strong> what reason, <strong>and</strong> what the data are<br />

likely to be used <strong>for</strong>) determines how they will react to it. There may be very<br />

different reactions from people depending on whether the data are being<br />

collected by the government or by pharmaceutical companies, <strong>and</strong> this may not<br />

be consistent from one country to the next. In Germany, <strong>for</strong> example, where the<br />

legacy of Nazi rule has made the population cautious of allowing government<br />

collection of personal in<strong>for</strong>mation, most health data are collected by private<br />

health companies rather than the state, <strong>and</strong> there are local, federal <strong>and</strong> state<br />

differences in the legal structures surrounding data protection <strong>and</strong> privacy.<br />

Cultural <strong>and</strong> historical factors can strongly affect public perceptions surrounding<br />

the in<strong>for</strong>mation being collected <strong>and</strong> can affect how successful data collection is.<br />

Plat<strong>for</strong>m(s) through which in<strong>for</strong>mation is collected <strong>and</strong> how it is stored are<br />

also important to consider. The group felt that is unclear under current EU<br />

law whether in<strong>for</strong>mation stored on a mobile device such as a smart phone<br />

is classed as broadcast in<strong>for</strong>mation or personal in<strong>for</strong>mation, <strong>and</strong> there<strong>for</strong>e<br />

whether or not it is protected by privacy laws; more research is needed on<br />

how existing legislation relates to rapid technological advances.


72<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

The discussion group felt that with regard to health-related big data projects<br />

such as the single NHS database, allowing personal choice through an ‘optin’<br />

system may the best approach to building trust. The NHS ODR 7 could be<br />

used as a model, though it is worth noting that only around half of the UK<br />

population (54 per cent of women <strong>and</strong> 46 per cent of men) have opted in.<br />

While this amounts to many millions of individuals, there are also many<br />

millions who do not opt in. A survey was carried out on behalf of NHS Blood<br />

<strong>and</strong> Transplant in 2013 to find out why people do not sign up to the Register. 8<br />

The results are shown in the box below.<br />

Box 1: Reasons why people choose not to opt in to the NHS’s Organ Donor<br />

Register.<br />

30% are aware of the ODR but do not know or have confused underst<strong>and</strong>ing of<br />

what it is<br />

16% are not aware of the ODR or are unable to say what it is<br />

16% say they do not want to think about their death<br />

15% say they worry that their family might be upset if they donated their organs<br />

12% say they worry that they could still be alive when the operation is carried out<br />

11% say they do not want to donate to someone who does not deserve it<br />

10% believe they are too old.<br />

(respondents were allowed to give more than one reason)<br />

Where opt-in to data collection is required, offering even minor incentives<br />

(which can be as simple as free access to the service, even if the company<br />

offering the services makes a profit from the individuals signing up)<br />

encourages individuals to sign up. For example, the market research company<br />

YouGov offers points <strong>for</strong> signing up <strong>and</strong> participating in its surveys, which can<br />

be redeemed <strong>for</strong> financial remuneration <strong>and</strong> other rewards. 9 Equally, making<br />

it difficult or inconvenient to opt out can also ‘nudge’ behaviour in a certain<br />

7. NHS, ‘Organ Donation, How to Register’, , accessed 3 June 2014.<br />

8. Figures provided on 10 June 2014 by NHS Blood <strong>and</strong> Transplant, from a commissioned<br />

market research report carried out by Optimisa in 2013.<br />

9. YouGov, ‘Join the YouGov Panel Today!’, ,<br />

accessed 2 June 2014.


73<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

direction. Facebook <strong>and</strong> most social-networking sites set low default privacy<br />

settings, as the company wants to be able to collect as much in<strong>for</strong>mation<br />

on individuals as possible. A conscious decision has to be made to change<br />

to higher privacy settings (though this is starting to change following the<br />

backlash from Snowden). As the easiest <strong>and</strong> least inconvenient course of<br />

action <strong>for</strong> a new user is to accept the low privacy settings, most people tend to<br />

do so. With regard to health data projects, the group felt that opt-in systems<br />

would be more likely to generate trust if they began with very high privacy<br />

settings, <strong>and</strong> allowed users various levels of opt-out if they were happy <strong>for</strong><br />

their in<strong>for</strong>mation to be shared with other government departments, research<br />

institutions or private-sector companies.<br />

Targeted <strong>Data</strong> Collection<br />

The group felt that an important consideration in the collection of health data<br />

– <strong>and</strong>, in fact, any data – is that the collection must be targeted to ensure that<br />

collection is necessary <strong>and</strong> efficient. <strong>Data</strong> should not be collected <strong>for</strong> the sake<br />

of collecting data. There are important distinctions here between targeted<br />

analysis (which knows what is being looked <strong>for</strong> <strong>and</strong> seeks actively to find it)<br />

<strong>and</strong> general pattern analysis (the recognition of patterns <strong>and</strong> regularities in<br />

data, without necessarily being aware of what those patterns mean without<br />

further analysis), though both are useful in determining dynamics within a<br />

data set. For example, general pattern analysis in public health may pick up a<br />

sudden increase in the quantities of influenza drugs being prescribed, <strong>and</strong> the<br />

location(s) in which the prescriptions are being made, which might indicate<br />

<strong>and</strong> identify the beginning of a new p<strong>and</strong>emic, while targeted analysis might<br />

then aim to track family members <strong>and</strong> colleagues of the individuals showing<br />

symptoms, so that they can be tested <strong>and</strong> offered preventative treatments<br />

be<strong>for</strong>e symptoms develop. <strong>Data</strong> subjected to general pattern analysis<br />

should be anonymised (as the patterns will emerge whether the data are<br />

anonymised or not), whereas it is very difficult to carry out targeted analysis<br />

on anonymised data. This raises questions around how <strong>and</strong> at what stage<br />

data are anonymised – the collection of only anonymised data may make<br />

meaningful interpretation or practical action difficult at a later stage, but if<br />

the public is not convinced that data are being appropriately anonymsied,<br />

they may be unwilling to provide data in the first place.<br />

Summary <strong>and</strong> Conclusions<br />

<strong>Big</strong> <strong>Data</strong> offers enormous potential benefits to health care, including<br />

improved surveillance so that outbreaks of disease are detected more<br />

quickly <strong>and</strong> accurately (particularly by developing opportunities <strong>for</strong> selfreporting<br />

through social media <strong>and</strong> mobile communications plat<strong>for</strong>ms)<br />

<strong>and</strong> by improving the dissemination of in<strong>for</strong>mation during the response to<br />

an outbreak. While the technology exists to enable this, in most cases the<br />

more pressing challenges surround people’s unwillingness to engage. Public<br />

acceptance of large health-related data projects has met several challenges to


74<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

date, generally over concerns around privacy, which seem to be particularly<br />

acute with regard to personal health in<strong>for</strong>mation.<br />

Communication of the benefits of such systems needs to be improved to<br />

ensure public trust in large government data projects is not undermined<br />

by perceptions that government is not capable of acting responsibly with<br />

public data. In particular, there are fears that if the health-care sector has<br />

access to large data sets, <strong>and</strong> is able to share this in<strong>for</strong>mation with other<br />

organisations, this may enable health insurance companies <strong>and</strong> other<br />

private-sector companies to profile customers <strong>and</strong> discriminate against<br />

them. These fears need to be addressed, <strong>and</strong> better underst<strong>and</strong>ing is needed<br />

on how in<strong>for</strong>mation regarding such data collection projects can be best<br />

communicated.<br />

An important consideration in the collection of health data – <strong>and</strong> in fact any<br />

data – is that the collection must be targeted to ensure that collection is<br />

necessary <strong>and</strong> efficient. In order to win (<strong>and</strong> maintain) public confidence <strong>and</strong><br />

trust, data should not be collected <strong>for</strong> the sake of collecting data.<br />

Suggested Research Topics<br />

• Research is needed on how easily existing data collection <strong>and</strong><br />

surveillance systems could morph from ‘business as usual’ to<br />

‘public health emergency’ conditions, <strong>and</strong> what additional features<br />

or functionality are needed to enable this. How easily can current<br />

surveillance systems cope with significantly increased data traffic,<br />

or more frequent updating? How rapidly can different surveillance<br />

systems be aggregated together <strong>and</strong> analysed?<br />

• Further research is needed into what uses people will accept or object<br />

to regarding their personal health data, <strong>and</strong> in what circumstances.<br />

This would help to develop a better underst<strong>and</strong>ing of how the way in<br />

which people are in<strong>for</strong>med of data collection (including who is doing<br />

the collection, <strong>for</strong> what reason, <strong>and</strong> what the data is likely to be used<br />

<strong>for</strong>) will influence how they react to it<br />

• Research is needed into how best to encourage people to opt in<br />

to data collection schemes, possibly by offering minor incentives<br />

that encourage individuals to sign up. The volume of health-care<br />

data available makes them ideal <strong>for</strong> subjection to general pattern<br />

analysis. This should be anonymised (as the patterns will emerge<br />

whether the data are anonymised or not), but the analysis may<br />

reveal where targeted interventions will have maximum impact; this<br />

requires the data to be linked to the individuals who will receive<br />

that intervention. Research is needed on how <strong>and</strong> at what stage<br />

data should be anonymised <strong>and</strong> de-anonymised during the analysis<br />

process.


75<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Box 2: Social media <strong>and</strong> health emergencies.<br />

As well as being a potential tool <strong>for</strong> surveillance <strong>and</strong> data collection, social<br />

media offers opportunities to influence the affected population to take up (or<br />

refrain from) certain behaviours during health emergencies, including enabling<br />

discussions to take place over whether or not suggested behaviours should be<br />

followed. In Asia, a legacy of the 2002–03 outbreak of Serious Acute Respiratory<br />

Syndrome (SARS) is that many people now wear a protective face mask when<br />

they have a cold, or to prevent them from catching a cold. Such masks are not as<br />

popular in Europe, however, <strong>and</strong> there is considerable disagreement over their<br />

usefulness in restricting the spread of infections. Social-media plat<strong>for</strong>ms could<br />

be used to promote discussion on whether wearing masks is useful <strong>for</strong> not,<br />

<strong>and</strong> where the medical community agrees that a behaviour is beneficial, social<br />

media could be used to disseminate advice. It could also be used to provide<br />

updates on the spread of infections (such as the number <strong>and</strong> location of cases)<br />

<strong>and</strong> predictions on where the disease is likely to spread next.<br />

Community Engagement<br />

During a p<strong>and</strong>emic or other serious disease outbreak, social media could<br />

also be used to raise <strong>and</strong> support community organisations <strong>and</strong> organise<br />

volunteers to carry out activities beneficial to the community, such as<br />

organising shopping collection <strong>and</strong> delivery <strong>for</strong> those infected, so that they<br />

do not need to leave the house, <strong>and</strong> organising regular cleaning of lifts,<br />

staircases <strong>and</strong> other shared areas in blocks of flats. An example of such a<br />

scheme is ‘FluFriends’, 10 which during the H1N1 p<strong>and</strong>emic encouraged people<br />

to arrange who would be able to collect antiviral drugs <strong>for</strong> them should they<br />

become infected, do their shopping <strong>for</strong> them, <strong>and</strong> who they could phone<br />

regularly to say how they were feeling. More recently, FloodVolunteers 11<br />

has enabled individuals to come together <strong>and</strong> request assistance or offer<br />

expertise <strong>and</strong> skills to help those affected by flooding. The technology<br />

needed to co-ordinate community ef<strong>for</strong>ts during a crisis is simple, but the<br />

behavioural factors that will determine whether or not people sign up to<br />

such community networks <strong>and</strong> actively engage with them are more complex.<br />

10. Margaret Lally, ‘Flu Friends – a Possible Alternative’, British Red Cross, ,<br />

accessed 2 June 2014.<br />

11. FloodVolunteers, , accessed 3 June 2014.


Discussion Group 4: Individual Privacy Versus<br />

Community Safety<br />

Chair <strong>and</strong> Rapporteur: Jennifer Cole<br />

Key Issues <strong>and</strong> Challenges<br />

• There are situations in which individual privacy <strong>and</strong> community<br />

safety may be in direct conflict, as ensuring community safety may be<br />

dependent on intruding on personal privacies<br />

• Most of the public appear to expect higher levels of data protection<br />

<strong>and</strong> assurance from the public sector than they do from the private<br />

sector; they accept supermarkets in particular gathering, storing<br />

<strong>and</strong> sharing large amounts of in<strong>for</strong>mation on them, but object when<br />

government wants to do the same<br />

• Most surveillance legislation in the UK came into <strong>for</strong>ce be<strong>for</strong>e socialmedia<br />

plat<strong>for</strong>ms were widespread, resulting in confusion over how the<br />

current laws relate to this <strong>for</strong>m of communication, <strong>and</strong> in particular<br />

over what constitutes private communication on social media.<br />

The aim of this discussion group was to consider situations in which<br />

requirements <strong>for</strong> individual privacy <strong>and</strong> community safety might come into<br />

conflict with data collection <strong>and</strong> sharing, <strong>and</strong> how academic research might<br />

help to provide a better underst<strong>and</strong>ing of, or solutions to, this dilemma.<br />

The group was provided with examples to consider, such as the police<br />

<strong>and</strong> security services conducting surveillance that might be considered an<br />

invasion of privacy on individuals who pose (or are suspected of posing) a<br />

danger to the wider community because of extreme views or suspected links<br />

to terrorist networks. To what extent does the need to protect society from<br />

such individuals justify monitoring not only the individuals themselves, but<br />

also their social networks, including friends, family members <strong>and</strong> colleagues?<br />

An example presented was of health services tracking the movements of an<br />

individual suffering from (or suspected to be suffering from) an infectious<br />

disease that might spread to others, or monitoring their recent social media<br />

communications to actively identify people who might have been in contact<br />

with them so that they can be contacted <strong>for</strong> treatment.<br />

The group was asked to discuss whether legislation around such surveillance,<br />

data collection <strong>and</strong> data sharing should be absolute: equally applicable<br />

to a gang of youths suspected of anti-social activity, such as graffiti, as to<br />

a suspected terrorist network, <strong>for</strong> example, or during a routine disease<br />

outbreak involving mild symptoms or a p<strong>and</strong>emic of life-threatening severity.<br />

The group was also asked to discuss how political <strong>and</strong> public opinion shapes<br />

surveillance <strong>and</strong> data legislation, how attitudes change over time, <strong>and</strong> what


77<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

the drivers of change are likely to be. Are there specific drivers or qualifiers<br />

that enable new legislation to be introduced <strong>and</strong> accepted?<br />

The 11 September 2001 terrorist attacks on the US created an atmosphere<br />

of fear that has led to greater social acceptance of surveillance when this is<br />

explained as being <strong>for</strong> security purposes. The US PATRIOT Act was given an<br />

example of a measure that has curtailed liberties in favour of security. 1 The<br />

group discussed what impact legislation such as this has on privacy, <strong>and</strong> at<br />

what point the balance between security <strong>and</strong> privacy might be considered<br />

to have tipped too far towards security. In times of great need, rules may<br />

change, but how <strong>and</strong> through what processes this should be allowed <strong>and</strong><br />

accepted are not currently understood.<br />

Determining the Risk Threshold <strong>for</strong> Collection<br />

The conference was held less than a year after the Snowden releases, at a<br />

time when new revelations about US <strong>and</strong> UK government actions under the<br />

controversial PRISM programme were coming to light each month. 2 Privacy<br />

<strong>and</strong> surveillance issues were there<strong>for</strong>e fresh in participants’ minds, <strong>and</strong> the<br />

relationship between them was still a very controversial issue. This applied<br />

particularly to widespread surveillance programmes that collect large volumes<br />

of data against relatively low-risk thresholds – in other words, surveillance<br />

programmes that lean far more towards security than privacy. A key point,<br />

the group felt, was to determine what level of activity, or connection to a<br />

network under surveillance, justified placing that individual or community<br />

under surveillance themselves. Set the bar higher <strong>for</strong> privacy, <strong>and</strong> the risk<br />

is that individuals who are involved will not be identified; set the bar higher<br />

<strong>for</strong> security, <strong>and</strong> the authorities will be accused of ‘snooping’ on innocent<br />

people. The group recognised that this is a dilemma <strong>for</strong> governments to<br />

which there is no easy answer, but that academia could help by researching<br />

attitudes to privacy <strong>and</strong> surveillance <strong>and</strong> identifying what decisions are more<br />

likely to be accepted, <strong>and</strong> why.<br />

The group broadly agreed with US President Barak Obama’s declaration that<br />

it is impossible to have ‘100 percent security <strong>and</strong> then have 100 percent<br />

privacy’, 3 <strong>and</strong> acknowledged that the actions of the UK government with<br />

regard to PRISM have been legal <strong>and</strong> based around existing legislation. There<br />

were concerns, however, that such surveillance was only legal because of the<br />

way in which the public seemed happy to sign away privacy rights in the terms<br />

1. The US PATRIOT Act: Preserving Life <strong>and</strong> Liberty, , accessed 16 June 2014.<br />

2. For a full explanation of PRISM, see Leon Kelion, ‘Q&A: NSA’s Prism Internet<br />

Surveillance Scheme’, BBC News, 1 July 2013, , accessed 16 June 2014.<br />

3. Paul Adams, ‘Barack Obama Defends US Surveillance Tactics’, BBC News, 8 June 2013,<br />

, accessed 16 June 2014.


78<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

<strong>and</strong> conditions they accept when signing up to social-media plat<strong>for</strong>ms such<br />

as Facebook, <strong>and</strong> because most surveillance <strong>and</strong> data assurance regulation<br />

includes clauses stating, or are worded in such a way, that individuals’ rights<br />

to privacy can often be over-ridden in situations that are loosely defined<br />

in legislation using terms such as ‘public safety’, <strong>and</strong> ‘immediate danger’,<br />

without expressing what would constitute such a situation. Such wording<br />

can there<strong>for</strong>e be argued to amount to ‘get-out clauses’ that enable privacy<br />

to be overridden whenever the government decides.<br />

Article 8.1 of the UK Human Rights Act 1998, 4 which came into <strong>for</strong>ce in<br />

October 2000, states that ‘everyone has the right to respect <strong>for</strong> his private<br />

<strong>and</strong> family life, his home <strong>and</strong> his correspondence’. This is a qualified right,<br />

however, which can be overruled by Article 8.2, which states: 5<br />

There shall be no interference by a public authority with the exercise of<br />

this right except such as is in accordance with the law <strong>and</strong> is necessary<br />

in a democratic society in the interests of national security, public<br />

safety or the economic well-being of the country, <strong>for</strong> the prevention<br />

of disorder or crime, <strong>for</strong> the protection of health or morals, or <strong>for</strong> the<br />

protection of the rights <strong>and</strong> freedoms of others.<br />

In other words, perceived danger to the wider community or society trumps<br />

the right to privacy of the individual.<br />

A second key piece of UK legislation affecting this debate is the Regulation of<br />

Investigatory Powers Act (RIPA) 2000, 6 which was introduced to modernise<br />

laws relating to the interception of communications in order to protect the<br />

public adequately from terrorism, cyber-crime <strong>and</strong> online paedophilia, <strong>and</strong><br />

has attracted criticism from civil rights <strong>and</strong> privacy campaigners such as<br />

Liberty 7 <strong>and</strong> The Open Rights Group, 8 which refer to it as ‘The Snooper’s<br />

Charter’. Section s26(2)c states that the Act ‘allows covert surveillance where<br />

there is “immediate danger”’, including directed surveillance undertaken <strong>for</strong><br />

the purposes of a specific investigation or operation. This would appear to<br />

allow surveillance that would actively seek out <strong>and</strong> identify individuals whose<br />

4. Human Rights Act 1998, Article 8, Right to Respect <strong>for</strong> Private <strong>and</strong> Family Life, , accessed 16 June 2014.<br />

5. Human Rights Act 1998, Schedule 1, Article 8, Right to Respect <strong>for</strong> Private <strong>and</strong> Family Life,<br />

, accessed 19 July 2014.<br />

6. Regulation of Investigatory Powers Act 2000, , accessed 16 June 2014.<br />

7. Liberty, ‘State Surveillance’, , accessed 16 June 2014.<br />

8. Digital Surveillance, ‘Why the Snooper’s Charter is the Wrong Approach: A Call <strong>for</strong><br />

Targeted <strong>and</strong> Accountable Investigatory Powers’, The Open Rights Group, , accessed<br />

16 June 2014.


79<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

private correspondence suggested that they may be involved in activities<br />

that pose a threat, or potential threat, to wider society; though, again, the<br />

exact terms of this threat are not specified.<br />

While RIPA dictates what in<strong>for</strong>mation can be sought out by the state during<br />

exceptional circumstances, the <strong>Data</strong> Protection Act 1998 9 (in particular the<br />

emergency powers set out under Schedule 2 a, c <strong>and</strong> d) determines to what<br />

extent in<strong>for</strong>mation that might normally be protected can be shared; clauses<br />

exclude protection where the processing is ‘necessary’ <strong>and</strong> ‘<strong>for</strong> the exercise<br />

of any functions of either House of Parliament’. The Act is primarily concerned<br />

with protecting confidentiality <strong>and</strong> imposes a duty on organisations to<br />

ensure that data are used only <strong>for</strong> authorised purposes <strong>and</strong> are properly<br />

protected (HSC/99/012). In addition, the <strong>Data</strong> Protection Act enables sharing<br />

where ‘the data subject has given his consent to the processing’, which has<br />

been hotly debated with regard to how the terms <strong>and</strong> conditions on socialmedia<br />

plat<strong>for</strong>ms are accepted by users when they join, <strong>and</strong> whether this can<br />

genuinely be interpreted as in<strong>for</strong>med consent. Since the conference, at least<br />

one court case has challenged the right of Facebook to pass in<strong>for</strong>mation to<br />

the US security agencies. 10 The discussion group also noted that the laws<br />

relating to surveillance, data collection <strong>and</strong> sharing came into <strong>for</strong>ce be<strong>for</strong>e<br />

social media plat<strong>for</strong>ms such as Twitter <strong>and</strong> Facebook, which communicate<br />

one-to-many, existed. To what extent social-media communications over<br />

such plat<strong>for</strong>ms constitute ‘private communication’, <strong>and</strong> how this can be<br />

interpreted under such legislation, still needs considerable debate, to which<br />

academia is well placed to contribute.<br />

Responsibilities of Government<br />

The discussion group agreed that protection of the many – be this a particular<br />

community or society as a whole – is paramount <strong>for</strong> government. Where<br />

there is conflict between this <strong>and</strong> the right of an individual to privacy, the<br />

group felt that it is right <strong>for</strong> the government to focus on the protection of<br />

the many. Nevertheless, there still need to be clearly set boundaries <strong>and</strong><br />

a response that is scalable to the actual risk posed. Academics can help<br />

to define these boundaries <strong>and</strong> to qualify the risk(s). The group discussed<br />

the degree to which individuals should be monitored at different levels of<br />

suspected or known involvement in activity, <strong>and</strong> agreed that rather than<br />

being absolute, this should probably differ depending on what that activity is.<br />

For example, it may be (more) acceptable to begin surveillance on the entire<br />

9. <strong>Data</strong> Protection Act 1998, ,<br />

accessed 16 June 2014.<br />

10. Mary Carolan, ‘Facebook <strong>Data</strong> Transfer Interfered With Privacy Daily, Court Told’,<br />

Irish Times, 30 April 2014, , accessed<br />

16 June 2014; Europe vs Facebook, ,<br />

accessed 16 June 2014.


80<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

social network of a suspected terrorist from the moment that individual is<br />

suspected of involvement in terrorist activity, but less acceptable to monitor<br />

all social contacts of a group of youths involved in anti-social but reasonably<br />

harmless graffitiing from such an early stage.<br />

The group felt that different threats, such as minor crime <strong>and</strong> terrorism,<br />

warrant different approaches, <strong>and</strong> there<strong>for</strong>e more research is needed into<br />

how the harm caused by certain activities is defined <strong>and</strong> measured, with<br />

better underst<strong>and</strong>ing needed of the extent to which some relatively lowharm<br />

activities overlap with more serious ones. For example, if a definite<br />

link can be identified between graffiti <strong>and</strong> terrorism – in the same way that<br />

definite links have now been identified between financial crime <strong>and</strong> terrorism<br />

– this would help to justify carrying out surveillance on individuals that are<br />

not directly linked to more serious activities but may well be the link on a<br />

social network, or between two social networks, that will lead to the more<br />

serious threats. Academic research could help to highlight which activities<br />

appear to have definite links <strong>and</strong> which do not.<br />

The Privacy Narrative<br />

Some members of the group felt that a small but disproportionately vocal<br />

privacy lobby negatively influence public opinion against surveillance that<br />

only disadvantages those who are breaking the law or who have something<br />

to hide. Innocent citizens have nothing to fear from government surveillance<br />

of their activities <strong>and</strong> so should not mind if such surveillance goes on. It can<br />

be argued that the advantages to a law-abiding citizen of having all personal<br />

communications <strong>and</strong> personal data collected <strong>and</strong> stored <strong>for</strong> potential analysis<br />

so strongly outweigh any perceived disadvantages that no one should object<br />

to it. Explaining this in such a way that people more easily see the benefits<br />

would act as a strong counter to the images conjured up by privacy lobbyists<br />

of sinister government agents spying on innocent citizens’ private lives <strong>and</strong><br />

somehow causing them harm by doing so. A suggested academic research<br />

project that might help to counter such anti-surveillance narratives was a<br />

retrospective look at IRA terrorism <strong>and</strong> the social networks of IRA terrorists.<br />

Could modern social network analysis 11 be applied retrospectively to<br />

historical case studies of IRA terrorism <strong>and</strong> illustrate how the use of such<br />

technology might have been able to construct social networks, identify<br />

other potential terrorists <strong>and</strong> prevent IRA attacks be<strong>for</strong>e they happened?<br />

The group felt that such historical revisiting of past terrorism networks could<br />

help to illustrate the value of storing data on individuals who may appear to<br />

be (or, in fact, actually be) innocent, but who can lead investigators to more<br />

dangerous individuals.<br />

11. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014.


81<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

Differing Perceptions of Public <strong>and</strong> Private <strong>Data</strong> Collection<br />

The group noted the very different attitude to privacy with regard to the public<br />

<strong>and</strong> private sectors, <strong>and</strong> also recognised that private-sector companies react<br />

to public opinion just as much as politicians, but are often able to change<br />

their procedures <strong>and</strong> policy more rapidly. Public opinion has a great effect on<br />

supermarkets, <strong>for</strong> example, which invest hugely in underst<strong>and</strong>ing customer<br />

psychology. To maintain customer trust (<strong>and</strong> there<strong>for</strong>e custom) they have<br />

to be seen to be acting responsibly with customers’ data. There is still not<br />

enough underst<strong>and</strong>ing of why customers readily sign up <strong>for</strong> supermarket<br />

reward cards, often signing away most of their data protection rights in the<br />

process, when very little in<strong>for</strong>mation is given on where the in<strong>for</strong>mation they<br />

provide will be going or how it will be used. This raised questions within<br />

the group as to why it appears to be less acceptable to the general public<br />

<strong>for</strong> certain government departments to share in<strong>for</strong>mation as readily as the<br />

private sector does. The group had little doubt that Virgin Health is likely to<br />

share customers’ in<strong>for</strong>mation with Virgin Media, but automatic transfer of<br />

in<strong>for</strong>mation from one government department to another is treated with<br />

suspicion by the public, suggesting that the public <strong>and</strong> private sectors are<br />

perceived very differently. Academic research could help to pinpoint what<br />

these differences are.<br />

One explanation that was offered was that the public readily underst<strong>and</strong><br />

that supermarkets (<strong>and</strong> other commercial entities) are motivated by selling<br />

more goods, <strong>and</strong> are there<strong>for</strong>e using their data to target advertising to them.<br />

In general, the public do not object to this as they may well be pleased to<br />

be alerted to new products they may like. The problem of government data<br />

collection was that obvious benefits are often less apparent, leaving the<br />

public with the perception that the government is trying to catch them out in<br />

some way – to check if they are paying enough tax, or to make sure they are<br />

not claiming benefits to which they are not entitled, <strong>for</strong> example. There are<br />

there<strong>for</strong>e few easily understood advantages to the individual to government<br />

use of such data, but many disadvantages. Better communication of the<br />

benefits, <strong>and</strong> perhaps even some obvious rewards, would encourage<br />

engagement with data projects. A suggestion was made that signing up<br />

to NHS health databases should enable individuals to receive discounts<br />

on prescriptions, or to be prioritised on hospital waiting lists above those<br />

who have not signed up, but this was challenged by other members of the<br />

group as unethical, <strong>and</strong> would need considerable research to underst<strong>and</strong> the<br />

implications be<strong>for</strong>e it should be considered.<br />

The International Community<br />

Finally, the group discussed who constitutes the ‘community’ whose security<br />

might benefit from compromises on personal privacy: the community<br />

as a whole – that is, everyone – or just the ruling elite. A major issue in<br />

compromises to personal privacy was the fear of a fascist state using the data<br />

collected on its citizens to discriminate against them or actively harm them.


82<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

This leads to different attitudes towards data collection by the state that are<br />

influenced by local <strong>and</strong> national histories: countries that have been subjected<br />

to harsh regimes in the past may be more cautious about h<strong>and</strong>ing over<br />

their data in future. One delegate gave an example of an Eastern European<br />

nation setting up its equivalent of the National Archives in a building that<br />

had previously been the headquarters of its secret police. Though there was<br />

no suggestion that the new archive had any sinister intention behind it, the<br />

public’s willingness to engage with it was clouded by the past associations<br />

of the building in which it was housed. Different cultures react differently to<br />

the sharing of historical records, <strong>and</strong> while British citizens expect the state to<br />

be subservient to them, this view is not held to the same degree across the<br />

entire European Union. Taking this into account, the relationship between<br />

the community <strong>and</strong> the individual may differ in different regions <strong>and</strong> nations,<br />

raising new challenges where the community is international, <strong>and</strong> data<br />

may potentially be being shared across borders. Again, the group felt that<br />

academia could help to build underst<strong>and</strong>ing of these differences, <strong>and</strong> map<br />

cultural attitudes that may help to predict acceptance of, or resistance to,<br />

the creation of new databases <strong>and</strong> international data sharing.<br />

Suggested Research Topics<br />

• Academics could try to identify links between different criminal or<br />

anti-social behaviours to help justify carrying out surveillance on<br />

individuals who are not directly linked to more serious activities but<br />

may well be one or two links away on overlapping social networks <strong>and</strong><br />

who could help lead to the more serious threats. Academic research<br />

could help to highlight which activities appear to have definite links<br />

with one another <strong>and</strong> which do not<br />

• Where there is conflict between the need to protect the community<br />

<strong>and</strong> the right of an individual to privacy, academia can help to<br />

determine where the boundaries should lie <strong>and</strong> suggest how to<br />

develop a response that is scalable to the actual risk posed. In<br />

particular, academia can help to determine qualitative <strong>and</strong> quantitative<br />

measurements of the risks<br />

• Historical revisiting of known terrorist <strong>and</strong> criminal networks using<br />

modern data analysis techniques, <strong>and</strong> using these to highlight how<br />

such in<strong>for</strong>mation may have helped to prevent terrorist attacks or<br />

disrupt activity earlier, might help to communicate the benefits<br />

of large-scale surveillance programmes <strong>and</strong> explain how <strong>and</strong> why<br />

sacrificing some privacy <strong>for</strong> security is beneficial.


Research Themes Identified in the<br />

Discussion Groups


Research Themes Identified in the Discussion<br />

Groups<br />

Discussion Group 1: Legality <strong>and</strong> Ethics of <strong>Data</strong> Sharing<br />

• An in-depth examination is needed of public underst<strong>and</strong>ing of the<br />

surveillance <strong>and</strong> privacy debate, to provide recommendations that<br />

will encourage more people to engage in shaping future policy<br />

• Academic research can help to explain why people’s perceptions of<br />

what is ‘snooping’ <strong>and</strong> what is not appear to differ so dramatically<br />

depending on who is collecting the data<br />

• Research is needed into how to educate people not to willingly give<br />

up data without questioning what this might enable others to do with<br />

those data. Many users do not underst<strong>and</strong> the potential dangers or<br />

the security vulnerabilities<br />

• Academia should suggest ways in which different levels of privacy<br />

settings <strong>and</strong> data sharing agreements can be built into online systems,<br />

so that customers genuinely have a choice in whether or not to accept<br />

the terms they are offered.<br />

Discussion Group 2: Policing, Terrorism, Crime <strong>and</strong> Fraud<br />

• More research is needed into how data analysis (<strong>and</strong> data analysts)<br />

can identify <strong>and</strong> interpret missing data <strong>and</strong> data on deviations from<br />

the expected norm. Real-time processing, combined with analysts<br />

who have an underst<strong>and</strong>ing of the context in which the data have<br />

been collected, will help such trends to be picked out more easily <strong>and</strong><br />

interpreted appropriately. Predictive analytics against the norm are<br />

needed, which can look at deviant behaviour at an individual level, as<br />

well as at the wider community level<br />

• A better underst<strong>and</strong>ing is required of how to link data to underlying<br />

causes, along with methodology that can guard against the negative<br />

influence of supposition. Techniques need to be developed that<br />

remodel datasets with excluded or removed data reintegrated, so<br />

that results can be compared <strong>and</strong> differences analysed in order to<br />

test the validity of initial assumptions<br />

• Better research is needed into ways to remodel data <strong>and</strong> test<br />

assumptions so that a more detailed picture can be built of how data<br />

reflect assumptions. This may help to identify whether some leads<br />

are currently being missed because of inherent biases in the way the<br />

data are approached.<br />

Discussion Group 3: Health <strong>Data</strong>, Public Health <strong>and</strong> Public Health<br />

Emergencies<br />

• Research is needed into how easily existing data collection <strong>and</strong><br />

surveillance systems could morph from ‘business as usual’ to


85<br />

<strong>Big</strong> <strong>Data</strong> <strong>for</strong> <strong>Security</strong> <strong>and</strong> <strong>Resilience</strong><br />

‘public health emergency’ conditions, <strong>and</strong> what additional features<br />

or functionality are needed to enable this. How easily can current<br />

surveillance systems cope with significantly increased data traffic,<br />

or more frequent updating? How rapidly can different surveillance<br />

systems be aggregated together <strong>and</strong> analysed?<br />

• Further research is needed into what uses people will accept or object<br />

to regarding their personal health data, <strong>and</strong> in what circumstances.<br />

This would help to develop a better underst<strong>and</strong>ing of how the way in<br />

which people are in<strong>for</strong>med of data collection (including who is doing<br />

the collection, <strong>for</strong> what reason, <strong>and</strong> what the data is likely to be used<br />

<strong>for</strong>) will influence how they will react to it<br />

• Research is needed into how best to encourage people to opt in to<br />

data collection schemes, possibly by offering minor incentives that<br />

encourage individuals to sign up<br />

• The volume of health-care data available makes them ideal <strong>for</strong><br />

subjection to general pattern analysis. This should be anonymised (as<br />

the patterns will emerge whether the data are anonymised or not)<br />

but the analysis may reveal where targeted interventions will have<br />

maximum impact; this requires the data to be linked to the individuals<br />

who will receive that intervention. Research is needed on how <strong>and</strong> at<br />

what stage data should be anonymised <strong>and</strong> de-anonymised during<br />

the analysis process.<br />

Discussion Group 4: Individual Privacy versus Community Safety<br />

• Academics could try to identify links between different criminal or<br />

antisocial behaviours to help justify carrying out surveillance on<br />

individuals who are not directly linked to more serious activities but<br />

may well be one or two links away on overlapping social networks,<br />

<strong>and</strong> who can help lead to the more serious threats. Academic research<br />

could help to highlight which activities appear to have definite links<br />

with one another <strong>and</strong> which do not<br />

• Where there is conflict between the need to protect the community<br />

<strong>and</strong> the right of an individual to privacy, academia can help to<br />

determine where the boundaries should be <strong>and</strong> suggest how to<br />

develop a response that is scalable to the actual risk posed. In<br />

particular, academia can help to determine qualitative <strong>and</strong> quantitative<br />

measurements of the risks<br />

• Historical revisiting of known terrorist <strong>and</strong> criminal networks using<br />

modern data analysis techniques, <strong>and</strong> using these to highlight how<br />

such in<strong>for</strong>mation may have helped to prevent terrorist attacks or<br />

disrupt activity earlier, might help to communicate the benefits<br />

of large-scale surveillance programmes <strong>and</strong> explain how <strong>and</strong> why<br />

sacrificing some privacy <strong>for</strong> security is beneficial.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!