Big Data for Security and Resilience

Big Data for Security 

and Resilience 

Challenges and Opportunities for the Next 

Generation of Policy-Makers 

Proceedings of the Conference ‘Big Data for Security and Resilience: Challenges 

and Opportunities for the Next Generation of Policy-Makers’ 

Edited by Jennifer Cole 

STFC/RUSI Conference Series No. 4

Conference Report, October 2014 

Big Data for Security and Resilience 

Challenges and Opportunities for the Next Generation 

of Policy-Makers 

Proceedings of the Conference ‘Big Data for Security and Resilience: Challenges and Opportunities for the 

Next Generation of Policy-Makers’, March 2014 

Edited by Jennifer Cole 

www.stfc.ac.uk 

www.rusi.org

A joint publication of RUSI and the STFC, 2014. 

Royal United Services Institute for Defence and Security Studies 

Whitehall 

London 

SW1A 2ET 

UK 

Science and Technology Facilities Council 

Polaris House 

North Star Avenue 

Swindon 

SN2 1SZ 

Editor: Jennifer Cole 

Sub-editor: Susannah Wright 

Individual authors retain copyright of their contributions to this publication. 

This report may be copied and electronically transmitted freely. It may not 

be reproduced in a different form without prior permission of RUSI and the 

SFTC.

Contents 

Foreword 

Bryan Edwards 

v 

Introduction: Machine Learning for Big Data 1 

Alex Gammerman and Jennifer Cole 

I. The National Archives, Big Data and Security: Why Dusty 

Documents Really Matter 5 

Tim Gollins 

II. Trends in Big Data: Key Challenges for Skills 14 

Harvey Lewis 

III: Big Data and Financial Transactions: Providing New Means 

of Analysis 18 

Gregory Mandoli 

IV. Characteristics of Terrorist Finance Networks: The Human Element 28 

Neil Bennett 

V: Terrorism and Political Risk Modelling 32 

Mark Lynch 

VI: Intelligent Use of Electronic Data to Enhance Public Health 

Surveillance 38 

Edward Velasco 

VII: The Raxibacumab Experience: The First Novel Product Approved 

Under the US Food and Drug Administration ‘Animal Rule’ 47 

Chia-Wei Tsai 

Discussion Groups 

Rapporteurs: Philippa Morrell, Chris Sheehan, Ed Hawker 

Discussion Group 1: The Ethics and Legality of Big Data Sharing 57 

Chair and Rapporteur: Edward Hawker 

Discussion Group 2: Policing, Terrorism, Crime and Fraud 62 

Chair: David Smart; Rapporteur: Philippa Morrell

Discussion Group 3: Health Data, Public Health and Public Health 

Emergencies 68 

Chair: Chris Watkins 

Discussion Group 4: Individual Privacy Versus Community Safety 76 

Chair and Rapporteur: Jennifer Cole 

Research Themes Identified in the Presentations and Discussion Groups 83 

An additional three presentations were given at the conference by 

Professor John Parkinson of the Medicines and Health Products 

Regulatory Agency (MHRA), Michael Connaughton of Oracle, and Dr 

Catriona McLeish of the University of Sussex. For a variety of reasons, 

no written papers were produced for these presentations, but we would 

still like to acknowledge their contribution to the event. The Powerpoint 

presentations given by Michael Connaugton and Professor Parkinson, as 

well as those delivered by the speakers who have contributed a written 

paper, can be accessed on the RUSI website events page here: http://goo. 

gl/9cXC3g.

Foreword 

Bryan Edwards 

Of all the challenges facing the UK today, few are as demanding as those 

affecting its national security. Some threats to the UK and its citizens are 

modern variants of those that the country has faced for many years. Others 

are entirely new and different to anything that has preceded them; while 

some, no doubt, have yet to be recognised or understood. 

One feature of this large, complex and constantly evolving array of challenges 

is that few, if any, lend themselves to single-discipline solutions. 

With this in mind, the Science and Technology Facilities Council (STFC) 

operates a Defence, Security and Resilience Futures Programme. Challengeled 

and agnostic with respect to academic discipline, the STFC’s aim is to 

identify and facilitate opportunities to engage relevant capabilities within 

the UK National Laboratories and university research groups in relation to 

some of the highest-priority and most demanding challenges in national 

security. 

As part of this programme, the STFC is delighted to fund and proud to 

collaborate closely with RUSI in delivering a series of conferences on topical 

issues within this domain. 

Each meeting is designed to explore the interface between academic 

research and government policy and operations, in order to stimulate debate 

on how a step change, rather than incremental change, in the protection 

of the UK could be achieved. The meetings are strategic in character, with 

contributions from an atypically broad community drawn from universities, 

industry, government and its agencies and partners. 

At the forefront of the organisers’ minds is a deceptively simple question: 

what academic research can offer now, and in the future, to allow 

government to further enhance its capabilities in key areas, enabling it either 

to do significantly different things or to do what it does now in significantly 

different and better ways. 

In this context, Big Data is often identified as being of particular importance. 

Certainly, there is little doubt that raw data are being generated at what 

appears to be an accelerating rate. This is a trend that seems set to continue 

for the foreseeable future. Not only that, but complementary improvements 

in data storage technologies and telecommunications infrastructure mean 

that more of these data can be archived (potentially indefinitely) and 

accessed on a global basis. And yet volume alone is insufficient to fully

vi 

Foreword 

appreciate either the nature of the challenge or the opportunities that 

exist. Indeed, if Big Data was defined simply according to volume alone, 

there would be few grounds for claiming a revolution. For example, during 

the 1990s, the strategy of the UK’s Department for Social Security sought 

to migrate benefits, such as unemployment benefits and pensions, from 

traditional paper-based systems to IT systems. The data volumes associated 

with this enterprise were large, even by today’s standards. It is therefore 

necessary to look instead at other characteristics of the data to identify 

what is qualitatively different, and to establish the source of the challenges 

and opportunities we are now presented with. These include features such 

as the diversity of the data, in terms of type and reliability. These in turn 

create new challenges for the development of the automated data analysis 

and interpretation systems required. This raises questions not only over how 

one could, in principle, approach the analysis of such data, but equally how 

systems based on these new principles could themselves be tested, verified 

and validated. 

While these technical challenges are significant, there are additional 

complexities associated with data residing in different organisations, and 

a population that is becoming increasingly aware of and sensitive to the 

possibility of exploitation of data whose ownership they question in ways 

they consider inappropriate. 

In this meeting we look at some of the technical challenges that Big Data 

presents, and consider a range of possible uses of and perspectives on 

data to tease out new issues. In the course of a one-day event, the scope 

for exploring them in detail is extremely limited. However, it is hoped that 

identifying relevant questions to be explored elsewhere is, in itself, a useful 

contribution to the debate. 

I would very much like to acknowledge the generous assistance and support 

offered by the US Department of Homeland Security, which contributed to 

making the day a success. Similarly, thanks must go to the staff at the STFC 

and RUSI, whose extremely hard work made this event possible. However, 

the final word of appreciation and gratitude is reserved for all those who 

participated so enthusiastically on the day itself, whether as speakers or as 

delegates. 

Anyone wishing to know more about the STFC’s Defence, Security and 

Resilience Futures Programme in general, or about these conferences in 

particular, is invited to contact me using the e-mail address below. 

Professor Bryan Edwards 

Science and Technology Facilities Council 

bryan.edwards@stfc.ac.uk

Introduction: Machine Learning for Big Data 

Jennifer Cole and Alex Gammerman 

This paper discusses the impact of the current high level of interest in Big 

Data from academia and industry, and comments on how this is influencing 

the approach taken to funding research and developing skills in particular 

areas of computer science. It also discusses the relationship between Big 

Data and machine learning – systems that have the ability to learn from 

data, rather than only following explicitly programmed instructions – and 

the influence Big Data has on machine learning. 

For Big Data (or, for that matter, Small Data) to have any value, machine learning 

needs to be applied in order to extract useful information from the data. The 

current approach to Big Data arguably places too much focus on the data as an 

end in themselves at the expense of properly considering the techniques and 

approaches that will enable the best use to be made of them. For example, in 

2012 the International Data Corporation estimated that while the global data 

supply had reached about 2.8 zettabytes (1 zettabyte equalling 10 21 bytes), 

only an estimated 0.5 per cent of all data collected is used for analysis. 1 There 

is little point in Big Data per se; a problem needs to be defined and then the 

amount of data needed to solve this problem can be decided. 

As a way of extracting useful information from data (irrespective of whether 

they are Big or Small Data) along with the academic disciplines and research 

that have contributed (and continue to contribute) to it, machine learning has 

much to offer in determining how the data are collected, analysed and used. 

Buzzwords in Computer History 

Big Data is a buzzword (or two), and it is not the first time in computer science 

that a new concept has been hailed as the answer to everything. In 1982, the 

Japanese Ministry of International Trade and Industry (MITI) began the Fifth 

Generation Computer Systems (FGCS) 2 project to develop a supercomputer 

that would further develop artificial intelligence. The British response to the 

Japanese challenge was the Alvey Programme 3 in information technology. At 

that time, the way forward for artificial intelligence was largely considered to 

be expert systems: computer systems that could help a human in the decision- 

1. John Gants and David Reinsel, ‘The Digital Universe in 2020: Big Data, Bigger Digital 

Shadows and Biggest Growth in the Far East’, International Data Corporation and 

EMC, 2012, , 

accessed 2 July 2014. 

2. Ehud Shapiro, ‘The Fifth Generation Project – a Trip Report’, Communications of the 

ACM (Vol. 26, No. 9, 1983), pp. 637–41, , 


3. The Alvey Programme, , accessed 30 July 2014.

2 


making process by emulating the reasoning abilities of an expert. Such systems 

were supposed to solve everything. Gradually, however, as it become clear that 

expert systems have narrow and limited areas of application, unsubstantiated 

claims died down and the boom was over. 

The expert systems boom has much in common with the Big Data hullabaloo 

being experienced today. There seems to be an assumption that everything 

can be resolved by Big Data. It is somewhat naive to assume that theory is no 

longer needed to solve problems, just a lot of data and an ability to calculate 

a correlation between various items of data. This is nonetheless what some 

of proponents of Big Data say. 4 The myth persists that Big Data will provide 

the answers to all our questions. Big Data will not do this, but combined with 

machine learning it may help to provide some of them. 

Big Data and Machine Learning 

Modern machine learning exists at the intersection between statistics and 

computer science. 5 Two main topics – inference (the process of reaching 

a conclusion from known facts) and data analysis – have been taken 

from statistics. In particular, non-parametric statistics (which makes no 

assumptions about probability distributions) has developed many methods 

and algorithms that are in use in machine learning. On the other hand, how 

to develop efficient algorithms and knowledge representation – the tractable, 

intractable, non-computable functions – are coming from computer science. 

Basically, machine learning tries to find regularities within past (or training) 

data (or examples) that allow the user to make predictions in future examples. 

This is done irrespective of the amount of data – big or small. 

Researchers at Royal Holloway, University of London, have been doing this 

for years: in 1998, the Computer Learning Research Centre 6 was established 

there and today two prominent Royal Holloway researchers are working in 

the field of statistical learning theory (SLT) with Vladimir Vapnik and Alexey 

Chervonenkis, the theory’s founders. 

Classical statistics usually deals with small scales and low dimensions of data; 

conceptual and computational difficulties may begin to arise when there are 

complex, sizable and high-dimensional data (roughly speaking, where the 

number of attributes or features are greater than a number of examples). 

Several machine learning methods are being developed to deal with these 

4. Chris Anderson, ‘The End of Theory: The Data Deluge Makes the Scientific Method 

Obsolete’, Wired, 16 July 2008, , accessed 30 July 2014. 

5. Many disciplines like psychology, mathematics, philosophy, linguistics, biology contribute 

to machine learning, but the main ones at present are statistics and computing. 

6. Computer Research Learning Centre, Royal Holloway University of London, , accessed 30 July 2014.

3 


problems, including online predictions, parallel algorithms and efficient 

methods. Some of the new techniques being developed at Royal Holloway 

include string kernel techniques, prediction with expert advice and online 

conformal predictors (or transductive confidence machines) – new learning 

techniques that make valid predictions. These techniques have been applied 

in a number of areas, for example for automatic target recognition, statistical 

profiling of offenders for the Home Office, material identification and 

atmospheric correction for military applications, and anomaly detection to 

identify suspicious behaviour of ships and other vehicles. These techniques 

have also been applied to several medical fields, for example for detecting 

various abdominal diseases and ovarian cancer, and finding the best 

treatment for depression. 

One of the central questions in the theory of learning concerns the quantity of 

data needed in order to achieve a solution with a desirable degree of accuracy. A 

simple pattern recognition system to classify digits (0–9) can learn to recognise 

and correctly predict a shown digit after being trained on only a few hundred 

digits out of the hundreds of thousands of digits available for training. 7 That 

is only a fraction of data, but enough to solve a problem. Pattern recognition 

systems often need surprisingly small amounts of data to obtain an answer. 

While intuitively it seems that the more data are used, the more accurate 

the prediction will be, the founders of SLT, 8 Vapnik and Chervonenkis, have 

shown that it is not just the length of training data that is important, but a 

concept called ‘capacity’ or ‘VC-dimension’ (after Vapnik and Chervonenkis). 

Roughly speaking, VC-dimension is a number of parameters of a decision 

rule. The important factor for quality of learning is a ratio of a length of 

the training set to the VC-dimension. A large ratio is ‘good’ from a learning 

perspective, as the results obtained on the test set are close to those on the 

training set to avoid ‘overfitting’ – the test set should show about the same 

accuracy (number of errors) as in the training set. 

If, however, there is a request to apply machine learning algorithms when Big 

Data is provided but the analysis cannot be handled on one machine, parallel 

algorithms can be developed and run on parallel machines. This requires more 

efficient methods to be developed, which is currently a challenge, though 

some progress is being made to resolve this. For example, in addition to wellknown 

methods such as induction, there are some advances in developing 

7. Alex Gammerman and Volodya Volk (2007), ‘Hedging Prediction in Machine Learning’, 

The Computer Journal (Vol. 50, No. 2, 2007), pp. 151–63. 

8. Oliver Bousquet at al., ‘Introduction to Statistical Learning Theory’, Max Plank Institute 

for Biological Cybernetics, 2004, , accessed 2 July 2014.

4 


transductive methods. 9 In induction, particular examples are used to formulate 

a general rule and then make predictions using this rule. The transductive 

instead goes from one example to another, which should be more efficient as 

the model does not have to solve an infinite number of examples, just find one 

particular example, which will in turn predict the next one. This could be a way 

forward for developing new, efficient algorithms for prediction. 

Conclusions 

There is currently a lot of research into machine learning taking place and 

new algorithms are being developed. They are both simple and rigorous, and 

give a wide range of statistical learning methods. John Poppelaars 10 compared 

the current belief in Big Data with a fictional computer, Deep Thought, in The 

Hitchhiker’s Guide to the Galaxy, which took 10 million years to compute 

the ultimate question of life, the universe and everything, but because the 

beings who had programmed it never really knew what the question was, 

nobody knew what to make of the answer. Nowadays, people hope that Big 

Data will help to find the ultimate question, but if we slightly paraphrased 

The Hitchhiker’s Guide to the Galaxy, we would argue that it is not Big Data 

that will define the question: it is machine learning. 

Jennifer Cole is a Senior Research Fellow in Resilience and Emergency 

Management at the Royal United Services Institute, where her research 

programme has included a number of reports and projects on the use of Big 

Data and cyber-security for the UK government, including the Foreign Office 

and Ministry of Defence. She is also a PhD candidate in the Computer Science 

Department at Royal Holloway, University of London. 

Professor Alex Gammerman studied in Leningrad (now St Petersburg) and then 

worked in several research institutes of the Academy of Science of the USSR. In 

1983 he moved to the UK. He was appointed to the established Chair in Computer 

Science at the University of London (Royal Holloway and Bedford New College) 

in 1993. Currently, he is Founding Director of the Computer Learning Research 

Centre at Royal Holloway, University of London, and a Fellow of the Royal 

Statistical Society. Professor Gammerman’s research interest lies in the field 

of machine learning, particularly the development of inductive–transductive 

confidence machines. Areas in which these techniques have been applied include 

medical diagnosis, forensic science, genomics, environment and finance. 

This is a version of the paper written by the authors and can be found at http:// 

clrc.rhul.ac.uk/publications/techrep.htm 

9. Vladimir Vapnik, The Nature of Statistical Learning Theory (New York, NY: Springet, 

1995). 

10. John Poppelaars, ‘Will Big Data End Operations Research?’, 2013, , accessed 30 July 2014.

I. The National Archives, Big Data and Security: 

Why Dusty Documents Really Matter 

Tim Gollins 

This paper discusses three linked propositions. First, the way in which the National 

Archives, as a national institution of the United Kingdom, can be regarded as a 

repository of Big Data. The paper will discuss the concept of big data and place it 

in the historical context of archival collections that have transformed the world, 

for example, the King of Assyria’s Library and the Library at Alexandria. Second, 

it will consider the way in which the National Archives are central to UK security, 

providing a point of reference for society, and supporting citizens’ rights and the 

rule of law. It will also discuss the potential threat that emerges from a loss of 

trust in the processes that underlie the transfer of records to the Archives. Third, 

the paper will cover how the challenges of sensitivity reviews of digital records, 

which ensure that sensitive government records are archived appropriately, 1 

could give rise to further threats to the Archives and thus the wider security of 

our society. The paper goes on to show that in addressing the challenges of the 

sensitivity review of digital records, by using the Big Data nature of archives, 

opportunities arise to counter the wider threats to the security of our society. 

The Archives and Big Data 

The classic definition of Big Data rests on volume, variety and velocity, 2 and 

is inherently assumed to be digital. Taking a longer view, there are a number 

of points in history where such transformative conditions have existed with 

collections of other media, such as: 

• The 30,000 clay tablets from the oldest surviving royal library in the 

world: that of Ashurbanipal, King of Assyria (around 668–630 BC), 

including the story of Gilgamesh 3 

• The iconic Library of Alexandria, alleged to have collected the 

knowledge of the ancient world under one roof (including 400,000– 

700,000 rolls within the collection). 4 

1. National Archives, ‘Step 3: Sensitivity Reviews of Selected Records’, , 


2. Anton Chuvakin, ‘Broadening Big Data Definition Leads to Security Idiotics!’, Gartner 

blog, 18 September 2013, , accessed 18 July 2014. 

3. British Museum, ‘The Library of Ashurbanipal, Research Project at the British 

Museum’, , accessed 19 August 2014. 

4. Heather Phillips, ‘The Great Library of Alexandria’, Library, Philosophy and Practice 

2010, , accessed 18 July 2014.

6 


In comparatively more recent times, as the practice and conventions 

of common law developed in Britain, the need to collect the records of 

cases and to access legal judgments for precedent gave rise to another 

example of Big Data of its day. Drawing on information from the National 

Archives Catalogue, 5 we learn that ‘The Dialogus de Scaccario’, describing 

Exchequer administration in the 1170s, mentions a clerk who was deputy 

to the chancellor and had responsibility for the preparation and custody 

of formal Chancery enrolments. Thereafter, the chancellor’s principal clerk 

was invariably associated with these duties, although progressively more 

and more remote from their direct execution; by 1388, and probably long 

before, a staff of subordinate clerks carried out the actual enrolments. From 

the mid-thirteenth century, this officer was generally known as the ‘keeper 

of the rolls’, and, as the first rank of Chancery clerks gradually came to be 

known as ‘masters’, the title ‘Master of the Rolls’ had become the standard 

designation by the fifteenth century. The holder of that post now chairs the 

Lord Chancellor’s Advisory Council, which assures the transfer of records to 

the Archives. 6 

Bringing the picture up to date, the paper holdings of the National Archives 

at Kew are over 1 billion paper pages, representing 1,000 years of history. 7 

At the same time, there are now over 2.5 billion archived pages accessible 

from the UK Government Web Archive (representing less than 20 years of 

contemporary history) 8 that are now being aggregated and mined to answer 

novel research questions that would have previously been intractable. The 

Archive is, and always has been, Big Data. 

The Archive and Security 

Discussion of security should not be limited to considerations of criminality 

and terrorism. The security of UK society relies at its deepest level on the trust 

of the citizen in the state. It is all about the rule of law and the fact that no 

one, not even the executive, is above that rule. 9 The British state is different 

from many others in that the citizen expects the state to be subservient to it 

rather than the more common case. This is the very fabric of UK society; the 

rule of law supports and empowers the citizen. 

5. National Archives Catalogue, , 

accessed 19 August 2014. 

6. National Archives Advisory Council Information, , accessed 19 August 2014. 

7. The authors’ own estimate based on approximately 12 million entries in the National 

Archives’ catalogue that refer to boxes or folders of records that can reasonably 

expected to hold upwards of 100 sheets of paper. 

8. National Archives UK Government Web Archive Information, , accessed 18 July 2014. 

9. The Rule of Law definition, LexisNexis, , accessed 18 July 2014.

7 


The National Archives are fundamental to this aspect of security. The 

Archives provide the impartial witness that enables ‘holding to account’ 

under the rule of law and in the court of history. They contain evidence 

of the transactions of the state and the executive and evidence of the 

decisions and policies enacted. This is central to Lord Bingham’s Fourth 

Principle: ‘Ministers and public officers at all levels must exercise the 

powers conferred on them in good faith, fairly, for the purpose for which 

the powers were conferred, without exceeding the limits of such powers 

and not unreasonably.’ 10 How can we know what the executive has done if 

the records are not kept? 

However, it is clearly not sufficient to consider the keeping of the record 

without considering how the record is selected and transferred to the 

Archives. The content of the Archives is clearly dependent on these processes. 

It follows therefore that the citizen must trust the process by which the 

Archives receive their material to sustain their rights. 

Transfer to the Archive 

The process by which public records are transferred to the National Archive 

is not widely understood, even among scholars who regularly use its content 

for their research. The principles of the appraisal that underlies transfer were 

laid down by the great archivist Hilary Jenkinson, who described many of the 

fundamentals of the UK system. 11 In setting out his approach, Jenkinson was 

trying to ensure that the UK archive (at that time The Public Records Office) 

was able to guard its independence under the rule of law, and could not fall 

foul of the criticism of complicity in wrongdoing that was evident in the case 

of the Nazi Archive in Germany with respect to the Holocaust. 12 

In summary, the transfer process consists of the following steps: 

• Appraisal and selection: determining which records meet the 

collection policy of the National Archives and then choosing which 

records should be transferred to the Archives or to a place of deposit 

• Sensitivity review: deciding which records should be open on transfer, 

which must be closed, and which must be retained in departments 

(under the ‘Lord Chancellor’s blanket’ – see below) 

• Preparation and delivery: the cataloguing, preparation and 

10. IAP Annual Conference, ‘The Rule of Law in Prosecuting Big Businesses in Application 

to Regulatory Frameworks’, 2013, p. 2, , accessed 18 July 2014. 

11. Hilary Jenkinson, A Manual of Archive Administration (London: P. Lund, Humphries & 

Co Ltd, 1963 [1923]). 

12. Eric Westervelt, ‘Probe Details Culpability of Nazi-Era Diplomats’, NPR, 28 October 

2010, , accessed 

18 July 2014.

8 


organisation of records for transfer and the actual transportation of 

records to the National Archives or to a place of deposit 

• Accessioning: the process by which the National Archives makes the 

records appropriately available. 

A Threat 

The principle of independence derived from and identified in the Grigg 

Report 13 that initiated the Public Records Act 1958 has led, over the years, 

to a series of checks and balances intended to ensure that the necessary 

records of the activities of the executive are deposited. These checks and 

balances include: 14 

• The right of access to information in departments under freedom of 

information legislation before information is transferred 

• Departments’ responsibility for the selection of the records, and for 

the identification of any sensitivity in the records that would cause an 

exemption under freedom of information legislation 

• The fact that the exemptions that can be applied to delay transfer are 

proscribed in law and their application can be challenged through the 

information commissioner and thence by appeal to the Information 

Tribunal 

• The public visibility of the selection criteria that the departments 

must apply – as agreed with the National Archives 

• The National Archives’ process of oversight during the creation of the 

criteria and the Archives’ process of monitoring their application 

• The publication of information regarding transfers 

• The formal oversight of the timeliness of the transfer process and 

the application of freedom of information exemptions by the Lord 

Chancellor’s Advisory Council on Public Records. 

Unfortunately, in 2012, negative publicity 15 concerning the ‘migrated archives’ 

of the colonial administrations (papers of the British administrations which 

should have been passed to the Public Records Office in a timely fashion 

but were wrongly kept at the government’s Hanslope Park facility) and 

subsequent questions concerning other collections of documents at the 

Foreign Office raised the issue of the degree of trust in this system. 

13. James Grigg, Report of the Committee on Departmental Records, Cmnd 9163 (London: 

HMSO, 1954). 

14. National Archives, History of the Public Records Act, , 


15. Ian Cobain and Richard Norton-Taylor, ‘Sins of Colonialists Lay Concealed for Decades 

in Secret Archive’, Guardian, 18 April 2012, , accessed 22 July 2014.

9 


While the process of selection, sensitivity review and transfer is in principle 

an open one, the process is complex and there are opaque aspects (not 

least, the use of the Lord Chancellor’s Security and Intelligence Instrument, 

known colloquially as ‘the Lord Chancellor’s blanket’, which is used to protect 

specific aspects of national security). 16 The very nature of such a situation, in 

which the shape of the process is open, and yet the detail of the data passing 

through the process must be hidden (since to reveal that detail would render 

the process moot), creates a situation in which conspiracy theorists can ply 

their trade. 17 

In essence, it can look like the establishment has something to hide and such 

appearances are important. While in no sense a conspiracy theorist, when 

someone of the eminence of Professor Margaret MacMillan feels compelled 

to challenge her own definitive works on the First World War, we should 

take note. 18 For trust to be maintained in the Archives, it is clear that any 

further barriers to the timely, open and transparent transfer of records must 

be avoided. 

Sensitivity Review of Digital Records 

The argument set out in this paper so far applies to all public records 

regardless of format or media. There are, however, particular consequences 

of the transition to the use of digital records that need to be considered. 

During the three decades from 1984 to 2014, administrative practices have 

been transformed by the introduction of a sequence of waves of technology. 

This started with the photocopier and moved on to the personal computer 

(PC), the local area network to the internet, a wide range of mobile devices 

and, most recently, the ‘cloud’. All of these technologies created the ability 

and tendency to duplicate and proliferate information in ever-increasing 

volumes. This process was piecemeal and began in the early 1990s, but by 

the middle of the first decade of this century all UK government records 

were digital. The impact of these technologies and the transformation of 

16. Notes on the Lord Chancellor’s Security and Intelligence Instrument, , 


17. National Archives, ‘20 Year Rule: Record Transfer Report’, , accessed 30 September 2014. 

18. Quoted in the Guardian: ‘I am one of many historians who has benefited from using 

the British archives and who had confidence that the documents had not been 

weeded to suit particular interests. Now I am wondering whether I will have to go 

back and rethink my work on such matters as the outbreak of the First World War or 

the peace conference at the end. But when are we going to get the complete records? 

So far the pace of transferring them is stately, to put it politely.’ Ian Cobain, ‘Academics 

Consider Legal Action to Force Foreign Office to Release Public Records’, Guardian, 

13 January 2014, , 

accessed 19 August 2014.

10 


administrative practice on the records of the public sector has not been 

examined in detail, however a detailed examination of the format and nature 

of the evidence presented to the Hutton Inquiry 19 is not positive. 20 In the 

evidence, the paper trail for a decision was no longer in a single Manila file; 

instead, the record was found in a blizzard of e-mails sent from person to 

person and stored on multiple computing systems. It would appear that the 

previously clear and unambiguous rules for the creation and management of 

information in the public services have been challenged. 

In July 2012, the government announced the transition towards releasing 

records when they are twenty years old, instead of thirty 21 (as has been the 

case since the amendment to the Public Records Act in 1967). 22 From 2013, 

two years’ worth of government records will be transferred to the National 

Archives through a ten-year transition period until a new ‘20-year rule’ is in 

place in 2023. The records covered by this transition are those from 1983 to 

2003, 23 coinciding with the time during which the most extreme aspects of 

the technical changes mentioned above took place. 

When examining the process of transfer described above, and considering 

the impact of the change to digital records, it is clear that all of the steps in 

the process need to be examined. Appraisal and selection, preparation and 

delivery, and accessioning will all present challenges to departments and 

the Archives but there are a number of mitigations, including the doctrine 

of macro appraisal and the recent developments in digital preservation at 

the National Archives. 24 It is the process of sensitivity review that generates 

the most significant challenges and where considerable work is needed to 

identify mitigations. 

Additional Threats 

The challenges of digital records to the process of sensitivity review are as 

follows: 

19. Lord Hutton, Report of the Inquiry into the Circumstances Surrounding the Death of Dr 

David Kelly C.M.G. [the Hutton Inquiry], HC 247 (London: The Stationery Office, 2004), 

, accessed 18 July 2014. 

20. Michael Moss, ‘The Hutton Inquiry, the President of Nigeria and What the Butler 

Hoped to See’, English Historical Review (Vol. 120, No. 487, June 2005), pp. 577–92, 

, accessed 19 August 2014. 

21. National Archives, ‘Government Confirms Transition to a 20-Year Rule Will Begin from 

2013’, 13 July 2012, , accessed 

18 July 2014. 

22. Public Records Act 1967, , accessed 

22 July 2014. 

23. Ibid. 

24. Tim Gollins, ‘Puting Parsimonuous Preservation into practice’, The National Archives, 

2012, , accessed 25 July 2014.

11 


• Volume and resources: Following advances in office technology during 

the late twentieth century, the consequent proliferation of information, 

and the broadening of the interest of the scholarly community, a much 

greater volume of material is being deemed worthy of preservation 

in the digital age. Against a background of budgetary constraint, the 

manual review of digitally born records is not practical 

• Complex context: Technology has challenged earlier clear and 

unambiguous rules for the creation and management of information. 

This situation will significantly complicate the process of digital 

sensitivity review, as understanding a record’s context (including its 

distribution) is crucial in assessing its sensitivity 

• Risk: These challenges for review also occur in a context of significantly 

increased risk. Although the consequences of mistaken disclosure 

have not changed with the advent of digital records, the probability of 

discovering a mistake has. It is hard to discover particular information 

in the paper world, in marked contrast to the digital environment 

where ubiquitous search engines index content rapidly. Risk-averse 

depositors may feel obliged to close large swathes of records if they 

cannot efficiently and effectively determine the sensitivity of each 

individual record with some clear degree of certainty. 

If sensitivity review of digitally born records is not practical, and against 

a background of budgetary constraint and increasing litigation, unless 

something is done large swaths of records will be closed in their entirety 

for long periods (up to 120 years in the case of some exemptions). Such 

precautionary closure (due to the costs or difficulty of review) is permissible 

under freedom of information legislation, but it will contradict citizens’ 

expectations of openness in a democratic society and will only serve to 

exacerbate the threat to trust in the Archives, as described above, and the 

subsequent threat to our security. 

Opportunities 

While digital records may challenge sensitivity review, and this may give rise 

to threats to our wider security, their very nature also offers opportunities to 

address those challenges and counter the threats. Some of the opportunities 

are as follows: 

• Some sensitivities are not subtle. They can relate to specific terms 

and thus an appropriately configured search system should be able 

to highlight them. For example, the records that related to the Al- 

Yamamah Contract, 25 although still available on the Campaign Against 

Arms Trade (CAAT) website, have been closed officially to prevent 

further damage to international relations. 

25. David Leigh and Rob Evans, ‘Secrets of al-Yamamah’, Guardian, [no date], , accessed 18 July 2014.

12 


• Consistency: by using electronic means, it is possible to drive some 

consistency across the review process. 

• Accurate estimation of residual risk: unlike in the review of paper 

records, it is possible to estimate the risk posed by reviewed records 

using the concept of technologically assisted digital review. 

• Exploitation of the Big Data aspects of digital records, coupled with the 

application of machine learning applied in the context of information 

retrieval technology, can result in patterns emerging that can inform 

reviewers of where to look. 

All of the above requires significant research, first to determine what the 

digital record looks like, and then to demonstrate the opportunities that can 

be derived. 

Conclusion 

Freedom of information does not relate solely to openness. There is a 

fundamental difference between openness (driven by what the state wants 

it citizens to see) and freedom of information, which proscribes the right of 

accessing information to the individual. 26 Freedom of information creates 

a balance between the public interest, the state interest and the personal 

interest based on human rights, all mediated and governed by the rule of 

law. 

Balance is crucial to achieving freedom of information alongside openness. 

Limits on openness are necessary for reasons of national security (for 

example, the location of Britain’s nuclear weapons should not be revealed, 

nor should their targeting information). Individuals also need to be protected 

from harm, and this has to be done through some limits on public access 

to information. However, the ability to hold the executive to account under 

the rule of law and in the court of history is also central to the security of 

a modern democratic society. This can only be achieved through open and 

transparent access to the records of government. 

How these challenges play out in the digital age of Big Data requires significant 

research, in order to gain a better understanding of how public records have 

changed and thus how they can be sensitivity reviewed and appropriately 

archived. 

Tim Gollins is currently an Honorary Research Fellow in the School of Computing 

Science at Glasgow University, working on the technically assisted sensitivity 

review of digital public records while on secondment from the National 

26. S Curtis, ‘Information Commissioner, “Open data is no substitute for freedom of 

information”’, Daily Telegraph, 29 October 2013, , 

accessed 29 July 2014.

13 


Archives. Tim started his career in the UK civil service in 1987 and joined 

the National Archives in April 2008 to lead the delivery and procurement 

workstream of the Digital Continuity Project. Tim was part of the team that 

developed the National Archives business information architecture and 

helped to initiate work on the new Discovery system to enable users to find 

and access the records held at the National Archives. He has recently worked 

on the design and implementation of a new digital-records infrastructure at 

the National Archives, which embodies the new parsimonious preservation 

approach he developed. Tim is a Director of the Digital Preservation Coalition 

and a member of the University of Sheffield I-School’s Advisory Panel.

II. Trends in Big Data: Key Challenges for Skills 

Harvey Lewis 

This paper will explain how Deloitte, one of the largest professional services 

networks in the world, has used Big Data both internally and in the services 

it provides to clients. The paper will address three main points: the role of 

Big Data at Deloitte, the challenges that exist surrounding the competency 

and basic skill sets of staff working with these data, and the current trends 

in Big Data and how they impact methodologies and practices. A particular 

area of growing interest is open data 1 – data that can be freely used, reused 

and distributed by anyone, subject only, at most, to the requirement to 

attribute and share alike 2 – which is providing a new source of resources for 

organisations in the public and private sectors. This paper will examine the 

impact this is having on ethics, responsibility and business efficiency within 

companies and governments, and amongst civilians. 

What is Big Data? The widely cited and accepted 2001 GARTNER 3 definition 

lists the three Vs: volume, velocity and variety. In Deloitte, the term is often 

used to describe data that are too rich or complex to analyse well in a 

spreadsheet and without concepts from university-level statistics. Deloitte 

has used Big Data in a number of ways. For example, as Hurricane Irene was 

bearing down on the US in 2011, Deloitte helped one large US retailer to 

combine information about curfews and road closures culled from social 

media with storm maps from the National Weather Service and GPS data 

from its own trucks to prepare for the storm’s impact on operations and 

devise a logistics strategy for response and recovery. 

Despite some reservations about the reliability and usefulness of social 

media, it is nevertheless proving to be a useful additional source of data for 

providing insights. For example, it can be used to identify instances of foodborne 

illnesses or other public health incidents, helping officials to work 

backwards from the spread portrayed in social media to the retail location 

and the distributor, and so on, as recently illustrated in analysis Deloitte 

performed on an outbreak of pet-food poisoning in the US. The social 

networks embodied in social media also provide useful clues about influence, 

which has also been investigated by mapping physicians understood to be 

1. Deloitte, ‘Open Data: Driving Growth, Ingenuity and Innovation’, June 2012, 


2. Open Data Handbook, ‘What is Open Data?’, , 


3. Douglas Laney, ‘3D Data Management: Controlling Data Volume, Velocity and Variety’, 

Gartner blog, 2001, , accessed 20 

August 2014.

15 


exceptionally influential in pharmaceutical networks. These sorts of projects 

have considerable reach across to security and resilience areas. 

Skills Challenges for Security and Resilience 

From a security and resilience perspective, the first challenge that needs 

to be addressed is over-reliance on technology, which can be manifested in 

three ways. The assumption that technology has to come first exemplifies 

how the lure of Big Data is driving bad decision-making. Second, upstream 

technology choice dominates downstream activities, which is very important 

– particularly from a public-sector procurement perspective. The range of 

different technologies and the choices that can be made early in programmes 

may significantly influence what is able to happen downstream. Furthermore, 

the infrastructure associated with educating and skill-based learning 

techniques is scarce; there are far too many ‘car drivers’ in Big Data and 

not enough ‘mechanics’. There are not enough people who understand the 

algorithms, logic and computer science behind the platforms they use that 

will allow them to be more innovative and creative in devising new solutions. 

The third challenge that exists is particularly acute for security and resilience: 

namely, that of ensuring that no stone is left unturned while identifying and 

extracting the useful data from the pile. The problem with Big Data is that the 

availability and ease with which these data can be stored inadvertently leads 

to a desire to collect everything that can be collected. From a security and 

resilience perspective, it is paramount that analysts are trained to identify 

necessary information and to be very selective about the data they collect, 

process and analyse. More data does not always equal more information. 

For example, to build a model that predicts the outcome of a coin toss, one 

can store either all the outcomes from individual coin tosses, or simply the 

total number of tosses and the number of times the coin came up heads. In 

the first instance, every piece of data is captured but it provides no further 

insight than can be extracted from the second instance. 

On the other hand, a different problem exists in data selection. For 

example, in the Second World War, the UK’s Bomber Command performed 

a comprehensive survey of anti-aircraft weapons damage on its bomber 

fleet and recommended that armour be placed in those areas most 

susceptible to damage. The problem was that the sample of bombers 

surveyed was biased. It did not include the bombers that had not returned, 

which may well have shown additional areas of damage, which were not 

being factored into the analysis. Ultimately, this flaw was detected by 

the newly formed Operations Research Group, which recommended that 

armour be placed in areas showing least signs of damage. This reiterates 

the challenges addressed by Alex Gammerman and Jennifer Cole in the 

introduction: when data are available at very high volumes and rates, the 

problem is how to pick out the data that are actually needed (or, in the

16 


case of Bomber Command, to realise that the most valuable data relate to 

what is missing, not what is in front of you). 

Analysts also need to guard against mistaking correlation for causality. Data 

alone are not sufficient to answer any question that might be thrown at them. 

This is a particular challenge for researchers and analysts when it comes to 

finding interesting insights that no one has discovered before. It is important 

to understand the root of the correlation, and to be able to assess whether 

or not it makes sense. For example, between 2006 and 2011, the murder 

rate in the US dropped from nearly 16,000 to just over 14,000 (a reduction of 

13.5 per cent), 4 and during the same period the market share of Microsoft’s 

Internet Explorer Internet browser also fell, from over 60 per cent to 20 per 

cent. 5 The two graphs showing these figures can be superimposed on top 

of one another, but this does not mean that as people became less likely to 

choose Internet Explorer as their preferred browser, they also became less 

likely to commit murder. There is correlation, but no causation. 

Finally, another significant challenge comes with mistaking the equations 

and models that analysts generate for insight. For example, the output of 

a regression model – the mathematical equation, as illustrated in Figure 1 

– is not the same as the insight that might gleaned when it is applied to a 

particular business domain or problem. 

Figure 1: Regression equation presented as business insight. 

y(x) = e (b0+b1x) 

1 + e (b0+b1x) 

b 0 

= 2.298057 

b 1 

= 30.023823 

The means by which insight is generated is not the insight and, in many 

cases, means nothing to anyone but the mathematicians who created it. It is 

in the interpretation and use of these models that value is derived, and this 

interpretation will depend on how data are visualised and the context into 

which the data fit. 

Future Trends in Big Data 

Deloitte has identified seven current trends in Big Data: identifying people 

with the right talent to do the right things; visualising data appropriately 

so that they can be easily understood; recognising the value of machine 

learning in interpreting and analysing data; developing better data discovery 

4. Wall Street Journal, ‘Murder in America’, , accessed 15 July 2014. 

5. W3 Schools, Browser Statistics and Trends, , accessed 15 July 2014.

17 


platforms; improving planning for how to get the most from data; improving 

the techniques for, and the use of, predictive data; and addressing the death 

of the data warehouse – an end to collecting and storing vast amounts of 

data for the sake of it. 

Those using Big Data need to know how to recognise and understand the 

opportunities it offers, based on the trends that can be seen for the next year 

and beyond. Companies need to be able to spot trends that are going to help 

reduce the cost and effort associated with processing complex data, or those 

that will improve the marginal returns from Big Data into something more 

significant – more signal, and less noise. 

In conclusion, the starting point with Big Data should be the objective or 

question that needs to be addressed. The data and technology are the 

means to the end; they are necessary but not sufficient. What do we want 

to do with the data? Where do we want them to take us? These are the 

questions that will drive innovation and creativity. Just because the data 

and the technology are available (for example, from social media) is there 

really any benefit from using them and how is this determined? Finally, just 

because the data are there, if collecting, processing and analysing them is 

not going be cost efficient, there is nothing wrong with looking at other ways 

of achieving the same end. 

Harvey Lewis is Research Director, Data and Analytics at Deloitte. Based in 

London, he leads a team of researchers and data scientists investigating 

Big Data, open data and trends in analytics. He also leads focused research 

projects for clients in both the public and private sectors. He has spent twenty 

years in the information technology industry, and specialises in analytics, 

cyber-security and national security. Harvey has authored numerous reports, 

white papers and blog posts. He is a frequent media commentator, and has 

contributed to many articles in the national and trade press. Harvey holds a 

BEng (Hons) and a PhD from Southampton University.

III: Big Data and Financial Transactions: Providing 

New Means of Analysis 

Gregory Mandoli 

A good government implies two things: first, fidelity to the object 

of government, which is the happiness of the people; secondly, a 

knowledge of the means by which that object can be best attained. 1 

The Importance of Adaptation 

During the course of its history, the US has been confronted with, and has 

responded to, incidents threatening its welfare. Regrettably, it often takes 

a crisis to catalyse a critical review of current affairs and the creation of 

new operational paradigms. The events of 9/11 illustrate this. Typically, 

programmes evolve slowly, and it is not until numerous injustices are tallied 

or a catastrophe hits that a major shift occurs. 

Much has been made about the intelligence failures that led to 9/11. After 

9/11, the US federal government was forced to recognise and seek a remedy 

to its lack of operational cohesiveness and to the lack of information sharing 

among federal, state and local agencies. Seemingly overnight, the regulatory, 

inspection, interdiction and investigative focus of government shifted and 

the global War on Terror began. 

Clearly, this new conflict impacted on the American psyche in a deep and visceral 

way. It also stirred cynicism, scrutiny and a public appetite for redressing the 

defects that exist in governmental administration. The surge of interest in 

strengthening public agencies prompted the passage of the Homeland Security 

Act (HLSA) of 2002 and the creation of the Department of Homeland Security 

(DHS), both of which aim to enhance the performance of all strata of government. 

In the immediate aftermath of 9/11, the HLSA and DHS seemed symbolic 

surrogates for the Twin Towers, though significant doses of confusion and 

dysfunction accompanied the rapid creation of a new department with its 

inaugural group of twenty-two agencies. The creation of DHS was similar 

to what the UK is experiencing today with the creation of the National 

Crime Agency (NCA) and the rebranding and realignment of the UK Border 

Agency into three distinct entities: the Border Police Command, Home Office 

Immigration Enforcement and UK Border Force. 

Homeland Security Investigations (HSI) is the principle investigative agency 

within DHS. It is the most unique law enforcement agency in the world because 

of its capability to investigate persons and property across borders and to 

1. James Madison, The Federalist (No. 62, 27 February 1788).

19 


pursue violators within the US or overseas. HSI also has special enforcement 

powers that enable it to conduct border searches at ports of entry – functional 

equivalents of the border and the extended border. It can also prosecute 

violators via criminal, civil and administrative judicial processes. 

As such, HSI brings together assets and capabilities that do not exist in any 

other agency. Originally identified as Immigration and Customs Enforcement, 

but rebranded as HSI in 2011, it merges elements from its legacy immigration 

and customs agencies into a globally oriented police force. HSI employs 

a ‘points of genesis’ investigative methodology that focuses on tackling 

transnational crime where it begins. The points of genesis approach allows 

threats to be attacked at inchoate stages, before they are fully formed and 

more threatening. This is more effective and efficient than waiting until 

the ‘point of commission’, when a crime is more developed and difficult to 

counter. This is a corollary of pre-emptive self-defence, a readily recognised 

precept in international law. 

As a result, HSI deploys agents worldwide to work with foreign police services. 

HSI works closely with many UK partners, but primarily the National Crime 

Agency, City of London Police, Metropolitan Police Service, Police Scotland 

and the Police Service of Northern Ireland. These bilateral engagements 

significantly augment capacity and render a return on investment for the 

respective agencies as well as the overall US law enforcement mission. 

The New Paradigm 

From an academic perspective, homeland security is a new and not fully 

defined discipline, though after more than a decade the concept has been 

widely accepted to encompass more than counter-terrorism. Clearly, other 

manmade or natural entities, movements and phenomena threaten our 

national security, such as illegal immigration, street gangs, illegal drugs and 

natural disasters, to name just a few. Thus, a reasonable interpretation of 

the concept mandates that homeland security must be comprehensive and 

address all hazards, though the criteria for what constitutes a hazard are not 

strictly defined, nor should they be. Unnecessary constraint is antithetical to 

a ‘light is right’ mentality that promotes strategic and operational fluidity in 

an environment where metastasis of modus operandi and criminal networks 

is rapid. 

For HSI the threats are clear. Its mission is to conduct criminal investigations 

against terrorist and other criminal organisations that threaten US national 

security and seek to exploit America’s legitimate trade, travel and financial 

systems – further information on how this is carried out is shown in Box 

1. 2 HSI will also support other DHS agencies in the response and recovery 

2. Immigrations and Customs Enforcement, Homeland Security Investigations, , accessed 17 July 2014.

20 


phases of natural disasters. The mission is purposefully broad and elastic to 

ensure focus and acuity against transnational criminal organisations that are 

globalised and poly-platform. 

The paradigm of a one-dimensional criminal enterprise, generically embodied 

in a Colombian cartel cocaine trafficking and bulk cash smuggling model, is 

both oversimplified and antiquated. Today, criminal organisations diversify 

their illicit platforms and become involved in myriad offences, including 

narcotics, intellectual property rights, human smuggling and trafficking, 

fraud and money laundering, to list just a few. 

Likewise, modern, internationally focused law enforcement agencies must 

be able to investigate a broad array of criminality in order to confront the 

poly-platform threats competently. Single or limited mission agencies 

have difficulty in this environment. In practice, they creep outside their 

missions. This then produces negative characteristics and results in deflated 

performance. 

Box 1: Value Transfer and Criminal Gain. 

HSI takes the attitude that, with the exception of criminals who have psychopathic 

tendencies, most are interested in making money or achieving some other 

commodity that has a measurable value. Within the wide range of crimes that 

are (or appear to be) perpetrated for money, a particular area of interest to HSI 

in the digital age is ‘value transfer’, which is a way to assess and identify money 

laundering. Value transfer focuses on the relative as well as the absolute value of 

transactions, and therefore often sheds light on money-laundering techniques, 

particularly where ‘dirty’ money may be phased through in modular increments 

rather than in single transactions. 

Value transfer can be physical (carrying bank notes or other forms of currency 

from one place to another); virtual (transferring credit through online banking 

systems); based on trust (often using Hawala-style transactions, as described 

below); or carried out via trade (buying or selling something for above or below 

its real market value). 

The easiest way to launder money is through straightforward cash smuggling, 

referred to as ‘bold cash’: a criminal has illegal proceeds from drug sales in 

the US that he wants to take back to Mexico, so he tapes the notes to himself 

and smuggles them across the border, where he then spends them. This is the 

simplest example: he might also swallow his money, or drive it across, but the 

value is in physical currency.

21 


Of course, the criminal could deposit the cash into bank accounts in the US and 

withdraw it in Mexico, but doing this through legitimate banking systems will 

tend to leave a trail that can be identified and followed. For example, money 

derived from drug sales in California may need to be moved to Yemen. The 

value could be transferred by putting all of the illegal proceeds into various bank 

accounts, but high-value, unusual transactions (such as those paid in cash) are 

likely to be picked up and questioned by banking systems. When money starts 

to pass through conventional money services, the intersections can be seen: 

where the money was put into the bank, how its value was transferred and 

where it was transferred to (for example, by wire or by sending blank cheques 

to a partner in Sana’a), can help law enforcement agencies to identify and trace 

the criminal actors. 

For this reason, criminal transactions may be more likely to be moved thorough 

Hawala, an informal system of money brokers used in the Middle East, North 

Africa and the Horn of Africa, which does not transfer currency, but receives 

money in one country and makes a loan to someone in another on the basis that 

the loan will eventually be repaid. Hawala tends not to leave a digital footprint, 

which can be critical with criminal transactions. 

Another option is informal value transfer, which builds on the techniques used 

in bold cash smuggling. One way around the challenge of the interface with 

conventional money services is trade: if a criminal can take value, put it into 

a commodity and get it to where he needs it to go, this can provide a more 

sophisticated way of transferring the value. This is more complex and also more 

covert. For example, drugs might come into the US from Colombia, with a gross 

profit from sale of the drugs in the US of $1 million in cash. The drug baron is 

not in the US, however, he is in Bogota in Colombia and he wants to get the cash 

out of the US. One way to do this would be to find a corrupt jeweller in New 

York and buy $1 million worth of gold from him, melt it down, cast it as nuts and 

bolts, dye it or plate it to make it appear to be a much cheaper metal, export 

these nuts and bolts to Colombia and, once it arrives, melt it down and resell 

it as gold, at its real value. Or the criminal might be even smarter and melt the 

nuts and bolts down into gold, then ship this back to the same corrupt jeweller 

in New York, who can then legitimately wire transfer the value of the gold. HSI, 

as a customs as well as a law enforcement agency, is able to look at Big Data 

from trade transactions to analyse any unusual movements of goods or trends 

that might suggest that this kind of activity is going on. 

Evolution of Money-Laundering Techniques 

As we move into the digital age, value transfer transactions are becoming 

more sophisticated, particularly with the evolution of crypto-currency and 

cryptography. Money laundering is an extremely dynamic activity, which law 

enforcement has to keep on top of. It is exceptionally fast-paced: situations

22 


change very fast. There is a definite evolution of comparisons and differentiations 

between bitcoin, virtual currencies and crypto-currency for example, and these 

must be taken seriously: they are value transfer mechanisms with real value 

being traded. This does not exist in a Ponzi Scheme or other type of pyramid 

scheme where the ‘real’ value is completely ethereal. 

Crypto-currency has been coupled with the Darknet – ‘hidden’ or private 

networks on the Internet that can only be accessed by those who are invited to 

make connections, and are generally associated with illegal or dissident activity. 

The Silk Road in particular has been discussed frequently in recent months; it is 

one of many sites operating on a TOR (previously The Onion Router) Network, 

which conceals a user’s location from anyone conducting surveillance. This is 

attractive to criminals because it offers anonymity and decentralised control. 

The flow of value through the Darknet – the buyer and seller value – on the 

Silk Road is something that needs to be looked into, as it underlies how cryptocurrency 

can flow in transactions. 

There is no reason why a criminal could not take that flow to supplant the transfer 

of value seen in the trade transparency route. Why would he need to melt down 

gold into nuts and bolts if he could just use digital currency to transfer value? It 

is really a matter of acceptance – and when is the tipping point going to come 

when this form of value transfer becomes more acceptable? If one looks back at 

currency as a symbolic form of value, the same thing is happening with digital 

currencies. The US dollar is backed by the US government, which guarantees it 

and gives people confidence in its value. At present, that confidence does not 

exist in the crypto-world but it may come: in a service-based world, if a criminal 

could transfer the commodity or the service, using cryptography, it would give 

them autonomy and anonymity (no government scrutiny or taxes to pay). Would 

that be something that would interest the criminal fraternity? Probably, and it 

is possible to do. 

Big Data: Privacy and Consistency 

Anomalies in trade data can often be the most useful way to find out what Big 

Data is useful to HSI. If someone is seen to be exporting gold from the US or 

Colombia, this can be flagged up as an anomaly. Normally coffee or sugar is 

seen along this route; it should be asked why large amounts of gold should be 

exported from areas that do not have gold mines. The HSI works in conjunction 

with other law enforcement agencies, customs and border protection, and with 

the State Department, Department of the Treasury and so on, to share and 

analyse this sort of information. 

Looking to the Future 

The future of money laundering throws up some very tricky challenges. Anything 

that can be mathematically defined can be transferred into the Darknet. If

23 


criminals come up with a conspiracy to commit malicious acts, what is stopping 

others from transferring them the funds that will enable them to carry them out? 

Who is going to see it and how difficult is that going to be for law enforcement? 

What does it mean to nation states if criminals don’t need to rely on conventional 

currency or legitimate banks to guarantee transactions? Furthermore, what 

does this mean for the rule of law: if a criminal can enter into a contract with 

someone that law enforcement agencies can mathematically define, will a court 

even be needed to resolve the conflict and, if not, what effects will this have on 

justice in the future? Researching the implications of these possibilities now will 

help governments and their law enforcement agencies to better understand the 

challenges when – rather than if – they need to be faced. 

Interagency Working 

Crime-fighting techniques need to be continually assessed by objective 

performance measures so that best practices can be identified. Relevant 

performance measures include efficiency, effectiveness, capacity, 

responsiveness, trust and confidence. In the Big Data context, these 

performance measures need to be weighed against an organisation’s 

ability to collect, store, analyse and disseminate information internally and 

externally. Information sharing lies at the core of HSI’s ethos. 

When HSI was created, the challenge was how to ensure that the previously 

disparate law enforcement agencies from which it sprang were able to interact 

in a way that would not adversely affect performance. Similar challenges are 

being encountered in the UK with the creation of multiple new agencies, as 

noted above. Thus, it becomes critical to assess new policies, programmes 

and strategies against these performance measures. 

In the US, there are a number of different criminal investigative agencies 

at the federal level. Quantitatively, HSI, the Federal Bureau of Investigation 

(FBI), the Drug Enforcement Administration (DEA), the US Secret Service 

(USSS) and the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF) 

envelop most of this space. For many, HSI is less well-known, but it is the 

second largest agency in this class next to the FBI, which in the post-9/11 

environment is really a hybrid enforcement and domestic intelligence 

agency. Despite divergent missions, each of these agencies is empowered to 

investigate many of the same laws. Thus interoperability becomes essential, 

though parochial attitudes and proprietary interests still exist. 

The best way of achieving optimal performance is to ensure that separate 

agencies work closely, and well, together. A lack of coherent integration will 

degrade the ability to share information and lead to negative outcomes. 

Based on professional experience and conversations with prosecutors;

24 


defence attorneys; military personnel; local, state and federal police officers; 

and criminal defendants, five characteristics can be identified that are present 

in, and adversely impact, the investigative capabilities of law enforcement. 

Certainly, other characteristics may exist, but these appear dominant. These 

five characteristics are defined as follows: 

1. Interagency conflict: the real or perceived incongruity of agencies’ 

interest that detrimentally affects the performance of one or both. 

Such conflict can materialise in different forms, such as inter-agency 

rivalry, mistrust or malfeasance; one example of this can be seen in 

the law enforcement context, when, during a joint operation, the 

participating agencies vie for control or credit for an action or case 

2. Redundancy: the duplication or repetition of action; one example of 

this can be seen in the law enforcement context, when two or more 

agencies participate in an investigation and unnecessarily perform 

the same or similar tasks 

3. Data fragmentation: the collection and segregation of information 

in a way that prevents its sharing; one example of this can be seen 

in the law enforcement context, when one agency has information 

regarding a suspect that may be of value to another agency and does 

not or cannot provide access to the information or make the other 

party aware of the information 

4. Jurisdictional foreclosure: the inability to enforce a law because of 

lack of authority or resources 

5. Violation of civil rights: the deprivation of rights belonging to an 

individual, including civil liberties, due process, equal protection 

of the laws, and freedom from discrimination through an act or 

omission to act by law enforcement. 

In relation to Big Data, the fragmentation of data becomes the universal issue. 

The other characteristics, primarily civil rights violations, can be present, but to 

a lesser extent. The centralisation of data enhances the work product derived 

from the collection, storage, analysis and dissemination cycle. A de facto or 

de jure centralised command structure is needed to foster the integration of 

the disparate elements. It is not wise to decentralise operations into small 

autonomous units because they will become unco-ordinated and perform at a 

less than optimal or ‘dysfunctional’ level when compared with the centralised 

model. Recognition of interdependence becomes paramount. 

In the context of this forum, centralisation creates an economy of scale 

and management scheme for Big Data. So what becomes crucial is not the 

performance of entities per se, but the construct employed to collect, store, 

analyse and disseminate information in a manner that will generate synergy. 

HSI has recognised this and participates actively in multiple ‘data crunching’ 

fora.

25 


Directly on point is HSI’s Trade and Transparency Program. Under this 

initiative, HSI works jointly with customs agencies worldwide to share and 

analyse trade data for anomalies. 

The HSI established the Trade Transparency Unit to conduct ongoing analysis 

of trade data provided through partnerships with other countries’ trade 

transparency units. One of the most effective ways to identify instances 

and patterns of trade-based money laundering is through the exchange 

and subsequent analysis of trade data for anomalies that would only be 

apparent by examining both sides of a trade transaction. The unit is formed 

when the US and any of its trading partners agree to exchange trade data 

for the purpose of comparison and analysis. Using state-of-the-art software 

and proven investigative techniques, the unit can easily identify previously 

invisible trade-based alternative remittance systems and customs fraud. 3 

To facilitate the creation and management of Big Data, agencies need to 

integrate in some way. This integration can occur at different levels, consisting 

of recognition, co-ordination, collaboration, community, consolidation and 

merger. Any one of these is better than nothing and, realistically, community 

is as far as most agencies can go without legislative intervention. These 

phases are defined as follows: 

• Recognition: the confirmation of existence which occurs in the Big 

Data context when one agency acknowledges that another agency has 

the authority to perform a particular act and has relevant information 

that the other agency may or may not have 

• Co-ordination: the act of confirming concurrent jurisdiction and 

agreeing to separate areas of enforcement to reduce redundancy, but 

agreeing to respond to requests for information 

• Collaboration: the act of working together in a joint operation 

and sharing information, but not granting open access – agency 

participation in a task force or a memorandum of understanding 

being examples of this 

• Community: the act or process of openly sharing resources or 

information among several entities with some restriction 

• Consolidation: the act or process of sharing information without 

restriction 

• Merger: the fusion of disparate entities into a single entity. 

It is important to note that this type of integrative scheme, regardless of the 

level, elicits claims of privacy invasions and civil-rights violations. Perhaps 

ironically, the consolidation of data or the creation of Big Data can actually 

3. ICE, Trade Transparency Unit, , accessed 18 

July 2014.

26 


minimise incursions into personal privacy. Counterintuitively, this shrinks 

government and mitigates violations. 

In the US, when the Founding Fathers were looking at how to configure 

the new union, the initial scheme was decentralised, or ‘anti-federalist’ 

in the parlance of the era. This resulted in the drafting of the Articles of 

Confederation and the creation of thirteen disparate, EU-style states. This 

scheme ultimately failed, mostly because individual state interests trumped 

the collective good. More on point was that it led to the institution of thirteen 

different policies and programmes. Today, there are fifty states and if the 

US was still decentralised, there would be fifty different, and potentially 

incongruous, regulatory programmes. A positive derivative of integration is 

that it organises the collection, storage, analysis and dissemination of Big 

Data, thus making the work product more effective, but the process more 

efficient, capable, responsive and trustworthy. Therefore, privacy and civil 

rights infringements are not inherent to Big Data schemes. Ideally, the other 

negative characteristics should be minimised as well. 

The need for a shared vision and for unified approaches is even more 

apparent in the digital age, where information is exponentially propagated 

with each passing day. Couple this with advances in web technology whereby 

users can remain anonymous, or at least pseudonymous. Currently, the 

Darknet and the use of cryptographic algorithms present emerging threats 

and create a new dimension for traditional criminal enterprises. The joint 

investigation into the online black market site Silk Road, headed by HSI and 

the FBI, provides a good example of this. 

HSI recognises this and that there will be future, presently unconceived, 

advances in criminal practice. Such spectres are daunting, but not intimidating 

or insurmountable when the infrastructure and partnerships to confront 

them already exist. This is the message of HSI. 

Greg Mandoli is a special agent with the Department of Homeland Security’s 

Homeland Security Investigations (HSI) and is currently assigned to the US 

Embassy in London. His related professional activities include eight years 

as an army reservist in the Judge Advocate General Corps and positions at 

the University of Maryland as a Course Developer and Adjunct Associate 

professor. In 2006, Greg became the first HSI agent to graduate from the 

Naval Postgraduate’s Master of Arts programme in Homeland Defense and 

Security. In 1994, Greg graduated from Golden Gate University School of 

Law with the recognition of a public-interest law scholar. Before becoming a 

special agent, Greg practised law as a Deputy Public Defender in California 

where he handled felony matters involving homicide, three strikes, gang and 

drug offences.

27 


This paper is a summary of topics presented by Homeland Security 

Investigations Special Agent and University of Maryland Adjunct Associate 

Professor Gregory Mandoli at the RUSI/STFC event ‘Big Data for Security 

and Resilence: Challenges and Opportunities for the Next Generation of 

Policymakers’. The paper represents his personal viewpoints and is partially 

based on previously authored materials.

IV. Characteristics of Terrorist Finance Networks: 

The Human Element 

Neil Bennett 

In Chapter III, Gregory Mandoli writes about Big Data and financial 

transactions, and this paper will return to the subject while taking a slightly 

different perspective – focusing on the benefits of linking money flows to 

human behaviour and human activity. This will show how analysis of data 

by social scientists as well as data analysts can support the identification of 

important individuals within a network. 

The aim of this conference has been to understand how academics, 

researchers and policy-makers can utilise Big Data. As noted, definitions of 

big data commonly refer to the four Vs: volume, variety, velocity and veracity. 1 

This paper will attempt to give a perspective from the operational, end-user 

requirement: that a wide variety of data are available in ever-increasing 

volumes. This presents challenges as to how those data are stored, by whom 

they are analysed, and why. The paper will describe operational challenges 

faced by law enforcement and defence, focusing on the operational 

opportunities and outputs. 

The end goal of data analysis is improved efficiency. Efficiency is effectiveness 

driven by an exploitation path towards the operational outcomes and, in turn, 

towards end use. Financial intelligence and terrorist finance can be used as the 

lens through which this process is viewed. Why finance? Law enforcement, 

defence, the UK government as a whole, as well as other governments around 

the world, see financial interventions as interventions of first choice in the 

fight against international crime. A perfect example of this is the recent case 

involving Ukraine, 2 in which the UK, together with the EU, imposed restrictive 

measures – financial sanctions – upon eighteen (later extended to twentytwo) 

Ukrainian former regime members for misappropriation of Ukrainian 

state funds. The sanctions prevented the politicians from accessing assets 

or funds held by European financial institutions, a significant move as many 

Ukrainian and Russian politicians hold money in accounts in Luxembourg 

and the Netherlands in particular. 

This paper will focus specifically on the alternative remittance system Hawala. 

It is worth bearing in mind here that the system of banking we recognise as 

1. IBM, ‘The FOUR V’s of Big Data’, , 


2. HM Treasury, ‘Financial Sanctions, Ukraine (Misappropriation and Human Rights)’, 

15 April 2014, , 


29 


official today was first established in the 1700s, and at its oldest can only 

be traced back as far as the financial institutions of Italy in the fourteen 

century. As Hawala arose in the early Medieval period, which system should 

be categorised as the ‘alternative’ remittance system is open to debate. 

There is nothing wrong with Hawala. It is the abuse of Hawala, not the system 

itself, that causes issues: those using Hawala systems are not automatically 

guilty. Therefore, the question to ask is how the manipulation of data obtained 

from Hawala transactions can assist policy-makers and law enforcement in 

making the right decisions about the right people and the right entities to go 

for. This requires a number of different issues to be considered: behaviour, 

attitude, language, customs, values, beliefs, influence, institutions, power – 

political, economic, legal – social structures, clan tribe and ideology. These 

issues are critical to understanding the situation in areas of the developing 

world in which law enforcement or defence is expected to operate, often 

in support of the Foreign Office, DfID and the Stabilisation Unit. If that 

understanding is not in place from the beginning, the decision-making 

process may be flawed. Complex human and cultural dimensions play a 

large part in any decision-making cycle, and there are both inter- and intradependencies 

between these and the cultural, institutional, technological 

and physical environment. 

Understanding Trust in Networks 

Individuals who are involved with and control money or value are extremely 

highly trusted within a network, but how can these levels of trust be assessed, 

identified and quantified? A Hawala transaction may move across South 

Asia, through Afghanistan, Pakistan, Iran and the Gulf States. Each stage of 

the transaction will include different languages, currencies and methods 

of communication, including fax, e-mail, mobile-phone calls and Internet 

communications. The human activity taking place within those environments 

is incredibly complex even before the Hawala aspect is considered, and 

comprises unstructured information and inherent knowledge as well as 

data. Understanding this complexity is critical to understanding the decisionmaking 

cycle. What information is in the ledgers and what does this actually 

mean – all the while bearing in mind that the information may be on paper, 

rather than in electronic format? 

Another key question to ask is why we look at money, and in what context 

we look at it. It is important here to understand two particular elements. 

First, there is the threat environment, which is a combination of interacting 

elements. We need to understand the systematic dimension of the threat 

from cradle to grave, all along the line of process, during which there will be 

different data in different formats: structured, unstructured, paper or digital. 

In order to understand the system, all those different inputs need to be 

understood and made sense of. Ultimately, this will require an enterprise of

30 


transnational co-operation: one analyst, or even one organisation, cannot do 

everything and therefore an approach is needed that allows the generation 

of best impact and best effort. 

One of the ways in which this is done is by breaking the process out into 

functions, understanding what the vulnerabilities are and understanding 

what actions are needed to generate effect against the vulnerabilities that 

have been identified as critical within the system or enterprise. This is only 

possible if a huge variety of data can be understood, which in turn requires 

data to be taken in from all of the different environments. 

The critical issues here are impact and benefit analysis: whether those data 

can be used to predict what may occur, and what particular action should be 

taken against them. This requires a huge range of hard and soft factors to be 

considered, including: 

• Social media 

• High performance analytics 

• Sociology 

• Link and entity extraction 

• Natural language processing 

• Anthropology 

• Semantic search 

• Node disambiguation 

• Graph databasing 

• Pattern and prediction 

• Visualisation 

• Fuzzy link analysis 

• Machine learning 

• Psychology 

• Web science 

• Predictive modelling 

• Linguistic analysis in microblogs. 

Industry and academia are approaching the challenges from all of these 

angles, and are working on ways to ensure that they work more coherently 

together. Sociology, anthropology and psychology are in italics in the above 

list as three examples that academics working in the area of machine learning 

in particular have indicated that they do not always consider. They feel that 

they would be well served by a better understanding of how their work could 

benefit from or impact on some of these disciplines – in particular psychology. 

Certainly in the case of link entity extraction (which extracts key entities such 

as names, locations, terms and dates and links them together), language 

processing and semantic search, they would benefit from considering the 

issues again from human, cultural and behavioural perspectives. How do

31 


data analysts really understand what is going on in those environments, 

so that the right decisions can be made based on the right interpretation 

of information? Without this, bad decisions may be made that deliver an 

inappropriate and even damaging intervention, be it tactical, operational 

or strategic, because those making the decision did not understand human 

behaviour. 

There is a huge amount of powerful, fascinating work that can done with 

large, structured data sets, but the real challenge is the high performance 

analytics that will support it also getting faster. The ability to do this mainly 

depends on how the unstructured feeds can be integrated, be these from 

social media or from the human characteristics identified by psychology, 

anthropology and sociology that actually allow the development of a 

holistic perspective of predictive modelling. Finance is a good way of trying 

to understand certain characteristics of human behaviour, and so, from a 

research perspective would be a good place to start building up a better 

understanding of not only the finance networks themselves, but also the 

terrorist and criminal networks that sit behind them. 

Neil Bennett has moved on from his role since this conference, and RUSI 

has not been able to contact him to approve this paper for publication. We 

therefore apologise for any errors it may contain, and stress that these are 

the responsibility of the editorial process, not the speaker.

V: Terrorism and Political Risk Modelling 

Mark Lynch 

This paper will discuss the insurance industry’s assessment and dealings with 

Big Data, using the specific example of risk modelling around the threat and 

likely impacts of political violence. It will provide a brief overview of how the 

insurance industry approaches the challenge of political violence, including 

how analytics have started to become a far more dominant component of 

this, and will show how Big Data is starting to filter in. It will then explore 

some of the challenges that are starting to emerge. 

The approach of the insurance industry to political violence is a particularly 

interesting example to consider as it resembles a microcosm of how business 

in general has dealt with Big Data, and also because the lessons learned by the 

insurance industry have a lot to offer other sectors with regard to resilience. 

Without insurance, the impact of a terrorist attack or widespread political 

violence would be greatly amplified. Terrorist violence can damage health, 

property and vehicles; result in interruption to businesses; and require 

compensation payments to those affected. Insurance is integral to enabling 

the reconstruction of buildings after attacks, facilitating payments to the 

families of those killed and seriously injured, and ensuring that victims are 

able to access disability benefits and other services as quickly as possible. 

As a result, the insurance market provides a fundamental component 

of resilience in an increasingly interconnected world. 1 Furthermore, the 

insurance industry holds a lot of Big Data, as it is very useful to the market 

to understand the composition of claims and the spread of insurance, and 

to identify indices that would establish whether an individual is more likely 

to be of a higher risk of losses (not exclusively tied to terrorism insurance). 

For example, it can be used to look at which areas were not insured or what 

claims were made for post-traumatic stress disorder following a terrorist 

attack. These aspects could be extremely useful in helping to make future 

resilience assessments, as they can help to highlight where vulnerabilities 

are more likely to occur. Such data held by the insurance industry could 

provide a rich vein of information to academic, medical and governmental 

organisations if greater interaction was prioritised. 

What Constitutes Political Violence? 

The insurance industry has very specific terminology and definitions of 

what constitutes political violence. First, it segregates political violence into 

three components that can be insured individually or in conjunction with 

1. Claudia Aradau and Rens van Munster, ‘Insuring Terrorism, Assuring Subjects, Ensuring 

Normality: The Politics of Risk after 9/11’, Alternatives: Global, Local, Political (Vol. 33, 

No. 2, 2008), pp. 191–210.

33 


others: terrorism or sabotage; strikes, riots and civil commotion; and war 

on land. This scale becomes very important when writing exemptions and 

incorporating this into recovery. Each area offers a distinct challenge to the 

insurance industry as a result. 

Political violence affects multiple business lines that are vital for the insurance 

industry. For example, if there was a blast at a theatre in London, the loss of 

revenue in the subsequent two years as a result of people not wanting to go 

to the theatre out of fear could have a massive effect. The market calls this 

contingency insurance and as a result the insurance market can be obliged to 

pick up the losses. Indeed, a number of studies have identified the material 

effect of terrorism on the tourism industry as international travellers are more 

likely to avoid perceived high-risk areas. 2 Similarly, business interruption 

has proven to be a key driving force for terrorism losses for the insurance 

industry and was the driving force behind the losses the insurance industry 

suffered in the wake of 9/11. 3 The sheer multitude of claims that were paid 

by the insurance sector following the 9/11 attacks highlights just how many 

disparate areas terrorism can affect within the insurance market. 

The overarching factor is that there are many unknowns and therefore 

many risks can occur that can have a harmful effect on the market. This is a 

particular challenge at present, when the insurance market is trying to expand 

into emerging markets such as Asia and Africa, where knowledge levels 

on these kinds of risks are very limited. Ironically, as the insurance market 

penetrates further into emerging markets, it needs to be able to calculate 

its own insurance policies, based on an understanding of the risks likely to 

be encountered. Indeed, among the fastest growing insurance markets in 

the world, seventeen out of twenty have suffered from either a sustained 

terrorism threat or from intensive rioting or civil commotion over the last 

ten to fifteen years. 4 Therefore, as insurance markets grow, the emphasis on 

understanding this risk will grow significantly. 

New Approaches to Risk 

The greatest driving catalyst for the insurance industry in approaching these 

challenges was the 1993 Bishopsgate bomb attack by the IRA. That blast, 

which killed only one person but caused about £3-billion-worth of damage, 

almost crippled the whole sector. 5 Prior to this disaster, the sector had used 

2. Sevil Sonmez and Alan R Graefe, ‘Influence of Terrorism Risk on Foreign Tourism 

Decisions’, Annals of Tourism Research (Vol. 25, No. 1, 1998), pp. 112–44. 

3. Dixon, Lloyd and Kaganoff. 

4. Ernest and Young, Waves of Change: The Shifting Insurance Landscape in Rapid-Growth Markets, 

2014, , 


5. Andrew Silke, ‘Understanding Terrorism Target Selection’, in A Richards, P Fussey and 

A Silke (eds), Terrorism and the Olympics: Major Event Security and Lessons for the 

Future (London: Routledge, 2010), pp. 49–66.

34 


very limited analytical capabilities to quantify the threat: companies bought 

insurance without a keen understanding of the accumulations of risk that 

were developing. This is because terrorism operates in a unique manner 

compared with traditional perils such as earthquakes, floods and hurricanes 

that the market is used to dealing with regularly. 6 Terrorism is an intensive, 

highly localised threat that requires a keen understanding of the proximity of 

risks to each other and the epicentre of a given blast, which is something that 

was quite new to the insurance market. As a result, a number of companies 

retained large clusters of policies around central London, and following the 

Bishopsgate bomb attack many were unable to cover the losses stemming 

from it. This was exacerbated by 9/11, which cost the industry £22 billion, a 

figure which is only going to grow owing to further claims for debris inhalation 

and continuing incapacity and post-traumatic stress disorder claims. 7 This is 

a severe issue for the insurance market. 

The Impact on Society 

Any attack is terrible, however the misery stemming from an attack can 

be compounded greatly if there is no financial restitution for individuals 

to cover medical bills or the reconstruction of their businesses or homes. 8 

If the insurance market thinks political risk and terrorism is too risky, it 

will not offer insurance against it or it will put exemptions into insurance 

policies so that those affected will have to pay for damages themselves. 

Such exemptions may cover certain types of incidents or certain areas that 

are seen as being at high risk, or providing cover for these examples may 

push up premiums considerably. Insurance companies see this frequently 

with the issue of chemical and biological weapons: because there are various 

unknown factors in this field, insurance companies are reluctant to include 

this within their coverage as there are too many uncertainties associated 

with such an attack. The resilience challenge is significant, as the lack of 

available insurance may be more to do with a lack of analytical capabilities 

to assess these risks properly than a genuine inability to calculate likely risks 

and their impacts. 

As a result, it is clear that better interaction with the insurance sector is key 

to providing a holistic approach to resilience. Government agencies should 

seek to avoid a situation comparable to earthquake cover in California, 

where market penetration has historically been extremely low as the cost of 

insurance is prohibitively high for most people and insurers are reluctant to 

6. H Kunreuther, E. Michel-Kerjan and B. Porter, Assessing, Managing, and Financing 

Extreme Events: Dealing with Terrorism (Cambridge, MA: National Bureau of Economic 

Research, 2003). 

7. Gail Makinen, Economic Effects of 9/11: A Retrospective Assessment (New York, NY: 

DIANE Publishing, 2011). 

8. R Roth Jr, Earthquake Insurance Protection in California (Washington, DC: Joseph 

Henry Press, 1988).

35 


be over-exposed within the California area. In order to avoid this, a number 

of governments, including those of the UK, US and Germany, have provided 

at least partial state backstops against terrorist attacks. 9 However, it is only 

through better analysis and a grasp of Big Data that the insurance sector can 

truly feel comfortable with the risk of terrorism. 

Incorporating Big Data into Analysis 

In previous years, risk analysis could often be a case of underwriters 

assessing the risk based on a preconceived understanding of political unrest 

following an extremely rudimentary analysis. However, over the last twenty 

years this has changed considerably. Insurance companies have started 

to hire scientists, statisticians and security experts who are able to better 

incorporate data into analysis. The risks from natural hazards, such as 

hurricanes or earthquakes, are now well understood by the market, and the 

understanding of political risk needs to reach similar levels in the future. 

Political risk is not only a less well-understood subject area, but there are 

also many more variables involved as it is a more qualitative subject. In order 

to incorporate analysis into political risk calculation, it is necessary to delve 

into the historical records as well as simply looking at present-day data. 

For example, Aon Benfield’s 2014 Interactive Political Risk Map shows how 

different countries’ histories of rioting and civil commotion over the past ten 

to fifteen years can be mapped out and analysed. 10 Analysts need to be able 

to identify different patterns for different regions, and subsequently provide 

this information to the insurance industry in order to flag up certain areas 

that are at greater risk than others. It is very easy to look back and decipher 

the risks at certain points in history, but the key progression would be to look 

forward. A very important part of this research is trying to pull data for GDP 

statistics and mortality for regions, and to see how those variables fit with 

incidences of political violence. Indeed, a number of statistical studies have 

shown that key identifiers such as unemployment and, in particular, infant 

mortality can be keen indicators of political unrest, particularly if there is a 

significant statistical switch. 11 Furthermore, historical analysis is extremely 

useful to the analysis of terrorism modelling, another area where there is 

much variation and uncertainty, particularly if plots and failed attacks are 

included. 

9. Alfonso Najera, Terrorism Coverage Schemes: A Comparative Table, 2011, , accessed 12 July 2014. 

10. Aon Risk Solutions, ‘Aon’s 2014 Interactive Political Risk Map’, 2014, , accessed 14 July 2014. 

11. J A Goldstone et al., ‘A Global Model for Forecasting Political Instability’, American 

Journal of Political Science (Vol. 54, No. 1, 2010), pp. 190–208.

36 


Extracting Useful Data and Avoiding Biases in Analysis 

Good data do exist if analysts are fully aware of the advantages they offer 

and how to use them. The Global Terrorism Database 12 and the RAND 

Corporation 13 are two examples of highly respected organisations that take 

data accumulation and analysis seriously: both are heavyweight analysts of 

terrorism history and can provide a wealth of statistical data for the security 

and insurance sector. 

However, some of the challenges the insurance industry faces involve more 

quantitative data analysis methods. For example, there is a major issue 

surrounding the quality of data that the insurance companies themselves 

hold; they have vast amounts of data but not all of them are useful, 

particularly in emerging markets where geographical data are sparse. This 

makes a big difference when attempting to understand the risk; terrorism or 

even political violence is often a very localised threat, thus a correspondingly 

localised understanding is needed, from units of measurement to political 

parties to currency and financial transactions. Similarly, there are a 

number of internal constraints holding the industry back from providing a 

comprehensive understanding of the risk. Insurance companies do not share 

data with each other and as a result it is difficult to get a holistic picture of 

the degree of terrorism coverage, or indeed the types of clients taking up 

this cover. This is important as such analysis of what is covered, what sort of 

policies are in place and the size of the clients themselves have a material 

impact on the ability of a state to recover following a catastrophic event. 

Furthermore, there are many challenges regarding privacy, and governments 

or companies who are unwilling to provide sufficient data. Even where the 

data do exist, it is difficult to know how to sell them to the client to enable 

them to be used appropriately. 

Conclusions and Recommendations 

It is a great shame that insurance companies do not tend to have the 

means to look at these data analytically; if they did, their analytics would 

be substantially better, particularly those on loss history. There are a lot 

of data out there on the length of time people are off injured following a 

terrorist attack, the effect of post-traumatic stress disorder, the variation per 

country or the amount of time it takes for a business to recuperate after an 

attack. All of this would be extremely useful for the academic and scientific 

communities. The insurance industry would greatly benefit from the opening 

up of governments’ empirical data and greater involvement of government 

on the level of security clearance. The insurance sector is already heavily 

regulated on data security, owing to the sensitive financial data that it 

holds, so it would not be a giant leap to allow certain key representatives 

in the insurance industry clearance to access and dissect certain pieces 

12. Global Terrorism Database, , accessed 14 July 2014. 

13. RAND, , accessed 14 July 2014.

37 


of classified material. While there may be concern among the public that 

private companies can access sensitive data and that their private insurance 

data could be looked at by the security services, these fears are largely 

unfounded. Most data that the insurance industry have are aggregated to a 

policy level so personal data about individual names or addresses are usually 

unavailable. Similarly, as long as the security services can vet individuals 

accessing the data and keep the numbers down, additional access to secure 

material should not be a significant hindrance. 

As a result, the interaction between government and the academic and 

insurance sectors could be extremely rewarding. Each sector has processed 

a significant amount of data points and analysis that is simply unavailable 

to the other. Proving analytical information about the primary threat levels, 

changing dynamics and targeting analysis to the insurance sector would 

allow the market to avoid the hyperbole that was witnessed in the insurance 

market following the 9/11 attacks. While on the other hand, information on 

market coverage, and quantification of the time taken to recover from an 

attack, whether business interruption, medical rehabilitation or international 

comparisons, are all held by the insurance sector, which would be incredibly 

useful for government and academic partners. 

Big Data has begun to play a much more prominent role in the insurance 

industry. Whether this will have a positive or negative impact for clients is 

uncertain, but this has begun to be a more accepted branch of science among 

the community. Greater co-operation stimulated between the business and 

academic communities and government would enable a greater impact to 

be made in this field. 

Mark Lynch is Head of Impact Forecasting’s Terrorism and Political vVolence 

Modelling Team. He has a background in international security analysis and 

counter-terrorism and is responsible for the composition of and academic 

input into Impact Forecasting’s human-security catastrophe models, including 

terrorism, rioting and drug cartel violence. Mark has a Master’s degree in 

International Security Studies from the Centre for the Study of Terrorism 

and Political Violence and he previously worked in the Royal United Services 

Institute’s National Security and Resilience Department. He has also worked 

with the London School of Economics, analysing violent manifestations of 

nationalism and has been published on the changing nature of nationalist 

and Islamic fundamentalist terrorism in the twenty-first century.

VI: Intelligent Use of Electronic Data to Enhance 

Public Health Surveillance 

Edward Velasco 

The exchange of health information on social media and the Internet would 

appear to offer obvious opportunities to gain insight into emerging disease 

outbreaks. With new initiatives like Google Flu Trends 1 and HealthMap, 2 there 

are now more ways than ever before to monitor outbreaks. This paper will 

explore the opportunities these initiatives offer to public health practitioners 

trying to detect emerging diseases in their regions. 

From an epidemiological perspective, there are obvious advantages to 

decreasing the time needed to detect an infectious disease health event, 

so that appropriate prevention or mitigation measures can be undertaken 

as quickly as possible. The existing types of public health surveillance 

systems are indicator-based and event-based surveillance. Indicator-based 

surveillance, the oldest and most commonly found system, is widely used by 

regional, national and international public health agencies. These systems 

are designed to collect and analyse structured data, based on protocols 

tailored to each disease, including calculating the incidence, seasonality and 

burden of disease. Their goal is to find increased numbers or clusters that 

might indicate a threat. There is generally a time lag between the occurrence 

of an event and its detection by indicator-based surveillance, however; these 

systems lack the ability to detect potential threats more quickly. In addition, 

they are not equipped to detect new or unexpected disease occurrences 

because they only collect predefined epidemiological attributes for each 

disease. This is why the first cases of Severe Acute Respiratory Syndrome 

Coronavirus (SARS-CoV) in Asia, for example, a new strain of viral infection, 

were not detected sooner. 3 

Instead of relying on official reports, event-based surveillance information 

is obtained directly from witnesses of real-time events or indirectly from a 

variety of communication channels, including social media and established 

alert systems, as well as from information channels such as the news 

media, public health networks and NGOs. Because it occurs in ‘real time’, 

event-based surveillance can identify events faster than indicator-based 

surveillance and can identify new events that will not be picked up by 

indicator-based surveillance. Health information monitored via the Internet 

and social media is an important part of event-based surveillance, and is 

1. Google Flu Trends, , accessed 14 July 2014. 

2. Healthmap, , accessed 14 July 2014. 

3. C Castillo-Delgado, ‘Trends and Directions of Global Public Health Surveillance’, 

Epidemiologic Reviews (Vol. 32, No. 1, 2010), pp. 93–109.

39 


most often the focus of existing event-based surveillance systems. Research 

has shown that event-based surveillance identifies trends comparable to 

those found using established indicator-based surveillance methods. 4 In 

practice, however, event-based surveillance systems have not been widely 

accepted and integrated into mainstream use by national and international 

health authorities, mainly because they have not yet been systematically 

evaluated within a public health agency. 

From 2010 to 2012, the Robert Koch Institute, the national public health 

agency of Germany, participated in a multidisciplinary scientific consortium 

to develop novel methods for an event-based surveillance tool to be 

integrated into infectious disease monitoring alongside other national 

surveillance activities. The multinational team produced a web-based 

platform (the Medical Ecosystem or M-Eco) 5 to develop technologies that 

are new to event-based surveillance (and have not yet been featured in 

existing systems, as also evidenced by a literature review that was previously 

conducted). These include content analysis using enhanced data processing 

(including stemming, POS 6 tagging and named entity recognition) and data 

collection from various user-generated content resources – including socialmedia 

content (such as Twitter) and radio and TV media transmissions 

(transcripts provided by a special media service). 

Detection mechanisms were developed to scan the Internet continuously 

for these media types, based on the simple semantic (disease names and 

symptoms) and statistically relevant (search algorithms) epidemiological 

requirements that were deemed critical for the surveillance of different 

infectious diseases. Development of these functionalities resulted in a 

‘search function’ on a web-based user interface that enabled epidemiologists 

to monitor ‘mentions’ of diseases and symptoms on Twitter and news media 

(fed via a news aggregate technology) over time, geo-located where possible 

to enable comparison with other sources of epidemiological information, 

including standard governmental infectious disease surveillance and 

monitoring. 

Automated technologies provided signals for the risk assessment of 

infectious disease events to public health epidemiologists in a user-friendly, 

rapid and easy way. Lastly, policy concerns regarding the integration of 

the developed technologies for existing public health infectious disease 

4. S Doan et al., ‘Global Health Monitor – A Web-Based System for Detecting and 

Mapping Infectious Diseases’, 2007; D M Hartley et al., ‘An Overview of Internet 

Biosurveillance’, Clinical Microbiology and Infection (Vol. 19, No. 11, 2013), pp. 1006– 

13; J P Linge et al., ‘Internet Surveillance Systems for Early Alerting of Health Threats’, 

Euro Surveillance: European Communicable Disease Bulletin (Vol. 14, No. 13, 2009). 

5. M-Eco, , accessed 14 July 2014. 

6. Part of speech tagging. An explanation of this process can be found at , accessed 14 July 2014.

40 


surveillance infrastructures were explored. Integrating these technologies 

into the surveillance software of the German national health institute 

was a goal, with the potential to scale up to other countries based on this 

experience. 

Figure 2: Components and processing pipeline of the M-Eco system. 

Evaluation of the Prototype System for Event-Based Surveillance 

The first of a series of three evaluations attempted to illustrate how well 

the system generates signals for potential events of public health interest. 

A simulation with Twitter was conceived, where thirteen scientists created 

tweets for three mock infectious disease event scenarios within the 

simulation: 

• An outbreak of measles in a local school 

• An outbreak of Salmonellosis among attendees of a European football 

championship 

• Cases of hepatitis A appearing in travellers returning to Germany 

from North Africa.

41 


The mock tweets were fed into the M-Eco technology (Figure 2), which 

combined them with real-world tweets that were taken from real users of 

Twitter and subsequently analysed. There were fewer retrieved signals that 

referred to true outbreaks than expected: only around a third (31 per cent) 

were relevant, compared with the 75–80 per cent expected by evaluators. 

While it is difficult to say whether or not this is because of the lack of actual 

tweets matching the three scenarios, the main assumption from these results 

was that the chosen keywords for identification via search and screening 

algorithms were not comprehensive enough. This could be because many 

tweets are written in a vernacular that does not match the formal medical 

terms used in the preset keyword lists used in the automated analyses. 

A subsequent evaluation tested the hypothesis that the M-Eco technology 

could produce viable signals in real time during a large mass-gathering event. 

This assumption was tested during the European Football Championship, 

which took place in Poland and Ukraine in 2012. 

Signals were provided to ‘subscribers’, also known as epidemiologists, at 

the Robert Koch Institute, and one state public health agency in Saxony, 

Germany, which received daily deliveries of signals that they then monitored 

for relevance alongside regular work. As in previous evaluation efforts, 

a lower number of signals were produced than was expected: only an 

average of twenty signals on average per day. There were 242 signals during 

the event and, of these, only thirteen were relevant over the event time. 

Similar problems with keywords or terms were recorded, such as the use of 

vernacular or the off-use of terms, for example ‘football fever’, ‘weakness’ of 

players’ ability or ‘headache’ from watching poor performance. 

An additional evaluation was completed over three weeks in order to measure 

the appropriateness of the developed system for daily epidemiological 

monitoring of infectious diseases and related symptoms relevant for 

Germany, using the M-Eco search interface. The evaluation exercise was 

based on criteria for inclusion and exclusion. Diseases that were deemed 

to be more prevalent in Germany were chosen (rarely occurring tropical 

diseases were not searched, for instance). Additionally, priority was given to 

those diseases and symptoms likely to be discussed in the general population 

via social media (because of popularity and general ubiquity) or those 

less likely to induce social stigma. Diseases that were deemed seasonally 

irrelevant for the time period, such as Tick-Borne Encephalitis (TBE), which 

primarily occurs in the summer months, were excluded. Other diseases were 

excluded because they occur so rarely that experts found a high likelihood 

of them remaining unmentioned on social media, for example Q-Fever, or 

because of their uncommon prevalence, or a faster or more severe onset of 

disease (and therefore higher likelihood to be detected by other sources) in 

Germany, for example Hemorrhagic (West Nile) Fever or tuberculosis.

42 


Search terms were entered into the M-Eco search function. Each search 

term was allocated to one of four epidemiologists, and evaluators 

monitored their terms daily with regards to the number of resulting signals 

(matching the search term and defined by location – Germany); whether 

there was an indication of larger events (an outbreak); whether signals 

were relevant to their work; and whether the search results were found 

in another epidemiological surveillance source. Evaluators also provided 

qualitative feedback on their experience during the evaluation, focusing on 

the integration of such monitoring into their regular epidemiological work, 

general feedback and any required improvements. 

Signals came in primarily indicating influenza or flu. When graphed over time 

by date, it was clear that two large waves of signals came up for ‘flu’ and 

coincided with media coverage of ‘flu shots’, namely the fact that Germany had 

been experiencing a shortage of vaccination coverage. Another interesting 

dip, where no signals at all appeared, coincided with the weekend, indicating 

that tweets may have patterns that correspond to days of the week. 

Figure 3: Monitored signals for the search terms in the M-Eco search 

function over time. 

Note: Black line shows the trend for all signals. 

The evaluation showed that search terms used primarily by medical 

professionals were most prevalent, indicating that more signals might be 

derived from the media or reports. A hypothesis made from this finding was 

that tweets were mainly from media sources and that media tended to break 

off at the weekend.

43 


Qualitative evaluation indicated overall acceptance of the concept for the 

system. Evaluators generally found it impressive that tweets about health can 

be monitored. They appreciated that the system provided them with signals 

based on aggregated social-media sources, and therefore allowed for easier 

and faster monitoring of many social-media sources in one place – something 

that could not be processed manually within a limited amount of time. 

General Discussion Points 

The experience with Twitter shows that the total number of signals retrieved 

by the prototype was smaller than initially expected throughout evaluation. 

This could be due to a smaller overall number of German-language tweets; 

social media has been shown to be dominated by English-language users 

(many of these in the US), which would result in fewer social-media documents 

and signals in the German language. 7 Additionally, there could be a perceived 

social stigma associated with certain terms for diseases or symptoms that 

yielded fewer results. 8 This was possibly evident by looking at the retrieved 

signals related to flu-related versus gastroenteritis-related symptoms (see 

Figure 3). When talking about headache, fever or flu, there may not be such 

social stigma, but gastrointestinal diseases, although sometimes mentioned 

by the media during large outbreaks, are not necessarily those illnesses 

fervently discussed in social media. One is more likely to speak publicly 

about a ‘headache’ than about ‘bloody diarrhoea’. Not surprisingly, words 

indicating gastrointestinal diseases and related symptoms were not very 

common in the social-media content retrieved in this evaluation. This is in 

contrast to the official notification system in Germany, where gastrointestinal 

diseases play an important role compared to flu-like illnesses. 

Although the results suggest that Twitter is a useful source of additional 

information, the difference between media reports and personal reports 

remains a significant issue. Reports that originate in news media are easier 

to retrieve as they tend to reflect language and keywords that accurately 

mirror health and medical terminology. They are also more likely to refer to 

outbreaks. Personal reports are hard to detect. Two groups of tweets written 

by individuals were identified: those in which people refer to media reports 

and those in which people refer to a health status (for example, a tweet 

with content on ‘own health status’ or someone related to it – perhaps, a 

joke about health symptoms). The research suggests that people are much 

more likely to exchange information about less-serious health conditions like 

tiredness or nausea than about more serious conditions. A person will share 

the fact that they have a headache, for example, but not that their recent 

cancer diagnosis and accompanying antibiotics cause severe diarrhoea. 

7. T Webster, Twitter Usage in America: 2010, 2010. 

8. T H A Correa and H Zuniga, ‘Who Interacts on the Web? The Intersection of Users’ 

Personality and Social Media Use’, Computers in Human Behavior (Vol. 26, No. 7, 

2010).

44 


Additionally, it is difficult for the system to detect such reports, as they do 

not often contain recognisable health or medical terms, but rather content 

that includes paraphrases and variable language, such as slang (for example, 

‘the squirts’ versus ‘diarrhoea’) or abbreviations that may include alternative 

spellings (e.g. ‘shot 2 l8 4 flu’ versus ‘flu-shot too late for flu season’). More 

research into language in social-media use is needed before text-mining for 

infectious disease-relevant information can be best technically addressed. 9 It 

will be a continuing task to improve the algorithms to better match a constantly 

changing media landscape, and the language and socio-cultural handling of 

social media by the populations whose health needs to be monitored. In other 

research, technologies developed to deal with these issues rely on spurious 

correlations, leaving keyword-based methods vulnerable to false alarms. 10 

Place and location are a critical part of outbreak detection, early warning 

and epidemiological work, and will be of increasing utility to health scientists 

wanting to monitor diseases using social media. Throughout the development 

and use of the prototype, geo-location has been a difficult component to 

analyse, because of a lack of data. When using Twitter as a data source, 

geo-location is not always included in user profiles, and users do not always 

disclose a location in the content of their tweets. Colleagues looking to geolocation-stamping 

have tried ways to analyse textual information from the 

content of social media in order to provide information on location, and such 

statistical learning frameworks seem to be successful, but can introduce a 

high level of complexity. 11 Chalenkha and Collier examined geo-encoding of 

outbreak reports with more detailed granularity, but found the encoding of 

health information from reports time-consuming and expensive. Automated 

systems tend to leave out too much information. As a solution, the authors 

propose a scheme called ‘spaciotemporal zoning’, in which they analyse 

events reported in sources with regard to temporal information as a means 

to mitigate the limitations of current report-based surveillance systems. 12 

Sensitivity and specificity remain tough factors in the process of signal generation. 

It is essential that enough data are captured, so that important information is not 

9. N Collier et al., ‘A Multilingual Ontology for Infectious Disease Surveillance: Rationale, Design 

and Challenges’, Language Ressources and Evaluation (Vol. 40, No. 3/4, 2006), pp. 405–13; M 

Conway et al., ‘Classifying Disease Outbreak Reports Using N-grams and Semantic Features’, 

International Journal of Medical Informatics (Vol. 78, No. 12, 2009), pp. e47–e58. 

10. A Culotta, ‘Detecting Influenza Outbreaks by Analyzing Twitter Messages’, 2010, 

, accessed 20 August 2014. 

11. V Lampos and N Christianini, ‘Nowcasting Events from the Social Web with Statistical 

Learning’, ACM Transactions on Intelligent Systems and Technology, 2011. 

12. H Chanlekha and N Collier, ‘A Methodology to Enhance Spatial Understanding of Disease 

Outbreak Events Reported in News Articles’, International Journal of Medical Informatics 

(Vol. 79, No. 4, 2010), pp. 284–96; H Chanlekha, A Kawazoe and N Collier, ‘A Framework 

for Enhancing Spatial and Temporal Granularity in Report-Based Health Surveillance 

Systems’, BMC Medical Informatics and Decision Making (Vol. 10, No. 1, 2010).

45 


overlooked, but simultaneously that not too much is presented to the user, so as 

not to overwhelm. The work suggests that signals are only relevant if the personal 

tweet mentions an actual outbreak, but this could be limited by existing knowledge 

of outbreaks. Signals are sometimes generated by cognates of disease names or 

symptoms, or words that sound like disease names or symptoms. 13 As mentioned 

above, there are various linguistic aspects that must be constantly improved. 

Despite the aforementioned technical challenges, the results of this prototype 

evaluation indicated that social media (Twitter) should not be ruled out for 

infectious disease surveillance. Although it has not yet been possible, it 

would be ideal to integrate this work alongside indicator-based surveillance 

efforts over an extended timeframe to give a better sense of the true value 

as events arise in real time. Further evaluations in the future are needed in 

order to measure a true epidemiological impact over time and in context. 

The experience with the M-Eco prototype provided a means to look further 

behind the systematic acquisition and processing of social-media data in 

health monitoring. Depending on the content available in social media, health 

officials can receive information about potential health threats earlier or they 

can receive additional information on health threats already detected by 

another system. The M-Eco prototype has been designed to offer automated 

methods and technologies to rapidly provide signals for the detection of 

infectious disease events. 14 This is challenging, and more time is needed to 

explore ways to evaluate such a system and the resulting signals over a longer 

period. Previous evaluations of event-based surveillance systems have been 

completed only to a limited extent, and there are very few examples to draw 

from. 15 In addition to speeding up the detection process through bypassing 

traditional indicator-based surveillance structures, event-based surveillance 

can also provide innovation in settings where weak or underdeveloped 

13. R Steinberger et al., ‘Text Mining from the Web for Medical Intelligence’, in F 

Fogelman-Soulié et al. (eds), Mining Massive Data Sets for Security (Amsterdam: IOS 

Press; 2008), pp. 295–310; R. Yangarber, R Steinberger et al., ‘Combining Information 

Retrieval and Information Extraction for Medical Intelligence’, Proceedings of Mining 

Massive Dara Sets for Security NATO Advanced Study Institute Gazzada, Italy, 2007. 

14. G Eysenbach, ‘Medicine 2.0: Social Networking, Collaboration, Participation, 

Apomediation, and Openness’, Journal of Medical Internet Research (Vol. 10, No. 

3, 2008), p. e22; G Eysenbach, ‘Infodemiology and Infoveillance: Framework for an 

Emerging Set of Public Health Informatics Methods to Analyze Search, Communication 

and Publication Behavior on the Internet’, Journal of Medical Internet Research (Vol. 

11, No. 1, 2009), p. e11; T W Grein et al., ‘Rumors of Disease in the Global Village: 

Outbreak Verification’, Emerging Infectious Diseases (Vol. 6, No. 2, 2000), pp. 97–102; 

M Keller et al., ‘Use of Unstructured Event-based Reports for Global Infectious Disease 

Surveillance’, Emerging Infectious Diseases (Vol. 15, No. 5, 2009), pp. 689–95. 

15. J S Brownstein and C C Friefeld, Evaluation of Internet-Based Informal Surveillance 

for Global Infectious Disease Intelligence, 2008; J S Brownstein, C C Freifeld, B Y Reis 

and K D Mandle, Evaluation of Online Media Reports for Global Infectious Disease 

Intelligence, 2007.

46 


surveillance systems are in place. Currently, several developing countries face 

such realities, and since socioeconomic disparities and poor or insufficient 

surveillance infrastructures often have broader consequences in the event of 

an outbreak, the potential gain is worth exploring. In these contexts that share 

a larger burden than most, the development of surveillance that can access 

health information in the absence of traditional surveillance institutions 

could be critical to the early detection and prevention of infectious disease 

at the earliest stage to prevent an epidemic outbreak or reduce its impact. 

Recent work has begun in this area in order to seek information on health 

threats using mobile-phone technology, Internet scanning tools, e-mail 

distribution lists or networks that complement the early warning function of 

routine surveillance systems. 16 The research has shown that the majority of 

existing event-based surveillance systems are situated in North America and 

Europe. Local, event-based systems to monitor epidemic threats in Africa, 

Asia, Oceania and South America are scarce. Guidance and training to create 

such systems on the ground should be considered, and can lead to a faster 

assessment of arising health threats and improved rapid response by local 

authorities. 

Edward Velasco is a senior scientist at the Robert Koch Institute, the 

national public health agency of Germany. He provides scientific advising 

and technical support in the Division of Healthcare-Associated Infections, 

Antimicrobial Resistance and Consumption, including outbreak management 

and research on clinical and social risk factors for antimicrobial resistance. 

He has widespread experience in infectious disease epidemiology and has 

consulted with the European Centre for Disease Prevention and Control on 

quality evaluation for surveillance systems in EU member states. He has 

held positions in evaluation at the Open Society Foundation, London and 

the Social Science Research Centre, Berlin. He has a doctorate in medical 

sciences from Charité University Hospital, the joint medical school of the 

Humboldt and Free Universities of Berlin, and a Master of Science in Social 

Epidemiology from the Harvard School of Public Health. He can be reached 

on VelascoE@rki.de. 

16. J P Chretien and S H Lewis, ‘Electronic Public Health Surveillance in Developing 

Settings: Meeting Summary’, BMC Proceedings (Vol. 2, Suppl. 3, 2008), p. S1; J P 

Chretien et al., ‘Syndromic Surveillance: Adapting Innovations to Developing Settings’, 

PLoS Medicine (Vol. 5, No. 3, 2008), p. e72; C Robertson et al., ‘Mobile Phone-Based 

Infectious Disease Surveillance System, Sri Lanka’, Emerging Infectious Diseases (Vol. 

16, No. 10, 2010), pp. 1524–31.

VII: The Raxibacumab Experience: The First Novel 

Product Approved Under the US Food and Drug 

Administration ‘Animal Rule’ 

Chia-Wei Tsai 

The analysis of data – big and small – is central to the US government’s 

approach to establishing requirements and procurement goals for medical 

countermeasures for chemical, biological, radiological and nuclear events 1 

and their approval or licensure by the Food and Drug Administration (FDA). 2 

Multiple scenarios with a wide variety of variables, such as location, time of 

year and weather conditions, are analysed to project the potential impact 

on humans, animals and commerce. Additional analysis is carried out to 

identify gaps in resources that are needed versus those that are available, 

and on their ability to be used successfully, including the logistics that are 

affected by the emergency. It is the policy of the US government to seek 

FDA approval or licensure for these medical countermeasures while they 

are being developed or stockpiled. The efficacy of many of these products 

cannot ethically be evaluated in humans and therefore their regulatory path 

relies on the Animal Rule. 3 This requires the demonstration of efficacy in 

animal models followed by the demonstration of safety in human trials, and 

the development of a pharmokinetic bridging study – which establishes the 

safe and appropriate human dose – between the two. All of this is based 

on the statistical analysis of Big Data from non-clinical and clinical studies. 

The successful application of these principles has been demonstrated in the 

development of the anthrax antitoxin raxibacumab. 

Anthrax Antitoxin Requirement 

In 2004, anthrax was determined by the US secretary of Homeland Security 

to present ‘a material threat against the US population sufficient to affect 

national security’. 4 Thus, the US government has established an integrated 

anthrax response strategy that includes antitoxins, antibacterials and 

1. National Strategy for Countering Biological Threats, , accessed 19 

August 2014. 

2. Medical Countermeasures Initiative Strategic Plan 2012–2016, , 


3. Food and Drug Administration, Animal Rule Summary, , 


4. Taking Measure of Countermeasures (Part 1), , 


48 


vaccines. 5 In 2008, the Enterprise Executive Committee 6 of the Public Health 

Emergency Medical Countermeasures Enterprise approved a scenario-based 

requirement for anthrax antitoxins. 7 That requirement was established 

from an assessment of high-consequence scenarios involving the exposure 

of a single major metropolitan area to a defined amount of anthrax spores 

through computer modelling and simulation. Exposure modelling involved 

very large data sets related to spore dispersion and fate, transport modelling, 

infection modelling involving analysis of human outbreak data from the 

Sverdlovsk incident that occurred in Russia in April 1979 (in which spores 

of anthrax were accidentally released from a military facility, resulting in an 

estimated one hundred deaths), and from non-human primate experimental 

exposures. 

In order to better determine how much antitoxin should be procured, 

the Analytical Decision Support Division of the US Biomedical Advanced 

Research and Development Authority (BARDA), conducted a preparedness 

analysis that included two different approaches to the analysis of very large 

data sets, which included a wide variety of parameters. In the first analysis, 

a fixed percentage approach was taken; in the second analysis, a populationdensity 

approach was taken. These two approaches both concluded that a 

similar level of coverage of all metropolitan statistical areas (cities) in the US 

was achievable, allowing a procurement goal to be established. This level of 

preparedness represents approximately the maximum amount of product 

that can be manufactured with existing capabilities, and a reasonable cost– 

benefit ratio based on existing funding and drug costs. 

With information from this analysis, a meeting was held in Seattle to discuss 

antitoxin use and logistical issues with state and local end-users. The group 

included policy-makers, planners, physicians, nurses, emergency responders 

and first responders. The objective was to identify the parameters this group 

felt were important to the analysis of response capabilities. This qualitative 

input allowed weight factors to be established for parameters identified 

as critical in the subsequent quantitative analysis. The participants were 

informed about the antitoxins that are currently available in the Strategic 

National Stockpile and given an opportunity to discuss their use in masscasualty 

scenarios. Several important issues were raised, but the participants 

all agreed that antitoxins would be an important component of the response 

to anthrax events. The results of this forum are currently being used to build 

medical countermeasures distribution and dispensing models that can be 

5. HHS PHEMCE Strategy and Implementation Plan, 2012, , accessed 19 August 2014. 

6. PHEMCE Governance, , accessed 19 August 2014. 

7. PHEMCE Mission Components, , accessed 19 August 2014.

49 


used to predict the medical outcomes in mass-casualty events. These models 

consider two approaches to surveillance and the initiation of an emergency 

response: detection through the BioWatch system and index clinical cases. 8 

At the meeting, officials of the Office of the Assistant Secretary for 

Preparedness and Response (ASPR) described the Department of Health and 

Human Services (HHS) Medical Countermeasure Strategy, 9 the acquisition 

process and BARDA’s role. An outside consultant described the lessons 

learned in the treatment of the victims of the 2001 anthrax attacks. ASPR 

provided product background on the two antitoxins available for use in 

treating anthrax exposure, raxibacumab and anthrax immune globulin 

intravenous. The role of government organisations and the private sector in 

the distribution and dispensing of medical countermeasures was discussed, 

including the need for large amounts of ancillary supplies to administer these 

medical countermeasures. 

The results of the preparedness analysis and the meeting with state and 

local end-users were used to inform decisions regarding the prioritisation of 

treatment geographically and at the patient level. The Centers for Disease 

Control and Prevention organised the first meeting of the Clinical Utilization 

Plan for Anthrax Countermeasures in a Mass Event Setting (CUPAC). 10 Using 

a ‘best practices’ approach, the CUPAC focused on patients with clinical 

signs and symptoms of anthrax presenting at health-care centres following a 

large-scale bioterrorism event associated with wild-type Bacillus anthracis. 

The goal of the CUPAC is to create strategies to triage and care for large 

numbers of patients effectively and to create a scalable prioritisation scheme 

for the use of medical countermeasures. Working with a Federal Steering 

Committee and the National Association of County and City Health Officials, 

a systematic review was conducted by the Triage and Critical Care Working 

Group and the Medical Countermeasure Working Group. In gathering and 

analysing data and drafting preliminary recommendations, the CUPAC 

working groups are considering questions about the prioritisation of 

antitoxins, and the prioritisation and duration of antibacterials, triage and 

critical care. The analysis includes large data sets from non-clinical studies 

as well as clinical data from the use of antitoxins to treat anthrax cases that 

occurred in 2009 and 2010 in Scotland, UK. Again, the analysis of Big Data 

with diverse parameters is playing a central role in the development of these 

guidelines. 

8. Department of Homeland Security, Homeland Security BioWatch programme, 


9. Public Health Emergency Medical Countermeasures (PHEMCE) Strategy, 2012, 

, 


10. Conference Report on Public Health and Clinical Guidelines for Anthrax, , accessed 19 August 2014.

50 


Development of Raxibacumab for Anthrax Treatment 

Since 2007, an inventory of treatment courses of antitoxins, including 

raxibacumab, a monoclonal antibody targeting the protective antigen of 

Bacillus anthracis, has been available 11 in the SNS through the Project 

BioShield contracts awarded in 2005. 12 Raxibacumab is a monoclonal 

antibody antitoxin against the protective antigen of Bacillus anthracis for 

the treatment of inhalational anthrax. Its efficacy has been demonstrated in 

multiple animal trials as a monotherapy and in combination with antibiotics. 

Its safety has been demonstrated in healthy adults through large clinical 

trials and statistical analysis of results from those trials. 

The development of raxibacumab is the result of a co-ordinated response to 

a recognised public bioterrorism threat and the US Government’s request for 

medical countermeasures to treat inhalational anthrax. Following the anthrax 

attacks in 2001, over 30,000 people with suspected exposures initiated 

antimicrobial prophylaxis. Eleven people developed inhalational anthrax, and 

despite the best available treatment, five of them died. All subjects received 

at least two antibiotics and some received as many as seven. Antibiotics 

alone were insufficient to treat subjects who had developed anthrax. While 

antibiotics can overcome blood infections caused by anthrax, they do not 

directly address the presence of toxins that drives the development of the 

disease. Anthrax toxin is responsible for most morbidity (illnesses) and 

mortality (deaths) associated with anthrax. 

In humans and animals inhalational anthrax occurs following inhalation 

of Bacillus anthracis spores, which germinate within macrophages (a type 

of white blood cell that ingests foreign particles) as they travel to the 

lymph nodes of the lung, 13 from where they are drained out of the body. 

Multiplication of the bacteria results in a high organism count in the blood, the 

production of bacterial toxins, and the rapid onset of septicemia. Although 

bacterial replication (bacteremia) can be controlled by the administration of 

appropriate antibiotics, it is the bacterial toxin that exerts deleterious effects 

on the cells within the body, resulting in substantial pathology and high 

mortality in infected individuals. Because antibiotics have no direct effect on 

the toxin, they do not treat the toxemia. After the toxin has reached sufficient 

levels in an individual, controlling bacterial replication with an antibiotic may 

not alter the clinical course of the patient. 

11. US Department of Health and Human Services, Project BioShield Annual Report 

to Congress, , accessed 30 September 2014. 

12. BARDA Strategic Plan 2011–2016, , accessed 19 August 2014. 

13. Anthrax, , accessed 19 August 2014.

51 


There is an effective anthrax vaccine that works by inducing the body’s 

immune response primarily to the protective agent component of anthrax 

toxin, however. Once subjects have this antibody, they are protected against 

the effects of anthrax. Of those who became ill in the 2001 attacks, all of 

the survivors developed an immune response to the anthrax toxin by day 

twenty-eight after exposure. 

The challenge with this rapidly progressing and often fatal disease is the time 

required for the infected person’s immune system to generate a response to 

the toxins. The anthrax vaccines that have traditionally been available require 

more than two months to achieve protective levels of antitoxin antibodies in 

the blood. Raxibacumab works by delivering human recombinant antitoxin 

antibody to the subject immediately. At the proposed dosage, raxibacumab 

persists long enough for the development of immunity, helping subjects 

survive to develop long-lasting toxin-neutralising antibodies. This immediate 

onset of action fills the need for subjects who have not received the anthrax 

vaccine. This approach also addresses the need arising from the inability of 

antibiotics to address anthrax toxemia directly. As demonstrated in studies 

in rabbits and non-human primates, raxibacumab improves survival when 

administered early, before symptoms develop, as well as later, when the 

disease has progressed to systemic infection. The results of the animal 

studies are subjected to statistical analysis and computer modelling in order 

to estimate how efficacious the antitoxin might be in humans. Moreover, 

raxibacumab is effective both as monotherapy and in combination with 

antibiotics. 

While it is possible to achieve 100 per cent cure rates using antibiotics 

alone under experimental conditions, the 2001 attacks and other real-world 

experiences have demonstrated that antibiotics alone are not 100 per cent 

effective. In addition, antibiotics would not be effective against antibioticresistant 

strains of anthrax, which have already been identified. The US 

government has recognised the need for additional anthrax bioterrorism 

countermeasures. Immediately after the anthrax attacks in September 

and October of 2001, Human Genome Sciences, Inc. (HGS) embarked on 

a development programme to produce a monoclonal antibody to treat 

inhalational anthrax. HGS was acquired by GlaxoSmithKline (GSK) in 2012, 

which continues the development and production of raxibacumab. The 

goal of the programme was to address the unmet bioterrorism and medical 

needs posed by inhalational anthrax and the limitations of current therapies. 

In less than a year, using recombinant DNA technology, a potent and specific 

antibody had been developed that binds the protective antigen of Bacillus 

anthracis with high affinity and inhibits protective antigen binding to anthrax 

toxin receptors, thus protecting animal and human macrophages from 

anthrax toxin-mediated cell death. HGS then began non-clinical work to 

establish proof of concept of the antibody as a therapeutic in the laboratory

52 


and in animals, and initiated the process development work to manufacture 

and characterise the product. 

Bacillus anthracis produces three toxins. While antimicrobials cut off the source 

of anthrax toxin production, they do nothing to inhibit the adverse effects of 

toxins that have already been released. The pathogenic effects of toxemia 

can persist after bacteremia has been resolved. However, antitoxin antibodies 

directly neutralise the toxin and prevent its pathogenic effects. Recombinant 

human antitoxin monoclonal antibodies immediately provide the protection that 

develops from the immune response in anthrax survivors or that is stimulated by 

vaccines over the course of weeks with multiple injections. Antitoxin antibodies 

can be used in combination with antibiotics to protect subjects from the toxemia 

that antibiotics do not treat and would also be an important therapeutic option 

when antimicrobials are unavailable or contraindicated, or in the event of 

exposure to an antibiotic-resistant anthrax strain. 

The Regulatory Path under the FDA’s Animal Rule 

Raxibacumab is the first new drug developed since the bioterrorism attacks of 

2001 to seek licensure under the US FDA regulation that describes ‘Evidence 

Needed to Demonstrate Effectiveness of New Drugs When Human Efficacy 

Studies Are Not Ethical or Feasible’, or the Animal Rule (21 CFR 601, Subpart 

H, 2002). 14 The animal studies with raxibacumab were designed to meet the 

criteria for demonstration of efficacy under the Animal Rule and the animal 

models used for evaluation contained the essential elements provided in 

FDA guidance, which are recommended to generate data likely to predict 

the effectiveness of the product in humans. 

A treatment model must be based on the administration of the therapeutic, 

based on a sign or observation, not just the parameter of time that has 

passed after exposure; because large non-clinical studies to establish these 

treatment triggers are not ethically acceptable, a meta-analysis of data from 

studies in the US and UK spanning over ten years was conducted. Analysis 

of large data sets from rabbit and macaque studies, which included many 

diverse parameters such as body temperature and biochemical assay results, 

allowed reproducible triggers to be identified in rabbits (body temperature 

increase) and macaques (the quantitative measurement of protective agent 

in the blood) to be established. 

While the efficacy of raxibacumab was demonstrated in two animal models 

of inhalational anthrax, safety was evaluated in human clinical studies with 

single and repeat dosing, alone and in combination with antibiotics, in healthy 

14. Product Development under the Animal Rule, , accessed 

19 August 2014.

53 


adult volunteers. 15 The animal efficacy studies demonstrated that a single 

dose of raxibacumab administered intravenously effectively neutralises the 

protective agent and significantly improves survival. Its effect is immediate, 

and maximum raxibacumab serum concentrations are critical for survival, as 

the goal is to neutralise protective agent as rapidly as possible. Moreover, 

because of its relatively long half-life, raxibacumab is durable, maintaining 

antitoxin protection until natural immunity can develop in twenty-eight 

days. Importantly, raxibacumab does not prevent the development of 

antitoxin immunity in anthrax-infected animals, nor does it interfere with the 

pharmacokinetics or safety of concomitantly administered antimicrobials. 

Animal studies have demonstrated that raxibacumab does not interfere 

with the activity of antibiotics and that the combination of raxibacumab and 

antibiotic provides a higher survival outcome than antibiotics alone. 

Because raxibacumab would likely be used with antimicrobials, the activity 

of antimicrobials was evaluated in combination with raxibacumab using 

the same study design as the pivotal efficacy studies. Per the suggestion 

of FDA, animal studies were performed in which a full human-equivalent 

dose of levofloxacin or ciprofloxacin was administered at the same time as 

raxibacumab to animals with symptomatic disease. Because antimicrobials 

are most effective when all spores have germinated, administering the 

antimicrobials after the animals had become septic maximised the efficacy 

of the antibiotics. This is reflected in the high survival rates in the antibiotic 

alone and raxibacumab-antibiotic combination treatment groups (85–100 

per cent). In this study, levofloxacin alone or in combination with raxibacumab 

was administered to the 42 per cent of anthrax-infected animals surviving 

to 84 hours after spore exposure. The combination of raxibacumab and 

levofloxacin resulted in a higher survival outcome than for levofloxacin 

treatment alone. 

The results of the added benefit study serve to supplement the results of the 

original efficacy studies, which demonstrated the efficacy of raxibacumab 

administered early in the course of the disease. In contrast to the survival 

rates observed late in the course of disease, survival rates are highest with 

raxibacumab when it is given as the protective agent is first being produced, 

with 90‐100 per cent survival rates in rabbits and monkeys when raxibacumab 

is administered as monotherapy at the time of spore challenge or at twelve 

hours after spore challenge. In the clinical setting, when neither the time 

of spore exposure, onset of symptoms, nor individual time course of the 

disease is easily identified, administering both antimicrobials to kill bacteria 

and anti-protective agent antibody to neutralise toxin is an effective strategy 

for combating both the source and effects of the disease. 

15. Clinical Pharmacology and Biopharmaceutics Review of Raxibacumab, , 


54 


Per agreement with FDA for an indication as a therapeutic treatment and 

consistent with the Animal Rule, the safety of raxibacumab has been evaluated 

in over 400 healthy human volunteers. Adverse events were generally mild to 

moderate and did not occur at a rate that was different from that observed 

among placebo-treated subjects. A low incidence of mild to moderate rash 

was observed in some subjects. These rashes were transient and resolved 

without medication or with oral diphenhydramine (a readily available drug 

used to reduce irritation and runny noses caused by hayfever or allergies). 

Concomitant administration of raxibacumab with ciprofloxacin (a common 

antibiotic), did not alter the safety or pharmacokinetics of either antibiotic 

or raxibacumab. 

Raxibacumab treatment should be initiated when a diagnosis of inhalational 

anthrax is suspected or confirmed. Raxibacumab provides a significant 

survival benefit in animals symptomatic for systemic anthrax disease. 

Raxibacumab treatment is also associated with significant and greater 

improvement in survival when given as pre- or post-exposure prophylaxis 

(preventative medicine). Raxibacumab is an important treatment option 

for inhalational anthrax: an effective antitoxin with a mechanism of action 

distinct from that of antimicrobials. Raxibacumab neutralizes protective 

agent, improves survival and reduces signs of the disease. When used in 

combination with antibiotics, raxibacumab does not interfere with antibiotic 

efficacy and results in a higher survival outcome than antimicrobial therapy 

alone. Raxibacumab used alone is also expected to provide clinical benefit 

for individuals in whom antibiotics are contraindicated or in whom anthrax 

disease is due to antibiotic-resistant strains of Bacillus anthracis. 

Post-Marketing Requirement after the Licensure 

Raxibacumab was approved by the FDA for the treatment of inhalational 

anthrax due to Bacillus anthracis in December 2012. Its approval was based 

on the analysis of data from non-clinical studies, and the development 

of a mathematical pharmacokinetic model bridging efficacious animal 

exposures to safe human exposures. Based on these data, the FDAapproved 

raxibacumab for the treatment of adult and paediatric patients 

with inhalational anthrax due to Bacillus anthracis, in combination with 

appropriate antibacterial drugs, and for prophylaxis of inhalational anthrax 

when alternative therapies are not available or appropriate. However, this 

approval requires GSK to conduct post-marketing studies, such as field studies, 

to verify and describe raxibacumab’s clinical benefit and to assess its safety 

when used as indicated, and the role of Big Data analysis is far from over. GSK 

has submitted a field study protocol to evaluate the effectiveness, suitable 

human dosage and safety of raxibacumab use for Bacillus anthracis infection 

in the US. This phase four, open-label study will be the first human study to 

collect data on Bacillus anthracis-infected or exposed patients treated with 

raxibacumab. It will also be the first study to gain a better understanding

55 


of the clinical benefit and safety of raxibacumab in human subjects. Data 

collected from this study (observation of adverse responses, measurement 

of antibody concentrations, white blood-cell counts, and so on) will further 

inform patient care and treatment choices for the management of anthrax. 

Conclusions 

The analysis of Big Data played a central role throughout the experience 

with raxibacumab. From the determination of a requirement based on 

scenario-based analysis, to the establishment of a procurement objective, to 

the development of an animal model based on a treatment trigger and the 

eventual approval of raxibacumab, effective data collection, management 

and analysis has been essential. This vital role will continue throughout 

the lifespan of raxibacumab. Anthrax cases arising from natural exposure 

or criminal or terrorist activity may be treated with raxibacumab and new 

data will be collected to expand our understanding of safety and efficacy in 

humans. If used in response to mass-casualty events, additional data will be 

collected on distribution and dispensing to further refine our understanding 

of logistics. Better understanding of the potential use of Big Data across many 

research areas and academic disciplines will help resolve issues surrounding 

the collection and use of these data in an environment in which privacy 

protection and public health needs are at times on opposite sides of the 

balance. 

Chia-Wei Tsai is a Project Officer in the Division of Chemical, Biological, 

Radiological and Nuclear (CBRN) Countermeasures. She is the project lead for 

advanced development and acquisition of medical countermeasures in the 

Antitoxins and Therapeutic Biologics Branch of the CBRN Program. She is also 

the contracting officer representative overseeing two advance development 

contacts and four procurement contracts. She also serves as the Chair for the 

technical evaluation panel for the CBRN antitoxin rolling Business Associate 

Arrangement (BAA). Dr Tsai recently received the Secretary’s Award for 

Distinguished Service 2012 for her contribution in leading CBRN medical 

countermeasures through FDA approval. Prior to joining HHS, Dr Tsai served 

as a scientist at DynPort Vaccine Company, which supports a Department of 

Defense plague vaccine development programme. She also served as project 

lead in the Malaria Vaccine Development Branch in the National Institute of 

Allergy and Infectious Disease. Dr Tsai received her PhD from the University 

of Maryland, College Park in Cell Biology and Molecular Genetics and 

completed her post-doctoral training at Johns Hopkins School of Medicine in 

Pharmacology.

Discussion Groups 

During the afternoon, the conference broke up into focused discussion 

groups, each comprising between ten and twenty delegates. The outcomes 

of these discussion fora are presented over the following pages. 

Discussions were without attribution. The information presented here 

seeks to represent the discussions that took place; there is not always 

robust academic referencing to support the views offered, but it has been 

assumed that if comments made by individual delegates were not credible, 

they would have been rejected by the other members of that group during 

the discussions. Views presented are therefore assumed to be broadly 

supported by the majority of those present. Where possible, transcripts of 

the discussion fora were distributed to the participants during the editing 

process for further comment and clarification. 

There was, inevitably, some crossover of subject matter and topic discussion 

between one group and the next, and where this occurred, comments have 

been amalgamated under one heading to avoid repetition.

Discussion Group 1: The Ethics and Legality of Big 

Data Sharing 

Chair and Rapporteur: Edward Hawker 

Key Issues and Challenges 

• The nature of what is and is not socially acceptable, regardless of 

what is legal, can change over time. This can be situation-dependent 

and is not absolute 

• Individuals do not always read and consider terms and conditions that 

set out privacy and data sharing obligations before accepting them. 

• There are fears that data may be misused to enable discrimination 

against certain individuals and groups 

• Who should be able to look at or have access to the data? How is this 

determined and how can it be enforced? 

This discussion group was asked to consider the ethics and legality of data 

sharing, and how Big Data and Big Data projects affect public perceptions. 

As government moves forward on a digital agenda that is increasingly 

dependent on public participation and acceptance, these issues will become 

ever-more important. 

Ethical Use: A Shifting Concept 

The group felt that a major consideration is how ethics and ethical use is 

defined. In the context of Big Data, ethics can include notions of privacy, 

anonymity, fair use and informed consent but perceptions of these terms can 

change over time. The 11 September 2001 attacks on the US fundamentally 

changed the paradigm and ushered in a new age of security-dominated 

policy and thinking, for example, but many people now see that reactive 

policies such as The US PATRIOT Act 1 went too far – this is a view that was 

independently raised in Discussion Group 4: ‘Individual Privacy versus 

Community Safety’, and will be discussed in further detail there. Future 

events may again change public perceptions and attitudes. 

The group agreed that a strong component of ethics is proportionality but 

determining what is proportionate to any given situation is also difficult. 

Collecting all of the available data may ensure that nothing of key importance 

is missed, but may be difficult to justify as proportionate and ethical. Targeted 

data collection, factoring in proportionality, may be seen as a more ethical 

approach but risks missing information that might later turn out to be of 

value. 

1. The US PATRIOT Act: Preserving Life and Liberty, , accessed 16 June 2014.

58 


Data Collection 

The participants then moved on to discussing the ethics of collecting and 

sharing data, and how consent is requested and obtained from the public 

to enable their data to be shared between organisations. Members of the 

public generate large volumes of data every day, via social-media platforms, 

online purchases, electronic tickets such as the Oyster cards used on 

London’s public transport, and geo-location data on mobile phones to name 

just a few examples. These data may be gathered and stored by the user’s 

mobile-phone operator, Internet service provider, the retail sites visited or 

bank used, under terms and conditions to which they have ostensibly agreed 

and act as a legal agreement between the user and company. Most people 

do not read these terms and conditions when they accept them, however, 

and so it can be argued that they do not truly understand what they are 

signing up for and would not sign away so many rights if they did understand. 

Companies then (rightly) claim that they are acting within the law when 

they share data with other organisations or even sell customer information 

to third parties, but there are questions over how ethical such behaviour 

can really be considered to be. The challenge identified by many of the 

participants was that there is no negotiation involved – terms and conditions 

must either be accepted, or the user will not be able to use the service. 

There is not generally an option to accept some of the terms while rejecting 

others, or to opt into some aspects of the service without accepting all of 

the terms and conditions. Could academia suggest ways in which different 

levels of privacy settings and data-sharing agreements might be built into 

online systems, so that customers genuinely have a choice in whether or not 

to accept the terms they are offered? 

Data Protection 

Next, the group discussed who should be allowed to look at data. There was 

general agreement that only authorised personnel should have access, but 

this raised the questions of who can be considered authorised and what 

protection this really gives. Insiders may be authorised but still may not act 

ethically: the actions of Edward Snowden, who was an authorised US National 

Security Agency (NSA) contractor when he passed classified information 

to the Guardian and Washington Post, were raised here. Snowden stole 

information and then released it on the web and to journalists – and yet may 

commentators would consider his actions, which were illegal but highlighted 

widespread US government surveillance of citizens’ private communications, 

to be more ethical than the actions of the NSA and other government 

agencies. The group agreed that there is a need to monitor those who handle 

the data and to make sure that they do it responsibly, as well as monitoring 

the data themselves; this also highlights a need to adopt corporate social 

responsibility procedures in relation to data handling in private companies.

59 


Under the UK’s Data Protection Act, 2 members of the public have the 

legal right to ask data controllers what information is being held on them. 

However, there are many situations in which the information does not have 

to be disclosed, particularly if it compromises someone else’s privacy. The 

group felt that there is a lack of public knowledge about these rights and 

the legal framework to protect user information, particularly where data are 

collected anonymously and used to draw conclusions about individuals or 

groups more generally. 

The Changing Nature of Surveillance 

Participants saw the way in which UK society is subjected to surveillance, 

and how this has changed over the last decade, as an important ethical 

issue. New technologies such as smartphones are able to generate more 

data and more accurate information than any time previously, and generate 

some information – such as the user’s current location – automatically. This 

suggests there is an ‘almost unconscious’ acceptance of surveillance by those 

who buy a smartphone, and also implies that people do not consider this to 

be surveillance in the same way they would if the government was tracking 

them. There is a role for academia in explaining why people’s perceptions of 

what is ‘snooping’ and what is not appear to differ so dramatically depending 

on who is collecting the data. 

The group felt that the willingness with which the public sign away their 

privacy rights online suggests that there is a balance between convenience 

and security which may warrant further research. Most people willingly give 

up some (if not all) of their security because it is convenient for them to 

use the service being offered without questioning what this might enable 

others to do with their data. Such information can easily be extracted by 

cyber-criminals and used for illegitimate purposes, as well as by legitimate 

agencies, but many users do not understand the potential dangers or the 

security vulnerabilities. Better education would help to ease some of the 

challenges, and research into how this might be delivered and accepted 

by the public would be of benefit: the group felt that most people do not 

understand where the information they share over social-media platforms 

actually goes. A compromised social-media account can give large amounts 

of personal information to fraudsters, which can enable them to then target 

scams very precisely. 

There is a strong perceived correlation between conventional media and 

social-media privacy settings: the group thought that people do change 

their security settings when privacy breaches are reported in the media. This 

highlights the power that the media has to shape perceptions of privacy and 

security, but none of the group – at the conference or subsequently – were 

2. Data Protection Act 1998, , 

accessed 16 June 2014.

60 


able to provide any proof or reference to studies that show that people do 

actually change their behaviour in these circumstances. Further research on 

both the perception of how behaviour is affected, and how in fact it actually 

is affected, is needed. 

The group also felt that it was important to consider the fact that much of 

the information that can be derived from social-media communication is 

in the form of metadata (the context rather than the content of the data, 

such as when a message was sent, and where it was sent from). This raises 

additional challenges for privacy and ethics, as if it is possible to see who 

has spoken to whom and when, without looking at precisely what they 

said, there are different levels of privacy that may need to be considered 

separately. This raised a number of questions, including: where are the 

boundaries of consent? Does the ethics of a situation change depending on 

whether all information is freely available, or only the metadata? How would 

this affect a message posted in the private areas of a message forum, or sent 

as a private e-mail which is then forwarded to people beyond the original 

intended participant? 

Information Requests 

A further ethical issue surrounded the extent to which it is acceptable for the 

private sector to pass information to the government when the latter requests 

it. Companies such as Facebook and Google have the choice of whether or 

not to comply with such requests, and the ethics of this can be complicated 

depending on which government is asking for the data. Academics such as 

Baker and Tang have explored these issues in more depth. 3 If a company fails 

to comply with a government request they may receive a court order forcing 

them to do so. This may make them more likely to comply with the first 

request out of convenience, opening them up to criticism of being too ready 

to ‘cosy up’ with government and for not being protective enough of their 

customers’ data. There are a number of ethical dilemmas for companies in 

this context. Their concern with customer focus and public image may make 

them less likely to comply with requests they think will lead to negative public 

backlash, for example. It is not their job (nor necessarily in their interests) to 

capture or highlight potential terrorists or criminal activities. Nevertheless, 

private companies could and should take a more proactive stance against 

criminal activity that might be detected by looking for it more closely within 

the data they hold. The group acknowledged that banks in particular have 

become more proactive recently, especially in relation to money laundering. 

Negative sanctions such as those handed out to HSBC in relation to failing 

3. Jane Stuart Baker and Lu Tang, ‘Google’s Dilemma in China’, in Steve May (ed.), Case 

Studies in Organizational Communication: Ethical Perspectives and Practices, 2nd ed. 

(Chapel Hill: Sage, 2012), , accessed 17 June 2014.

61 


to maintain effective anti-money-laundering programmes as much as social 

corporate responsibility have played a large role in this shift in behaviour. 4 

Predictive Analytics 

A final area the group discussed was predictive analytics – analysing data to 

predict how people may behave in future. This has the potential to prevent 

crimes and enhance public safety, but is ethically contentious. Concerns were 

raised that information gained on individuals may be used to stereotype entire 

groups. The application of ideas such as the ‘broken windows theory’, 5 which 

states that communities where low-level crime is endemic are predisposed 

to more serious crime, are not universally accepted. While arguments for 

predictive analytics would claim that identifying and tackling low-level crime 

will help to prevent more serious misdemeanours (and indeed, when Police 

Commissioner William Bratton applied the theory to turnstile jumpers on 

the New York subway in the early 1990s, crimes of all kinds on the transport 

system decreased), critics express concern that such approaches can lead 

to negative categorisations of some members of society. Academia could, 

however, help to analyse data and identify potential trends which could 

then be explored in more detail through a multidisciplinary approach 

involving behavioural psychologists, sociologists and criminologists, as well 

as computer scientists. 

Suggested Research Topics 

• An in-depth examination is needed of public understanding of the 

surveillance and privacy debate, to provide recommendations that 

will encourage more people to engage in shaping future policy 

• Academic research can help to explain why people’s perceptions of 

what is ‘snooping’ and what is not appear to differ so dramatically 

depending on who is collecting the data 

• Research is needed into how to educate people not to willingly give 

up data without questioning what this might enable others to do with 

those data. Many users do not understand the potential dangers or 

the security vulnerabilities 

• Academia should suggest ways in which different levels of privacy 

settings and data-sharing agreements can be built into online systems, 

so that customers genuinely have a choice in whether or not to accept 

the terms they are offered. 

4. ‘HSBC Holdings Plc. and HSBC Bank USA N.A. Admit to Anti-Money Laundering and 

Sanctions Violations, Forfeit $1.256 Billion in Deferred Prosecution Agreement, 

Department of Justice Office of Public Affairs’, 11 December 2012, , accessed 17 June 2014. 

5. James Q Wilson and George L Kelling, ‘The Police and Community Safety: Broken 

Windows’, Manhattan Institute, 1982, , accessed 19 August 2014.

Discussion Group 2: Policing, Terrorism, Crime 

and Fraud 

Chair: David Smart 

Rapporteur: Philippa Morrell 


• Data analysis needs to begin somewhere and follow the best direction. 

How can the most appropriate leads to follow be identified? 

• Missing data, where data should be expected, might provide as much 

information as the analysis of data themselves. How can such gaps be 

identified and analysed? 

• Is the apparent lack of skilled security data analysts due to a genuine 

skills gap or better career prospects and higher pay in other sectors 

for people with the appropriate skills? 

• How can the benefits of Big Data be measured and proven, to avoid 

it becoming a sink for valuable resources that would be best used 

elsewhere? 

Since the 11 September 2001 terrorist attacks on the US, there has been greater 

collaboration between police forces, security services and governments at 

the international and national levels. Links between terrorism and crime – 

and, in particular, acquisitive crimes such as fraud and money laundering 

– have been identified, studied and researched extensively. This discussion 

group considered how new approaches to the data available might help to 

improve the quantity and quality of the linkages that are being made. The 

group also discussed whether there are other sources of data, not currently 

analysed for this purpose, which might also yield valuable information and 

intelligence. 

A key point to consider is that effective analysis of data depends on having a 

lead (or leads) in the first place, which helps to identify what the analyst – or 

the analytical tool – is looking for within the data. These starting points can 

then be developed to provide further insights or information. A lack of data 

where they might be expected can, however, be just as important a lead, but 

is much more difficult to spot. A particularly useful research focus, therefore, 

may be to look at developing new methods of data analysis to identify the 

‘unknown unknowns’, described by one member of the group as ‘the Holy 

Grail of intelligence work’, by spotting anomalies within the more routine data 

that could then be subjected to further analysis and investigation. The group 

thought that, at present, gaps in the data are not sufficiently recognised as 

such, nor effectively interpreted.

63 


The group also felt that while identifying phenomena such as linkages in the 

data can be relatively easy to do automatically, it is much harder to attribute 

significance (or the quality of the significance) to those links without a human 

analyst involved in the process. A good understanding of what the linkages 

mean is needed in order for them to be given value and for any consequent 

interventions to be effective. The group felt that more research is needed on 

the relationship between qualitative (human) and quantitative (automatic) 

analysis in making effective interpretations. 

Regional Differences 

Some of the delegates felt that once technological solutions are available, 

there is a tendency to apply the same technology everywhere in order to 

promote standardisation and interoperability, but this risks introducing 

a ‘one-size-fits-all’ approach that is not equally applicable to all situations 

or regions. The group felt that this was important as local initiatives for 

countering extremism and radicalisation within at-risk communities often 

work because of local conditions at a particular point in time; what works 

in one place is not automatically transferable to another location or even 

repeatable within the same community. Good policy is often derived from 

historic experiences – for instance, if a particular approach has worked in 

one area of the country, or on one operation, there may be a push for it to 

become standard policy to use it in others, but this has not always proven 

successful. The subtleties and reasons that programmes have been successful 

may not be easily extractable from data, but may be more apparent to a 

human analyst. 

There have been a number of short-term solutions in recent times, but 

the group felt that there needs to be a longer-term view, with a greater 

emphasis on determining what is well understood and what is not. For 

example, between two and five years after a specific event, retroactive data 

analysis could help to assess impacts and changes that have occurred in the 

intervening period, and perhaps identify which have been the most effective 

responses, so that they can be focused on more strongly. There is also a need 

to ensure that all available data are considered: the most effective responses 

may not have been the official ones. For such an approach to work, however, 

there needs to be a high level of detail and the quality of the data has to be 

assured – quantity alone is not a guarantee of the results, nor that the data 

gathered will be useful. 

Research to evaluate vigorously what has and has not worked well may 

help best practice to be identified and also prevent mistakes from being 

repeated. Better ways to identify the ‘starting state’ are also needed, so that 

there is a baseline against which success or failure can be measured. This 

will help to determine how success or failure will be evaluated (which the 

group felt is often lacking at present) and may provide better understanding

64 


of how multiple concurrent interventions can be evaluated separately and 

individually to test which are most effective. 

In order for data to be useful, they need to be multidimensional. This requires 

robust methodology behind how the dimensions are set, and thus what data 

should be collected. The most valuable data do not always relate to what is 

happening ‘on the ground’; they may be more subtle – such as where stolen 

credit cards are being used, rather than where they are being stolen – and 

this in turn may involve comparison at the local, national and international 

levels. A credit card stolen in one area or country could be used in another to 

order goods from a company based in a third, which are delivered to a fourth 

location. The data sets analysed need to be relevant to the answers required, 

but the ability to integrate many different data sets, and to do this much 

faster and more efficiently, does not by itself provide the cultural context. 

Implications of Real-Time Analysis 

Big data provide new timeframes for the collection and analysis of 

information, enabling real-time processing and real-time updates, as well as 

data collection over long periods of time. Real-time processing, combined 

with analysts who have an understanding of the context in which the data 

have been collected, can enable ‘non-normal’ trends to be picked out 

more easily. The use of hard and soft intelligence combined with Big Data 

might enable the detection of those hiding in plain sight, for example, using 

predictive analytics against the norm that looks at deviant behaviour at an 

individual level rather than at the community level. 

More research is needed to determine the circumstances in which Big Data 

is likely to be most relevant – at the tactical level, or at a more operational 

or strategic level to enable decision-making. What analytical methods are 

required to understand the data being created and do these differ depending 

on what outcome the analysis is aiming for? The group also questioned 

whether an overemphasis on the potential benefits of Big Data make it a 

sink for resources – and whether there is proof that they add real benefits. 

Useful Data Extraction 

When collected and applied appropriately, Big Data can combine and share 

many sources of data and information to provide pertinent intelligence. Big 

Data enables the use of multiple sources of data and provides an ability to 

filter these data in new and innovative ways. It is also dependent on the 

ability to strip out unnecessary and extraneous data. Data weeded out by 

the filters can still be analysed but their removal from the main data sets 

raise questions over data quality. This can be a particular issue where data 

are collected centrally for use by different organisations, as each needs a 

different portion of the complete data set: the point(s) at which data are 

removed will have a strong influence on what the remainder can be used for,

65 


as information that is extraneous to one organisation may be highly insightful 

to another. In addition, if data are going to be shared across a number of 

organisations, underlying definitions of how and with whom the data are 

intended to be shared will need to be agreed and understood by all parties 

for the process to work seamlessly. 

In any data analysis, it is important to know who and what is being looked 

at to ensure that the cultural aspects are considered and understood. For 

instance, one participant offered an example of data that treat the British 

Kashmiri and Punjabi populations as the same. Both are defined as either as 

Muslims, or at a more granular level by the language they speak (Punjabi), 

whereas there are very distinct historical and cultural differences between 

the two groups. If the data cannot distinguish between them, it will not be 

possible to highlight a trend within one group that is not present in the other. 

A further consideration is that when analysing any data there is a danger 

of pre-supposition of what the data may represent (or what the collector 

wants them to represent) and a misuse of them because of this. The actual 

underlying causal relationship is overlooked or ignored. An example here is 

a recorded increase in the number of neonatal deaths in Japan following 

the Fukushima Dai-ichi nuclear power plant accident: 1 the increase appears 

to point to radiation exposure causing the deaths, but was in fact due to 

increased maternal stress following the evacuation of homes damaged in 

the earthquake and tsumani, leading to an increase in premature births, 

combined with damage to hospitals, which meant that neonatal care was 

compromised. The neonatal deaths and the damaged power station shared 

the same root cause – the earthquake and tsunami – a different relationship 

from what one interpretation of the data might suggest. Comparing 

neonatal death rates closer to Fukushima with those further away, in areas 

less affected by the nuclear power station, but which had suffered similar 

earthquake damage, would help to provide more accurate data analysis. 

Human analysis can also dismiss links that are relevant, however – one of 

the group gave an example in which one of the men subsequently involved 

with plotting the 7 July 2005 bombings on the London Underground had 

previously shown up in police analysis of a terrorist network but had been 

dismissed for further analysis as he was ‘only’ involved in financial crime, 

and so was assumed to be an ‘ordinary’ criminal who just coincidentally 

overlapped with the terrorist network. 

It is important to have ‘checks’ that guard against this: techniques are available 

that enable an initial filtering of data which can then be revisited, so that 

1. Alfred Korblein, ‘Infant Mortality in Japan after Fukushima’, December 2012, , 


66 


data can be analysed in line with presuppositions and then remodelled with 

the excluded data reintegrated, in order to see whether and how the results 

differ, thus testing the validity of initial assumptions. Humans inevitably look 

for patterns and may find them where they do not exist. 

Social-Network Analysis 

There is much potential for Big Data to help identify linkages when used in 

conjunction with social network analysis – the mapping and measuring of 

relationships and flows between people, groups, organisations, computers, 

URLs and other connected information. 2 Identifying the key nodes of a 

network through social-network analysis will help to generate further 

leads. The group felt that this tends to work better with regard to lower 

level operatives, but it can also lead to the very top of the network – the 

classic example being the role social network analysis played in the capture 

of Saddam Hussein. 3 Individuals found via social-network analysis are often 

known to law enforcement from other operations that may or may not be 

considered to be relevant. Amalgamating data sets from many different 

operations may help to indentify linkages by providing an overview that 

will help to put the social-network analysis in context but, to return to the 

issue of automated versus human analysis discussed earlier in this paper, 

social network analysis is likely to throw up many low-level links, only some 

of which are relevant. It may take strong cultural understanding and human 

analysis to decide which of the many options are worth pursuing. 

Data Lessons from the Private Sector 

The group discussed whether and to what extent the commercial sector’s 

experiences with Big Data can be applied to national security. For example, 

commercial approaches enable companies to market particular products to 

particular individuals based on their previous buying behaviour and, while 

it may not be immediately obvious that this could aid counter-terrorism or 

serious organised crime operations, there is potential benefit in being able to 

analyse someone’s previous behaviour in order to predict and influence their 

future behaviour. The group expressed concern that for such an approach 

to work, high numbers of skilled human analysts need to be directed at the 

available data sets, and need to be able to understand additional background 

information and context. There is currently a shortage of appropriate skilled 

security analysts available but it is not clear whether this is a real skills 

shortage or a funding issue: whether there are too few people with the 

appropriate skills, or whether those who have these skills are being attracted 

2. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014. 

3. ‘Case Study: The Capture of Saddam Hussein, War 2.0: The National Security and the 

Science of Networks’, , 


67 


to commercial sector marketing jobs rather than public-sector security work 

because of the salaries offered. 

Finding out how national security can benefit from the experiences of the 

commercial sector will be a continuous learning experience; a multidisciplinary 

approach is needed involving social scientists as well as computer experts. 

Summary 

The analysis of large amounts of data and diverse data streams needs to 

be multidisciplinary and multidimensional. Big data can enable real-time 

processing of large amounts of information, but for this to be of value in the 

policing, terrorism, crime and fraud arenas, a better understanding is needed 

of the value that can be added by the data being collected and analysed, along 

with more analysis of precisely what this value is and how it is added. The 

financial sector, in particular, needs to guard against complete automation 

in the detection of anomalies: there is no replacement for human analysis. 


• More research is needed into how data analysis (and data analysts) 

can identify and interpret missing data and data on deviations from 

the expected norm. Real-time processing, combined with analysts 

who have an understanding of the context in which the data have 

been collected, will help such trends to be picked out more easily and 

interpreted appropriately. Predictive analytics against the norm are 

needed, which can look at deviant behaviour at an individual level, as 

well as at the wider community level 

• A better understanding is required of how to link data to underlying 

causes, along with methodology that can guard against the negative 

influence of supposition. Techniques need to be developed that 

remodel data sets with excluded or removed data reintegrated, so 

that results can be compared and differences analysed in order to test 

the validity of the initial assumptions 

• Better research is needed into ways to remodel data and test 

assumptions so that a more detailed picture can be built of how data 

reflect assumptions. This may help to identify whether some leads 

are currently being missed because of inherent biases in the way the 

data are approached.

Discussion Group 3: Health Data, Public Health 

and Public Health Emergencies 

Chair: Chris Watkins 


• The public requires honesty and transparency about why health data 

are being collected and what they will be used for. Good communication 

is essential for building trust in health databases and data sets 

• Allowing some degree of personal choice over what data are stored, 

who they might be shared with and in what situations will help 

individuals feel in control of their data. This may be important in 

gaining acceptance for new public health data strategies, however 

the number of people who choose to sign up to the NHS Organ Donor 

Register (ODR), as an example, has been disappointingly low 

• The public’s concern over possible discrimination resulting from 

health information kept on individuals, and of the potential misuse 

of data, appears to be creating a negative impression of large health 

data projects. This needs to be addressed before health information 

possibilities can be fully realised. 

This group discussed the ways in which big data might be used to positive 

effect in public health, both in general and during public health emergencies. 1 

The main benefits the group identified include opportunities to improve 

surveillance so that outbreaks of disease are detected more quickly and more 

accurately, particularly by developing opportunities for self-reporting through 

social-media and mobile communications platforms and by improving the 

dissemination of information during the response to an outbreak. 

The group felt that the systems which exist and are widely used prior to a 

health emergency, but that have the capacity to also cope during the crisis, 

are likely to be more useful than entirely novel systems that only come into 

play in extreme situations. One valuable area of research, therefore, would 

be to consider how easily existing systems could morph from ‘business as 

usual’ to ‘public health emergency’ conditions, and what additional features 

or functionalities might need to be added to the normal system to enable 

this. For instance: how easily could current surveillance systems cope with 

significantly increased data traffic, or more frequent updating? How rapidly 

can different surveillance systems be aggregated together and analysed? 

1. A public health emergency of international concern is defined by the World Health 

Organization as ‘an extraordinary event [that] constitute[s] a public health risk to other 

States through the international spread of disease, or that potentially requires an 

international response’, , accessed 19 

July 2014.

69 


Once such technological challenges are met, the group felt that there are a 

number of ways in which data collected over social-media platforms in particular 

can help to improve public health and the response(s) to a health emergency. In 

some cases, studies are already available. During the 2009–10 H1N1 influenza 

outbreak, 2 for example, the NHS set up an online service to enable a more 

effective distribution of the antiviral drug TamiFlu, which was in short supply. 

The technological requirements to set up self-diagnosis and self-reporting 

systems that will help to collect data on the number and location of systems 

are relatively easy to set up – such as an app that would list symptoms and 

allow someone to check which ones they have, report them to the NHS and 

receive advice on whether further consultation with a local pharmacist or 

their GP is needed – but their efficiency and accuracy is affected by people’s 

willingness to engage. This, in turn, would be influenced by the nature of 

the disease. Self-diagnosis and reporting systems could collect accurate data 

on sexually transmitted infections (STIs) for example, helping to locate ‘hotspots’ 

of new outbreaks, while social-network analysis might help to trace 

social contacts from whom the STI might have been caught or who are in 

danger of having it passed on to them, but whether or not people would 

be willing to engage in such self-reporting is a different matter. Apps that 

have proven effective in providing information on seasonal influenza may 

not have been suitable technology to engage during the early days of the 

AIDS epidemic, had such technology been available. 

Privacy Challenges 

Using Big Data for public health benefit is more complex than just the 

technological development. There have been several challenges to public 

acceptance of large health-related data projects to date, generally over 

concerns around privacy, which seem to be particularly acute with regard to 

personal health information. There has been considerable public and media 

backlash to projects such as the NHS Dataspine, 3 which intended to provide 

a central repository for information on more than 70 million patients from 

27,000 individual organisations, and its successor, the Health and Social Care 

Information Centre, 4 which was set up in April 2013 under the Health and 

Social Care Act 2012, to ‘collect, analyse and present UK national health 

and social care data’. Protest groups such as GeneWatchUK 5 and Privacy 

International 6 have raised issues around the mere existence of such a 

2. NHS, ‘Swine Flu’, , accessed 19 July 2014. 

3. Computer Weekly, ‘NHS Data Spine out of Action for 28 Hours in a Week’, 10 January 

2006, , 


4. See , accessed 2 June 2014. 

5. See , accessed 2 June 2014. 

6. See , 


70 


database, including objections to the ease with which it can be accessed, 

who should be allowed access to it for research or surveillance purposes, 

and what rights patients should have to access and amend their information. 

Health professionals and policy-makers see the potential benefits of such 

a database, which will enable the best available treatments to be targeted 

towards individuals based on their health profile and potentially even their 

genome, as far outweighing the potential for misuse, but the UK government 

nonetheless needs to ensure that privacy safeguards are in place and that 

these safeguards are clearly communicated to (and trusted by) the general 

public in order to ensure public and media acceptance of such programmes. 

The group felt that poor communication of the benefits of such systems 

is most likely to be at the heart of the public and media backlash, but 

also acknowledged that public trust in large government data projects is 

undermined by perceptions that government does not act responsibly 

with public data. Such perceptions have been reinforced by the Snowden 

revelations published in the Guardian and Washington Post since June 

2013, in which former US National Security Agency (NSA) contractor Edward 

Snowden revealed that the NSA had collected and stored large volumes of 

Internet communications by private citizens between 2007 and 2013 – in 

collaboration with a number of private sector companies such as Google and 

Amazon, and other national government agencies including the UK’s GCHQ 

– under a strategy of collecting everything, and only then analysing it to 

reveal criminal or terrorist activities, rather than collecting communications 

only where there was reason to suspect those communications may contain 

something untoward. The revelations have caused public and media outrage, 

even though in most cases collection and sharing had been possible because 

the users of platforms such as Facebook, Yahoo and Amazon did not change 

default privacy settings that would have prevented their personal information 

from being shared in this way. Prior to the Snowden revelations, most users 

considered the benefits of data sharing on Facebook, including the amount 

of data that can be shared and the number of people it can reach, to heavily 

outweigh the negatives (and in practice they still do – there is little concrete 

evidence that a significant proportion of individuals are genuinely changing 

their behaviour with regard to privacy settings as a result of Snowden). 

Fear of Discrimination 

Negative attitudes towards Big Data health projects are largely driven by 

fear that the information contained in the data will enable the state, private 

companies or other agents to discriminate against individuals or certain groups. 

For example, certain data might enable insurance companies to discriminate 

against individuals with high-risk profiles and thus affect the cost of insurance 

premiums. An individual with a genetic predisposition to cancer may find it 

difficult to take out life insurance or a long-term loan, such as a mortgage. 

This is in spite of counter-arguments that awareness of their genetic make-up

71 


may make them more likely to follow a healthy lifestyle that avoids known 

triggers to the genetic condition, such as smoking, and to attend regular 

medical screenings that will pick up emerging conditions early and enable 

more effective treatment, which may even increase their life expectancy. 

Fears include perceptions that if the health-care sector had access to large data 

sets and was able to share this information with other organisations, this might 

enable health insurance companies to profile customers to determine whether 

they should be given insurance or not. For example, when a customer signs up to 

a supermarket reward card, they often accept terms and conditions that enable 

their buying habits to be shared with a number of third-party organisations, 

mostly for marketing purposes. The information supermarkets collect about 

people’s buying habits can be used to make inferences about their lifestyle and 

this could in turn be analysed to make predictions about their likely long-term 

health. If supermarkets were to share this information with the health sector, it 

may enable positive health interventions to be targeted towards communities 

or even individuals, and to improve planning on what health-care services might 

be needed in future, but a potential downside might be that health insurance 

companies use the information to determine whether a potential customer is a 

heavy consumer of alcohol, or whether their diet is particularly unhealthy, and 

approve or deny insurance based on this personal information. 

Building Trust in Health Data Projects 

The group felt that with regard to examples such as the one given above, more 

research is needed into what uses people will accept or object to regarding their 

personal data, and in what circumstances. This would help to develop a better 

understanding of how the way in which people are informed of data collection 

(including who is doing the collection, for what reason, and what the data are 

likely to be used for) determines how they will react to it. There may be very 

different reactions from people depending on whether the data are being 

collected by the government or by pharmaceutical companies, and this may not 

be consistent from one country to the next. In Germany, for example, where the 

legacy of Nazi rule has made the population cautious of allowing government 

collection of personal information, most health data are collected by private 

health companies rather than the state, and there are local, federal and state 

differences in the legal structures surrounding data protection and privacy. 

Cultural and historical factors can strongly affect public perceptions surrounding 

the information being collected and can affect how successful data collection is. 

Platform(s) through which information is collected and how it is stored are 

also important to consider. The group felt that is unclear under current EU 

law whether information stored on a mobile device such as a smart phone 

is classed as broadcast information or personal information, and therefore 

whether or not it is protected by privacy laws; more research is needed on 

how existing legislation relates to rapid technological advances.

72 


The discussion group felt that with regard to health-related big data projects 

such as the single NHS database, allowing personal choice through an ‘optin’ 

system may the best approach to building trust. The NHS ODR 7 could be 

used as a model, though it is worth noting that only around half of the UK 

population (54 per cent of women and 46 per cent of men) have opted in. 

While this amounts to many millions of individuals, there are also many 

millions who do not opt in. A survey was carried out on behalf of NHS Blood 

and Transplant in 2013 to find out why people do not sign up to the Register. 8 

The results are shown in the box below. 

Box 1: Reasons why people choose not to opt in to the NHS’s Organ Donor 

Register. 

30% are aware of the ODR but do not know or have confused understanding of 

what it is 

16% are not aware of the ODR or are unable to say what it is 

16% say they do not want to think about their death 

15% say they worry that their family might be upset if they donated their organs 

12% say they worry that they could still be alive when the operation is carried out 

11% say they do not want to donate to someone who does not deserve it 

10% believe they are too old. 

(respondents were allowed to give more than one reason) 

Where opt-in to data collection is required, offering even minor incentives 

(which can be as simple as free access to the service, even if the company 

offering the services makes a profit from the individuals signing up) 

encourages individuals to sign up. For example, the market research company 

YouGov offers points for signing up and participating in its surveys, which can 

be redeemed for financial remuneration and other rewards. 9 Equally, making 

it difficult or inconvenient to opt out can also ‘nudge’ behaviour in a certain 

7. NHS, ‘Organ Donation, How to Register’, , accessed 3 June 2014. 

8. Figures provided on 10 June 2014 by NHS Blood and Transplant, from a commissioned 

market research report carried out by Optimisa in 2013. 

9. YouGov, ‘Join the YouGov Panel Today!’, , 


73 


direction. Facebook and most social-networking sites set low default privacy 

settings, as the company wants to be able to collect as much information 

on individuals as possible. A conscious decision has to be made to change 

to higher privacy settings (though this is starting to change following the 

backlash from Snowden). As the easiest and least inconvenient course of 

action for a new user is to accept the low privacy settings, most people tend to 

do so. With regard to health data projects, the group felt that opt-in systems 

would be more likely to generate trust if they began with very high privacy 

settings, and allowed users various levels of opt-out if they were happy for 

their information to be shared with other government departments, research 

institutions or private-sector companies. 

Targeted Data Collection 

The group felt that an important consideration in the collection of health data 

– and, in fact, any data – is that the collection must be targeted to ensure that 

collection is necessary and efficient. Data should not be collected for the sake 

of collecting data. There are important distinctions here between targeted 

analysis (which knows what is being looked for and seeks actively to find it) 

and general pattern analysis (the recognition of patterns and regularities in 

data, without necessarily being aware of what those patterns mean without 

further analysis), though both are useful in determining dynamics within a 

data set. For example, general pattern analysis in public health may pick up a 

sudden increase in the quantities of influenza drugs being prescribed, and the 

location(s) in which the prescriptions are being made, which might indicate 

and identify the beginning of a new pandemic, while targeted analysis might 

then aim to track family members and colleagues of the individuals showing 

symptoms, so that they can be tested and offered preventative treatments 

before symptoms develop. Data subjected to general pattern analysis 

should be anonymised (as the patterns will emerge whether the data are 

anonymised or not), whereas it is very difficult to carry out targeted analysis 

on anonymised data. This raises questions around how and at what stage 

data are anonymised – the collection of only anonymised data may make 

meaningful interpretation or practical action difficult at a later stage, but if 

the public is not convinced that data are being appropriately anonymsied, 

they may be unwilling to provide data in the first place. 

Summary and Conclusions 

Big Data offers enormous potential benefits to health care, including 

improved surveillance so that outbreaks of disease are detected more 

quickly and accurately (particularly by developing opportunities for selfreporting 

through social media and mobile communications platforms) 

and by improving the dissemination of information during the response to 

an outbreak. While the technology exists to enable this, in most cases the 

more pressing challenges surround people’s unwillingness to engage. Public 

acceptance of large health-related data projects has met several challenges to

74 


date, generally over concerns around privacy, which seem to be particularly 

acute with regard to personal health information. 

Communication of the benefits of such systems needs to be improved to 

ensure public trust in large government data projects is not undermined 

by perceptions that government is not capable of acting responsibly with 

public data. In particular, there are fears that if the health-care sector has 

access to large data sets, and is able to share this information with other 

organisations, this may enable health insurance companies and other 

private-sector companies to profile customers and discriminate against 

them. These fears need to be addressed, and better understanding is needed 

on how information regarding such data collection projects can be best 

communicated. 

An important consideration in the collection of health data – and in fact any 

data – is that the collection must be targeted to ensure that collection is 

necessary and efficient. In order to win (and maintain) public confidence and 

trust, data should not be collected for the sake of collecting data. 


• Research is needed on how easily existing data collection and 

surveillance systems could morph from ‘business as usual’ to 

‘public health emergency’ conditions, and what additional features 

or functionality are needed to enable this. How easily can current 

surveillance systems cope with significantly increased data traffic, 

or more frequent updating? How rapidly can different surveillance 

systems be aggregated together and analysed? 

• Further research is needed into what uses people will accept or object 

to regarding their personal health data, and in what circumstances. 

This would help to develop a better understanding of how the way in 

which people are informed of data collection (including who is doing 

the collection, for what reason, and what the data is likely to be used 

for) will influence how they react to it 

• Research is needed into how best to encourage people to opt in 

to data collection schemes, possibly by offering minor incentives 

that encourage individuals to sign up. The volume of health-care 

data available makes them ideal for subjection to general pattern 

analysis. This should be anonymised (as the patterns will emerge 

whether the data are anonymised or not), but the analysis may 

reveal where targeted interventions will have maximum impact; this 

requires the data to be linked to the individuals who will receive 

that intervention. Research is needed on how and at what stage 

data should be anonymised and de-anonymised during the analysis 

process.

75 


Box 2: Social media and health emergencies. 

As well as being a potential tool for surveillance and data collection, social 

media offers opportunities to influence the affected population to take up (or 

refrain from) certain behaviours during health emergencies, including enabling 

discussions to take place over whether or not suggested behaviours should be 

followed. In Asia, a legacy of the 2002–03 outbreak of Serious Acute Respiratory 

Syndrome (SARS) is that many people now wear a protective face mask when 

they have a cold, or to prevent them from catching a cold. Such masks are not as 

popular in Europe, however, and there is considerable disagreement over their 

usefulness in restricting the spread of infections. Social-media platforms could 

be used to promote discussion on whether wearing masks is useful for not, 

and where the medical community agrees that a behaviour is beneficial, social 

media could be used to disseminate advice. It could also be used to provide 

updates on the spread of infections (such as the number and location of cases) 

and predictions on where the disease is likely to spread next. 

Community Engagement 

During a pandemic or other serious disease outbreak, social media could 

also be used to raise and support community organisations and organise 

volunteers to carry out activities beneficial to the community, such as 

organising shopping collection and delivery for those infected, so that they 

do not need to leave the house, and organising regular cleaning of lifts, 

staircases and other shared areas in blocks of flats. An example of such a 

scheme is ‘FluFriends’, 10 which during the H1N1 pandemic encouraged people 

to arrange who would be able to collect antiviral drugs for them should they 

become infected, do their shopping for them, and who they could phone 

regularly to say how they were feeling. More recently, FloodVolunteers 11 

has enabled individuals to come together and request assistance or offer 

expertise and skills to help those affected by flooding. The technology 

needed to co-ordinate community efforts during a crisis is simple, but the 

behavioural factors that will determine whether or not people sign up to 

such community networks and actively engage with them are more complex. 

10. Margaret Lally, ‘Flu Friends – a Possible Alternative’, British Red Cross, , 

accessed 2 June 2014. 

11. FloodVolunteers, , accessed 3 June 2014.

Discussion Group 4: Individual Privacy Versus 

Community Safety 

Chair and Rapporteur: Jennifer Cole 


• There are situations in which individual privacy and community 

safety may be in direct conflict, as ensuring community safety may be 

dependent on intruding on personal privacies 

• Most of the public appear to expect higher levels of data protection 

and assurance from the public sector than they do from the private 

sector; they accept supermarkets in particular gathering, storing 

and sharing large amounts of information on them, but object when 

government wants to do the same 

• Most surveillance legislation in the UK came into force before socialmedia 

platforms were widespread, resulting in confusion over how the 

current laws relate to this form of communication, and in particular 

over what constitutes private communication on social media. 

The aim of this discussion group was to consider situations in which 

requirements for individual privacy and community safety might come into 

conflict with data collection and sharing, and how academic research might 

help to provide a better understanding of, or solutions to, this dilemma. 

The group was provided with examples to consider, such as the police 

and security services conducting surveillance that might be considered an 

invasion of privacy on individuals who pose (or are suspected of posing) a 

danger to the wider community because of extreme views or suspected links 

to terrorist networks. To what extent does the need to protect society from 

such individuals justify monitoring not only the individuals themselves, but 

also their social networks, including friends, family members and colleagues? 

An example presented was of health services tracking the movements of an 

individual suffering from (or suspected to be suffering from) an infectious 

disease that might spread to others, or monitoring their recent social media 

communications to actively identify people who might have been in contact 

with them so that they can be contacted for treatment. 

The group was asked to discuss whether legislation around such surveillance, 

data collection and data sharing should be absolute: equally applicable 

to a gang of youths suspected of anti-social activity, such as graffiti, as to 

a suspected terrorist network, for example, or during a routine disease 

outbreak involving mild symptoms or a pandemic of life-threatening severity. 

The group was also asked to discuss how political and public opinion shapes 

surveillance and data legislation, how attitudes change over time, and what

77 


the drivers of change are likely to be. Are there specific drivers or qualifiers 

that enable new legislation to be introduced and accepted? 

The 11 September 2001 terrorist attacks on the US created an atmosphere 

of fear that has led to greater social acceptance of surveillance when this is 

explained as being for security purposes. The US PATRIOT Act was given an 

example of a measure that has curtailed liberties in favour of security. 1 The 

group discussed what impact legislation such as this has on privacy, and at 

what point the balance between security and privacy might be considered 

to have tipped too far towards security. In times of great need, rules may 

change, but how and through what processes this should be allowed and 

accepted are not currently understood. 

Determining the Risk Threshold for Collection 

The conference was held less than a year after the Snowden releases, at a 

time when new revelations about US and UK government actions under the 

controversial PRISM programme were coming to light each month. 2 Privacy 

and surveillance issues were therefore fresh in participants’ minds, and the 

relationship between them was still a very controversial issue. This applied 

particularly to widespread surveillance programmes that collect large volumes 

of data against relatively low-risk thresholds – in other words, surveillance 

programmes that lean far more towards security than privacy. A key point, 

the group felt, was to determine what level of activity, or connection to a 

network under surveillance, justified placing that individual or community 

under surveillance themselves. Set the bar higher for privacy, and the risk 

is that individuals who are involved will not be identified; set the bar higher 

for security, and the authorities will be accused of ‘snooping’ on innocent 

people. The group recognised that this is a dilemma for governments to 

which there is no easy answer, but that academia could help by researching 

attitudes to privacy and surveillance and identifying what decisions are more 

likely to be accepted, and why. 

The group broadly agreed with US President Barak Obama’s declaration that 

it is impossible to have ‘100 percent security and then have 100 percent 

privacy’, 3 and acknowledged that the actions of the UK government with 

regard to PRISM have been legal and based around existing legislation. There 

were concerns, however, that such surveillance was only legal because of the 

way in which the public seemed happy to sign away privacy rights in the terms 

1. The US PATRIOT Act: Preserving Life and Liberty, , accessed 16 June 2014. 

2. For a full explanation of PRISM, see Leon Kelion, ‘Q&A: NSA’s Prism Internet 

Surveillance Scheme’, BBC News, 1 July 2013, , accessed 16 June 2014. 

3. Paul Adams, ‘Barack Obama Defends US Surveillance Tactics’, BBC News, 8 June 2013, 

, accessed 16 June 2014.

78 


and conditions they accept when signing up to social-media platforms such 

as Facebook, and because most surveillance and data assurance regulation 

includes clauses stating, or are worded in such a way, that individuals’ rights 

to privacy can often be over-ridden in situations that are loosely defined 

in legislation using terms such as ‘public safety’, and ‘immediate danger’, 

without expressing what would constitute such a situation. Such wording 

can therefore be argued to amount to ‘get-out clauses’ that enable privacy 

to be overridden whenever the government decides. 

Article 8.1 of the UK Human Rights Act 1998, 4 which came into force in 

October 2000, states that ‘everyone has the right to respect for his private 

and family life, his home and his correspondence’. This is a qualified right, 

however, which can be overruled by Article 8.2, which states: 5 

There shall be no interference by a public authority with the exercise of 

this right except such as is in accordance with the law and is necessary 

in a democratic society in the interests of national security, public 

safety or the economic well-being of the country, for the prevention 

of disorder or crime, for the protection of health or morals, or for the 

protection of the rights and freedoms of others. 

In other words, perceived danger to the wider community or society trumps 

the right to privacy of the individual. 

A second key piece of UK legislation affecting this debate is the Regulation of 

Investigatory Powers Act (RIPA) 2000, 6 which was introduced to modernise 

laws relating to the interception of communications in order to protect the 

public adequately from terrorism, cyber-crime and online paedophilia, and 

has attracted criticism from civil rights and privacy campaigners such as 

Liberty 7 and The Open Rights Group, 8 which refer to it as ‘The Snooper’s 

Charter’. Section s26(2)c states that the Act ‘allows covert surveillance where 

there is “immediate danger”’, including directed surveillance undertaken for 

the purposes of a specific investigation or operation. This would appear to 

allow surveillance that would actively seek out and identify individuals whose 

4. Human Rights Act 1998, Article 8, Right to Respect for Private and Family Life, , accessed 16 June 2014. 

5. Human Rights Act 1998, Schedule 1, Article 8, Right to Respect for Private and Family Life, 


6. Regulation of Investigatory Powers Act 2000, , accessed 16 June 2014. 

7. Liberty, ‘State Surveillance’, , accessed 16 June 2014. 

8. Digital Surveillance, ‘Why the Snooper’s Charter is the Wrong Approach: A Call for 

Targeted and Accountable Investigatory Powers’, The Open Rights Group, , accessed 

16 June 2014.

79 


private correspondence suggested that they may be involved in activities 

that pose a threat, or potential threat, to wider society; though, again, the 

exact terms of this threat are not specified. 

While RIPA dictates what information can be sought out by the state during 

exceptional circumstances, the Data Protection Act 1998 9 (in particular the 

emergency powers set out under Schedule 2 a, c and d) determines to what 

extent information that might normally be protected can be shared; clauses 

exclude protection where the processing is ‘necessary’ and ‘for the exercise 

of any functions of either House of Parliament’. The Act is primarily concerned 

with protecting confidentiality and imposes a duty on organisations to 

ensure that data are used only for authorised purposes and are properly 

protected (HSC/99/012). In addition, the Data Protection Act enables sharing 

where ‘the data subject has given his consent to the processing’, which has 

been hotly debated with regard to how the terms and conditions on socialmedia 

platforms are accepted by users when they join, and whether this can 

genuinely be interpreted as informed consent. Since the conference, at least 

one court case has challenged the right of Facebook to pass information to 

the US security agencies. 10 The discussion group also noted that the laws 

relating to surveillance, data collection and sharing came into force before 

social media platforms such as Twitter and Facebook, which communicate 

one-to-many, existed. To what extent social-media communications over 

such platforms constitute ‘private communication’, and how this can be 

interpreted under such legislation, still needs considerable debate, to which 

academia is well placed to contribute. 

Responsibilities of Government 

The discussion group agreed that protection of the many – be this a particular 

community or society as a whole – is paramount for government. Where 

there is conflict between this and the right of an individual to privacy, the 

group felt that it is right for the government to focus on the protection of 

the many. Nevertheless, there still need to be clearly set boundaries and 

a response that is scalable to the actual risk posed. Academics can help 

to define these boundaries and to qualify the risk(s). The group discussed 

the degree to which individuals should be monitored at different levels of 

suspected or known involvement in activity, and agreed that rather than 

being absolute, this should probably differ depending on what that activity is. 

For example, it may be (more) acceptable to begin surveillance on the entire 

9. Data Protection Act 1998, , 

accessed 16 June 2014. 

10. Mary Carolan, ‘Facebook Data Transfer Interfered With Privacy Daily, Court Told’, 

Irish Times, 30 April 2014, , accessed 

16 June 2014; Europe vs Facebook, , 


80 


social network of a suspected terrorist from the moment that individual is 

suspected of involvement in terrorist activity, but less acceptable to monitor 

all social contacts of a group of youths involved in anti-social but reasonably 

harmless graffitiing from such an early stage. 

The group felt that different threats, such as minor crime and terrorism, 

warrant different approaches, and therefore more research is needed into 

how the harm caused by certain activities is defined and measured, with 

better understanding needed of the extent to which some relatively lowharm 

activities overlap with more serious ones. For example, if a definite 

link can be identified between graffiti and terrorism – in the same way that 

definite links have now been identified between financial crime and terrorism 

– this would help to justify carrying out surveillance on individuals that are 

not directly linked to more serious activities but may well be the link on a 

social network, or between two social networks, that will lead to the more 

serious threats. Academic research could help to highlight which activities 

appear to have definite links and which do not. 

The Privacy Narrative 

Some members of the group felt that a small but disproportionately vocal 

privacy lobby negatively influence public opinion against surveillance that 

only disadvantages those who are breaking the law or who have something 

to hide. Innocent citizens have nothing to fear from government surveillance 

of their activities and so should not mind if such surveillance goes on. It can 

be argued that the advantages to a law-abiding citizen of having all personal 

communications and personal data collected and stored for potential analysis 

so strongly outweigh any perceived disadvantages that no one should object 

to it. Explaining this in such a way that people more easily see the benefits 

would act as a strong counter to the images conjured up by privacy lobbyists 

of sinister government agents spying on innocent citizens’ private lives and 

somehow causing them harm by doing so. A suggested academic research 

project that might help to counter such anti-surveillance narratives was a 

retrospective look at IRA terrorism and the social networks of IRA terrorists. 

Could modern social network analysis 11 be applied retrospectively to 

historical case studies of IRA terrorism and illustrate how the use of such 

technology might have been able to construct social networks, identify 

other potential terrorists and prevent IRA attacks before they happened? 

The group felt that such historical revisiting of past terrorism networks could 

help to illustrate the value of storing data on individuals who may appear to 

be (or, in fact, actually be) innocent, but who can lead investigators to more 

dangerous individuals. 

11. Org.net, ‘Social Network Analysis, A Brief Introduction’, , accessed 23 April 2014.

81 


Differing Perceptions of Public and Private Data Collection 

The group noted the very different attitude to privacy with regard to the public 

and private sectors, and also recognised that private-sector companies react 

to public opinion just as much as politicians, but are often able to change 

their procedures and policy more rapidly. Public opinion has a great effect on 

supermarkets, for example, which invest hugely in understanding customer 

psychology. To maintain customer trust (and therefore custom) they have 

to be seen to be acting responsibly with customers’ data. There is still not 

enough understanding of why customers readily sign up for supermarket 

reward cards, often signing away most of their data protection rights in the 

process, when very little information is given on where the information they 

provide will be going or how it will be used. This raised questions within 

the group as to why it appears to be less acceptable to the general public 

for certain government departments to share information as readily as the 

private sector does. The group had little doubt that Virgin Health is likely to 

share customers’ information with Virgin Media, but automatic transfer of 

information from one government department to another is treated with 

suspicion by the public, suggesting that the public and private sectors are 

perceived very differently. Academic research could help to pinpoint what 

these differences are. 

One explanation that was offered was that the public readily understand 

that supermarkets (and other commercial entities) are motivated by selling 

more goods, and are therefore using their data to target advertising to them. 

In general, the public do not object to this as they may well be pleased to 

be alerted to new products they may like. The problem of government data 

collection was that obvious benefits are often less apparent, leaving the 

public with the perception that the government is trying to catch them out in 

some way – to check if they are paying enough tax, or to make sure they are 

not claiming benefits to which they are not entitled, for example. There are 

therefore few easily understood advantages to the individual to government 

use of such data, but many disadvantages. Better communication of the 

benefits, and perhaps even some obvious rewards, would encourage 

engagement with data projects. A suggestion was made that signing up 

to NHS health databases should enable individuals to receive discounts 

on prescriptions, or to be prioritised on hospital waiting lists above those 

who have not signed up, but this was challenged by other members of the 

group as unethical, and would need considerable research to understand the 

implications before it should be considered. 

The International Community 

Finally, the group discussed who constitutes the ‘community’ whose security 

might benefit from compromises on personal privacy: the community 

as a whole – that is, everyone – or just the ruling elite. A major issue in 

compromises to personal privacy was the fear of a fascist state using the data 

collected on its citizens to discriminate against them or actively harm them.

82 


This leads to different attitudes towards data collection by the state that are 

influenced by local and national histories: countries that have been subjected 

to harsh regimes in the past may be more cautious about handing over 

their data in future. One delegate gave an example of an Eastern European 

nation setting up its equivalent of the National Archives in a building that 

had previously been the headquarters of its secret police. Though there was 

no suggestion that the new archive had any sinister intention behind it, the 

public’s willingness to engage with it was clouded by the past associations 

of the building in which it was housed. Different cultures react differently to 

the sharing of historical records, and while British citizens expect the state to 

be subservient to them, this view is not held to the same degree across the 

entire European Union. Taking this into account, the relationship between 

the community and the individual may differ in different regions and nations, 

raising new challenges where the community is international, and data 

may potentially be being shared across borders. Again, the group felt that 

academia could help to build understanding of these differences, and map 

cultural attitudes that may help to predict acceptance of, or resistance to, 

the creation of new databases and international data sharing. 


• Academics could try to identify links between different criminal or 

anti-social behaviours to help justify carrying out surveillance on 

individuals who are not directly linked to more serious activities but 

may well be one or two links away on overlapping social networks and 

who could help lead to the more serious threats. Academic research 

could help to highlight which activities appear to have definite links 

with one another and which do not 

• Where there is conflict between the need to protect the community 

and the right of an individual to privacy, academia can help to 

determine where the boundaries should lie and suggest how to 

develop a response that is scalable to the actual risk posed. In 

particular, academia can help to determine qualitative and quantitative 

measurements of the risks 

• Historical revisiting of known terrorist and criminal networks using 

modern data analysis techniques, and using these to highlight how 

such information may have helped to prevent terrorist attacks or 

disrupt activity earlier, might help to communicate the benefits 

of large-scale surveillance programmes and explain how and why 

sacrificing some privacy for security is beneficial.

Research Themes Identified in the 

Discussion Groups

Research Themes Identified in the Discussion 

Groups 

Discussion Group 1: Legality and Ethics of Data Sharing 

• An in-depth examination is needed of public understanding of the 

surveillance and privacy debate, to provide recommendations that 

will encourage more people to engage in shaping future policy 

• Academic research can help to explain why people’s perceptions of 

what is ‘snooping’ and what is not appear to differ so dramatically 

depending on who is collecting the data 

• Research is needed into how to educate people not to willingly give 

up data without questioning what this might enable others to do with 

those data. Many users do not understand the potential dangers or 

the security vulnerabilities 

• Academia should suggest ways in which different levels of privacy 

settings and data sharing agreements can be built into online systems, 

so that customers genuinely have a choice in whether or not to accept 

the terms they are offered. 

Discussion Group 2: Policing, Terrorism, Crime and Fraud 

• More research is needed into how data analysis (and data analysts) 

can identify and interpret missing data and data on deviations from 

the expected norm. Real-time processing, combined with analysts 

who have an understanding of the context in which the data have 

been collected, will help such trends to be picked out more easily and 

interpreted appropriately. Predictive analytics against the norm are 

needed, which can look at deviant behaviour at an individual level, as 

well as at the wider community level 

• A better understanding is required of how to link data to underlying 

causes, along with methodology that can guard against the negative 

influence of supposition. Techniques need to be developed that 

remodel datasets with excluded or removed data reintegrated, so 

that results can be compared and differences analysed in order to 

test the validity of initial assumptions 

• Better research is needed into ways to remodel data and test 

assumptions so that a more detailed picture can be built of how data 

reflect assumptions. This may help to identify whether some leads 

are currently being missed because of inherent biases in the way the 

data are approached. 

Discussion Group 3: Health Data, Public Health and Public Health 

Emergencies 

• Research is needed into how easily existing data collection and 

surveillance systems could morph from ‘business as usual’ to

85 


‘public health emergency’ conditions, and what additional features 

or functionality are needed to enable this. How easily can current 

surveillance systems cope with significantly increased data traffic, 

or more frequent updating? How rapidly can different surveillance 

systems be aggregated together and analysed? 

• Further research is needed into what uses people will accept or object 

to regarding their personal health data, and in what circumstances. 

This would help to develop a better understanding of how the way in 

which people are informed of data collection (including who is doing 

the collection, for what reason, and what the data is likely to be used 

for) will influence how they will react to it 

• Research is needed into how best to encourage people to opt in to 

data collection schemes, possibly by offering minor incentives that 

encourage individuals to sign up 

• The volume of health-care data available makes them ideal for 

subjection to general pattern analysis. This should be anonymised (as 

the patterns will emerge whether the data are anonymised or not) 

but the analysis may reveal where targeted interventions will have 

maximum impact; this requires the data to be linked to the individuals 

who will receive that intervention. Research is needed on how and at 

what stage data should be anonymised and de-anonymised during 

the analysis process. 

Discussion Group 4: Individual Privacy versus Community Safety 

• Academics could try to identify links between different criminal or 

antisocial behaviours to help justify carrying out surveillance on 

individuals who are not directly linked to more serious activities but 

may well be one or two links away on overlapping social networks, 

and who can help lead to the more serious threats. Academic research 

could help to highlight which activities appear to have definite links 

with one another and which do not 

• Where there is conflict between the need to protect the community 

and the right of an individual to privacy, academia can help to 

determine where the boundaries should be and suggest how to 

develop a response that is scalable to the actual risk posed. In 

particular, academia can help to determine qualitative and quantitative 

measurements of the risks 

• Historical revisiting of known terrorist and criminal networks using 

modern data analysis techniques, and using these to highlight how 

such information may have helped to prevent terrorist attacks or 

disrupt activity earlier, might help to communicate the benefits 

of large-scale surveillance programmes and explain how and why 

sacrificing some privacy for security is beneficial.

Big Data for Security and Resilience

Create successful ePaper yourself

Delete template?

Save as template?