03.12.2021 Views

Cyber Defense eMagazine December Edition for 2021

Will you stay one step ahead of Cyber Scrooge this year? Learn new ways to protect your family, job, company & data. December Cyber Defense eMagazine: Cyber Deception Month is here...Defeat Cyber Scrooge! Cyber Defense Magazine December Edition for 2021 in online format #CDM #CYBERDEFENSEMAG @CyberDefenseMag by @Miliefsky a world-renowned cyber security expert and the Publisher of Cyber Defense Magazine as part of the Cyber Defense Media Group as well as Yan Ross, US Editor-in-Chief, Pieruligi Paganini, International Editor-in-Chief and many more writers, partners and supporters who make this an awesome publication! Thank you all and to our readers! OSINT ROCKS! #CDM #CDMG #OSINT #CYBERSECURITY #INFOSEC #BEST #PRACTICES #TIPS #TECHNIQUES See you at RSA Conference 2022 - Our 10th Year Anniversary - Our 10th Year @RSAC #RSACONFERENCE #USA - Thank you so much!!! - Team CDMG CDMG is a Carbon Negative and Inclusive Media Group.

Will you stay one step ahead of Cyber Scrooge this year? Learn new ways to protect your family, job, company & data. December Cyber Defense eMagazine: Cyber Deception Month is here...Defeat Cyber Scrooge!

Cyber Defense Magazine December Edition for 2021 in online format #CDM #CYBERDEFENSEMAG @CyberDefenseMag by @Miliefsky a world-renowned cyber security expert and the Publisher of Cyber Defense Magazine as part of the Cyber Defense Media Group as well as Yan Ross, US Editor-in-Chief, Pieruligi Paganini, International Editor-in-Chief and many more writers, partners and supporters who make this an awesome publication! Thank you all and to our readers! OSINT ROCKS! #CDM #CDMG #OSINT #CYBERSECURITY #INFOSEC #BEST #PRACTICES #TIPS #TECHNIQUES

See you at RSA Conference 2022 - Our 10th Year Anniversary - Our 10th Year @RSAC #RSACONFERENCE #USA - Thank you so much!!! - Team CDMG

CDMG is a Carbon Negative and Inclusive Media Group.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

validation study by Abdalla et al. (2020) [Mohamed Abdalla, Moustafa Abdalla, Graeme Hirst, and<br />

Frank Rudzicz. 2020. Exploring the privacy-preserving properties of word embeddings:<br />

Algorithmic validation study. J Med Internet Res.].<br />

There are four types of disclosure concerns when it comes to protecting data privacy:<br />

●<br />

●<br />

●<br />

●<br />

Identity disclosure: identifying an individual.<br />

Attribute disclosure: identifying an individual’s ethnicity, religion, physical attributes, etc.<br />

Group attribute disclosure: e.g., is a particular group more likely to have cancer?<br />

Membership disclosure: e.g., is this person part of a pharmaceutical trial?<br />

Not all attributes are the same with regards to increasing the risk of these disclosures. Within<br />

conversations with chatbots, users might reveal direct identifiers (e.g., full names, exact addresses,<br />

phone numbers, credit card numbers) and quasi-identifiers (e.g., religion, origin, gender, etc.). When<br />

combining quasi-identifiers together, the risk of re-identifying an individual grows exponentially.<br />

The ScatterLab incident mentioned above is an example of identity and possibly attribute disclosure,<br />

though one major issue was actually membership disclosure through identity disclosure. These<br />

disclosure types were caused by the leak of direct identifiers and perhaps of quasi-identifiers as well.<br />

Preventing Identity, Attribute, and Membership Disclosures<br />

There are a few solutions <strong>for</strong> dealing with training data memorization within chatbots. One is differentially<br />

private gradient descent (DPGD), which was used in Carlini et al.’s 2019 paper. DPGD adds noise to the<br />

ML model training process. The original idea behind differential privacy is to be able to make<br />

generalizations about a population without the risk of disclosing any specific individual’s unique<br />

in<strong>for</strong>mation. The goal of adding differential privacy to an algorithm, like a chatbot model, is that if you run<br />

the algorithm on two datasets differing by a single entry, then the likelihood of getting a different set of<br />

possible outputs is negligible. DPGD provides mathematical guarantees that rare in<strong>for</strong>mation is not being<br />

memorized by a machine learning model, though often at the expense of model utility.<br />

Another solution is highly accurate redaction or de-identification, which means removing the direct<br />

identifiers and quasi-identifiers within your training data (e.g., location, names, telephone numbers, etc.).<br />

There’s a lot you can gather from a conversation’s context without the need <strong>for</strong> identifiable in<strong>for</strong>mation.<br />

Finally, another option is using synthetic personal data generation. This method allows <strong>for</strong> replacement<br />

of direct and quasi-identifiers in a very natural way, so a chatbot’s training data matches the style of the<br />

language model’s pre-training dataset, which prevents downstream model accuracy loss. It also has the<br />

additional benefit that, if any personally identifiable in<strong>for</strong>mation is missed, it’s very difficult to tell what the<br />

original data was from the synthetic data. Targeted synthetic data generation changes the paradigm of<br />

disclosure risk versus data utility.<br />

If ScatterLab had used either one of these three methods to protect the privacy of their users, they would<br />

have prevented violating training data privacy, as well as input and output privacy. Their story inspires<br />

<strong>Cyber</strong> <strong>Defense</strong> <strong>eMagazine</strong> – <strong>December</strong> <strong>2021</strong> <strong>Edition</strong> 106<br />

Copyright © <strong>2021</strong>, <strong>Cyber</strong> <strong>Defense</strong> Magazine. All rights reserved worldwide.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!