Cyber Defense eMagazine December Edition for 2021

More documents

Recommendations

Info

Caution: Personal Data Memorization in Progress How a Korean chatbot’s privacy scandal can inform your chatbot’s privacy success By Patricia Thaine, Co-Founder & CEO, Private AI On April 2nd, 2021, SLATE published a story titled A South Korean Chatbot Shows Just How Sloppy Tech Companies Can Be With User Data. It covered a privacy breach by ScatterLab, a South Korean chatbot company who was “accused [...] of collecting intimate conversations between lovers without informing the users and then using the data to build a conversational A.I. chatbot” (source). This incident, where the chatbot was “exposing people’s names, nicknames, and home addresses in its responses,” (source) happened despite warnings from the privacy and Natural Language Processing (NLP) research community that language models (which are used in chatbots, automatic speech recognition, sentiment analysis and countless other NLP tasks) memorize rare information within their training data. Previously, “ScatterLab had boasted about its large dataset of 10 billion intimitate conversation logs” (source). Cyber Defense eMagazine – December 2021 Edition 104 Copyright © 2021, Cyber Defense Magazine. All rights reserved worldwide.
4 Pillars of Privacy-Preserving AI Understanding the privacy challenges that chabots face requires, first and foremost, a general understanding of what the privacy challenges are for machine learning systems in general. There are four pillars to privacy-preserving AI: 1) Training data privacy: making sure that you can’t reconstruct sensitive or personal information within the training data, 2) Input privacy: privacy of the individual whose data you’re inferring upon, 3) Model weights privacy: privacy of the model of a particular corporation, institution, or individual who created it. This is about IP protection, but also training data privacy, since it is possible to determine information about the training data from model weight updates, 4) Output privacy: also about protecting the privacy of the individual whose data you’re inferring upon. By collecting private conversations with identifiable individuals and training their models on them, ScatterLab first violated (2) input privacy, then (1) training data privacy, and possibly (4) output privacy. Training Data Privacy Much of research and development these days focuses on training data privacy, in part because of how likely deep learning models are to memorize training data, with the potential of spewing it out in production to unknown parties. The secret sharer: Evaluating and testing unintended memorization in neural networks [Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. 2019. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium, pages 267–284, Santa Clara, CA. USENIX Association.] by Carlini et al. (2019) is a pivotal paper discussing the problem. They placed a fake social security number into the Penn Treebank dataset as a canary and then trained a character language model on the dataset. They then measured the perplexity of various sequences of numbers and found that the model was less surprised to see the sequences of numbers that made up the canary; i.e., the language model had recorded that it was more likely to encounter the canary rather than other random numbers given the training data. This is a problem because it shows that the language model memorized the secret. Another paper titled Extracting training data from large language models by Carlini at al. (2020) demonstrates how GPT-2 was actually memorizing data from the pre-training dataset. [Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2020. Extracting training data from large language models. arXiv preprint arXiv:2012.07805.] It had memorized addresses, names, and other information that could be considered sensitive had the data not been publically available. It is important to keep in mind that these very models will be memorizing that same kind of information from chatbot training data. The paper showed that an extra large GPT-2 model already started memorizing information after seeing only 33 examples. Privacy issues have also been raised about training non-contextual word embeddings on data containing sensitive information in Exploring the privacy-preserving properties of word embeddings: Algorithmic Cyber Defense eMagazine – December 2021 Edition 105 Copyright © 2021, Cyber Defense Magazine. All rights reserved worldwide.
Page 1 and 2:
9 Ways Social Media Sabotages Your
Page 3 and 4:
How Do You Secure the Modern Supply
Page 5 and 6:
@CYBERDEFENSEMAG CYBER DEFENSE eMAG
Page 7 and 8:
Cyber Defense eMagazine - December
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
With over 4 billion users, social m
Page 41 and 42:
Unsecure Mobile Networks When using
Page 43 and 44:
However, as concerning as these pra
Page 45 and 46:
Danny Lopez, CEO, Glasswall “Duri
Page 47 and 48:
estoring data. Modern organizations
Page 49 and 50:
When Diplomacy, Finance and Tech Co
Page 51 and 52:
Another way leaders can take initia
Page 53 and 54: How Covid-19 Changed Advertising Fo
Page 55 and 56: However, this shift might not be su
Page 57 and 58: Why MFA Alone Isn’t Enough for Tr
Page 59 and 60: just highlights how vulnerable most
Page 61 and 62: on channels like email. This starts
Page 63 and 64: 3 Best Practices to Avoid Inevitabl
Page 65 and 66: For example, an employee may become
Page 67 and 68: and outcomes is essential for proje
Page 69 and 70: Electric Vehicle Charging: The Next
Page 71 and 72: Going forward, EV dealers, charging
Page 73 and 74: How does this relate to MFA? Well,
Page 75 and 76: constant exposure of countries to a
Page 77 and 78: Why do you need a malware sandbox?
Page 79 and 80: Multi-Cloud Security and Compliance
Page 81 and 82: If security controls are not consol
Page 83 and 84: How Do You Secure the Modern Supply
Page 85 and 86: The case for isolation Supply chain
Page 87 and 88: Don’t Take Yourself Out of The Ga
Page 89 and 90: The U.S. Government Accountability
Page 91 and 92: assessment criteria for the tasks f
Page 93 and 94: the file. This was a great techniqu
Page 95 and 96: the wake of the pandemic where the
Page 97 and 98: Certain communications are protecte
Page 99 and 100: At Salt Communications we work with
Page 101 and 102: And it’s only getting worse. If y
Page 103: The benefits are great When HDOs ha
Page 107 and 108: concern and a desire to be cautious
Page 109 and 110: and stay safe online. That’s why
Page 111 and 112: 2. Enabling over-the-horizon threat
Page 113 and 114: Techniques Used by Hackers to Bypas
Page 115 and 116: Phishing scams avoid email security
Page 117 and 118: ATO Detection Account takeover bene
Page 119 and 120: How To Protect Your Digital Legacy
Page 121 and 122: I worked with cybersecurity experts
Page 123 and 124: Sextortion Email Scams What to Do a
Page 125 and 126: Sextortion attacks are a type of cy
Page 127 and 128: How to respond to a sextortion emai
Page 129 and 130: Example 2:'You've been hacked' emai
Page 131 and 132: How can this strategy be implemente
Page 133 and 134: Surviving The New Era of Terabit-Cl
Page 135 and 136: accompanying shift toward more digi
Page 137 and 138: Survey results show businesses are
Page 139 and 140: Enterprises Cannot Achieve Zero Tru
Page 141 and 142: Capital One breach back in 2019. Ad
Page 143 and 144: Cyber Defense eMagazine - December
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
CyberDefense.TV now has 200 hotseat
Page 163 and 164:
Books by our Publisher: https://www
Page 165 and 166:
Page 167 and 168:
show all

Cyber Defense eMagazine December Edition for 2021

Create successful ePaper yourself

Delete template?

Save as template?