www.allitebooks.com
Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python
Discovering Accounts to Follow Using Graph Mining Lots of things can be represented as graphs. This is particularly true in this day of Big Data, online social networks, and the Internet of Things. In particular, online social networks are big business, with sites such as Facebook that have over 500 million active users (50 percent of them log in each day). These sites often monetize themselves by targeted advertising. However, for users to be engaged with a website, they often need to follow interesting people or pages. In this chapter, we will look at the concept of similarity and how we can create graphs based on it. We will also see how to split this graph up into meaningful subgraphs using connected components. This simple algorithm introduces the concept of cluster analysis—splitting a dataset into subsets based on similarity. We will investigate cluster analysis in more depth in Chapter 10, Clustering News Articles. The topics covered in this chapter include: • Creating graphs from social networks • Loading and saving built classifiers • The NetworkX package • Converting graphs to matrices • Distance and similarity • Optimizing parameters based on scoring functions • Loss functions and scoring functions [ 135 ]
Discovering Accounts to Follow Using Graph Mining Loading the dataset In this chapter, our task is to recommend users on online social networks based on shared connections. Our logic is that if two users have the same friends, they are highly similar and worth recommending to each other. We are going to create a small social graph from Twitter using the API we introduced in the previous chapter. The data we are looking for is a subset of users interested in a similar topic (again, the Python programming language) and a list of all of their friends (people they follow). With this data, we will check how similar two users are, based on how many friends they have in common. There are many other online social networks apart from Twitter. The reason we have chosen Twitter for this experiment is that their API makes it quite easy to get this sort of information. The information is available from other sites, such as Facebook, LinkedIn, and Instagram, as well. However, getting this information is more difficult. To start collecting data, set up a new IPython Notebook and an instance of the twitter connection, as we did in the previous chapter. You can reuse the app information from the previous chapter or create a new one: import twitter consumer_key = "" consumer_secret = "" access_token = "" access_token_secret = "" authorization = twitter.OAuth(access_token, access_token_secret, consumer_key, consumer_secret) t = twitter.Twitter(auth=authorization, retry=True) Also, create the output filename: import os data_folder = os.path.join(os.path.expanduser("~"), "Data", "twitter") output_filename = os.path.join(data_folder, "python_tweets.json") We will also need the json library to save our data: import json [ 136 ]
- Page 106 and 107: Chapter 5 Thought should always be
- Page 108 and 109: Chapter 5 Other features describe a
- Page 110 and 111: Chapter 5 Similarly, we can convert
- Page 112 and 113: Chapter 5 [18, 19, 20], [21, 22, 23
- Page 114 and 115: Chapter 5 Next, we create our trans
- Page 116 and 117: Chapter 5 This returns a different
- Page 118 and 119: Also, we want to set the final colu
- Page 120 and 121: Chapter 5 The downside to transform
- Page 122 and 123: Chapter 5 A transformer is akin to
- Page 124 and 125: We can then create an instance of t
- Page 126: Chapter 5 Putting it all together N
- Page 129 and 130: Social Media Insight Using Naive Ba
- Page 131 and 132: Social Media Insight Using Naive Ba
- Page 133 and 134: Social Media Insight Using Naive Ba
- Page 135 and 136: Social Media Insight Using Naive Ba
- Page 137 and 138: Social Media Insight Using Naive Ba
- Page 139 and 140: Social Media Insight Using Naive Ba
- Page 141 and 142: Social Media Insight Using Naive Ba
- Page 143 and 144: Social Media Insight Using Naive Ba
- Page 145 and 146: Social Media Insight Using Naive Ba
- Page 147 and 148: Social Media Insight Using Naive Ba
- Page 149 and 150: Social Media Insight Using Naive Ba
- Page 151 and 152: Social Media Insight Using Naive Ba
- Page 153 and 154: Social Media Insight Using Naive Ba
- Page 155 and 156: Social Media Insight Using Naive Ba
- Page 160 and 161: Chapter 7 Next, we will need a list
- Page 162 and 163: Chapter 7 Make sure the filename is
- Page 164 and 165: Chapter 7 cursor = results['next_cu
- Page 166 and 167: Chapter 7 Next, we are going to rem
- Page 168 and 169: Chapter 7 Creating a graph Now, we
- Page 170 and 171: Chapter 7 As you can see, it is ver
- Page 172 and 173: Chapter 7 Next, we will only add th
- Page 174 and 175: Chapter 7 The difference in this gr
- Page 176 and 177: Chapter 7 We can graph the entire s
- Page 178 and 179: Chapter 7 Optimizing criteria Our a
- Page 180 and 181: Chapter 7 Next, we need to get the
- Page 182 and 183: • method='nelder-mead': This is u
- Page 184 and 185: Beating CAPTCHAs with Neural Networ
- Page 186 and 187: Chapter 8 The red lines indicate th
- Page 188 and 189: Chapter 8 The combination of an app
- Page 190 and 191: Chapter 8 Next we set the font of t
- Page 192 and 193: Chapter 8 We can then extract the s
- Page 194 and 195: Chapter 8 Our targets are integer v
- Page 196 and 197: Chapter 8 Then we iterate over our
- Page 198 and 199: Chapter 8 From these predictions, w
- Page 200 and 201: Chapter 8 This code correctly predi
- Page 202 and 203: The result is shown in the next gra
- Page 204 and 205: Chapter 8 However, it isn't very go
- Page 206: Chapter 8 Summary In this chapter,
Discovering Accounts to Follow Using Graph Mining<br />
Loading the dataset<br />
In this chapter, our task is to re<strong>com</strong>mend users on online social networks based on<br />
shared connections. Our logic is that if two users have the same friends, they are highly<br />
similar and worth re<strong>com</strong>mending to each other.<br />
We are going to create a small social graph from Twitter using the API we<br />
introduced in the previous chapter. The data we are looking for is a subset of users<br />
interested in a similar topic (again, the Python programming language) and a list of<br />
all of their friends (people they follow). With this data, we will check how similar<br />
two users are, based on how many friends they have in <strong>com</strong>mon.<br />
There are many other online social networks apart from<br />
Twitter. The reason we have chosen Twitter for this<br />
experiment is that their API makes it quite easy to get this<br />
sort of information. The information is available from other<br />
sites, such as Facebook, LinkedIn, and Instagram, as well.<br />
However, getting this information is more difficult.<br />
To start collecting data, set up a new IPython Notebook and an instance of the<br />
twitter connection, as we did in the previous chapter. You can reuse the app<br />
information from the previous chapter or create a new one:<br />
import twitter<br />
consumer_key = ""<br />
consumer_secret = ""<br />
access_token = ""<br />
access_token_secret = ""<br />
authorization = twitter.OAuth(access_token, access_token_secret,<br />
consumer_key, consumer_secret)<br />
t = twitter.Twitter(auth=authorization, retry=True)<br />
Also, create the output filename:<br />
import os<br />
data_folder = os.path.join(os.path.expanduser("~"), "Data",<br />
"twitter")<br />
output_filename = os.path.join(data_folder, "python_tweets.json")<br />
We will also need the json library to save our data:<br />
import json<br />
[ 136 ]