www.allitebooks.com

Learning%20Data%20Mining%20with%20Python Learning%20Data%20Mining%20with%20Python

24.07.2016 Views

Chapter 7 Next, we will only add the edge if it is above a certain threshold. This stops us from adding edges we don't care about—for example, edges with weight 0. By default, our threshold is 0, so we will be including all edges right now. However, we will use this parameter later in the chapter. The code is as follows: if weight >= threshold: If the weight is above the threshold, we add the two users to the graph (they won't be added as a duplicate if they are already in the graph): G.add_node(user1) G.add_node(user2) We then add the edge between them, setting the weight to be the computed similarity: G.add_edge(user1, user2, weight=weight) Once the loops have finished, we have a completed graph and we return it from the function: return G We can now create a graph by calling this function. We start with no threshold, which means all links are created. The code is as follows: G = create_graph(friends) The result is a very strongly connected graph—all nodes have edges, although many of those will have a weight of 0. We will see the weight of the edges by drawing the graph with line widths relative to the weight of the edge—thicker lines indicate higher weights. Due to the number of nodes, it makes sense to make the figure larger to get a clearer sense of the connections: plt.figure(figsize=(10,10)) We are going to draw the edges with a weight, so we need to draw the nodes first. NetworkX uses layouts to determine where to put the nodes and edges, based on certain criteria. Visualizing networks is a very difficult problem, especially as the number of nodes grows. Various techniques exist for visualizing networks, but the degree to which they work depends heavily on your dataset, personal preferences, and the aim of the visualization. I found that the spring_layout worked quite well, but other options such as circular_layout (which is a good default if nothing else works), random_layout, shell_layout, and spectral_layout also exist. [ 149 ]

Discovering Accounts to Follow Using Graph Mining Visit http://networkx.lanl.gov/reference/drawing.html for more details on layouts in NetworkX. Although it adds some complexity, the draw_graphviz option works quite well and is worth investigating for better visualizations. It is well worth considering in real-world uses. Let's use spring_layout for visualization: pos = nx.spring_layout(G) Using our pos layout, we can then position the nodes: nx.draw_networkx_nodes(G, pos) Next, we draw the edges. To get the weights, we iterate over the edges in the graph (in a specific order) and collect the weights: edgewidth = [ d['weight'] for (u,v,d) in G.edges(data=True)] We then draw the edges: nx.draw_networkx_edges(G, pos, width=edgewidth) The result will depend on your data, but it will typically show a graph with a large set of nodes connected quite strongly and a few nodes poorly connected to the rest of the network. [ 150 ]

Chapter 7<br />

Next, we will only add the edge if it is above a certain threshold. This stops us from<br />

adding edges we don't care about—for example, edges with weight 0. By default, our<br />

threshold is 0, so we will be including all edges right now. However, we will use this<br />

parameter later in the chapter. The code is as follows:<br />

if weight >= threshold:<br />

If the weight is above the threshold, we add the two users to the graph<br />

(they won't be added as a duplicate if they are already in the graph):<br />

G.add_node(user1)<br />

G.add_node(user2)<br />

We then add the edge between them, setting the weight to be the <strong>com</strong>puted<br />

similarity:<br />

G.add_edge(user1, user2, weight=weight)<br />

Once the loops have finished, we have a <strong>com</strong>pleted graph and we return it from<br />

the function:<br />

return G<br />

We can now create a graph by calling this function. We start with no threshold,<br />

which means all links are created. The code is as follows:<br />

G = create_graph(friends)<br />

The result is a very strongly connected graph—all nodes have edges, although<br />

many of those will have a weight of 0. We will see the weight of the edges by<br />

drawing the graph with line widths relative to the weight of the edge—thicker<br />

lines indicate higher weights.<br />

Due to the number of nodes, it makes sense to make the figure larger to get a clearer<br />

sense of the connections:<br />

plt.figure(figsize=(10,10))<br />

We are going to draw the edges with a weight, so we need to draw the nodes first.<br />

NetworkX uses layouts to determine where to put the nodes and edges, based on<br />

certain criteria. Visualizing networks is a very difficult problem, especially as the<br />

number of nodes grows. Various techniques exist for visualizing networks, but the<br />

degree to which they work depends heavily on your dataset, personal preferences,<br />

and the aim of the visualization. I found that the spring_layout worked quite well,<br />

but other options such as circular_layout (which is a good default if nothing else<br />

works), random_layout, shell_layout, and spectral_layout also exist.<br />

[ 149 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!