pdfcoffee
continue# compute non-zero elements for current rowcounts = np.array([int(x) for x in line.split(',')[1:]])nz_col_ids = np.nonzero(counts)[0]nz_data = counts[nz_col_ids]nz_row_ids = np.repeat(rid, len(nz_col_ids))rid += 1# add data to big listsrow_ids.extend(nz_row_ids.tolist())col_ids.extend(nz_col_ids.tolist())data.extend(nz_data.tolist())f.close()TD = csr_matrix((np.array(data), (np.array(row_ids), np.array(col_ids))),shape=(rid, counts.shape[0]))return TDChapter 7# read data and convert to Term-Document matrixTD = download_and_read(UCI_DATA_URL)# compute undirected, unweighted edge matrixE = TD.T * TD# binarizeE[E > 0] = 1Once we have our sparse binarized adjacency matrix, E, we can then generaterandom walks from each of the vertices. From each node, we construct 32 randomwalks of maximum length of 40 nodes. The walks have a random restart probabilityof 0.15, which means that for any node, the particular random walk could end with15% probability. The following code will construct the random walks and write themout to a file given by RANDOM_WALKS_FILE. Note that this is a very slow process. Acopy of the output is provided along with the source code for this chapter in caseyou prefer to skip the random walk generation process:NUM_WALKS_PER_VERTEX = 32MAX_PATH_LENGTH = 40RESTART_PROB = 0.15RANDOM_WALKS_FILE = os.path.join(DATA_DIR, "random-walks.txt")def construct_random_walks(E, n, alpha, l, ofile):if os.path.exists(ofile):[ 255 ]
Word Embeddingsprint("random walks generated already, skipping")returnf = open(ofile, "w")for i in range(E.shape[0]): # for each vertexif i % 100 == 0:print("{:d} random walks generated from {:d} vertices".format(n * i, i))for j in range(n): # construct n random walkscurr = iwalk = [curr]target_nodes = np.nonzero(E[curr])[1]for k in range(l): # each of max length l# should we restart?if np.random.random() < alpha and len(walk) > 5:break# choose one outgoing edge and append to walktry:curr = np.random.choice(target_nodes)walk.append(curr)target_nodes = np.nonzero(E[curr])[1]except ValueError:continuef.write("{:s}\n".format(" ".join([str(x) for x in walk])))print("{:d} random walks generated from {:d} vertices, COMPLETE".format(n * i, i))f.close()# construct random walks (caution: very long process!)construct_random_walks(E, NUM_WALKS_PER_VERTEX, RESTART_PROB, MAX_PATH_LENGTH, RANDOM_WALKS_FILE)A few lines from the RANDOM_WALKS_FILE are shown below. You could imaginethat these look like sentences in a language where the vocabulary of words is allthe node IDs in our graph. We have learned that word embeddings exploit thestructure of language to generate a distributional representation for words. Graphembedding schemes such as DeepWalk and node2vec do the exact same thing withthese "sentences" created out of random walks. Such embeddings are able to capturesimilarities between nodes in a graph that go beyond immediate neighbors, as weshall see as follows:0 1405 4845 754 4391 3524 4282 2357 3922 16670 1341 456 495 1647 4200 5379 473 2311[ 256 ]
- Page 240 and 241: Chapter 6def train(self, epochs, ba
- Page 242 and 243: Chapter 6The preceding images were
- Page 244 and 245: Chapter 6Another interesting paper
- Page 246 and 247: Chapter 6To elaborate, let us say t
- Page 248 and 249: Chapter 6Figure 7: The architecture
- Page 250 and 251: Chapter 6Figure 11: Illegible initi
- Page 252 and 253: Chapter 6Bedrooms: Generated bedroo
- Page 254 and 255: Chapter 6The images need to be norm
- Page 256 and 257: Chapter 6initializer = tf.random_no
- Page 258 and 259: Cool, right? Now we can define the
- Page 260 and 261: Chapter 6d_loss = (dA_loss + dB_los
- Page 262 and 263: Chapter 6generator_AB.save_weights(
- Page 264: 6. Ledig, Christian, et al. Photo-R
- Page 267 and 268: Word EmbeddingsDeep learning models
- Page 269 and 270: Word EmbeddingsFor example, "crucia
- Page 271 and 272: Word EmbeddingsAssuming a window si
- Page 273 and 274: Word EmbeddingsGloVeThe Global vect
- Page 275 and 276: Word Embeddingsgensim is an open so
- Page 277 and 278: Word Embeddingsgensim also provides
- Page 279 and 280: Word EmbeddingsSpecifically, we wil
- Page 281 and 282: Word EmbeddingsWe will also convert
- Page 283 and 284: Word EmbeddingsE = np.zeros((vocab_
- Page 285 and 286: Word Embeddingsx = self.embedding(x
- Page 287 and 288: Word EmbeddingsThe change in valida
- Page 289: Word EmbeddingsThe dataset is a 114
- Page 293 and 294: Word Embeddingssize=128, # size of
- Page 295 and 296: Word EmbeddingsfastText computes em
- Page 297 and 298: Word EmbeddingsIn the future, once
- Page 299 and 300: Word EmbeddingsA much earlier relat
- Page 301 and 302: Word EmbeddingsOnce you have the fi
- Page 303 and 304: Word EmbeddingsThis will create the
- Page 305 and 306: Word EmbeddingsClassifying with BER
- Page 307 and 308: Word Embeddings2. Each Transformer
- Page 309 and 310: Word EmbeddingsOnce trained, we sav
- Page 311 and 312: Word Embeddings4. Pennington, J., S
- Page 313 and 314: Word Embeddings34. Google Research,
- Page 315 and 316: Recurrent Neural NetworksWe will th
- Page 317 and 318: Recurrent Neural NetworksFor notati
- Page 319 and 320: Recurrent Neural NetworksThis probl
- Page 321 and 322: Recurrent Neural NetworksThe line a
- Page 323 and 324: Recurrent Neural NetworksGated recu
- Page 325 and 326: Recurrent Neural NetworksThis probl
- Page 327 and 328: Recurrent Neural NetworksThe topolo
- Page 329 and 330: Recurrent Neural Networkstexts = do
- Page 331 and 332: Recurrent Neural Networksdef call(s
- Page 333 and 334: Recurrent Neural Networks# callback
- Page 335 and 336: Recurrent Neural NetworksExample
- Page 337 and 338: Recurrent Neural NetworksAs can be
- Page 339 and 340: Recurrent Neural Networksdata_dir =
Word Embeddings
print("random walks generated already, skipping")
return
f = open(ofile, "w")
for i in range(E.shape[0]): # for each vertex
if i % 100 == 0:
print("{:d} random walks generated from {:d} vertices"
.format(n * i, i))
for j in range(n): # construct n random walks
curr = i
walk = [curr]
target_nodes = np.nonzero(E[curr])[1]
for k in range(l): # each of max length l
# should we restart?
if np.random.random() < alpha and len(walk) > 5:
break
# choose one outgoing edge and append to walk
try:
curr = np.random.choice(target_nodes)
walk.append(curr)
target_nodes = np.nonzero(E[curr])[1]
except ValueError:
continue
f.write("{:s}\n".format(" ".join([str(x) for x in walk])))
print("{:d} random walks generated from {:d} vertices, COMPLETE"
.format(n * i, i))
f.close()
# construct random walks (caution: very long process!)
construct_random_walks(E, NUM_WALKS_PER_VERTEX, RESTART_PROB, MAX_
PATH_LENGTH, RANDOM_WALKS_FILE)
A few lines from the RANDOM_WALKS_FILE are shown below. You could imagine
that these look like sentences in a language where the vocabulary of words is all
the node IDs in our graph. We have learned that word embeddings exploit the
structure of language to generate a distributional representation for words. Graph
embedding schemes such as DeepWalk and node2vec do the exact same thing with
these "sentences" created out of random walks. Such embeddings are able to capture
similarities between nodes in a graph that go beyond immediate neighbors, as we
shall see as follows:
0 1405 4845 754 4391 3524 4282 2357 3922 1667
0 1341 456 495 1647 4200 5379 473 2311
[ 256 ]