Machine Learning with Graphs : 4 Graph as Matrix: PageRank, Random Walks and Embeddings
Stanford CS224W : Machine Learning with Graphs, Fall 2021:
-
4 Graph as Matrix: PageRank, Random Walks and Embeddings
1. Overview
Treating a graph as a matrix allows us to:
-
Determine node importance via random walk (PageRank)
-
Obtain node embeddings via matrix factorization (MF)
-
View other node embeddings (e.g. Node2Vec) as MF
Random walk, matrix factorization and node embeddings are closely related!
2. PagaRank (aka the Google Algorithm)
2.1 Modelling
2.1.1 Link Analysis Algorithms
Link Analysis approaches to compute the importance of nodes in a graph:
-
PageRank
-
Personalized PageRank (PPR)
-
Random Walk with Restarts
2.1.2 Link as Votes
Idea: Links as votes: Page is more important if it has more links
Links from important pages count more: Recursive question!
2.1.3 PageRank: The "Flow" Model
A "vote" from an important page is worth more:
-
Each link's vote is proportional to the importance of its source page
-
If page \(i\) with importance \(r_i\) has \(d_i\) out-links, each link gets \(r_i / d_i\) votes
-
Page \(j\)'s own importance \(r_j\) is the sum of the votes on its in-links
A page is important if it is pointed to by other important pages
Define "rank" \(r_j\) for node \(j\):
where \(d_i\) is the out-degree of node \(i\)
Note : using Gaussian elimination to solve this system of linear equations is a bad idea!
2.1.4 PageRank: Matrix Formulation
Stochastic adjacency matrix \(M\):
-
\(d_i\) is the out-degree of node \(i\)
-
\(\boldsymbol{M}\) is a column stochastic matrix
-
If \(i \to j\), then \(m_{ij} = \frac{1}{d_i}\), otherwise \(m_{ij}=0\):
\[m_{ij} = \begin{cases} 1/d_i, & \text{If } d_i > 0 \\ 0, & \text{If } d_i = 0 \\ \end{cases} \] -
Every column is same and sums to 1
-
\(M = (K A)^T\), where \(A\) is the adjacency matrix. \(K\) is a diagonal matrix with a diagonal \(\left[\frac{1}{d_1}, \frac{1}{d_2}, \cdots, \frac{1}{d_N} \right]\), and denotes \(\frac{1}{d_i}=0\) if \(d_i=0\)
-
Rank Vector \(\boldsymbol{r}\) : An entry per page
-
\(r_i\) is the importance score of page \(i\)
-
\(\sum_i r_i = 1\)
The flow equations can be written:
2.1.5 Connection to Random Walk
Imagine a random web surfer:
-
At any time \(t\), surfer is on some page \(i\)
-
At time \(t+1\), the surfer follows an out-link from \(i\) uniformly at random
-
Ends up on some page \(j\) linked from \(i\)
-
Process repeats indefinitely
Let: \(\boldsymbol{p}(t)\) is a vector whose \(i\)th coordinate is the probability that the surfer is page \(i\) at time \(t\). Thus, \(\boldsymbol{p}(t)\) is a probability distribution over pages.
2.1.5 The Stationary Distribution
The surfer at time \(t+1\)
Follow a link uniformly at random:
Suppose the random walk reaches a state
then \(\boldsymbol{p}(t)\) is stationary distribution of a random walk
The original rank vector \(\boldsymbol{r}\) satisfies \(\boldsymbol{r} = \boldsymbol{M} \cdot \boldsymbol{r}\),
Thus, \(\boldsymbol{r}\) is a stationary distribution for the random walk
2.1.6 Eigenvector Formulation
The flow equation:
Thus, the rank vector \(\boldsymbol{r}\) is an eigenvector of the stochastic adjacency matrix \(\boldsymbol{M}\) with eigenvalue \(1\)
-
Starting from any vector \(\boldsymbol{u}\), the limit \(\boldsymbol{M(M(\cdots M(M u)))}\) is the long-term distribution of the surfers.
-
PageRank = Limiting distribution = principal eigenvector of \(\boldsymbol{M}\)
-
Note : If \(\boldsymbol{r}\) is the limit of the product \(\boldsymbol{M M\cdots M M u}\), then \(\boldsymbol{r}\) satisfies the flow equation \(1 \cdot \boldsymbol{r} = \boldsymbol{M} \cdot \boldsymbol{r}\)
-
So \(\boldsymbol{r}\) is the principal eigenvector of \(\boldsymbol{M}\) with eigenvalue \(1\)
-
We can now efficiently solve for \(\boldsymbol{r}\): Power iteration method.
2.1.7 Summary
PageRank : Measures importance of nodes in a graph using the link structure of the web
Models a random web surfer using the stochastic adjacency matrix \(\boldsymbol{M}\)
PageRank solves \(\boldsymbol{r} = \boldsymbol{Mr}\) where \(\boldsymbol{r}\) can be viewed as both the principle eigenvector of \(\boldsymbol{M}\) and as the stationary distribution of a random walk over the graph
2.2 Solution
2.2.1 Power Iteration Method
Given a web graph with \(N\) nodes, where the nodes are pages and edges are hyperlinks
Power iteration : a simple iterative scheme
-
Step 1 : Initialize \(\boldsymbol{r}^{(0)} = \left[\frac{1}{N}, \frac{1}{N}, \cdots, \frac{1}{N} \right]^{\top}\)
-
Step 2 : Iterate \(\boldsymbol{r}^{(t+1)} = \boldsymbol{M} \cdot \boldsymbol{r}^{(t)}\), where
\[r_j = \sum_{i \to j} \frac{r_i}{d_i} \]and \(d_i\) is the out-degree of node \(i\).
-
Step 3 : Stop when \(|\boldsymbol{r}^{(t+1)} - \boldsymbol{r}^{(t)}|_1 < \varepsilon\)
where \(|\boldsymbol{x}|_1 = \sum_{i=1}^{N}|x_i|_1\) is the L1 norm. Other vector norm (e.g., Euclidean) can also be used.
About 50 iterations is sufficient to estimate the limiting solution.
2.2.2 Problems of PageRank
Two problems of PageRank
(1) Some pages are dead ends (have no out-links)
- Such pages cause importance to "leak out"
(2) Spider traps (all out-links are within the group)
- Eventually spider traps absorb all importance
2.2.3 Solution to Spider Traps
The "Spider trap" problem:
Solution for spider traps: At each time step, the random surfer has two options:
-
(1) With probability \(\beta\), follow a link at random
-
(2) With probability \(1-\beta\), jump to a random page
-
Common values for \(\beta\) are in the range 0.8 to 0.9
Surfer will teleport out of spider trap within a few time steps
2.2.4 Solution to Dead Ends
The "Dead end" problem:
Teleports : Follow random teleport links with total probability 1.0 from dead-ends
- Adjust matrix accordingly
2.2.5 Teleports
Why are dead-ends and spider traps a problem and why do teleports solve the problem?
-
Spider-traps are not a problem, but with traps PageRank scores are not what we want
- Solutions : Never get stuck in a spider trap by teleporting out of it in a finite number of steps
-
Dead-ends are a problem. The matrix is not column stochastic so our initial assumptions are not met
- Solution : Make matrix column stochastic by always teleporting when there is nowhere else to go
2.2.6 Random Teleports
This original formulation assumes that \(\boldsymbol{M}\) has no dead ends. We can either:
-
preprocess matrix \(\boldsymbol{M}\) to remove all dead ends
-
or explicitly follow random teleport links with probability 1.0 from dead-ends
Google's solution that does it all: At each step, random surfer has two options:
-
(1) With probability \(\beta\), follow a link at random
-
(2) With probability \(1-\beta\), jump to a random page
PageRank equation (Brin-Page, 98)
- where : \(d_i\) is the out-degree of node \(i\)
The Google Matrix \(\boldsymbol{G}\):
- where \(\left[\frac{1}{N} \right]_{N \times N}\) is a \(N\) by \(N\) matrix where all entries are \(\frac{1}{N}\).
Then, we have a recursive problem \(\boldsymbol{r} = \boldsymbol{G} \cdot \boldsymbol{r}\), and the Power method still works.
In parctive \(\beta = 0.8 \sim 0.9\) (make 5 steps on average, jump)
2.2.7 Summary: Solving PageRank
PageRank solves for \(\boldsymbol{r} = \boldsymbol{G r}\) and can be efficiently computed by power iteration of the stochastic adjacency matrix (\(\boldsymbol{G}\))
Adding random uniform teleportation solves issues of dead-ends and spider-traps
2.3 Implement
代码实例: Page Rank
点击查看代码
def page_rank(graph, alpha=0.85, weight=None, tol=1e-4, max_iter=1000):
node_list = list(graph.nodes())
node_num = len(node_list)
adj_mat = nx.adjacency_matrix(graph, weight=weight).toarray()
# out degree
out_degree = [round(v * (graph.number_of_nodes()-1)) for v in nx.out_degree_centrality(graph).values()]
out_degree = [1. / v if v > 0. else 0. for v in out_degree]
M_mat = np.dot(np.diag(out_degree), adj_mat).T
# google matrix
Google_mat = alpha * M_mat + ((1 - alpha) / node_num) * np.ones((node_num, node_num))
# initialization PR value
pr = (1 / node_num) * np.ones(node_num)
for n in range(max_iter):
pr_old = pr.copy()
# updata PR value
pr = np.dot(Google_mat, pr_old)
# current tolerance
cur_tol = np.sum(np.abs(pr - pr_old))
if cur_tol <= tol:
break
# print(n, cur_tol)
else:
print('Warning: iterations do not convergence, the object tolerance is {0}' \
'while the current tolerance is {1}'.format(tol, cur_tol))
sys.exit(0)
pr = {k : v for k, v in zip(node_list, pr.tolist())}
return pr
# ===========================================================================
代码实例: networkx.pagerank() 函数
注意:networkx.pagerank() 中 tol 参数实际为 len(G) * tol
点击查看代码
tol = 1e-7
pr = page_rank(graph, alpha=0.85, tol=len(graph) * tol, max_iter=1000)
pr_1 = pd.Series(pr)
pr = nx.pagerank(graph, alpha=0.85, max_iter=100, tol=tol)
pr_2 = pd.Series(pr)
delta = pr_1 - pr_2
(delta / pr_2).abs().sort_values(ignore_index=True).plot()
3. Random Walk with Restarts and Personalized PageRank
3.1 Example: Recommendation
Given : A bipartite graph representing user and item interactions (e.g. purchase)
4. Matrix Factorization and Node Embeddings
4.1 Embeddings & Matrix Factorization
Node Embedding : encoder as an embedding lookup
Embedding matrix \(\boldsymbol{Z}\):
-
each column represents the embedding vector for a specific node
-
the number of rows represents the dimension / size of embeddings
Objective : maximize \(\boldsymbol{z}_v^{\top} \boldsymbol{z}_u\) for node pairs \((u, v)\) that are similar
4.2 Connection to Matrix Factorization
Simplest node similarity : Nodes \(u,v\) are similar if they are connected by an edge
-
This means: \(\boldsymbol{z}_v^{\top} \boldsymbol{z}_u = A_{u,v}\), which is the \((u,v)\) entry of the graph adjacency matrix \(\boldsymbol{A}\)
-
There, \(\boldsymbol{Z}^{\top} \boldsymbol{Z} = \boldsymbol{A}\)
4.3 Matrix Factorization
The embedding dimension \(d\) ( number of rows in \(\boldsymbol{Z}\) ) is much smaller than number of nodes \(n\).
Exact factorization \(\boldsymbol{A} = \boldsymbol{Z}^{\top} \boldsymbol{Z}\) is generally not possible.
However, we can learn \(\boldsymbol{Z}\) approximately.
Objective : \(\min \limits_{\boldsymbol{Z}} = \| \boldsymbol{A} - \boldsymbol{Z}^{\top} \boldsymbol{Z} \|_2\)
-
We optimize \(\boldsymbol{Z}\) such that is minimizes the L2 norm (Frobenius norm) of \(\boldsymbol{A} - \boldsymbol{Z}^{\top} \boldsymbol{Z}\)
-
Note in Lecture 3 the softmax was used instead of L2. But the
goal to approximate \(\boldsymbol{A}\) with \(\boldsymbol{Z}^{\top} \boldsymbol{Z}\) is the same.
Conclusion : Inner product decoder with node similarity defined by edge connectivity is equivalent to matrix factorization of \(\boldsymbol{A}\).
4.4 Random Walk-based Similarity
DeepWalk and node2vec have a more complex node similarity definition based on random walks
DeepWalk is equivalent to matrix factorization of the following complex matrix expression (Qiu et al, 2018):
where:
-
\(\mathrm{vol}(G) = \sum_i \sum_j A_{i,j}\) is the volume of graph
-
\(T\) is the context window size \(T = |N_R(u)|\)
-
\(\boldsymbol{D}\) is the diagonal matrix and \(D_{u,u} = \text{deg} (u)\)
-
\(r\) is power of normalized adjacency matrix
-
\(b\) is the number of negative samples
Node2vec can also be formulated as a matrix factorization (albeit a more complex matrix) (Qiu et al, 2018)
4.4 Limitations
Limitations of node embeddings via matrix factorization and random walks:
(1) Cannot obtain embeddings for nodes not in the training set
- Cannot compute its embedding with DeepWalk / node2vec. Need to recompute all node embeddings
(2) Cannot capture structural similarity
- DeepWalk and node2vec do not capture structural similarity.
(3) Cannot utilize node, edge and graph features
- DeepWalk / node2vec embeddings do not incorporate such node features
Solution to these limitations: Deep Representation Learning and Graph Neural Networks
5. Summary
PageRank
-
Measures importance of nodes in graph
-
Can be efficiently computed by power iteration of adjacency matrix
Personalized PageRank (PPR)
-
Measures importance of nodes with respect to a particular node or set of nodes
-
Can be efficiently computed by random walk
Node embeddings based on random walks can be expressed as matrix factorization
Viewing graphs as matrices plays a key role in all above algorithms!
References
Qiu, Jiezhong, et al. “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2vec.” Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 459–67. arxiv
Page Rank, Wikipedia, site

浙公网安备 33010602011771号