Machine Learning with Graphs : 4 Graph as Matrix: PageRank, Random Walks and Embeddings

Stanford CS224W : Machine Learning with Graphs, Fall 2021:

4 Graph as Matrix: PageRank, Random Walks and Embeddings
site

1. Overview

Treating a graph as a matrix allows us to:

Determine node importance via random walk (PageRank)
Obtain node embeddings via matrix factorization (MF)
View other node embeddings (e.g. Node2Vec) as MF

Random walk, matrix factorization and node embeddings are closely related!

2. PagaRank (aka the Google Algorithm)

2.1 Modelling

2.1.1 Link Analysis Algorithms

Link Analysis approaches to compute the importance of nodes in a graph:

PageRank
Personalized PageRank (PPR)
Random Walk with Restarts

2.1.2 Link as Votes

Idea: Links as votes: Page is more important if it has more links

Links from important pages count more: Recursive question!

2.1.3 PageRank: The "Flow" Model

A "vote" from an important page is worth more:

Each link's vote is proportional to the importance of its source page
If page \(i\) with importance \(r_i\) has \(d_i\) out-links, each link gets \(r_i / d_i\) votes
Page \(j\)'s own importance \(r_j\) is the sum of the votes on its in-links

A page is important if it is pointed to by other important pages

Define "rank" \(r_j\) for node \(j\):

\[r_j = \sum_{i \to j} \frac{r_i}{d_i} = \sum_{i=1}^{N} A_{ij} \frac{r_i}{d_i} \]

where \(d_i\) is the out-degree of node \(i\)

Note : using Gaussian elimination to solve this system of linear equations is a bad idea!

2.1.4 PageRank: Matrix Formulation

Stochastic adjacency matrix \(M\):

\(d_i\) is the out-degree of node \(i\)
\(\boldsymbol{M}\) is a column stochastic matrix
- If \(i \to j\), then \(m_{ij} = \frac{1}{d_i}\), otherwise \(m_{ij}=0\):
  
  \[m_{ij} = \begin{cases} 1/d_i, & \text{If } d_i > 0 \\ 0, & \text{If } d_i = 0 \\ \end{cases} \]
- Every column is same and sums to 1
- \(M = (K A)^T\), where \(A\) is the adjacency matrix. \(K\) is a diagonal matrix with a diagonal \(\left[\frac{1}{d_1}, \frac{1}{d_2}, \cdots, \frac{1}{d_N} \right]\), and denotes \(\frac{1}{d_i}=0\) if \(d_i=0\)

Rank Vector \(\boldsymbol{r}\) : An entry per page

\(r_i\) is the importance score of page \(i\)
\(\sum_i r_i = 1\)

The flow equations can be written:

\[\boldsymbol{r} = \boldsymbol{M r}, \quad \text{where} \quad r_j = \sum_{i \to j} \frac{r_i}{d_i} \]

2.1.5 Connection to Random Walk

Imagine a random web surfer:

At any time \(t\), surfer is on some page \(i\)
At time \(t+1\), the surfer follows an out-link from \(i\) uniformly at random
Ends up on some page \(j\) linked from \(i\)
Process repeats indefinitely

Let: \(\boldsymbol{p}(t)\) is a vector whose \(i\)th coordinate is the probability that the surfer is page \(i\) at time \(t\). Thus, \(\boldsymbol{p}(t)\) is a probability distribution over pages.

2.1.5 The Stationary Distribution

The surfer at time \(t+1\)

Follow a link uniformly at random:

\[\boldsymbol{p}(t + 1) = \boldsymbol{M} \cdot \boldsymbol{p}(t) \]

Suppose the random walk reaches a state

\[\boldsymbol{p}(t + 1) = \boldsymbol{M} \cdot \boldsymbol{p}(t) = \boldsymbol{p}(t) \]

then \(\boldsymbol{p}(t)\) is stationary distribution of a random walk

The original rank vector \(\boldsymbol{r}\) satisfies \(\boldsymbol{r} = \boldsymbol{M} \cdot \boldsymbol{r}\),

Thus, \(\boldsymbol{r}\) is a stationary distribution for the random walk

2.1.6 Eigenvector Formulation

The flow equation:

\[1 \cdot \boldsymbol{r} = \boldsymbol{M} \cdot \boldsymbol{r} \]

Thus, the rank vector \(\boldsymbol{r}\) is an eigenvector of the stochastic adjacency matrix \(\boldsymbol{M}\) with eigenvalue \(1\)

Starting from any vector \(\boldsymbol{u}\), the limit \(\boldsymbol{M(M(\cdots M(M u)))}\) is the long-term distribution of the surfers.
- PageRank = Limiting distribution = principal eigenvector of \(\boldsymbol{M}\)
- Note : If \(\boldsymbol{r}\) is the limit of the product \(\boldsymbol{M M\cdots M M u}\), then \(\boldsymbol{r}\) satisfies the flow equation \(1 \cdot \boldsymbol{r} = \boldsymbol{M} \cdot \boldsymbol{r}\)
- So \(\boldsymbol{r}\) is the principal eigenvector of \(\boldsymbol{M}\) with eigenvalue \(1\)

We can now efficiently solve for \(\boldsymbol{r}\): Power iteration method.

2.1.7 Summary

PageRank : Measures importance of nodes in a graph using the link structure of the web

Models a random web surfer using the stochastic adjacency matrix \(\boldsymbol{M}\)

PageRank solves \(\boldsymbol{r} = \boldsymbol{Mr}\) where \(\boldsymbol{r}\) can be viewed as both the principle eigenvector of \(\boldsymbol{M}\) and as the stationary distribution of a random walk over the graph

2.2 Solution

2.2.1 Power Iteration Method

Given a web graph with \(N\) nodes, where the nodes are pages and edges are hyperlinks

Power iteration : a simple iterative scheme

Step 1 : Initialize \(\boldsymbol{r}^{(0)} = \left[\frac{1}{N}, \frac{1}{N}, \cdots, \frac{1}{N} \right]^{\top}\)
Step 2 : Iterate \(\boldsymbol{r}^{(t+1)} = \boldsymbol{M} \cdot \boldsymbol{r}^{(t)}\), where

\[r_j = \sum_{i \to j} \frac{r_i}{d_i} \]
and \(d_i\) is the out-degree of node \(i\).
Step 3 : Stop when \(|\boldsymbol{r}^{(t+1)} - \boldsymbol{r}^{(t)}|_1 < \varepsilon\)

where \(|\boldsymbol{x}|_1 = \sum_{i=1}^{N}|x_i|_1\) is the L1 norm. Other vector norm (e.g., Euclidean) can also be used.

About 50 iterations is sufficient to estimate the limiting solution.

2.2.2 Problems of PageRank

Two problems of PageRank

(1) Some pages are dead ends (have no out-links)

Such pages cause importance to "leak out"

(2) Spider traps (all out-links are within the group)

Eventually spider traps absorb all importance

2.2.3 Solution to Spider Traps

The "Spider trap" problem:

\[\begin{array}{r} \curvearrowleft \\ a \longrightarrow b \ \end{array} \qquad \qquad \begin{array}{c|l|l|l|l} \text{Iter. } t & 0 & 1 & 2 & 3 \\ \hline r_{a}^{(t)} & 1 & 0 & 0 & 0 \\ r_{b}^{(t)} & 0 & 1 & 0 & 0 \end{array} \]

Solution for spider traps: At each time step, the random surfer has two options:

(1) With probability \(\beta\), follow a link at random
(2) With probability \(1-\beta\), jump to a random page
Common values for \(\beta\) are in the range 0.8 to 0.9

Surfer will teleport out of spider trap within a few time steps

2.2.4 Solution to Dead Ends

The "Dead end" problem:

\[a \longrightarrow b \qquad \qquad \begin{array}{c|l|l|l|l} \text{Iter. } t & 0 & 1 & 2 & 3 \\ \hline r_{a}^{(t)} & 1 & 0 & 0 & 0 \\ r_{b}^{(t)} & 0 & 1 & 0 & 0 \end{array} \]

Teleports : Follow random teleport links with total probability 1.0 from dead-ends

Adjust matrix accordingly

\[\begin{array}{l|c|c|c} & y & a & m \\ \hline \text{node } y & 1/2 & 1/2 & 0 \\ \hline \text{node } a & 1/2 & 0 & 0 \\ \hline \text{node } m & 0 & 1/2 & 0 \\ \end{array} \qquad \Rightarrow \qquad \begin{array}{l|c|c|c} & y & a & m \\ \hline \text{node } y & 1/2 & 1/2 & 1/3 \\ \hline \text{node } a & 1/2 & 0 & 1/3 \\ \hline \text{node } m & 0 & 1/2 & 1/3 \\ \end{array} \]

2.2.5 Teleports

Why are dead-ends and spider traps a problem and why do teleports solve the problem?

Spider-traps are not a problem, but with traps PageRank scores are not what we want
- Solutions : Never get stuck in a spider trap by teleporting out of it in a finite number of steps
Dead-ends are a problem. The matrix is not column stochastic so our initial assumptions are not met
- Solution : Make matrix column stochastic by always teleporting when there is nowhere else to go

2.2.6 Random Teleports

This original formulation assumes that \(\boldsymbol{M}\) has no dead ends. We can either:

preprocess matrix \(\boldsymbol{M}\) to remove all dead ends
or explicitly follow random teleport links with probability 1.0 from dead-ends

Google's solution that does it all: At each step, random surfer has two options:

(1) With probability \(\beta\), follow a link at random
(2) With probability \(1-\beta\), jump to a random page

PageRank equation (Brin-Page, 98)

\[r_{j}=\sum_{i \to j} \beta \frac{r_{i}}{d_{i}}+(1-\beta) \frac{1}{N} \]

where : \(d_i\) is the out-degree of node \(i\)

The Google Matrix \(\boldsymbol{G}\):

\[\boldsymbol{G} = \beta \boldsymbol{M} + (1 - \beta) \left[ \, \frac{1}{N} \, \right]_{N \times N} \]

where \(\left[\frac{1}{N} \right]_{N \times N}\) is a \(N\) by \(N\) matrix where all entries are \(\frac{1}{N}\).

Then, we have a recursive problem \(\boldsymbol{r} = \boldsymbol{G} \cdot \boldsymbol{r}\), and the Power method still works.

In parctive \(\beta = 0.8 \sim 0.9\) (make 5 steps on average, jump)

2.2.7 Summary: Solving PageRank

PageRank solves for \(\boldsymbol{r} = \boldsymbol{G r}\) and can be efficiently computed by power iteration of the stochastic adjacency matrix (\(\boldsymbol{G}\))

Adding random uniform teleportation solves issues of dead-ends and spider-traps

2.3 Implement

代码实例： Page Rank

点击查看代码

def page_rank(graph, alpha=0.85, weight=None, tol=1e-4, max_iter=1000):
    
    node_list = list(graph.nodes())
    node_num = len(node_list)

    adj_mat = nx.adjacency_matrix(graph, weight=weight).toarray()
    
    # out degree
    out_degree = [round(v * (graph.number_of_nodes()-1)) for v in nx.out_degree_centrality(graph).values()]
    out_degree = [1. / v if v > 0. else 0. for v in out_degree]

    M_mat = np.dot(np.diag(out_degree), adj_mat).T
    
    # google matrix
    Google_mat = alpha * M_mat + ((1 - alpha) / node_num) * np.ones((node_num, node_num))

    # initialization PR value
    pr = (1 / node_num) * np.ones(node_num)

    for n in range(max_iter):

        pr_old = pr.copy()

        # updata PR value
        pr = np.dot(Google_mat, pr_old)

        # current tolerance
        cur_tol = np.sum(np.abs(pr - pr_old))

        if cur_tol <= tol:
            break

        # print(n, cur_tol)
    else:
        print('Warning: iterations do not convergence, the object tolerance is {0}' \
              'while the current tolerance is {1}'.format(tol, cur_tol))
        sys.exit(0)
        
    pr = {k : v for k, v in zip(node_list, pr.tolist())}
    return pr
# ===========================================================================

代码实例： networkx.pagerank() 函数

注意：networkx.pagerank() 中 tol 参数实际为 len(G) * tol

点击查看代码

tol = 1e-7

pr = page_rank(graph, alpha=0.85, tol=len(graph) * tol, max_iter=1000)
pr_1 = pd.Series(pr)

pr = nx.pagerank(graph, alpha=0.85, max_iter=100, tol=tol)
pr_2 = pd.Series(pr)

delta = pr_1 - pr_2
(delta / pr_2).abs().sort_values(ignore_index=True).plot()

3. Random Walk with Restarts and Personalized PageRank

3.1 Example: Recommendation

Given : A bipartite graph representing user and item interactions (e.g. purchase)

4. Matrix Factorization and Node Embeddings

4.1 Embeddings & Matrix Factorization

Node Embedding : encoder as an embedding lookup

Embedding matrix \(\boldsymbol{Z}\):

each column represents the embedding vector for a specific node
the number of rows represents the dimension / size of embeddings

Objective : maximize \(\boldsymbol{z}_v^{\top} \boldsymbol{z}_u\) for node pairs \((u, v)\) that are similar

4.2 Connection to Matrix Factorization

Simplest node similarity : Nodes \(u,v\) are similar if they are connected by an edge

This means: \(\boldsymbol{z}_v^{\top} \boldsymbol{z}_u = A_{u,v}\), which is the \((u,v)\) entry of the graph adjacency matrix \(\boldsymbol{A}\)
There, \(\boldsymbol{Z}^{\top} \boldsymbol{Z} = \boldsymbol{A}\)

4.3 Matrix Factorization

The embedding dimension \(d\) ( number of rows in \(\boldsymbol{Z}\) ) is much smaller than number of nodes \(n\).

Exact factorization \(\boldsymbol{A} = \boldsymbol{Z}^{\top} \boldsymbol{Z}\) is generally not possible.

However, we can learn \(\boldsymbol{Z}\) approximately.

Objective : \(\min \limits_{\boldsymbol{Z}} = \| \boldsymbol{A} - \boldsymbol{Z}^{\top} \boldsymbol{Z} \|_2\)

We optimize \(\boldsymbol{Z}\) such that is minimizes the L2 norm (Frobenius norm) of \(\boldsymbol{A} - \boldsymbol{Z}^{\top} \boldsymbol{Z}\)
Note in Lecture 3 the softmax was used instead of L2. But the
goal to approximate \(\boldsymbol{A}\) with \(\boldsymbol{Z}^{\top} \boldsymbol{Z}\) is the same.

Conclusion : Inner product decoder with node similarity defined by edge connectivity is equivalent to matrix factorization of \(\boldsymbol{A}\).

4.4 Random Walk-based Similarity

DeepWalk and node2vec have a more complex node similarity definition based on random walks

DeepWalk is equivalent to matrix factorization of the following complex matrix expression (Qiu et al, 2018):

\[\log \left(\mathrm{vol}(G)\left(\frac{1}{T} \sum_{r=1}^{T}\left(D^{-1} A\right)^{r}\right) D^{-1}\right)-\log b \]

where:

\(\mathrm{vol}(G) = \sum_i \sum_j A_{i,j}\) is the volume of graph
\(T\) is the context window size \(T = |N_R(u)|\)
\(\boldsymbol{D}\) is the diagonal matrix and \(D_{u,u} = \text{deg} (u)\)
\(r\) is power of normalized adjacency matrix
\(b\) is the number of negative samples

Node2vec can also be formulated as a matrix factorization (albeit a more complex matrix) (Qiu et al, 2018)

4.4 Limitations

Limitations of node embeddings via matrix factorization and random walks:

(1) Cannot obtain embeddings for nodes not in the training set

Cannot compute its embedding with DeepWalk / node2vec. Need to recompute all node embeddings

(2) Cannot capture structural similarity

DeepWalk and node2vec do not capture structural similarity.

(3) Cannot utilize node, edge and graph features

DeepWalk / node2vec embeddings do not incorporate such node features

Solution to these limitations: Deep Representation Learning and Graph Neural Networks

5. Summary

PageRank

Measures importance of nodes in graph
Can be efficiently computed by power iteration of adjacency matrix

Personalized PageRank (PPR)

Measures importance of nodes with respect to a particular node or set of nodes
Can be efficiently computed by random walk

Node embeddings based on random walks can be expressed as matrix factorization

Viewing graphs as matrices plays a key role in all above algorithms!

References

Qiu, Jiezhong, et al. “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2vec.” Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 459–67. arxiv

Page Rank, Wikipedia, site

posted @ 2022-07-13 16:09 veager 阅读(97) 评论(0) 收藏举报

刷新页面返回顶部