PageRank例子
PageRank计算公式为:$S(V_i)=(1-d)+d \times \sum \limits _{V_j \in In(V_i)} \frac{1}{|Out(V_j)|}S(V_j) $
\(In(V_i)\):链接到\(V_i\)的节点集合
\(Out(V_j)\):从\(V_j\)出发链接到的节点集合
\(d\):介于(0,1)的因子
例子:

计算\(e\)节点:
\(\begin{aligned} \operatorname{In}\left(v_{e}\right) &=\{a, b\}, j \in\{a, b\} \\ \sum_{j \in\{a, b\}} \frac{1}{\left|\operatorname{Out}\left(V_{j}\right)\right|} S\left(V_{j}\right) &=\frac{1}{\left|\operatorname{Out}\left(V_{a}\right)\right|} S\left(V_{a}\right)+\frac{1}{\left|\operatorname{Out}\left(V_{b}\right)\right|} S\left(V_{b}\right) \\ &=\frac{1}{|\{e\}|} S\left(V_{a}\right)+\frac{1}{|\{e, f\}|} S\left(V_{b}\right) \\ &=S\left(V_{a}\right)+\frac{1}{2} S\left(V_{b}\right) \end{aligned}\)
\(S\left(V_{e}\right)=(1-d)+d *\left(S\left(V_{a}\right)+\frac{1}{2} S\left(V_{b}\right)\right)\)
把链接表示为表格形式
| a | b | e | f | |
|---|---|---|---|---|
| a | 0 | 0 | 0 | 0 |
| b | 0 | 0 | 0 | 0 |
| e | 1 | 1 | 0 | 0 |
| f | 0 | 1 | 0 | 0 |
\(\{a-e\}=1\)表示\(a\)是\(e\)的入链,\(e\)是\(a\)的出链
\(\large \frac{1}{|Out(V_i)|}\)表示为表格形式
| a | b | e | f | |
|---|---|---|---|---|
| a | 0 | 0 | 0 | 0 |
| b | 0 | 0 | 0 | 0 |
| e | 1 | 0.5 | 0 | 0 |
| f | 0 | 0.5 | 0 | 0 |
\(a\)只有一条出链\(\{a-e\}\),\(b\)有两条出链\(\{b-e,b-f\}\)
即\(Out(V_a)=1\),\(Out(V_b)=2\)
使用矩阵表示为:\(\left[\begin{array}{cccc}0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0.5 & 0 & 0 \\ 0 & 0.5 & 0 & 0\end{array}\right] *\left[\begin{array}{c}1 \\ 1 \\ 1 \\ 1\end{array}\right]=\left[\begin{array}{c}0 \\ 0 \\ 1.5 \\ 0.5\end{array}\right]\),4个节点的初始权重都为1,1次迭代计算如下图

使用python模拟
import numpy as np
g = [[0, 0, 0, 0],
[0, 0, 0, 0],
[1, 0.5, 0, 0],
[0, 0.5, 0, 0]]
g = np.array(g)
pr = np.array([1, 1, 1, 1]) # initialization for a, b, e, f is 1
d = 0.85
for iter in range(10):
pr = 0.15 + 0.85 * np.dot(g, pr)
print(iter)
print(pr)
输出:
0
[0.15 0.15 1.425 0.575]
1
[0.15 0.15 0.34125 0.21375]
2
[0.15 0.15 0.34125 0.21375]
3
[0.15 0.15 0.34125 0.21375]
4
[0.15 0.15 0.34125 0.21375]
5
[0.15 0.15 0.34125 0.21375]
6
[0.15 0.15 0.34125 0.21375]
7
[0.15 0.15 0.34125 0.21375]
8
[0.15 0.15 0.34125 0.21375]
9
[0.15 0.15 0.34125 0.21375]
10
[0.15 0.15 0.34125 0.21375]
最终\(e\)节点的权重为0.34125
参考:https://towardsdatascience.com/textrank-for-keyword-extraction-by-python-c0bae21bcec0

浙公网安备 33010602011771号