PageRank例子

PageRank计算公式为:$S(V_i)=(1-d)+d \times \sum \limits _{V_j \in In(V_i)} \frac{1}{|Out(V_j)|}S(V_j) $

\(In(V_i)\):链接到\(V_i\)的节点集合

\(Out(V_j)\):从\(V_j\)出发链接到的节点集合

\(d\):介于(0,1)的因子

例子:

image-20200522161540018

计算\(e\)节点:

\(\begin{aligned} \operatorname{In}\left(v_{e}\right) &=\{a, b\}, j \in\{a, b\} \\ \sum_{j \in\{a, b\}} \frac{1}{\left|\operatorname{Out}\left(V_{j}\right)\right|} S\left(V_{j}\right) &=\frac{1}{\left|\operatorname{Out}\left(V_{a}\right)\right|} S\left(V_{a}\right)+\frac{1}{\left|\operatorname{Out}\left(V_{b}\right)\right|} S\left(V_{b}\right) \\ &=\frac{1}{|\{e\}|} S\left(V_{a}\right)+\frac{1}{|\{e, f\}|} S\left(V_{b}\right) \\ &=S\left(V_{a}\right)+\frac{1}{2} S\left(V_{b}\right) \end{aligned}\)

\(S\left(V_{e}\right)=(1-d)+d *\left(S\left(V_{a}\right)+\frac{1}{2} S\left(V_{b}\right)\right)\)

把链接表示为表格形式

a b e f
a 0 0 0 0
b 0 0 0 0
e 1 1 0 0
f 0 1 0 0

\(\{a-e\}=1\)表示\(a\)\(e\)的入链,\(e\)\(a\)的出链

\(\large \frac{1}{|Out(V_i)|}\)表示为表格形式

a b e f
a 0 0 0 0
b 0 0 0 0
e 1 0.5 0 0
f 0 0.5 0 0

\(a\)只有一条出链\(\{a-e\}\),\(b\)有两条出链\(\{b-e,b-f\}\)

\(Out(V_a)=1\)\(Out(V_b)=2\)

使用矩阵表示为:\(\left[\begin{array}{cccc}0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0.5 & 0 & 0 \\ 0 & 0.5 & 0 & 0\end{array}\right] *\left[\begin{array}{c}1 \\ 1 \\ 1 \\ 1\end{array}\right]=\left[\begin{array}{c}0 \\ 0 \\ 1.5 \\ 0.5\end{array}\right]\),4个节点的初始权重都为1,1次迭代计算如下图

image-20200522212904182

使用python模拟

import numpy as np
g = [[0, 0, 0, 0],
     [0, 0, 0, 0],
     [1, 0.5, 0, 0],
     [0, 0.5, 0, 0]]
g = np.array(g)
pr = np.array([1, 1, 1, 1]) # initialization for a, b, e, f is 1
d = 0.85
for iter in range(10):
    pr = 0.15 + 0.85 * np.dot(g, pr)
    print(iter)
    print(pr)

输出:

0
[0.15  0.15  1.425 0.575]
1
[0.15    0.15    0.34125 0.21375]
2
[0.15    0.15    0.34125 0.21375]
3
[0.15    0.15    0.34125 0.21375]
4
[0.15    0.15    0.34125 0.21375]
5
[0.15    0.15    0.34125 0.21375]
6
[0.15    0.15    0.34125 0.21375]
7
[0.15    0.15    0.34125 0.21375]
8
[0.15    0.15    0.34125 0.21375]
9
[0.15    0.15    0.34125 0.21375]
10
[0.15    0.15    0.34125 0.21375]

最终\(e\)节点的权重为0.34125

参考:https://towardsdatascience.com/textrank-for-keyword-extraction-by-python-c0bae21bcec0

posted @ 2020-05-22 21:34  yueqiudian  阅读(802)  评论(0)    收藏  举报