Network Science: Measures and metrics

Network Science: Measures and metrics

0 Denotation

  • \(\mathcal{U} = \mathcal{V} = \{ v_1, v_2, \cdots, v_N \}\), node set

  • \(\mathcal{E} = \{e_{uv} | u, v \in \mathcal{V} \} = \{ e_1, e_2, \cdots, e_M \}\), edge set

1 Grant Component

  • A graph is said to be connected if every pair of vertices in the graph is connected. This means that there is a path between every pair of vertices.

  • A directed graph is called weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph.

    It is unilaterally connected or unilateral (also called semiconnected) if it contains a directed path from \(u\) to \(v\) or a directed path from \(v\) to \(u\) for every pair of vertices \(u\), \(v\)

  • A directed graph is called strongly connected if there is a path in each direction between each pair of vertices of the graph.

代码

NetworkX Component 相关函数

# number of connected components
nx.number_connected_components(G)
nx.number_weakly_connected_components(G)
nx.number_strongly_connected_components(G)

# undirected graph
nx.connected_components(G)
# directed graph
nx.weakly_connected_components(G)
nx.strongly_connected_components(G)
  • 函数代码: the grant component
点击查看代码
def get_grant_component(graph):
    '''
    get the Grant Component of 'graph'
    '''
    if not nx.is_directed(graph):
        graph = graph.subgraph(sorted(nx.connected_components(graph), key=len, reverse=True)[0])
    else:
        graph = graph.subgraph(sorted(nx.weakly_connected_components(graph), key=len, reverse=True)[0])
        # nx.strongly_connected_components(G)
    return graph

2 Centrality

NetworkX 中计算 centrality 相关函数

2.1 Degree Centrality (DC)

For a undirected graph, the degree centrality is computed as:

  • \(\mathbf{A}\) is the adjacency matrix, and \(a_{ij}=1\) if there exists a edge between node \(v_i\) and \(v_j\)

  • \(\mathbf{A}\) is symmetric

Then we have:

\[d_i = \sum_j a_{ij} = (\mathbf{A})_{i :} \mathbf{1} \qquad \boldsymbol{d} = \mathbf{A} \mathbf{1} \]

  • where \((\mathbf{A})_{i :}\) represents the \(i\)th row of the matrix \(\mathbf{A}\)

  • \(\mathbf{1}\) is a all-one column vector

For a directed graph, the in-degree centrality and out-degree centrality are computed as:

  • \(\mathbf{A}\) may not be symmetric

\[\begin{aligned} d_i^{\text{In}} &= \sum_j a_{ji} = (\mathbf{A}^{\top})_{i :} \mathbf{1} \qquad & \boldsymbol{d}^{\text{In}} &= \mathbf{A} \mathbf{1} \\ d_i^{\text{Out}} &= \sum_j a_{ij} = (\mathbf{A})_{i :} \mathbf{1} \qquad & \boldsymbol{d}^{\text{Out}} &= \mathbf{A}^{\top} \mathbf{1} \end{aligned} \]

代码

# degree centrality for undirected graph
nx.degree_centrality(G)
# in-degree centrality and out-degree centrality for directed graph
nx.in_degree_centrality(G)
nx.out_degree_centrality(G)

Note: in the degree functions in NetworkX package, degree values are normalized by dividing by the maximum possible degree in a simple graph \(N-1\) where \(N\) is the number of nodes in the graph

  • 函数代码: non-normalized degree centrality
点击查看代码
def degree_centrality(graph):
    dc = {node : len(list(graph.neighbors(node))) for node in graph.nodes()}
    return dc

def in_degree_centrality(graph):
    dc_in = {node : len(list(graph.predecessors(node))) for node in graph.nodes()}
    return dc_in 

def out_degree_centrality(graph):
    dc_out = {node : len(list(graph.successors(node))) for node in graph.nodes()}
    return dc_out

def nx_degree_centrality(graph):
    dc = {k : round(v * (graph.number_of_nodes()-1)) for k, v in nx.degree_centrality(graph).items()}
    return dc

def nx_in_degree_centrality(graph):
    dc_in = {k : round(v * (graph.number_of_nodes()-1)) for k, v in nx.in_degree_centrality(graph).items()}
    return dc_in

def nx_out_degree_centrality(graph):
    dc_out = {k : round(v * (graph.number_of_nodes()-1)) for k, v in nx.out_degree_centrality(graph).items()}
    return dc_out
  • 测试实例
点击查看代码
# Directed graph
# graph = graph.to_directed() 
nx_in_dc = pd.Series(nx_in_degree_centrality(graph)).sort_index()
nx_out_dc = pd.Series(nx_out_degree_centrality(graph)).sort_index()

in_dc = pd.Series(in_degree_centrality(graph)).sort_index()
out_dc = pd.Series(out_degree_centrality(graph)).sort_index()

print((nx_in_dc - in_dc).value_counts())
print((nx_out_dc - out_dc).value_counts())

# Undirected graph
# graph = graph.to_undirected() 
nx_dc = pd.Series(nx_degree_centrality(graph)).sort_index()
dc = pd.Series(degree_centrality(graph)).sort_index()

print((nx_dc - dc).value_counts())

2.2 Strength Centrality (SC)

Strength centrality is only used for the weighted graph

For a undirected graph,

\[s_i = \sum_j w_{ij} = (\mathbf{W})_{i :} \mathbf{1} \]

For a directed graph,

\[\begin{aligned} s_i^{\text{In}} = \sum_j w_{ji} = (\mathbf{W}^{\top})_{i :} \mathbf{1} \\ s_i^{\text{Out}} = \sum_j w_{ij} = (\mathbf{W})_{i :} \mathbf{1} \end{aligned} \]

  • 函数代码: strength centrality
点击查看代码
def strength_centrality(graph, weight):
    sc = {node : sum([float(graph[node][neigh][weight]) for neigh in graph.neighbors(node)]) for node in graph.nodes()}
    return sc

def in_strength_centrality(graph, weight):
    in_sc = {node : sum([float(graph[neigh][node][weight]) for neigh in graph.predecessors(node)]) for node in graph.nodes()}
    return in_sc

def out_strength_centrality(graph, weight):
    out_sc = {node : sum([float(graph[node][neigh][weight]) for neigh in graph.successors(node)]) for node in graph.nodes()}
    return out_sc

2.3 Closeness Centrality (CC)

Closeness centrality of a node \(u\) is the reciprocal of the average shortest path distance to \(u\) over all \(n-1\) reachable nodes.

\[CC_i = \dfrac{N - 1}{ \sum_{v_i \in \mathcal{V}, v_i \neq v_j} d_{i,j}^{\text{min}} }, \]

2.4 Betweenness Centrality (BC)

2.5 Eigenvector Centrality (EC)

代码

nx.eigenvector_centrality(G, max_iter=100, tol=1e-06, nstart=None, weight=None)

  • 对于有向图(directed graph)

    • 则默认计算左特征向量 (left eigenvector),表示 incoming edges 的 EC。

    • 如需要计算右特征向量 (right eigenvector),即 outgoing edges 的 EC,则可先对 graph 进行 G.reverse(),即令 \(\mathbf{A} = \mathbf{A}^{\top}\)

  • 代码函数:eigenvector centrality

点击查看代码
def eigenvector_centrality(graph, weight=None, directed=None):
    '''
    parameters:
    ---
    graph () :

    weight (str or None, default is None) :
        None : unweighted graph
    directed (str in {'in', 'out'} or None) :
        'in' or None : compute the left eigenvector, i.e., in-eigenvector centrality
        'out' : compute the left eigenvector, i.e., out-eigenvector centrality
    '''
    import networkx as nx
    import numpy as np
    
    # adjacency matrix
    adj_mat = nx.to_numpy_array(graph, weight=weight, nodelist=graph.nodes())
    
    # in-eigenvector centrality
    if (directed is None) or (directed == 'in'):
        eigvalue, eigvector = np.linalg.eig(adj_mat)

    # out-eigenvector centrality
    elif (directed == 'out'):
        eigvalue, eigvector = np.linalg.eig(adj_mat.T)
    
    ec = eigvector[:, 0].real
    ec = dict(zip(list(graph.nodes()), ec.tolist()))
    return ec 
  • 测试实例

    • 大规模网络结果一致性较差(e.g., Singapore bus network)
点击查看代码
# Undirected & unweighted graph
graph = graph.to_undirected() 
ec = pd.Series(eigenvector_centrality(graph)).sort_index()
nx_ec = pd.Series(nx.eigenvector_centrality(graph)).sort_index()

# error 
ape = (nx_ec - ec).abs() / nx_ec.abs()
print(ape.sum())
print(ape.sort_values())


# Directed & unweighted graph
graph = graph.to_directed() 
in_ec  = pd.Series(eigenvector_centrality(graph, directed='in')).sort_index()
out_ec = pd.Series(eigenvector_centrality(graph, directed='out')).sort_index()

nx_in_ec = pd.Series(nx.eigenvector_centrality(graph)).sort_index()
nx_out_ec = pd.Series(nx.eigenvector_centrality(graph)).sort_index()

print(((nx_in_ec - in_ec).abs() / in_ec.abs()).sum())
print(((nx_out_ec - out_ec).abs() / out_ec.abs()).sum())

2.6 Kate Centrality (KC)

Kate centrality of a specific node \(v_i\) is defined as (Katz centrality, Wikipedia; Hamilton et al, pp. 18):

\[c_i = \sum _{k=1}^{\infty } \sum _{j=1}^{N} \alpha ^{k}(\mathbf{A}^{k})_{ji} \]

  • where the parameter \(\alpha < \dfrac{1}{\lambda_{\max}}\) and \(\lambda_{\max}\) represents the maximum eigenvalue of the matrix \(\mathbf{A}\)

which can be rewritten in the matrix format (according to geometric series of matrices, refers to Hamilton et al, pp. 18, or this blog):

\[\text{Way I}: \qquad \boldsymbol{c} = ((\mathbf{I} - \alpha \mathbf{A}^{\top})^{-1} - \mathbf{I}){\boldsymbol{1}} \]

  • where \(\mathbf{I}\) is a \(N \times N\) indentiy matrix and \(\boldsymbol{1}\) is a \(N\)-dimension all-one vector.

Here also has another format (in NetworkX package, or Newman, 2010, pp. 172):

\[x_i = \alpha \sum_{j=1}^N A_{ji} x_j + \beta, \quad \forall i = 1,2,\cdots,N \]

and in matrix format:

\[\text{Way II}: \qquad \boldsymbol{x} = \alpha \mathbf{A}^{\top} \boldsymbol{x} + \beta \boldsymbol{1} \]

  • where the parameter \(\alpha < \dfrac{1}{\lambda_{\max}}\) and \(\beta\) controls the initial centrality

  • When \(\alpha = 1 / \lambda_{\max}\) and \(\beta = 0\), Katz centrality is the same as eigenvector centrality.

Usually let \(\beta=1\), and we have:

\[\boldsymbol{x} = (\mathbf{I} - \alpha \mathbf{A}^{\top})^{-1} \boldsymbol{1} \]

注意: 对于 directed graph,上式计算的是 incoming edge 的 KC(即,其他 node 到 node \(i\))。如果要计算 outgoing edge 的 KC,则需要令 \(\mathbf{A} = \mathbf{A}^{\top}\)

代码实现

networkx 库中 katz centrality

nx.katz_centrality(G, alpha=0.1, beta=1.0, ...)

  • parameter:

    • normalized (bool, optional, default=True): if True, will normalize the index as:

      \[x_i = \frac{x_i}{\sqrt{\sum \limits_{j=1}^{N} x_j^2 }} = \frac{x_i}{\| \boldsymbol{x} \|_2}, \qquad i = 1,2,\cdots, N \]

  • 注意,对于 directed graph,nx.katz_centrality() 函数默认计算的是 incoming edge 的 KC(也就是,其他 node 到 node \(i\)

  • 如果要计算 outgoing edge 的 KC,则使用 G=G.reverse()

  • 代码实例 1: Wikipedia 的 Katz centrality 计算公式

点击查看代码
def kate_centrality(graph, alpha, weight=None):

    node_list = list(graph.nodes())
    node_num = len(node_list)

    adj_mat = nx.adjacency_matrix(graph, weight=weight).toarray()

    # check "alpha"
    eigenvalues = np.linalg.eigvals(adj_mat)
    max_eigenvalue = np.max(eigenvalues.real)

    try:
        assert alpha <= 1 / max_eigenvalue
    except:
        print('alpha must be no more than the reciprocal of the maximum eigenvalue of adjacency matrix,' \
              'but get alpha = {0}, and 1 / max eigenvalue = {1}'.format(alpha, 1 / max_eigenvalue))
        sys.exit(0)
    
    inv = np.linalg.inv(np.eye(node_num) - alpha * adj_mat.T)
    kc = np.dot(inv, np.ones(node_num)) - np.ones(node_num)
    kc = {k : v for k, v in zip(node_list, kc.tolist())}
    return kc
# ============================================================
def nx_kate_centrality(graph, alpha, weight=None):
    
    # check "alpha"
    adj_mat = nx.adjacency_matrix(graph, weight=weight).toarray()
    eigenvalues = np.linalg.eigvals(adj_mat)
    max_eigenvalue = np.max(eigenvalues.real)

    try:
        assert alpha <= 1 / max_eigenvalue
    except:
        print('alpha must be no more than the reciprocal of the maximum eigenvalue of adjacency matrix,' \
              'but get alpha = {0}, and 1 / max eigenvalue = {1}'.format(alpha, 1 / max_eigenvalue))
        sys.exit(0)
    
    kc = nx.katz_centrality(graph, alpha=alpha, 
                            beta=1.0, 
                            weight=weight, 
                            max_iter=100, tol=1.0e-9,
                            normalized=False)
    
    # minus one
    kc = {k : v - 1. for k, v in kc.items()}
    return kc
# ============================================================

代码实例 2: 计算 Kate centrality

点击查看代码
alpha = 0.01
weight = weight

kc_1 = pd.Series(kate_centrality(graph, alpha, weight))
kc_2 = pd.Series(nx_kate_centrality(graph, alpha, weight))

fig, ax = plt.subplots(1, 1)
ax.set_ylim([0, 1])

delta = kc_2 - kc_1
(delta.abs() / kc_2).sort_values(ignore_index=True).plot(ax=ax)

3 Clustering

3.1

3.2 Clustering Coefficient (CluC)

(1) Undirected Graph

For the unweighted graph, the clustering coefficient of node \(v_i\) is computed as:

\[C_i = \frac{ \frac{1}{2} \sum \limits_{j} \sum \limits_{h} a_{i j} \, a_{i h} \, a_{j h} }{ \frac{1}{2} d_i\left(d_i-1\right)} = \frac{(\mathbf{A}^3)_{i i}}{d_i\left(d_i-1\right)} \]

  • where \((\mathbf{A}^3)_{i i}\) is the \(i\)th element of the main diagonal of \(\mathbf{A}^3 = \mathbf{A A A}\)

  • \(d_i\) is the degree of node \(v_i\)

  • \(C_i \in [0, 1]\)

For the weighted graph

\[\widetilde{C}_i = \frac{ \frac{1}{2} \sum \limits_{j} \sum \limits_{h} w_{ij}^{1 / 3} w_{ih}^{1 / 3} w_{j h}^{1 / 3}}{ \frac{1}{2} d_i (d_i-1)} = \frac{\left(\mathbf{W}^{[1 / 3]}\right)_{i i}^3}{d_i\left(d_i-1\right)} \]

  • We define \(\mathbf{W}^{[1/k]} = [w_{ij}^{1/k}]_{ij}\), i.e., the matrix obtained from \(\mathbf{W}\) by taking the \(k\)th root of each entry

(2) Directed Graph (Fagiolo, 2007)

Fagiolo (2007) proposed a method to compute the clustering coefficient of the directed graph and this method is applied in NetworkX package.

For unweighted graph, the clustering coefficient of node \(v_i\) is computed as:

\[C_i^D =\frac{t_i^D}{T_i^D} = \frac{1 / 2 \sum \limits_h \sum \limits_j ( a_{i j}+a_{j i}) (a_{j h} + a_{h j} ) (a_{h i}+a_{i h} )}{ d_i^{\text {Total }} ( d_i^{\text {Total }}-1 ) - 2 d_i^{\leftrightarrow} } \]

where \(t_i\) is computed as:

\[t_i^D = \frac{1}{2} \sum_h \sum_j (a_{i j}+a_{j i} ) (a_{j h}+a_{h j} )(a_{h i}+a_{i h} ) = \left(\mathbf{A}+\mathbf{A}^{\top}\right)_{i i}^3 \]

\(d_i^{\text{Total}}\) is the total degree, i.e.,

\[d_i^{\text{Total}} = d_i^{\text{In}} + d_i^{\text{Out}} = (\mathbf{A}^{\mathrm{T}}+\mathbf{A} )_{i \cdot} \mathbf{1} \]

and \(d_i^{\leftrightarrow}\) (named reciprocal degree in NetworkX package) is the number of bilateral edges between \(i\) and its neighbors. i.e., the number of nodes \(j\) for which both an edge \(i \to j\) and an edge \(j \to i\) exist) is computed as:

\[d_i^{\leftrightarrow} = \sum_{j \neq i} a_{i j} a_{j i} =(\mathbf{A}^2 )_{i i} \]

For weighted graph, \(t_i^D\) is replaced by \(\tilde{t}_i^D\).

  • \(\tilde{t}_i^D\) is computed by substituting the adjacency matrix \(\mathbf{A}\) with the weight matrix \(\mathbf{W}^{[1/3]}\)

Then, we have:

\[\tilde{C}_i^D =\frac{\tilde{t}_i^D}{T_i^D} = \frac{1 / 2 \left[\mathbf{W}^{[1/3]} + (\mathbf{W}^{\top})^{[1/3]} \right]_{i i}^3 }{ d_i^{\text {Total }} ( d_i^{\text {Total }}-1 ) - 2 d_i^{\leftrightarrow} } \]

Note: in nx.clustering() function in NetworkX libarary, the weight matrix \(\mathbf{W}\) will be normalized before computing clustering coefficient, i.e., \(\mathbf{W} = \frac{\mathbf{W} }{ \max w_{ij} }\)

代码

nx.clustering(G, nodes=None, weight=None)

nx.clustering(G, 0)
nx.clustering(G)
点击查看代码
def clustering_coefficient(graph, weight=None, norm_weight_matrix=False):
    
    import numpy as np
    import networkx as nx
    
    # adjacency matrix
    adj_mat = nx.to_numpy_array(graph, weight=None, nodelist=graph.nodes())
    # weighted matrix
    weight_mat = nx.to_numpy_array(graph, weight=weight, nodelist=graph.nodes())
    
    if norm_weight_matrix:
        weight_mat = weight_mat / np.max(weight_mat)
    
    # Undirected graph
    if not nx.is_directed(graph):
        # Unweigthed graph
        if (weight is None):
            t = np.diag(np.linalg.matrix_power(adj_mat, 3))
        else:
            weight_mat = np.power(weight_mat, 1/3)
            t = np.diag(np.linalg.matrix_power(weight_mat, 3))
        
        # degree
        d = np.sum(adj_mat, axis=1)
        
        cluc = t / (d * (d - 1))
        
    # Directed graph
    else:
        # Unweigthed graph
        if (weight is None):
            t = 1 / 2 * np.diag(np.linalg.matrix_power(adj_mat + adj_mat.T, 3))
        
        else: # Weigthed graph
            weight_mat = np.power(weight_mat, 1/3)
            t = 1 / 2 * np.diag(np.linalg.matrix_power(weight_mat + weight_mat.T, 3))
        
        # degree
        d_tot = np.sum(adj_mat + adj_mat.T, axis=1)
        d_rep = np.diag(np.linalg.matrix_power(adj_mat, 2))
        
        cluc = t / (d_tot * (d_tot - 1) - 2 * d_rep) 
    
    cluc = dict(zip(list(graph.nodes()), cluc))
    
    return cluc

References

Wikipedia

Hamilton, W. L. (2020). Graph representation learning. Morgan & Claypool Publishers.

Newman, M. E. J. (2010). Networks: An introduction. Oxford University Press.

posted @ 2023-01-29 22:22  veager  阅读(82)  评论(0)    收藏  举报