Machine Learning with Graphs : 1. Introduction

Stanford CS224W : Machine Learning with Graphs, Fall 2021:

  • 1 Introduction; Machine Learning for Graphs

  • site

1. Classic Graph ML tasks

Node classification: Predict a property of a node

  • Example: Categorize online users / items

Link prediction: Predict whether there are missing links between two nodes

  • Example: Knowledge graph completion

Graph classification: Categorize different graphs

  • Example: Molecule property prediction

Clustering: Detect if nodes form a community

  • Example: Social circle detection

Other tasks:

  • Graph generation: Drug discovery

  • Graph evolution: Physical simulation

Node-level ML tasks

Edge-level ML tasks

Graph-level ML tasks

2. Choice of Graph Representation

2.1 Components of a Network

Objects: nodes, vertices \(N\)

Interactions: links, edges \(E\)

System: network, graph, \(G\)

2.2 Types of Graph

(1) Directed vs. Undirected Graphs

Undirected Graphs: Links are undirected (symmetrical, reciprocal)

Directed Graphs: Links are directed (arcs)

(2) Unweighted vs. Weighted Graphs

(3) Heterogeneous Graph

A heterogeneous graph is defined as:

\[G = (V, E, R, T) \]

  • Nodes with node types \(v_i \in V\)

  • Edges with relation types \((v_i, r, v_j) \in R\)

  • Node type \(T(v_i)\)

  • Relation type \(r \in R\)

(4) Bipartite Graph

Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in \(U\) to one in \(V\); that is, \(U\) and \(V\) are independent sets.

Examples:

  • Authors-to-Papers (they authored)

  • Actors-to-Movies (they appeared in)

  • Users-to-Movies (they rated)

  • Recipes-to-Ingredients (they contain)

"Folded" networks:

  • Author collaboration networks

  • Movie co-rating networks

(5) Connected (undirected) graph

Connected (undirected) graph : Any two vertices can be joined by a path.

Disconnected graph : A disconnected graph is made up by two or more connected components.

Giant Component : Largest Component

Isolated node :

Connectivity : The adjacency matrix of a network with several components can be written in a block- diagonal form, so that nonzero elements are confined to
squares, with all other elements being zero.

Connectivity of Directed Graphs :

  • Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g., A-B path and B-A path).

  • Weakly connected directed graph: is connected if we disregard the edge directions.

Strongly connected components (SCCs) can be identified, but not every node is part of a nontrivial strongly connected component.

  • In-component: nodes that can reach the SCC

  • Out-component: nodes that can be reached from the SCC.

2.3 Node Degrees

Node degree, \(k_i\) : the number of edges adjacent to node \(i\)

Average degree, \(\bar{k}\):

\[\bar{k} = \frac{1}{N} \sum_{i=1}^{N} k_i = \frac{2E}{N} \]

where \(N\) is the number of nodes and \(E\) is the number of egdes.

In directed networks we define an in-degree and out-degree. The (total) degree of a node is the sum of in- and out-degrees.

Implement by NetworkX

Graph.degree()

DiGraph.degree():对于有向图,node 的 degree 为 in_degree + out_degree

DiGraph.in_degree():仅用于有向图

DiGraph.out_degree():仅用于有向图

主要参数:

  • nbunch (single node, container, or all nodes (default= all nodes))

  • weight (string or None, optional (default=None)) weight 的 label 名

返回: int or DegreeView

实例: node degree

G.degree(0)  # node 0 的 degree
G.degree()
G.in_degree()
G.out_degree()

2.4 Adjacency Matrix

Adjacency Matrix: \(A \in \mathbb{R}^{N \times N}\), \(A_{ij} = 1\) if there is a link from node \(i\) to node \(j\); otherwise, \(A_{ij} = 0\).

Note that the adjacency matrix of a undirected graph is symmetric while for a directed graph (right) the matrix is not symmetric.

Implement by NetworkX

根据 adjacency matrix 创建 graph 对象,参考 site

将 graph 转换为 adjacency matrix:

主要参数:

  • G (graph)

  • nodelistlist (optional):

    • djacency matrix 行和列所代表的点的顺序;

    • 如果为 None 则按照 G.nodes() 的顺序

  • dtype (NumPy data type, optional):

  • order ({'C', 'F'}, optional):

  • multigraph_weight (callable, optional): 如何处理 MultiGraph 平行 edge 的 weight,默认为 sum。

  • weight (string or None optional, default = 'weight'):

    • graph 中 weight 的 label,其所对应的边的值将会作为 adjacency matrix 的元素。

    • 如果是 None 或者不存在该属性(但是存在 edge),则所对应的 adjacency matrix 的元素为 1。

  • nonedge (array_like, default = 0.0):没有 edge 所对应的 adjacency matrix 的元素值

实例:

G = nx.Graph()
G.add_edges_from([(0, 1, {'weight' : 10}),
                  (1, 2, {'cost' : 5}),
                  (2, 3, {'weight' : 3, 'cost' : -4.0})])

dtype = np.dtype([("weight", int), ("cost", float)])

A = nx.to_numpy_array(G, dtype=dtype, weight=None)

print(type(A))
print(A)
print(A["weight"])
print(A["cost"])
A = nx.adjacency_matrix(G, weight=None)
print(A.toarray())
A = nx.adjacency_matrix(G, weight="weight")
print(A.toarray())
A = nx.adjacency_matrix(G, weight="cost")
print(A.toarray())

2.5 Edge list

nx.to_pandas_edgelist() 将边转为 edge list。

posted @ 2022-07-08 20:05  veager  阅读(69)  评论(0)    收藏  举报