Machine Learning with Graphs : 1. Introduction
Stanford CS224W : Machine Learning with Graphs, Fall 2021:
-
1 Introduction; Machine Learning for Graphs
1. Classic Graph ML tasks
Node classification: Predict a property of a node
- Example: Categorize online users / items
Link prediction: Predict whether there are missing links between two nodes
- Example: Knowledge graph completion
Graph classification: Categorize different graphs
- Example: Molecule property prediction
Clustering: Detect if nodes form a community
- Example: Social circle detection
Other tasks:
-
Graph generation: Drug discovery
-
Graph evolution: Physical simulation
Node-level ML tasks
Edge-level ML tasks
Graph-level ML tasks
2. Choice of Graph Representation
2.1 Components of a Network
Objects: nodes, vertices \(N\)
Interactions: links, edges \(E\)
System: network, graph, \(G\)
2.2 Types of Graph
(1) Directed vs. Undirected Graphs
Undirected Graphs: Links are undirected (symmetrical, reciprocal)
Directed Graphs: Links are directed (arcs)
(2) Unweighted vs. Weighted Graphs
(3) Heterogeneous Graph
A heterogeneous graph is defined as:
-
Nodes with node types \(v_i \in V\)
-
Edges with relation types \((v_i, r, v_j) \in R\)
-
Node type \(T(v_i)\)
-
Relation type \(r \in R\)
(4) Bipartite Graph
Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in \(U\) to one in \(V\); that is, \(U\) and \(V\) are independent sets.
Examples:
-
Authors-to-Papers (they authored)
-
Actors-to-Movies (they appeared in)
-
Users-to-Movies (they rated)
-
Recipes-to-Ingredients (they contain)
"Folded" networks:
-
Author collaboration networks
-
Movie co-rating networks
(5) Connected (undirected) graph
Connected (undirected) graph : Any two vertices can be joined by a path.
Disconnected graph : A disconnected graph is made up by two or more connected components.
Giant Component : Largest Component
Isolated node :
Connectivity : The adjacency matrix of a network with several components can be written in a block- diagonal form, so that nonzero elements are confined to
squares, with all other elements being zero.
Connectivity of Directed Graphs :
-
Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g., A-B path and B-A path).
-
Weakly connected directed graph: is connected if we disregard the edge directions.
Strongly connected components (SCCs) can be identified, but not every node is part of a nontrivial strongly connected component.
-
In-component: nodes that can reach the SCC
-
Out-component: nodes that can be reached from the SCC.
2.3 Node Degrees
Node degree, \(k_i\) : the number of edges adjacent to node \(i\)
Average degree, \(\bar{k}\):
where \(N\) is the number of nodes and \(E\) is the number of egdes.
In directed networks we define an in-degree and out-degree. The (total) degree of a node is the sum of in- and out-degrees.
Implement by NetworkX
DiGraph.degree():对于有向图,node 的 degree 为 in_degree + out_degree
DiGraph.in_degree():仅用于有向图
DiGraph.out_degree():仅用于有向图
主要参数:
-
nbunch (single node, container, or all nodes (default= all nodes)) -
weight (string or None, optional (default=None))weight 的 label 名
返回: int or DegreeView
实例: node degree
G.degree(0) # node 0 的 degree
G.degree()
G.in_degree()
G.out_degree()
2.4 Adjacency Matrix
Adjacency Matrix: \(A \in \mathbb{R}^{N \times N}\), \(A_{ij} = 1\) if there is a link from node \(i\) to node \(j\); otherwise, \(A_{ij} = 0\).
Note that the adjacency matrix of a undirected graph is symmetric while for a directed graph (right) the matrix is not symmetric.
Implement by NetworkX
根据 adjacency matrix 创建 graph 对象,参考 site。
将 graph 转换为 adjacency matrix:
-
nx.adjacency_matrix(G, nodelist=None, dtype=None, weight='weight')- Return :
A (SciPy sparse matrix)。可通过A.toarray()或A.todense()方法转为 dense 矩阵
- Return :
主要参数:
-
G (graph) -
nodelistlist (optional):-
djacency matrix 行和列所代表的点的顺序;
-
如果为
None则按照G.nodes()的顺序
-
-
dtype (NumPy data type, optional): -
order ({'C', 'F'}, optional): -
multigraph_weight (callable, optional): 如何处理 MultiGraph 平行 edge 的 weight,默认为 sum。 -
weight (string or None optional, default = 'weight'):-
graph 中 weight 的 label,其所对应的边的值将会作为 adjacency matrix 的元素。
-
如果是
None或者不存在该属性(但是存在 edge),则所对应的 adjacency matrix 的元素为 1。
-
-
nonedge (array_like, default = 0.0):没有 edge 所对应的 adjacency matrix 的元素值
实例:
G = nx.Graph()
G.add_edges_from([(0, 1, {'weight' : 10}),
(1, 2, {'cost' : 5}),
(2, 3, {'weight' : 3, 'cost' : -4.0})])
dtype = np.dtype([("weight", int), ("cost", float)])
A = nx.to_numpy_array(G, dtype=dtype, weight=None)
print(type(A))
print(A)
print(A["weight"])
print(A["cost"])
A = nx.adjacency_matrix(G, weight=None)
print(A.toarray())
A = nx.adjacency_matrix(G, weight="weight")
print(A.toarray())
A = nx.adjacency_matrix(G, weight="cost")
print(A.toarray())
2.5 Edge list
nx.to_pandas_edgelist() 将边转为 edge list。

浙公网安备 33010602011771号