Implement Graph Neural Networks Model
Implement Graph Neural Networks Model
1 Graph Convolutional Networks (GCN)
1.1 Formulation
The propagation function \(f\):
-
where: \(\widehat{\mathbf{A}} = \mathbf{A} + \mathbf{I}\)
-
\(\mathbf{A} \in \mathbb{R}^{N \times N}\) is the adjacency matrix
-
\(\mathbf{I}\) is the identity matrix
-
-
\(\widehat{\mathbf{D}}\) is the diagonal node degree matrix of \(\widehat{\mathbf{A}}\).
-
\(\mathbf{H}^{(l)} \in \mathbb{R}^{N \times F^{(l)}}\) is the input feature matrix of the layer \(l\)
-
\(N\) is the number of nodes
-
\(F^{(l)}\) is the number of input features for each node
-
-
\(\mathbf{W}\) is the weight matrix, is the trainable parameters
-
\(\sigma(\cdot)\) is the ReLU activation function
2. Graph SAGE (SAmple and aggreGatE)
3. ChebNet
4 Graph Attention Networks (GAT)
4.1 Formulation
(1) Simple linear transformation
- to obtain sufficient expressive power to transform the input features into higher level features
-
where: \(\mathbf{W}^{(l)} \in \mathbb{R}^{F' \times F}\) is the learnable weight matrix
-
\(\boldsymbol{h}_i^{(l)} \in \mathbb{R}^{F}\) is the input of layer \(l\)
(2) Attention Coefficients: compute a pair-wise un-normalized attention score between two neighbors.
-
is the most important step
-
additive attention, in contrast with the dot-product attention used for the Transformer model
-
first, concatenates ( denoted as \(||\) ) the \(\boldsymbol{z}\) embeddings of the two nodes
-
then, takes a dot product of it and a learnable weight vector \(\boldsymbol{a}^{(l)}\)
-
in the end, applies a \(\text{LeakyReLU}\)
Note: The graph structure is injected into the mechanism by performing masked attention
-
only compute neighborhood of node \(i\), \(N(i)\),
-
these will be exactly the first-order neighbors of (including ).
(3) Softmax
- makes coefficients easily comparable across different nodes, we normalize them across all choices of \(j\) using the softmax function
(4) Aggregation
-
is similar to GCN.
-
The embeddings from neighbors are aggregated together, scaled by the attention scores.
4.2 Multi-head Attention
Analogous to multiple channels in ConvNet, GAT introduces multi-head attention to enrich the model capacity and to stabilize the learning process.
Suppose \(K\) independent attention mechanisms execute the transformation of (4), and then their outputs can be combined in 2 ways depending on the use:
Concatenation
- in this setting, the final returned output, \(\boldsymbol{h}^{(l+1) \, \prime}_{i}\), will consist of features \(KF'\) ( rather than \(F'\) ) for each node.
Average
-
If we perform multi-head attention on the final (prediction) layer of the network, concatenation is no longer sensible
-
instead, averaging is employed, and delay applying the final nonlinearity (usually a softmax or logistic sigmoid for classification problems) until then:
Thus,
-
concatenationfor intermediary layers
-
and average for the final layer
参考资料
Graph Representation Learning, github, website

浙公网安备 33010602011771号