Implement Graph Neural Networks Model

1 Graph Convolutional Networks (GCN)

1.1 Formulation

The propagation function \(f\):

\[\mathbf{H}^{(l+1)} = f \left( \mathbf{H}^{(l)}, \mathbf{A} \right) \triangleq \sigma \left( \widehat{\mathbf{D}}^{-\frac{1}{2}} \, \widehat{\mathbf{A}} \, \widehat{\mathbf{D}}^{-\frac{1}{2}} \, \mathbf{H}^{(l)} \, \mathbf{W}^{(l)} \right) \]

where: \(\widehat{\mathbf{A}} = \mathbf{A} + \mathbf{I}\)
- \(\mathbf{A} \in \mathbb{R}^{N \times N}\) is the adjacency matrix
- \(\mathbf{I}\) is the identity matrix
\(\widehat{\mathbf{D}}\) is the diagonal node degree matrix of \(\widehat{\mathbf{A}}\).
\(\mathbf{H}^{(l)} \in \mathbb{R}^{N \times F^{(l)}}\) is the input feature matrix of the layer \(l\)
- \(N\) is the number of nodes
- \(F^{(l)}\) is the number of input features for each node
\(\mathbf{W}\) is the weight matrix, is the trainable parameters
\(\sigma(\cdot)\) is the ReLU activation function

2. Graph SAGE (SAmple and aggreGatE)

3. ChebNet

4 Graph Attention Networks (GAT)

4.1 Formulation

(1) Simple linear transformation

to obtain sufficient expressive power to transform the input features into higher level features

\[\boldsymbol{z}_i^{(l)} = \mathbf{W}^{(l)} \boldsymbol{h}_i^{(l)} \]

where: \(\mathbf{W}^{(l)} \in \mathbb{R}^{F' \times F}\) is the learnable weight matrix
\(\boldsymbol{h}_i^{(l)} \in \mathbb{R}^{F}\) is the input of layer \(l\)

(2) Attention Coefficients: compute a pair-wise un-normalized attention score between two neighbors.

is the most important step
additive attention, in contrast with the dot-product attention used for the Transformer model
first, concatenates ( denoted as \(||\) ) the \(\boldsymbol{z}\) embeddings of the two nodes
then, takes a dot product of it and a learnable weight vector \(\boldsymbol{a}^{(l)}\)
in the end, applies a \(\text{LeakyReLU}\)

\[e_{ij}^{(l)} = \text{LeakyReLU} \left( \left( \boldsymbol{a}^{(l)} \right)^{\top} \left[ \boldsymbol{z}_i^{(l)} \Big| \Big| \boldsymbol{z}_j^{(l)} \right] \right) \]

Note: The graph structure is injected into the mechanism by performing masked attention

only compute neighborhood of node \(i\), \(N(i)\),
these will be exactly the first-order neighbors of (including ).

(3) Softmax

makes coefficients easily comparable across different nodes, we normalize them across all choices of \(j\) using the softmax function

\[\alpha_{ij}^{(l)} = \frac{ \exp \left( e_{ij}^{(l)} \right)}{ \sum \limits_{k \in \mathcal{N}(i)}^{} \exp \left(e_{ik}^{(l)} \right)} \]

(4) Aggregation

is similar to GCN.
The embeddings from neighbors are aggregated together, scaled by the attention scores.

\[\boldsymbol{h}_i^{(l+1)} = \sigma \left( \sum_{j\in \mathcal{N}(i)} {\alpha^{(l)}_{ij} \,\boldsymbol{z}^{(l)}_j } \right) \]

4.2 Multi-head Attention

Analogous to multiple channels in ConvNet, GAT introduces multi-head attention to enrich the model capacity and to stabilize the learning process.

Suppose \(K\) independent attention mechanisms execute the transformation of (4), and then their outputs can be combined in 2 ways depending on the use:

Concatenation

in this setting, the final returned output, \(\boldsymbol{h}^{(l+1) \, \prime}_{i}\), will consist of features \(KF'\) ( rather than \(F'\) ) for each node.

\[\boldsymbol{h}^{(l+1)}_{i} = \bigg| \bigg|_{k=1}^{K} \ \sigma \left( \sum_{j \in \mathcal{N}(i)} \alpha_{ij}^{k} \ \mathbf{W}^{k} \boldsymbol{h}^{(l)}_{j} \right) \ \in \mathbb{R}^{KF'} \]

Average

If we perform multi-head attention on the final (prediction) layer of the network, concatenation is no longer sensible
instead, averaging is employed, and delay applying the final nonlinearity (usually a softmax or logistic sigmoid for classification problems) until then:

\[\boldsymbol{h}_{i}^{(l+1)} = \sigma \left( \frac{1}{K} \sum_{k=1}^{K} \sum_{j \in \mathcal{N}(i)} \alpha_{ij}^{k} \ \mathbf{W}^{k} \boldsymbol{h}^{(l)}_{j} \right) \ \in \mathbb{R}^{F'} \]

Thus,

concatenationfor intermediary layers
and average for the final layer

参考资料

Graph Representation Learning, github, website

posted @ 2022-07-25 20:49 veager 阅读(68) 评论(0) 收藏举报

刷新页面返回顶部