2. Graph Conv 的作用
The multiplication of the adjacency matrix \(\textbf{A}\) with the feature matrix \(\textbf{X}\) in the GraphConv layer is a crucial operation in Graph Convolutional Networks (GCNs). This operation performs a localized, weighted aggregation of node features from each node's neighbors. Here's a detailed explanation of why this is done and what it accomplishes:
GraphConv 层中的邻接矩阵 \(\textbf{A}\) 与特征矩阵 \(\textbf{X}\) 的乘法是图卷积网络(GCN)中的关键操作。此操作对来自每个节点的邻居的节点特征执行局部加权聚合。以下详细解释了为什么这样做以及它实现了什么:
GraphConv 层中的邻接矩阵与节点特征矩阵的乘法执行 GCN 中邻居聚合的关键操作。
这允许每个节点根据其邻居的特征更新其特征,从而通过图有效地传播信息并捕获图的局部结构。
此操作与权重变换和可选的标准化相结合,使网络能够学习节点及其关系的有意义的表示。
Purpose of Adjacency Matrix Multiplication
-
Neighbor Aggregation:
- In a graph, the features of a node should be influenced by the features of its neighboring nodes. The adjacency matrix \(\textbf{A}\) encodes the connections between nodes, where \(\textbf{A}_{ij}\) is non-zero if there is an edge between node \(i\) and node \(j\).在图中,节点的特征应该受到其相邻节点的特征的影响。邻接矩阵 \(\textbf{A}\) 对节点之间的连接进行编码,如果节点 \(i\) 和节点 \(j\) 不为零> .
- When we multiply \(\textbf{A}\) with \(\textbf{X}\), each node's feature vector is updated to be a weighted sum of the feature vectors of its neighbors.当我们将 \(\textbf{A}\) 与 \(\textbf{X}\) 相乘时,每个节点的特征向量都会更新为其邻居特征向量的加权和。
-
Information Propagation:
- This operation allows information to propagate through the graph, enabling each node to gather information from its local neighborhood.此操作允许信息在图中传播,使每个节点能够从其本地邻居收集信息。
- This is essential for capturing the local structure and feature distribution within the graph.这对于捕获图中的局部结构和特征分布至关重要。
Mathematical Interpretation
我们来分解一下 GraphConv 层的操作:
-
Matrix Multiplication:
- The first operation \(\textbf{Y} = \textbf{A} \cdot \textbf{X}\) where \(\textbf{Y}\) is the intermediate result, \(\textbf{A}\) is the adjacency matrix, and \(\textbf{X}\) is the input feature matrix.第一个操作 \(\textbf{Y} = \textbf{A} \cdot \textbf{X}\) ,其中 \(\textbf{Y}\) 是中间结果, \(\textbf{A}\) 是邻接矩阵, \(\textbf{X}\) 是输入特征矩阵。
- For node \(i\), the feature vector \(\textbf{Y}_i\) is computed as: 对于节点 \(i\) ,特征向量 \(\textbf{Y}_i\) 计算如下: $$\textbf{Y}i = \sum(i)} \textbf{A}_{ij} \textbf{X}_j$$ where \(\mathcal{N}(i)\) denotes the neighbors of node \(i\) including itself (if self-loops are added). 其中 \(\mathcal{N}(i)\) 表示节点 \(i\) 的邻居,包括其自身(如果添加了自循环)。
-
Self-Loop Addition:
- If
add_selfisTrue, \(\textbf{X}\) is added to \(\textbf{Y}\). This ensures that the node's own features are also included in the aggregation: 如果add_self是True,则 \(\textbf{X}\) 将添加到 \(\textbf{Y}\) 中。这确保了节点自身的特征也包含在聚合中: $$\textbf{Y} = \textbf{A} \cdot \textbf{X} + \textbf{X}$$
- If
-
Weight Transformation:
- The intermediate result \(\textbf{Y}\) is then transformed by a weight matrix \(\textbf{W}\): 然后将中间结果 \(\textbf{Y}\) 通过权重矩阵 \(\textbf{W}\) 进行转换: $$\textbf{Z} = \textbf{Y} \cdot \textbf{W}$$
- This operation applies a linear transformation to the aggregated features, which is essential for learning the appropriate feature representation.此操作对聚合特征应用线性变换,这对于学习适当的特征表示至关重要。
-
Bias Addition:
- If a bias term is included, it is added to \(\textbf{Z}\): 如果包含偏差项,则会将其添加到 \(\textbf{Z}\) : $$\textbf{Z} = \textbf{Z} + \textbf{b}$$
-
Normalization:
- If
normalize_embeddingisTrue, the features are normalized: 如果normalize_embedding是True,则特征被标准化: $$\textbf{Z} = \frac{\textbf{Z}}{|\textbf{Z}|_2}$$ - This ensures that the feature vectors have unit length, which can be useful in certain applications.这确保了特征向量具有单位长度,这在某些应用中很有用。
- If
Example Code Walkthrough
以下是 GraphConv 类的简化演练:
class GraphConv(nn.Module):
def __init__(self, input_dim, output_dim, add_self=False, normalize_embedding=False,
dropout=0.0, bias=True):
super(GraphConv, self).__init__()
self.add_self = add_self
self.dropout = dropout
if dropout > 0.001:
self.dropout_layer = nn.Dropout(p=dropout)
self.normalize_embedding = normalize_embedding
self.input_dim = input_dim
self.output_dim = output_dim
device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.weight = nn.Parameter(torch.FloatTensor(input_dim, output_dim)).to(device)
if bias:
self.bias = nn.Parameter(torch.FloatTensor(output_dim).to(device))
else:
self.bias = None
def forward(self, x, adj):
if self.dropout > 0.001:
x = self.dropout_layer(x)
# Matrix multiplication with adjacency matrix
y = torch.matmul(adj, x)
# Optionally add self-loop
if self.add_self:
y += x
# Linear transformation
y = torch.matmul(y, self.weight)
# Add bias if present
if self.bias is not None:
y = y + self.bias
# Normalize if required
if self.normalize_embedding:
y = F.normalize(y, p=2, dim=2)
return y
2. GCNConv
2.0 code
class GCNConv(MessagePassing):
r"""The graph convolutional operator from the `"Semi-supervised
Classification with Graph Convolutional Networks"
<https://arxiv.org/abs/1609.02907>`_ paper.
.. math::
\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},
where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
adjacency matrix with inserted self-loops and
:math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
The adjacency matrix can include other values than :obj:`1` representing
edge weights via the optional :obj:`edge_weight` tensor.
Its node-wise formulation is given by:
.. math::
\mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
\mathcal{N}(i) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
\hat{d}_i}} \mathbf{x}_j
with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
:math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to target
node :obj:`i` (default: :obj:`1.0`)
Args:
in_channels (int): Size of each input sample, or :obj:`-1` to derive
the size from the first input(s) to the forward method.
out_channels (int): Size of each output sample.
improved (bool, optional): If set to :obj:`True`, the layer computes
:math:`\mathbf{\hat{A}}` as :math:`\mathbf{A} + 2\mathbf{I}`.
(default: :obj:`False`)
cached (bool, optional): If set to :obj:`True`, the layer will cache
the computation of :math:`\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
\mathbf{\hat{D}}^{-1/2}` on first execution, and will use the
cached version for further executions.
This parameter should only be set to :obj:`True` in transductive
learning scenarios. (default: :obj:`False`)
add_self_loops (bool, optional): If set to :obj:`False`, will not add
self-loops to the input graph. By default, self-loops will be added
in case :obj:`normalize` is set to :obj:`True`, and not added
otherwise. (default: :obj:`None`)
normalize (bool, optional): Whether to add self-loops and compute
symmetric normalization coefficients on-the-fly.
(default: :obj:`True`)
bias (bool, optional): If set to :obj:`False`, the layer will not learn
an additive bias. (default: :obj:`True`)
**kwargs (optional): Additional arguments of
:class:`torch_geometric.nn.conv.MessagePassing`.
Shapes:
- **input:**
node features :math:`(|\mathcal{V}|, F_{in})`,
edge indices :math:`(2, |\mathcal{E}|)`
or sparse matrix :math:`(|\mathcal{V}|, |\mathcal{V}|)`,
edge weights :math:`(|\mathcal{E}|)` *(optional)*
- **output:** node features :math:`(|\mathcal{V}|, F_{out})`
"""
_cached_edge_index: Optional[OptPairTensor]
_cached_adj_t: Optional[SparseTensor]
def __init__(
self,
in_channels: int,
out_channels: int,
improved: bool = False,
cached: bool = False,
add_self_loops: Optional[bool] = None,
normalize: bool = True,
bias: bool = True,
**kwargs,
):
kwargs.setdefault('aggr', 'add')
super().__init__(**kwargs)
if add_self_loops is None:
add_self_loops = normalize
if add_self_loops and not normalize:
raise ValueError(f"'{self.__class__.__name__}' does not support "
f"adding self-loops to the graph when no "
f"on-the-fly normalization is applied")
self.in_channels = in_channels
self.out_channels = out_channels
self.improved = improved
self.cached = cached
self.add_self_loops = add_self_loops
self.normalize = normalize
self._cached_edge_index = None
self._cached_adj_t = None
self.lin = Linear(in_channels, out_channels, bias=False,
weight_initializer='glorot')
if bias:
self.bias = Parameter(torch.empty(out_channels))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
super().reset_parameters()
self.lin.reset_parameters()
zeros(self.bias)
self._cached_edge_index = None
self._cached_adj_t = None
def forward(self, x: Tensor, edge_index: Adj,
edge_weight: OptTensor = None) -> Tensor:
if isinstance(x, (tuple, list)):
raise ValueError(f"'{self.__class__.__name__}' received a tuple "
f"of node features as input while this layer "
f"does not support bipartite message passing. "
f"Please try other layers such as 'SAGEConv' or "
f"'GraphConv' instead")
if self.normalize:
if isinstance(edge_index, Tensor):
cache = self._cached_edge_index
if cache is None:
edge_index, edge_weight = gcn_norm( # yapf: disable
edge_index, edge_weight, x.size(self.node_dim),
self.improved, self.add_self_loops, self.flow, x.dtype)
if self.cached:
self._cached_edge_index = (edge_index, edge_weight)
else:
edge_index, edge_weight = cache[0], cache[1]
elif isinstance(edge_index, SparseTensor):
cache = self._cached_adj_t
if cache is None:
edge_index = gcn_norm( # yapf: disable
edge_index, edge_weight, x.size(self.node_dim),
self.improved, self.add_self_loops, self.flow, x.dtype)
if self.cached:
self._cached_adj_t = edge_index
else:
edge_index = cache
x = self.lin(x)
# propagate_type: (x: Tensor, edge_weight: OptTensor)
out = self.propagate(edge_index, x=x, edge_weight=edge_weight)
if self.bias is not None:
out = out + self.bias
return out
def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j
def message_and_aggregate(self, adj_t: Adj, x: Tensor) -> Tensor:
return spmm(adj_t, x, reduce=self.aggr)
2.1
The GCNConv class implements the graph convolutional operator described in the paper "Semi-supervised Classification with Graph Convolutional Networks" by Kipf and Welling. This operator is designed to perform convolution operations on graph-structured data.
Attributes and Their Roles
-
in_channels:- Role: Size of each input feature vector.
- Purpose: Determines the dimensionality of the input node features.
-
out_channels:- Role: Size of each output feature vector.
- Purpose: Determines the dimensionality of the output node features after the convolution operation.
-
improved:- Role: Indicates whether to use an improved version of the adjacency matrix.
- Purpose: If
True, the adjacency matrix is modified to include double self-loops (A + 2I), which can improve performance in certain scenarios.
-
cached:- Role: Indicates whether to cache the normalized adjacency matrix.
- Purpose: Caches the normalization of the adjacency matrix for efficiency, particularly in transductive learning scenarios.
-
add_self_loops:- Role: Indicates whether to add self-loops to the graph.
- Purpose: Ensures that each node's own features are included in the convolution operation. Self-loops are added if normalization is enabled.
-
normalize:- Role: Indicates whether to normalize the adjacency matrix.
- Purpose: Normalizes the adjacency matrix using symmetric normalization, which is crucial for the GCN operator to perform correctly.
-
bias:- Role: Indicates whether to include a learnable bias in the layer.
- Purpose: Adds a bias term to the output of the linear transformation.
-
lin:- Role: Linear transformation applied to the input node features.
- Purpose: Transforms the input node features to the desired output dimensionality.
-
_cached_edge_indexand_cached_adj_t:- Role: Caches the normalized adjacency matrix and its corresponding edge index.
- Purpose: Avoids recomputing the normalization in subsequent forward passes, improving efficiency.
Operation Mechanism of forward
The forward method processes the input graph data and applies the GCN convolution operation. Here's a detailed explanation of each step:
-
Check for Tuple Input:
- If the input
xis a tuple or list, an error is raised because this layer does not support bipartite message passing.
- If the input
-
Normalization:
- If
normalizeisTrue, the adjacency matrix (represented byedge_index) and the edge weights are normalized. - If
edge_indexis a tensor, it checks the cache. If not cached, it computes the normalized adjacency matrix usinggcn_normand caches it ifcachedisTrue. - If
edge_indexis aSparseTensor, a similar caching mechanism is applied.
- If
-
Linear Transformation:
- Applies the linear transformation to the input features
xusingself.lin(x).
- Applies the linear transformation to the input features
-
Message Passing:
- Calls
self.propagateto perform message passing. This function aggregates messages from neighboring nodes according to the normalized adjacency matrix and edge weights. - The
messagemethod computes the messages to be passed to each node. If edge weights are provided, they are used to scale the messages.
- Calls
-
Bias Addition:
- If a bias term is included (
self.biasis notNone), it is added to the output features.
- If a bias term is included (
-
Return Output:
- Returns the final output node features after the convolution operation.
Example Walkthrough of forward Method
Here's a step-by-step walkthrough with a hypothetical input:
-
Inputs:
x: Tensor of shape(num_nodes, in_channels), representing node features.edge_index: Tensor of shape(2, num_edges), representing the graph's adjacency list.edge_weight: Optional tensor of shape(num_edges,), representing edge weights.
-
Normalization:
- If normalization is enabled and not cached,
gcn_normcomputes the normalized adjacency matrix and edge weights. - For example,
gcn_normmight convert the adjacency matrixAtoD^{-1/2} A D^{-1/2}whereDis the degree matrix.
- If normalization is enabled and not cached,
-
Linear Transformation:
- Applies a linear transformation to
x, resulting in a tensor of shape(num_nodes, out_channels).
- Applies a linear transformation to
-
Message Passing:
- Calls
self.propagatewith the normalized adjacency matrix and transformed features. - The
messagemethod computes the weighted sum of neighboring node features for each node.
- Calls
-
Bias Addition:
- Adds the bias term (if present) to the output features.
-
Output:
- Returns the updated node features, which now incorporate information from neighboring nodes.
By following these steps, the GCNConv class effectively performs a graph convolution operation, updating each node's features based on its neighbors' features in a normalized manner.
浙公网安备 33010602011771号