公告

2. Graph Conv 的作用

The multiplication of the adjacency matrix $\textbf{A}$ with the feature matrix $\textbf{X}$ in the GraphConv layer is a crucial operation in Graph Convolutional Networks (GCNs). This operation performs a localized, weighted aggregation of node features from each node's neighbors. Here's a detailed explanation of why this is done and what it accomplishes:

GraphConv 层中的邻接矩阵 $\textbf{A}$ 与特征矩阵 $\textbf{X}$ 的乘法是图卷积网络（GCN）中的关键操作。此操作对来自每个节点的邻居的节点特征执行局部加权聚合。以下详细解释了为什么这样做以及它实现了什么：

GraphConv 层中的邻接矩阵与节点特征矩阵的乘法执行 GCN 中邻居聚合的关键操作。

这允许每个节点根据其邻居的特征更新其特征，从而通过图有效地传播信息并捕获图的局部结构。
此操作与权重变换和可选的标准化相结合，使网络能够学习节点及其关系的有意义的表示。

Purpose of Adjacency Matrix Multiplication

Neighbor Aggregation:
- In a graph, the features of a node should be influenced by the features of its neighboring nodes. The adjacency matrix $\textbf{A}$ encodes the connections between nodes, where $\textbf{A}_{ij}$ is non-zero if there is an edge between node $i$ and node $j$.在图中，节点的特征应该受到其相邻节点的特征的影响。邻接矩阵 $\textbf{A}$ 对节点之间的连接进行编码，如果节点 $i$ 和节点 $j$ 不为零> .
- When we multiply $\textbf{A}$ with $\textbf{X}$, each node's feature vector is updated to be a weighted sum of the feature vectors of its neighbors.当我们将 $\textbf{A}$ 与 $\textbf{X}$ 相乘时，每个节点的特征向量都会更新为其邻居特征向量的加权和。
Information Propagation:
- This operation allows information to propagate through the graph, enabling each node to gather information from its local neighborhood.此操作允许信息在图中传播，使每个节点能够从其本地邻居收集信息。
- This is essential for capturing the local structure and feature distribution within the graph.这对于捕获图中的局部结构和特征分布至关重要。

Mathematical Interpretation

我们来分解一下 GraphConv 层的操作：

Matrix Multiplication:
- The first operation $\textbf{Y} = \textbf{A} \cdot \textbf{X}$ where $\textbf{Y}$ is the intermediate result, $\textbf{A}$ is the adjacency matrix, and $\textbf{X}$ is the input feature matrix.第一个操作 $\textbf{Y} = \textbf{A} \cdot \textbf{X}$ ，其中 $\textbf{Y}$ 是中间结果， $\textbf{A}$ 是邻接矩阵， $\textbf{X}$ 是输入特征矩阵。
- For node $i$, the feature vector $\textbf{Y}_i$ is computed as: 对于节点 $i$ ，特征向量 $\textbf{Y}_i$ 计算如下： $$\textbf{Y}i = \sum(i)} \textbf{A}_{ij} \textbf{X}_j$$ where $\mathcal{N}(i)$ denotes the neighbors of node $i$ including itself (if self-loops are added). 其中 $\mathcal{N}(i)$ 表示节点 $i$ 的邻居，包括其自身（如果添加了自循环）。
Self-Loop Addition:
- If add_self is True, $\textbf{X}$ is added to $\textbf{Y}$. This ensures that the node's own features are also included in the aggregation: 如果 add_self 是 True ，则 $\textbf{X}$ 将添加到 $\textbf{Y}$ 中。这确保了节点自身的特征也包含在聚合中： $$\textbf{Y} = \textbf{A} \cdot \textbf{X} + \textbf{X}$$
Weight Transformation:
- The intermediate result $\textbf{Y}$ is then transformed by a weight matrix $\textbf{W}$: 然后将中间结果 $\textbf{Y}$ 通过权重矩阵 $\textbf{W}$ 进行转换： $$\textbf{Z} = \textbf{Y} \cdot \textbf{W}$$
- This operation applies a linear transformation to the aggregated features, which is essential for learning the appropriate feature representation.此操作对聚合特征应用线性变换，这对于学习适当的特征表示至关重要。
Bias Addition:
- If a bias term is included, it is added to $\textbf{Z}$: 如果包含偏差项，则会将其添加到 $\textbf{Z}$ ： $$\textbf{Z} = \textbf{Z} + \textbf{b}$$
Normalization:
- If normalize_embedding is True, the features are normalized: 如果 normalize_embedding 是 True ，则特征被标准化： $$\textbf{Z} = \frac{\textbf{Z}}{|\textbf{Z}|_2}$$
- This ensures that the feature vectors have unit length, which can be useful in certain applications.这确保了特征向量具有单位长度，这在某些应用中很有用。

Example Code Walkthrough

以下是 GraphConv 类的简化演练：

class GraphConv(nn.Module):
    def __init__(self, input_dim, output_dim, add_self=False, normalize_embedding=False,
                 dropout=0.0, bias=True):
        super(GraphConv, self).__init__()
        self.add_self = add_self
        self.dropout = dropout
        if dropout > 0.001:
            self.dropout_layer = nn.Dropout(p=dropout)
        self.normalize_embedding = normalize_embedding
        self.input_dim = input_dim
        self.output_dim = output_dim

        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.weight = nn.Parameter(torch.FloatTensor(input_dim, output_dim)).to(device)

        if bias:
            self.bias = nn.Parameter(torch.FloatTensor(output_dim).to(device))
        else:
            self.bias = None

    def forward(self, x, adj):
        if self.dropout > 0.001:
            x = self.dropout_layer(x)
        
        # Matrix multiplication with adjacency matrix
        y = torch.matmul(adj, x)

        # Optionally add self-loop
        if self.add_self:
            y += x
        
        # Linear transformation
        y = torch.matmul(y, self.weight)
        
        # Add bias if present
        if self.bias is not None:
            y = y + self.bias
        
        # Normalize if required
        if self.normalize_embedding:
            y = F.normalize(y, p=2, dim=2)
        
        return y

2. GCNConv

2.0 code

class GCNConv(MessagePassing):
    r"""The graph convolutional operator from the `"Semi-supervised
    Classification with Graph Convolutional Networks"
    <https://arxiv.org/abs/1609.02907>`_ paper.

    .. math::
        \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
        \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

    where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
    adjacency matrix with inserted self-loops and
    :math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
    The adjacency matrix can include other values than :obj:`1` representing
    edge weights via the optional :obj:`edge_weight` tensor.

    Its node-wise formulation is given by:

    .. math::
        \mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
        \mathcal{N}(i) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
        \hat{d}_i}} \mathbf{x}_j

    with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
    :math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to target
    node :obj:`i` (default: :obj:`1.0`)

    Args:
        in_channels (int): Size of each input sample, or :obj:`-1` to derive
            the size from the first input(s) to the forward method.
        out_channels (int): Size of each output sample.
        improved (bool, optional): If set to :obj:`True`, the layer computes
            :math:`\mathbf{\hat{A}}` as :math:`\mathbf{A} + 2\mathbf{I}`.
            (default: :obj:`False`)
        cached (bool, optional): If set to :obj:`True`, the layer will cache
            the computation of :math:`\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
            \mathbf{\hat{D}}^{-1/2}` on first execution, and will use the
            cached version for further executions.
            This parameter should only be set to :obj:`True` in transductive
            learning scenarios. (default: :obj:`False`)
        add_self_loops (bool, optional): If set to :obj:`False`, will not add
            self-loops to the input graph. By default, self-loops will be added
            in case :obj:`normalize` is set to :obj:`True`, and not added
            otherwise. (default: :obj:`None`)
        normalize (bool, optional): Whether to add self-loops and compute
            symmetric normalization coefficients on-the-fly.
            (default: :obj:`True`)
        bias (bool, optional): If set to :obj:`False`, the layer will not learn
            an additive bias. (default: :obj:`True`)
        **kwargs (optional): Additional arguments of
            :class:`torch_geometric.nn.conv.MessagePassing`.

    Shapes:
        - **input:**
          node features :math:`(|\mathcal{V}|, F_{in})`,
          edge indices :math:`(2, |\mathcal{E}|)`
          or sparse matrix :math:`(|\mathcal{V}|, |\mathcal{V}|)`,
          edge weights :math:`(|\mathcal{E}|)` *(optional)*
        - **output:** node features :math:`(|\mathcal{V}|, F_{out})`
    """
    _cached_edge_index: Optional[OptPairTensor]
    _cached_adj_t: Optional[SparseTensor]

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        improved: bool = False,
        cached: bool = False,
        add_self_loops: Optional[bool] = None,
        normalize: bool = True,
        bias: bool = True,
        **kwargs,
    ):
        kwargs.setdefault('aggr', 'add')
        super().__init__(**kwargs)

        if add_self_loops is None:
            add_self_loops = normalize

        if add_self_loops and not normalize:
            raise ValueError(f"'{self.__class__.__name__}' does not support "
                             f"adding self-loops to the graph when no "
                             f"on-the-fly normalization is applied")

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.improved = improved
        self.cached = cached
        self.add_self_loops = add_self_loops
        self.normalize = normalize

        self._cached_edge_index = None
        self._cached_adj_t = None

        self.lin = Linear(in_channels, out_channels, bias=False,
                          weight_initializer='glorot')

        if bias:
            self.bias = Parameter(torch.empty(out_channels))
        else:
            self.register_parameter('bias', None)

        self.reset_parameters()

    def reset_parameters(self):
        super().reset_parameters()
        self.lin.reset_parameters()
        zeros(self.bias)
        self._cached_edge_index = None
        self._cached_adj_t = None

    def forward(self, x: Tensor, edge_index: Adj,
                edge_weight: OptTensor = None) -> Tensor:

        if isinstance(x, (tuple, list)):
            raise ValueError(f"'{self.__class__.__name__}' received a tuple "
                             f"of node features as input while this layer "
                             f"does not support bipartite message passing. "
                             f"Please try other layers such as 'SAGEConv' or "
                             f"'GraphConv' instead")

        if self.normalize:
            if isinstance(edge_index, Tensor):
                cache = self._cached_edge_index
                if cache is None:
                    edge_index, edge_weight = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops, self.flow, x.dtype)
                    if self.cached:
                        self._cached_edge_index = (edge_index, edge_weight)
                else:
                    edge_index, edge_weight = cache[0], cache[1]

            elif isinstance(edge_index, SparseTensor):
                cache = self._cached_adj_t
                if cache is None:
                    edge_index = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops, self.flow, x.dtype)
                    if self.cached:
                        self._cached_adj_t = edge_index
                else:
                    edge_index = cache

        x = self.lin(x)

        # propagate_type: (x: Tensor, edge_weight: OptTensor)
        out = self.propagate(edge_index, x=x, edge_weight=edge_weight)

        if self.bias is not None:
            out = out + self.bias

        return out

    def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

    def message_and_aggregate(self, adj_t: Adj, x: Tensor) -> Tensor:
        return spmm(adj_t, x, reduce=self.aggr)

2.1

The GCNConv class implements the graph convolutional operator described in the paper "Semi-supervised Classification with Graph Convolutional Networks" by Kipf and Welling. This operator is designed to perform convolution operations on graph-structured data.

Attributes and Their Roles

in_channels:
- Role: Size of each input feature vector.
- Purpose: Determines the dimensionality of the input node features.
out_channels:
- Role: Size of each output feature vector.
- Purpose: Determines the dimensionality of the output node features after the convolution operation.
improved:
- Role: Indicates whether to use an improved version of the adjacency matrix.
- Purpose: If True, the adjacency matrix is modified to include double self-loops (A + 2I), which can improve performance in certain scenarios.
cached:
- Role: Indicates whether to cache the normalized adjacency matrix.
- Purpose: Caches the normalization of the adjacency matrix for efficiency, particularly in transductive learning scenarios.
add_self_loops:
- Role: Indicates whether to add self-loops to the graph.
- Purpose: Ensures that each node's own features are included in the convolution operation. Self-loops are added if normalization is enabled.
normalize:
- Role: Indicates whether to normalize the adjacency matrix.
- Purpose: Normalizes the adjacency matrix using symmetric normalization, which is crucial for the GCN operator to perform correctly.
bias:
- Role: Indicates whether to include a learnable bias in the layer.
- Purpose: Adds a bias term to the output of the linear transformation.
lin:
- Role: Linear transformation applied to the input node features.
- Purpose: Transforms the input node features to the desired output dimensionality.
_cached_edge_index and _cached_adj_t:
- Role: Caches the normalized adjacency matrix and its corresponding edge index.
- Purpose: Avoids recomputing the normalization in subsequent forward passes, improving efficiency.

Operation Mechanism of `forward`

The forward method processes the input graph data and applies the GCN convolution operation. Here's a detailed explanation of each step:

Check for Tuple Input:
- If the input x is a tuple or list, an error is raised because this layer does not support bipartite message passing.
Normalization:
- If normalize is True, the adjacency matrix (represented by edge_index) and the edge weights are normalized.
- If edge_index is a tensor, it checks the cache. If not cached, it computes the normalized adjacency matrix using gcn_norm and caches it if cached is True.
- If edge_index is a SparseTensor, a similar caching mechanism is applied.
Linear Transformation:
- Applies the linear transformation to the input features x using self.lin(x).
Message Passing:
- Calls self.propagate to perform message passing. This function aggregates messages from neighboring nodes according to the normalized adjacency matrix and edge weights.
- The message method computes the messages to be passed to each node. If edge weights are provided, they are used to scale the messages.
Bias Addition:
- If a bias term is included (self.bias is not None), it is added to the output features.
Return Output:
- Returns the final output node features after the convolution operation.

Example Walkthrough of `forward` Method

Here's a step-by-step walkthrough with a hypothetical input:

Inputs:
- x: Tensor of shape (num_nodes, in_channels), representing node features.
- edge_index: Tensor of shape (2, num_edges), representing the graph's adjacency list.
- edge_weight: Optional tensor of shape (num_edges,), representing edge weights.
Normalization:
- If normalization is enabled and not cached, gcn_norm computes the normalized adjacency matrix and edge weights.
- For example, gcn_norm might convert the adjacency matrix A to D^{-1/2} A D^{-1/2} where D is the degree matrix.
Linear Transformation:
- Applies a linear transformation to x, resulting in a tensor of shape (num_nodes, out_channels).
Message Passing:
- Calls self.propagate with the normalized adjacency matrix and transformed features.
- The message method computes the weighted sum of neighboring node features for each node.
Bias Addition:
- Adds the bias term (if present) to the output features.
Output:
- Returns the updated node features, which now incorporate information from neighboring nodes.

By following these steps, the GCNConv class effectively performs a graph convolution operation, updating each node's features based on its neighbors' features in a normalized manner.

posted on 2024-05-31 17:48 Hello_zhengXinTang 阅读(360) 评论(0) 收藏举报

刷新页面返回顶部