2. Graph Conv 的作用

The multiplication of the adjacency matrix \(\textbf{A}\) with the feature matrix \(\textbf{X}\) in the GraphConv layer is a crucial operation in Graph Convolutional Networks (GCNs). This operation performs a localized, weighted aggregation of node features from each node's neighbors. Here's a detailed explanation of why this is done and what it accomplishes:

GraphConv 层中的邻接矩阵 \(\textbf{A}\) 与特征矩阵 \(\textbf{X}\) 的乘法是图卷积网络(GCN)中的关键操作。此操作对来自每个节点的邻居的节点特征执行局部加权聚合。以下详细解释了为什么这样做以及它实现了什么:

GraphConv 层中的邻接矩阵与节点特征矩阵的乘法执行 GCN 中邻居聚合的关键操作。

这允许每个节点根据其邻居的特征更新其特征,从而通过图有效地传播信息并捕获图的局部结构。
此操作与权重变换和可选的标准化相结合,使网络能够学习节点及其关系的有意义的表示。

Purpose of Adjacency Matrix Multiplication

  1. Neighbor Aggregation:

    • In a graph, the features of a node should be influenced by the features of its neighboring nodes. The adjacency matrix \(\textbf{A}\) encodes the connections between nodes, where \(\textbf{A}_{ij}\) is non-zero if there is an edge between node \(i\) and node \(j\).在图中,节点的特征应该受到其相邻节点的特征的影响。邻接矩阵 \(\textbf{A}\) 对节点之间的连接进行编码,如果节点 \(i\) 和节点 \(j\) 不为零> .
    • When we multiply \(\textbf{A}\) with \(\textbf{X}\), each node's feature vector is updated to be a weighted sum of the feature vectors of its neighbors.当我们将 \(\textbf{A}\)\(\textbf{X}\) 相乘时,每个节点的特征向量都会更新为其邻居特征向量的加权和。
  2. Information Propagation:

    • This operation allows information to propagate through the graph, enabling each node to gather information from its local neighborhood.此操作允许信息在图中传播,使每个节点能够从其本地邻居收集信息。
    • This is essential for capturing the local structure and feature distribution within the graph.这对于捕获图中的局部结构和特征分布至关重要。

Mathematical Interpretation

我们来分解一下 GraphConv 层的操作:

  1. Matrix Multiplication:

    • The first operation \(\textbf{Y} = \textbf{A} \cdot \textbf{X}\) where \(\textbf{Y}\) is the intermediate result, \(\textbf{A}\) is the adjacency matrix, and \(\textbf{X}\) is the input feature matrix.第一个操作 \(\textbf{Y} = \textbf{A} \cdot \textbf{X}\) ,其中 \(\textbf{Y}\) 是中间结果, \(\textbf{A}\) 是邻接矩阵, \(\textbf{X}\) 是输入特征矩阵。
    • For node \(i\), the feature vector \(\textbf{Y}_i\) is computed as: 对于节点 \(i\) ,特征向量 \(\textbf{Y}_i\) 计算如下: $$\textbf{Y}i = \sum(i)} \textbf{A}_{ij} \textbf{X}_j$$ where \(\mathcal{N}(i)\) denotes the neighbors of node \(i\) including itself (if self-loops are added). 其中 \(\mathcal{N}(i)\) 表示节点 \(i\) 的邻居,包括其自身(如果添加了自循环)。
  2. Self-Loop Addition:

    • If add_self is True, \(\textbf{X}\) is added to \(\textbf{Y}\). This ensures that the node's own features are also included in the aggregation: 如果 add_selfTrue ,则 \(\textbf{X}\) 将添加到 \(\textbf{Y}\) 中。这确保了节点自身的特征也包含在聚合中: $$\textbf{Y} = \textbf{A} \cdot \textbf{X} + \textbf{X}$$
  3. Weight Transformation:

    • The intermediate result \(\textbf{Y}\) is then transformed by a weight matrix \(\textbf{W}\): 然后将中间结果 \(\textbf{Y}\) 通过权重矩阵 \(\textbf{W}\) 进行转换: $$\textbf{Z} = \textbf{Y} \cdot \textbf{W}$$
    • This operation applies a linear transformation to the aggregated features, which is essential for learning the appropriate feature representation.此操作对聚合特征应用线性变换,这对于学习适当的特征表示至关重要。
  4. Bias Addition:

    • If a bias term is included, it is added to \(\textbf{Z}\): 如果包含偏差项,则会将其添加到 \(\textbf{Z}\) : $$\textbf{Z} = \textbf{Z} + \textbf{b}$$
  5. Normalization:

    • If normalize_embedding is True, the features are normalized: 如果 normalize_embeddingTrue ,则特征被标准化: $$\textbf{Z} = \frac{\textbf{Z}}{|\textbf{Z}|_2}$$
    • This ensures that the feature vectors have unit length, which can be useful in certain applications.这确保了特征向量具有单位长度,这在某些应用中很有用。

Example Code Walkthrough

以下是 GraphConv 类的简化演练:

class GraphConv(nn.Module):
    def __init__(self, input_dim, output_dim, add_self=False, normalize_embedding=False,
                 dropout=0.0, bias=True):
        super(GraphConv, self).__init__()
        self.add_self = add_self
        self.dropout = dropout
        if dropout > 0.001:
            self.dropout_layer = nn.Dropout(p=dropout)
        self.normalize_embedding = normalize_embedding
        self.input_dim = input_dim
        self.output_dim = output_dim

        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.weight = nn.Parameter(torch.FloatTensor(input_dim, output_dim)).to(device)

        if bias:
            self.bias = nn.Parameter(torch.FloatTensor(output_dim).to(device))
        else:
            self.bias = None

    def forward(self, x, adj):
        if self.dropout > 0.001:
            x = self.dropout_layer(x)
        
        # Matrix multiplication with adjacency matrix
        y = torch.matmul(adj, x)

        # Optionally add self-loop
        if self.add_self:
            y += x
        
        # Linear transformation
        y = torch.matmul(y, self.weight)
        
        # Add bias if present
        if self.bias is not None:
            y = y + self.bias
        
        # Normalize if required
        if self.normalize_embedding:
            y = F.normalize(y, p=2, dim=2)
        
        return y

2. GCNConv

2.0 code

class GCNConv(MessagePassing):
    r"""The graph convolutional operator from the `"Semi-supervised
    Classification with Graph Convolutional Networks"
    <https://arxiv.org/abs/1609.02907>`_ paper.

    .. math::
        \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
        \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

    where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
    adjacency matrix with inserted self-loops and
    :math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
    The adjacency matrix can include other values than :obj:`1` representing
    edge weights via the optional :obj:`edge_weight` tensor.

    Its node-wise formulation is given by:

    .. math::
        \mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
        \mathcal{N}(i) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
        \hat{d}_i}} \mathbf{x}_j

    with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
    :math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to target
    node :obj:`i` (default: :obj:`1.0`)

    Args:
        in_channels (int): Size of each input sample, or :obj:`-1` to derive
            the size from the first input(s) to the forward method.
        out_channels (int): Size of each output sample.
        improved (bool, optional): If set to :obj:`True`, the layer computes
            :math:`\mathbf{\hat{A}}` as :math:`\mathbf{A} + 2\mathbf{I}`.
            (default: :obj:`False`)
        cached (bool, optional): If set to :obj:`True`, the layer will cache
            the computation of :math:`\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
            \mathbf{\hat{D}}^{-1/2}` on first execution, and will use the
            cached version for further executions.
            This parameter should only be set to :obj:`True` in transductive
            learning scenarios. (default: :obj:`False`)
        add_self_loops (bool, optional): If set to :obj:`False`, will not add
            self-loops to the input graph. By default, self-loops will be added
            in case :obj:`normalize` is set to :obj:`True`, and not added
            otherwise. (default: :obj:`None`)
        normalize (bool, optional): Whether to add self-loops and compute
            symmetric normalization coefficients on-the-fly.
            (default: :obj:`True`)
        bias (bool, optional): If set to :obj:`False`, the layer will not learn
            an additive bias. (default: :obj:`True`)
        **kwargs (optional): Additional arguments of
            :class:`torch_geometric.nn.conv.MessagePassing`.

    Shapes:
        - **input:**
          node features :math:`(|\mathcal{V}|, F_{in})`,
          edge indices :math:`(2, |\mathcal{E}|)`
          or sparse matrix :math:`(|\mathcal{V}|, |\mathcal{V}|)`,
          edge weights :math:`(|\mathcal{E}|)` *(optional)*
        - **output:** node features :math:`(|\mathcal{V}|, F_{out})`
    """
    _cached_edge_index: Optional[OptPairTensor]
    _cached_adj_t: Optional[SparseTensor]

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        improved: bool = False,
        cached: bool = False,
        add_self_loops: Optional[bool] = None,
        normalize: bool = True,
        bias: bool = True,
        **kwargs,
    ):
        kwargs.setdefault('aggr', 'add')
        super().__init__(**kwargs)

        if add_self_loops is None:
            add_self_loops = normalize

        if add_self_loops and not normalize:
            raise ValueError(f"'{self.__class__.__name__}' does not support "
                             f"adding self-loops to the graph when no "
                             f"on-the-fly normalization is applied")

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.improved = improved
        self.cached = cached
        self.add_self_loops = add_self_loops
        self.normalize = normalize

        self._cached_edge_index = None
        self._cached_adj_t = None

        self.lin = Linear(in_channels, out_channels, bias=False,
                          weight_initializer='glorot')

        if bias:
            self.bias = Parameter(torch.empty(out_channels))
        else:
            self.register_parameter('bias', None)

        self.reset_parameters()

    def reset_parameters(self):
        super().reset_parameters()
        self.lin.reset_parameters()
        zeros(self.bias)
        self._cached_edge_index = None
        self._cached_adj_t = None

    def forward(self, x: Tensor, edge_index: Adj,
                edge_weight: OptTensor = None) -> Tensor:

        if isinstance(x, (tuple, list)):
            raise ValueError(f"'{self.__class__.__name__}' received a tuple "
                             f"of node features as input while this layer "
                             f"does not support bipartite message passing. "
                             f"Please try other layers such as 'SAGEConv' or "
                             f"'GraphConv' instead")

        if self.normalize:
            if isinstance(edge_index, Tensor):
                cache = self._cached_edge_index
                if cache is None:
                    edge_index, edge_weight = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops, self.flow, x.dtype)
                    if self.cached:
                        self._cached_edge_index = (edge_index, edge_weight)
                else:
                    edge_index, edge_weight = cache[0], cache[1]

            elif isinstance(edge_index, SparseTensor):
                cache = self._cached_adj_t
                if cache is None:
                    edge_index = gcn_norm(  # yapf: disable
                        edge_index, edge_weight, x.size(self.node_dim),
                        self.improved, self.add_self_loops, self.flow, x.dtype)
                    if self.cached:
                        self._cached_adj_t = edge_index
                else:
                    edge_index = cache

        x = self.lin(x)

        # propagate_type: (x: Tensor, edge_weight: OptTensor)
        out = self.propagate(edge_index, x=x, edge_weight=edge_weight)

        if self.bias is not None:
            out = out + self.bias

        return out

    def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

    def message_and_aggregate(self, adj_t: Adj, x: Tensor) -> Tensor:
        return spmm(adj_t, x, reduce=self.aggr)

2.1

The GCNConv class implements the graph convolutional operator described in the paper "Semi-supervised Classification with Graph Convolutional Networks" by Kipf and Welling. This operator is designed to perform convolution operations on graph-structured data.

Attributes and Their Roles

  1. in_channels:

    • Role: Size of each input feature vector.
    • Purpose: Determines the dimensionality of the input node features.
  2. out_channels:

    • Role: Size of each output feature vector.
    • Purpose: Determines the dimensionality of the output node features after the convolution operation.
  3. improved:

    • Role: Indicates whether to use an improved version of the adjacency matrix.
    • Purpose: If True, the adjacency matrix is modified to include double self-loops (A + 2I), which can improve performance in certain scenarios.
  4. cached:

    • Role: Indicates whether to cache the normalized adjacency matrix.
    • Purpose: Caches the normalization of the adjacency matrix for efficiency, particularly in transductive learning scenarios.
  5. add_self_loops:

    • Role: Indicates whether to add self-loops to the graph.
    • Purpose: Ensures that each node's own features are included in the convolution operation. Self-loops are added if normalization is enabled.
  6. normalize:

    • Role: Indicates whether to normalize the adjacency matrix.
    • Purpose: Normalizes the adjacency matrix using symmetric normalization, which is crucial for the GCN operator to perform correctly.
  7. bias:

    • Role: Indicates whether to include a learnable bias in the layer.
    • Purpose: Adds a bias term to the output of the linear transformation.
  8. lin:

    • Role: Linear transformation applied to the input node features.
    • Purpose: Transforms the input node features to the desired output dimensionality.
  9. _cached_edge_index and _cached_adj_t:

    • Role: Caches the normalized adjacency matrix and its corresponding edge index.
    • Purpose: Avoids recomputing the normalization in subsequent forward passes, improving efficiency.

Operation Mechanism of forward

The forward method processes the input graph data and applies the GCN convolution operation. Here's a detailed explanation of each step:

  1. Check for Tuple Input:

    • If the input x is a tuple or list, an error is raised because this layer does not support bipartite message passing.
  2. Normalization:

    • If normalize is True, the adjacency matrix (represented by edge_index) and the edge weights are normalized.
    • If edge_index is a tensor, it checks the cache. If not cached, it computes the normalized adjacency matrix using gcn_norm and caches it if cached is True.
    • If edge_index is a SparseTensor, a similar caching mechanism is applied.
  3. Linear Transformation:

    • Applies the linear transformation to the input features x using self.lin(x).
  4. Message Passing:

    • Calls self.propagate to perform message passing. This function aggregates messages from neighboring nodes according to the normalized adjacency matrix and edge weights.
    • The message method computes the messages to be passed to each node. If edge weights are provided, they are used to scale the messages.
  5. Bias Addition:

    • If a bias term is included (self.bias is not None), it is added to the output features.
  6. Return Output:

    • Returns the final output node features after the convolution operation.

Example Walkthrough of forward Method

Here's a step-by-step walkthrough with a hypothetical input:

  1. Inputs:

    • x: Tensor of shape (num_nodes, in_channels), representing node features.
    • edge_index: Tensor of shape (2, num_edges), representing the graph's adjacency list.
    • edge_weight: Optional tensor of shape (num_edges,), representing edge weights.
  2. Normalization:

    • If normalization is enabled and not cached, gcn_norm computes the normalized adjacency matrix and edge weights.
    • For example, gcn_norm might convert the adjacency matrix A to D^{-1/2} A D^{-1/2} where D is the degree matrix.
  3. Linear Transformation:

    • Applies a linear transformation to x, resulting in a tensor of shape (num_nodes, out_channels).
  4. Message Passing:

    • Calls self.propagate with the normalized adjacency matrix and transformed features.
    • The message method computes the weighted sum of neighboring node features for each node.
  5. Bias Addition:

    • Adds the bias term (if present) to the output features.
  6. Output:

    • Returns the updated node features, which now incorporate information from neighboring nodes.

By following these steps, the GCNConv class effectively performs a graph convolution operation, updating each node's features based on its neighbors' features in a normalized manner.

posted on 2024-05-31 17:48  Hello_zhengXinTang  阅读(360)  评论(0)    收藏  举报