Hypergraph_IM 写作素材
《Hypergraph Convolutional Recurrent Neural Network》
The most significant element of our model is the hypergraph convolution layer applied in each cell. The role of HGC is to use hypergraph \(G\) to update the node wise feature matrix \(Q\). In this section, we focus on describing how a hidden feature vector \(q_j\) for a single node \(v_j\) is updated by a single operation of the \(HGC\) layer, given the neighbor node features with their structural relationships represented in the hypergraph \(G\):
\begin{align}
q^\prime_j = \Phi^\text{HGC}(q_j)
\end{align}
where \(q^\prime_j\) is the updated feature vector. The procedure of node feature update is composed of three steps, as shown in Figure 4; the details of each step are described below.
\textbf{Node to edge aggregation.} The first step of hypergraph convolution is the node-to-edge (v-to-e) aggregation to construct the edge feature vector \(e_i\) by aggregating the feature vectors \(\{q_k|v_k \in \mathcal{H}_i\}\) of the nodes that belong to \(\mathcal{H}_i\):
\begin{align}
e_i = \text{AGGREGATE}_{(v2e)}({q_k| v_k \in \mathcal{H}_i})
\end{align}
where \(q_k\) is the feature vector for node \(v_k\). AGGREGATE denotes the aggregation function, which will be discussed in section \hl{4.2.2.} This aggregation is applied concurrently to all hyperedges \(\mathcal{H}_i\) in \(E\).
\textbf{Edge to node aggregation.} The second step is edge-to-node (e-to-v) aggregation in which each node aggregates the influences from the hyperedges containing the node. For each node \(v_j\), feature vectors of hyperedges containing the node \(v_j\) (i.e., \(\{e_k|v_j \in \mathcal{H}_k\}\)) are aggregated to compute the updated node feature vector \(u_j\) as:
\begin{align}
u_j = \text{AGGREGATE}_{e2v}({e_k|v_j \in \mathcal{H}_k})
\end{align}
Single node updating procedures are conducted simultaneously for all the nodes in \(V\).
\textbf{Node update.} The node update function receives the aggregated node-wise feature vector \(u_j\) and the original node-wise feature vector \(q_j\) as inputs to compute the updated node-wise feature \(q^\prime_j\). For the update function, we have applied the linear model to the inputs as:
\begin{align}
q^{\prime}j = W^{\top}(q_j || u_j) + b
\end{align}
where \(W\) and \(b\) indicate learnable weight and bias, respectively.
《HGNN+: General Hypergraph Neural Networks》 或 《Hypergraph Neural Networks》
Given a hypergraph \(\mathcal{G}=(V,\mathcal{E}, \Delta)\) with N vertices, since the hypergraph Laplacian \(\Delta\) is a \(N \times N\) positive semi-definite matrix, the eigen decomposition \(\Delta=\Phi\Lambda\Phi^\top\) can be employed to get the orthonormal eigen vectors \(\Phi=\text{diag}(\phi_1, \dots, \phi_N)\) and a diagonal matrix \(\Lambda = \text{diag}(\lambda_1, \dots, \lambda_N)\) containing corresponding non-negative eigenvalues. Then, the Fourier transform for a signal \(x=(x_1, \dots, x_N)\) in hypergraph is defined as Fourier bases and the eigenvalues are interpreted as frequencies. The spectral convolution of signal \(x\) and filter \(g\) can be denoted as
\begin{align}
g \star x = \Phi((\Phi^\top g \odot (\Phi^\top x)))= \Phi g(\Lambda) \Phi^\top x,
\end{align}
where \(\odot\) denotes the element-wise Hadamard product and \(g(\Lambda)=\text{diag}(g(\Lambda_1), \dots, g(\Lambda_n))\) is a function of the Forier coefficients. However, the computation cost in forward and inverse Fourier transform is \(O(n^2)\). To solve the problem, we can follow \hl{[49]} to parametrize \(g(\Lambda)\) with \(K\) order polynomials. Furthermore, we use the truncated Chebyshev expansion as one such polynomial. Chebyshv polynomials \(T_k(x)\) is recursively computed by \(T_k(x)=2x T_{k-1}(x)-T_{k-2}(x)\), with \(T_0(x)=1\) and \(T_1(x)=x\). Thus, the \(g(\Lambda)\) can be parametried as
\begin{align}
\label{eq:fouriercoefficients}
g \star x \approx \sum^K_{k=0} \theta_k T_k(\tilde{\Delta}) x,
\end{align}
where \(T_k(\tilde{\Lambda})\) is the Chebyshev polynomial of order \(k\) with scaled Laplacian \(\tilde{\Lambda}=\frac{2}{\Lambda_{\max}}\Lambda - I\). In Eq. \ref{eq:fouriercoefficients}, the expansive computation of Laplacian Eigen vectors is excluded and only matrix powers, additions and multiplications are included, which brings further improvement in computation complexity. We can further let \(K=1\) to limit the order of convolution operation due to that the Laplacian in hypergraph can already well represent the high-order correlation between nodes. It is also suggested in \hl{[2]} that \(\Lambda_{\max} \approx 2\) because of the scale adaptability of neural networks. Then, the convolution operation can be further simplified to
\begin{align}
g \star x \approx \theta_0 x - \theta_1 D_v{-1/2}HWD_eH^\top D^{-1/2}_v x,
\end{align}
where \(\theta_0\) and \(\theta_1\) is parameters of filters over all nodes. We further use a single parameter \(\theta\) to avoid the overfitting problem, which is defined as
\begin{align}
\begin{cases}
\theta_1 = -\frac{1}{2} \theta\
\theta_0 = \frac{1}{2}\theta D^{-1/2}_v H D^{-1}_e H^\top D^{-1/2}_v,
\end{cases}
\end{align}
Then, the convolution operation can be simplified to the following expression
\begin{align}
g \star x \approx \frac{1}{2} D^{-1/2}_v H(W + I)D^{-1}_e H^\top D^{-1/2}_v x \
\end{align}
where \((W+I)\) can be regarded as the weight of the hyperedges. \(W\) is initialized as an identity matrix, which means equal weights for all hyperedges.
When we have a hypergraph signal \(X^t\) for \(t\)-th layer, our hyperedge convolution layer HGNNConv can be formulated by
\begin{align}
X^{(t+1)} = \sigma(D^{-1/2}_v HWD^{-1}_e H^\top D^{-1/2}_v X^t \Theta),
\end{align}
where \(\Theta\) is the parameter to be learned during the training process. The filter \(\Theta\) is applied over the nodes in hypergraph to extract features. After convolution, we can obtain \(X^{t+1}\), which can be used for further learning.
《Multi-Behavior Hypergraph-Enhanced Transformer for Sequential Recommendation》
In this module, we introduce our hypergraph message passing paradigm with the convolutional layer, \hl{to capture the global multi-behavior} dependencies over time. The hypergraph convolutional layer generally involves two-stage information passing \hl{[2]}, i.e., node-hyperedge and hyperedge-node embedding propagation along with the hypergraph connection matrix \(M\) for refining \hl{item representations}. Particularly, we design our hypergraph convolutional layer as:
\begin{align}
X^{l+1} = D^{-1}_v \cdot M \cdot D^{-1}_e \cdot M^\top \cdot X^{(l)}
\end{align}
where \(X^{(l)}\) represents the \hl{item embeddings} encoded from the \(l\)-th layer of hypergraph convolution. Furthermore, \(D_v\) and \(D_e\) are diagonal matrices for normalization based on vertex and edge degrees, respectively. Note that the two-stage message passing by \(M \cdot M^\top\) takes \(O((|\mathcal{E}^p|+|\mathcal{E}^q|)\times J^2)\) calculations, which is quite time-consuming. Inspired by the design in \hl{[3]}, we calculate a matrix \(M^\prime\) by leveragiing pre-calculated \(\beta_{j,j^\prime}\) to obtain a close approximation representation of \(M \cdot M^\top\) and thus boost the inference. The detailed process can be found in the \hl{supplementary material}. We also remove the non-linear projection following \hl{[6]} to simplify the message passing process. \hl{Each item} embedding \(x^{(0)}\) in \(X^{(0)}\) is initialized with the behavior-aware self-gating operation as: \(x^{(0)}=(v_j \oplus b_j) \odot \text{sigmoid}((v_j \oplus b_j) \cdot w +r)\).

浙公网安备 33010602011771号