Gating filter

Gating Filter: The Gate controls the path through which information flows in the network and have proven to be useful for recurrent neural networks.

[1] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

Residual Gating

一个物品的一些特征会保持到从最后一次出现的时期到下一个时期。为了保持这些特性，设计了一个残差门控来传播保持的特征。

初始化物品静态特征查找表\(E=[e_i]\)
保留量的计算，即保留多少以前的信息：
\[g=\frac{e^{\mathbf{z}_{R}^{T} \sigma\left(\mathbf{W}_{R} \mathbf{x}_{i}^{t_{<n}, L}\right)}}{e^{\mathbf{z}_{R}^{T} \sigma\left(\mathbf{W}_{R} \mathbf{x}_{i}^{t{<n},L} \right)}+e^{\mathbf{z}_{R}^{T} \sigma\left(\mathbf{W}_{R} \mathbf{e}_{i}\right)}} \]
更新公式:
\[\mathbf{x}_{i}^{t_{n}, 0}=g \mathbf{x}_{i}^{t_{<n}, L}+(1-g) \mathbf{e}_{i} \]
\(\mathbf{x}_i^{t_{<n},L}\)表示HGCN最后一层的输出(\(t_{<n}\)表示物品最后一次出现的时期)，如果不存在这个输出，则令\(\mathbf{x}_i^{t_n,0}=e_i\)
\(\mathbf{z}_R\)是transformation vector；\(\mathbf{W}_R\)是transformation matrix；\(\sigma(\cdot)\)是 tanh 函数。

理解：特征向量非线性变换（非线性激活函数）到相等的特征域，\(\mathbf{z}_R\)属于指示向量，哪个变换后的特征向量与指示向量内积相似度越大，保留的信息越多。

[1] Wang, Jianling, et al. "Next-item recommendation with sequential hypergraphs." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.

GCN: Gating Mechanisms

language model: Gating mechanisms are use to select which words or features are relevant for predicting the next word.

LSTMs通过输入门（input gates）和忘记门（forget gates）来实现长期记忆。否则在多轮的转换之后信息很容易发生消失。相反，卷积网络没有那么容易发生梯度消失的问题，因此卷积网络不需要忘记门。

理解：梯度消失之后，那么梯度消失对应的位置是什么值都不会影响损失函数的值的大小。因此对应的位置表示的信息就没有参考价值了，所以称为消息丢失。

LSTM-style(GTU)[2]: \(\tanh(\mathbf{X}*\mathbf{W}+\mathbf{b}) \otimes \sigma(\mathbf{X}*\mathbf{V}+\mathbf{c})\)

The gradient of GTU:

\[\nabla[\tanh (\mathbf{X}) \otimes \sigma(\mathbf{X})] =\tanh ^{\prime}(\mathbf{X}) \nabla \mathbf{X} \otimes \sigma(\mathbf{X}) +\sigma^{\prime}(\mathbf{X}) \nabla \mathbf{X} \otimes \tanh (\mathbf{X}) \]

GLU: \((\mathbf{X}*\mathbf{W}+\mathbf{b}) \otimes \sigma(\mathbf{X}*\mathbf{V}+\mathbf{c})\)

The gradient of GLU:

\[\nabla[\mathbf{X} \otimes \sigma(\mathbf{X})]=\nabla \mathbf{X} \otimes \sigma(\mathbf{X})+\mathbf{X} \otimes \sigma^{\prime}(\mathbf{X}) \nabla \mathbf{X} \]

[1] Dauphin, Yann N., et al. "Language modeling with gated convolutional networks." International conference on machine learning. PMLR, 2017.

[2] Oord, Aaron van den, et al. "Conditional image generation with pixelcnn decoders." arXiv preprint arXiv:1606.05328 (2016).

posted @ 2021-06-12 06:52 小肚腩的世界阅读(50) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

小肚腩的世界

Gating filter

Gating filter

Residual Gating

GCN: Gating Mechanisms

公告