AFM模型(Attentional Factorization Machine)
Attentional Factorization Machines:Learning the Weight of Feature Interactions via Attention Networks
pair-wise interaction layer(It expands m vectors to m(m − 1)/2 interacted vectors):
the attention network is defined as ：
We point out that in these methods(e.g WDL,DCN), feature interactions are implicitly captured by a deep neural network, rather than FM that explicitly models each interaction as the inner product of two features. As such, these deep methods are not interpretable, as the contribution of each feature interaction is unknown.By directly extending FM with the attention mechanism that learns the importance of each feature interaction, our AMF is more interpretable and empirically demonstrates superior performance over Wide&Deep and DeepCross.
RQ1 How do the key hyper-parameters of AFM (i.e., dropout on feature interactions and regularization on the attention network) impact its performance?
RQ2 Can the attention network effectively learn the importance of feature interactions?
RQ3 How does AFM perform as compared to the state-of-theart methods for sparse data prediction?