摘要:
论文:Lite Transformer with Long-Short Range Attention by Wu, Liu et al. [ code in github ] LSRA特点:两组head,其中一组头部专注于局部上下文建模(通过卷积),而另一组头部专注于长距离关系建模(通过注意)。 阅读全文
摘要:
持续更新 2020-05-28 Transformer Transformer详解 Long-Short Range Attention 论文阅读 | Lite Transformer with Long-Short Range Attention 参考: ICLR 2020趋势分析:NLP中更好& 阅读全文