The stuff make you know 90% of what matters today

The technical papers to show you the key under the hood technologies in AI - 2024-05-10

1. The Annotated Transformer (Attention is All You Need - https://arxiv.org/pdf/1706.03762)

https://nlp.seas.harvard.edu/annotated-transformer/

The Transformer has been on a lot of people's minds over the last five years. This post presents an annotated version of the paper in the form of a line-by-line implementation. It reorders and deletes some sections from the original paper and adds comments throughout. This document itself is a working notebook, and should be a completely usable implementation. Code is available here (https://github.com/harvardnlp/annotated-transformer/)

 

2. The First Law of Complexodynamics

https://scottaaronson.blog/?p=762

https://scottaaronson.blog/

The blog of Scott Aaronson - "If you take nothing else from this blog: quantum computers won't solve hard problems instantly by just trying all solutions in parallel"

 

3. The Unreasonable Effectiveness of Recurrent Neural Networks

https://karpathy.github.io/2015/05/21/rnn-effectiveness/

"We'll train RNNs to generate text character by character and ponder the question "how is that even possible?"

BTW, together with this post I am also releasing code that allows you to train character-level language models based on multi-layer LSTMs. (https://github.com/karpathy/char-rnn)

 

4. Understanding LSTM Networks

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

 

5. Recurrent Neural Network Regularization

https://arxiv.org/pdf/1409.2329.pdf

Present a simple regularization technique for Recurrent Neural Networks (RNNs)- with Long Short-Term Memory (LSTM) units.

 

6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

https://www.cs.toronto.edu/~hinton/absps/colt93.pdf

Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases.

 

7. Pointer Networks

https://arxiv.org/pdf/1506.03134.pdf

Introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence.

 

8. ImageNet Classification with Deep Convolutional Neural Networks

https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

AlexNet

code: https://github.com/ulrichstern/cuda-convnet

 

9. Order Matters: Sequence to Sequence for Sets

https://arxiv.org/pdf/1511.06391

The order in which we organize input and/or output data matters significantly when learning an inderlying model.

 

10. GPipe: Easy Scalling with Micro-Batch Pipeline Parallelism

https://arxiv.org/pdf/1811.06965

Introduce GPipe, a pipeline parallelism library that allows acaling any network that can be expressed as a sequence of layers

 

11. Deep Residual Learning for Image Recognition

https://arxiv.org/pdf/1512.03385

ResNet

 

12. Multi-scale Context Aggregation by Dilated Convolution

https://arxiv.org/pdf/1511.07122

A new convolution network module that is specifically designed for dense prediction

 

13. Nerual Message Passing for Quantum Chemistry

https://arxiv.org/pdf/1704.01212

Message Passing Nerual Networks (MPNNs)

 

14. Attention Is All You Need

https://arxiv.org/pdf/1706.03762

Attention and Transformer

 

15. Neural Machine Translation by Jointly Learning to Align and Translate

https://arxiv.org/pdf/1409.0473

Allow a model to automatically (soft-)serach for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

 

16. Identity Mapings in Deep Residual Networks

https://arxiv.org/pdf/1603.05027

Analyze the propagation formulations behind the resdual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

code: https://github.com/KaimingHe/resnet-1k-layers

 

17. A Simple Neural Network Module for Relational Reasoning

https://arxiv.org/pdf/1706.01427

Use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning

 

18. Variational Lossy Autoencoder

https://arxiv.org/pdf/1611.02731

Present a simple but principle method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN

 

19. Relational Recurrent Neural Networks

https://arxiv.org/pdf/1806.01822

Relational Memory Core (RMC) - which employs multi-head dot product attention to allow memories to interact

 

20. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

https://arxiv.org/pdf/1405.6903

 

21. Neural Tuing Machines

https://arxiv.org/pdf/1410.5401

 

22. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

https://arxiv.org/pdf/1512.02595

 

23. Scaling Laws for Neural Language Models

https://arxiv.org/pdf/2001.08361

 

24. A Tutorial Introduction to the Minimum Description Length Principle

https://arxiv.org/pdf/math/0406077

 

25. Machine Super Intelligence

https://www.vetta.org/documents/Machine_Super_Intelligence.pdf

 

26. Kolmogorov Complexity and Algorithmic Randomnes

https://www.lirmm.fr/~ashen/kolmbook-eng-scan.pdf

 

27. CS231n Convolutional Neural Networks for Visual Recognition

https://cs231n.github.io/

posted @ 2024-05-29 09:27  沐南不二  阅读(56)  评论(0)    收藏  举报