Transformers and Natural Language Processing

Date:

This talk described the encoder-decoder architecture for Transformers and self-attention as a weighted combination of all word embeddings. In addition, several advantages were presented over RNNs and ConvNets.