References
Tags: concept
Sources:
Related notes:
Neural Network
Deep Learning
Language Model
BERT
Updates:
Notes {{word-count}}
Summary:
Key points:
Problems in Transformer training and their solutions are introduced in paper Understanding the Difficulty of Training Transformers.
Transformer
BERT is a bidirectional Transformer encoder. It is also a Language Model that has been studied and used widely after the paper was published.