Pre-Finetuning/Domain-Adaptive Pre-training of Language Models
Published in
6 min readFeb 14, 2021
Pretraining language models with the Transformer architecture are used everywhere in various NLP tasks. People are usually convinced by the results that those self-supervised objective functions can really transfer some knowledge to downstream tasks. In this post, I am not going to discuss those pre-training techniques, instead, I would like to talk about a process called pre-finetuning/Domain-Adaptive Pretraining as a new additional way to improve performance over the standard pre-training stage.