Pre-Finetuning/Domain-Adaptive Pre-training of Language Models

Chien-Sheng (Jason) Wu
Process My Language
6 min readFeb 14, 2021

--

Pretraining language models with the Transformer architecture are used everywhere in various NLP tasks. People are usually convinced by the results that those self-supervised objective functions can really transfer some knowledge to downstream tasks. In this post, I am not going to discuss those pre-training techniques, instead, I would like to talk about a process called pre-finetuning/Domain-Adaptive Pretraining as a new additional way to improve performance over the standard pre-training stage.

--

--