Pre-Finetuning/Domain-Adaptive Pre-training of Language Models

Published in

Process My Language

6 min readFeb 14, 2021

Pretraining language models with the Transformer architecture are used everywhere in various NLP tasks. People are usually convinced by the results that those self-supervised objective functions can really transfer some knowledge to downstream tasks. In this post, I am not going to discuss those pre-training techniques, instead, I would like to talk about a process called pre-finetuning/Domain-Adaptive Pretraining as a new additional way to improve performance over the standard pre-training stage.

Pre-Finetuning/Domain-Adaptive Pre-training of Language Models

Written by Chien-Sheng (Jason) Wu