GPT-3: The Ginormous Language Model and its beta-API

Published in

Process My Language

6 min readJun 12, 2020

In this article, I am not going to introduce any training and testing details of GPT-3, please check their paper for more information (74 pages). I would like to post some of my thoughts and questions for public discussion, and also introduce the very very new beta API of OpenAI (2020.06.11).

A week ago (2020.06.05), OpenAI released a paper called “Language Models are Few-Shot Learners” on Arxiv, which is the so-called “GPT-3” model with 175 billion parameters pre-trained on 45TB text data. Do you know what is the expected cost to train such a model? People guess about $12M! This is an ATOMIC BOMB in the NLP field (there are both positive and negative feedbacks). I still remember during the day almost every tweet on my (nerd) Twitter is about GPT-3. Let me quote the tweet (even though it is a bit ironic lol) from Dr. Geoffrey Hinton, one of the godfathers of AI:

Hinton’s Tweet (4.398 trillion = 2⁴² and GPT-3 aggregates 42 tasks)

With such a ginormous language model, what is going to happen next? How will it change the AI field? Positive or negative?

My own opinion is quite positive. First of all, it is amazing to have such a model that can achieve promising zero-shot and few-shot results without task-specific fine-tuning. In some of the tasks, GPT-3 with only a few training samples can surpass a full-data and fully-supervised model. In general, the top three components we need to train a SOTA machine learning model these days are: “Data”, “Model”, and “Computation Resource”, and nowadays the third one seems to be more dominant. I heard a joke that poor people can only do theoretical deep learning research because SOTA is for rich people/companies, which is a bit sad but kind of true. For your reference, one page of results in the T5 model needs to spend a few million US dollars, and even OpenAI cannot retrain its GPT-3 when they found a bug of data overlapping.

“Unfortunately, a bug in the filtering caused us to ignore some overlaps, and due to the cost of training it was not feasible to retrain the model…” (from GPT-3 paper Page 31)

Maybe soon or later we can see a Giant BERT, a bi-directional model at GPT-3 scale, from those big tech companies. Or maybe 5–10 years later, we could have computational power which is 1000 times faster so everyone can train such a model from scratch. Who knows?

Let’s look at some cool APIs from OpenAI: Yesterday, just yesterday, OpenAI released a waitlist for its GPT-3 beta API, and the features look super cool, including Semantic Search, Chatting, Generation, Productivity Tools, Content Comprehension, and Polyglot. Access to this GPT-3 API is invitation-only, and pricing is unclear. People can join their waitlist to test the APIs (link). This could be the first commercial product of OpenAI: an AI text-generation system that was previously warned to be too dangerous to share last year.

These features are not specific only for the GPT-3 model, many existing NLP models can do the same thing I have to say. But currently (from the shown demo) the GPT-3 model may be one of the best models that can do all these at the same time and good. I look forward to trying them once they are fully released. (All of these tasks are still ongoing research directions)

Semantic Search: I believe everyone has experience with “CTRL+F” to search a keyword inside a document or webpage, and this is usually done by strictly string matching (A is A and B is B). What OpenAI proposed to do here is to add semantic understanding into context searching, like a QA system but without giving a specific answer. For example, in figure 1, we can search a question like “why is bread so fluffy?” and it will point to the most relevant paragraph (Image that it will return nothing using a simple keyword search). Here, I also want to point to the SOCO.ai demo, one of the startup projects from my friend Tiancheng Zhao, that is combining QA with searching for Google Scholar search in particular.

2. Productivity Tools: This allows for interacting with the terminal using natural language, parsing text into a spreadsheet table, and more. For example, in Figure 2 left-hand side, original if you what to find time and date on a terminal, we need to either remember those commands. But in their demo, we can directly ask “what day is it?” to get “date +%A” and the following question “I mean the full date” to get “date +%F”. Also, instead of saying “git clone XXX” we can say “clone the openai gym repo and install it” (even though I think this is more wordy and ambiguous LOL). This would be super useful for developers (if it works well) to increase their productivity. On the right-hand side of Figure 2, they can import text information and update a table. For example, given a new company name “AI Benefits Company”, similar to the in-built auto-extension feature of Excel, we can automatically obtain Ticker (ABC) and Year Founded (2030)values for such company.

3. Generation: They can of course do a set of generation tasks, aka controllable text generation, such as chatting, text summarization, and machine translation. A single model that can do multiple tasks is an ideal goal for NLP research. For example, in Figure 3 left-hand side, the model is chatting with humans (more like chit-chatting with some information provided). I would like to point to the Blender paper from Facebook that is one of the best chit-chat bots we have these days. In the middle, the model generates some following context given a predefined prompt. This is the same demo we all see in GPT-2. On the right-hand side, the model does translation given a tag “English” to “French”. Here, I would like to point to the DecaNLP multi-task learning challenge and the first large-scale controllable language model CTRL. Both of them are from my research team at Salesforce.

Lastly, OpenAI was founded as a purely nonprofit enterprise in 2015 and shifted its business model in 2019, creating a for-profit firm named OpenAI LP. Even though some people criticized the move, suggesting that it undermined the lab’s original claims to be pursuing open-source AI, I believe that it is a good way to attract investment (e.g., $1 billion from Microsoft) and encourage such more practical and useful applications. Now that the company has taken further steps into the commercial plan, releasing APIs, many will be watching carefully to see how they change and contribute to the AI research community.

I wrote NLP blog a bit casual so please let me know if there is any mistake or typos. Feel free to contact me if you have any questions or ideas. I believe top research is always coming out of discussion and cooperation.

Appreciate your reading. Cheers.

GPT-3: The Ginormous Language Model and its beta-API

Written by Chien-Sheng (Jason) Wu