The release of ChatGPT by OpenAI at the end of last year has been phenomenal — even my grandma is asking about it. Its capabilities to generate human-like language have inspired people to experiment with its potential in various products. Its wildly successful launch even put pressure on tech giants like Google to rush to release their own version of ChatGPT.

But let’s be honest, for non-technical product managers, designers, and entrepreneurs, the inner workings of ChatGPT may seem like a magical black box. Don’t worry! In this blog, I’ll try to explain the technology and the model behind ChatGPT as simply as possible. By the end of this post, you’ll have a good understanding of what ChatGPT can do, and how it performs its magic.

The transformer & GPT timeline

Before we dive deep into the actual mechanism of ChatGPT, let’s take a quick look at the timeline of the development of the transformer architecture of language models and the different versions of GPT, so that you can have a better sense of how things evolved into the ChatGPT we have today.

Timeline of Transformers, GPT, and ChatGPT. Image by the author.

2015. OpenAI was founded by Sam Altman, Elon Musk, Greg Brockman, Peter Thiel, and others. OpenAI develops many different AI models other than GPT.
2017. Google published the paper Attention is All You Need, which introduced the transformer architecture [2]. The transformer is a neural network architecture that lays the foundation for many state-of-the-art (SOTA) large language models (LLM) like GPT.
2018. GPT is introduced in Improving Language Understanding by Generative Pre-training [3]. It’s based on a modified transformer architecture and pre-trained on a large corpus.
2019. GPT-2 is introduced in Language Models are Unsupervised Multitask Learners [4], which can perform a range of tasks without explicit supervision when training.
2020. GPT-3 is introduced in Language Models are Few-Shot Learners [5], which can perform well with few examples in the prompt without fine-tuning.
2022. InstructGPT is introduced in Training language models to follow instructions with human feedback [6], which can better follow user instructions by fine-tuning with human...