Skip to main content
Back to Glossary
Models

GPT

Generative Pre-trained Transformer, OpenAI's family of large language models that generate human-like text through autoregressive prediction.


What is GPT?

GPT stands for Generative Pre-trained Transformer, and it's the model family that kicked off the current AI boom. OpenAI released the first version back in 2018, but things really took off with GPT-3 in 2020 and GPT-4 in 2023. The core idea is pretty straightforward: train a massive neural network on huge amounts of text, then fine-tune it to follow instructions and have conversations.

How GPT Works

The model predicts one word (technically, one token) at a time based on everything that came before it. It's trained on books, websites, code, and basically the entire internet. What makes GPT special is the scale. GPT-4 reportedly has over a trillion parameters, which lets it capture incredibly nuanced patterns in language. The pre-training phase teaches it general knowledge, while later fine-tuning makes it actually useful for specific tasks.

When to Use GPT

GPT models excel at writing, summarization, translation, coding assistance, and general Q&A. They're great when you need flexible, conversational AI that can handle a wide range of tasks without special setup. The downside? They can hallucinate facts, and the larger versions get expensive to run. They also have knowledge cutoffs, so they won't know about recent events unless you give them that context.

Strengths and Limitations

The biggest strength is versatility. GPT can write poetry, debug code, explain quantum physics, or help you draft an email. It's a generalist. The weaknesses include occasional confident wrongness, high computational costs for the best versions, and a tendency to be verbose. For production use, you'll want to add guardrails and fact-checking, especially for anything high-stakes.

Related Terms

More in Models