i=U35GS1TFroTGHZhH
How Large Language Models (LLMs) WorkLLMs—like those used in OpenAI systems—learn patterns in language and use them to generate text. They don’t “think” like humans; they predict what text comes next based on training.
1. Training on Massive Text DataLLMs are trained on huge datasets including books, websites, articles, and code.
They learn patterns such as:
- "peanut butter and ___" → "jelly"
- "The capital of France is ___" → "Paris"
This is done using a Transformer model.
2. TokenisationText is split into small units called tokens.
Example:
"ChatGPT is cool" →
["Chat", "GPT", " is", " cool"]
Tokens can be words or parts of words.
3. Attention MechanismThe model uses "attention" to decide which words matter most in context.
Example:
"The animal didn't cross the street because it was tired."
The model learns that "it" refers to "the animal", not "the street".
4. Learning by PredictionDuring training, the model predicts the next token:
"The sky is ___" → "blue"
When it makes mistakes, it adjusts its internal parameters using gradient descent. Over time, it improves.
5. Generating ResponsesWhen you ask a question:
1. Input is tokenised
2. The model predicts the next token
3. It repeats this step until a full response is formed
So responses are generated one token at a time.
6. Fine-Tuning and AlignmentAfter training, models are improved using:
- Human feedback
- Instruction tuning
This helps systems like ChatGPT behave more helpfully and follow instructions.
7. What LLMs Don’t DoLLMs:
- Do not think or understand like humans
- Do not have beliefs or intentions
- Do not store facts as a database
They are pattern prediction systems.
Simple AnalogyAn LLM is like:
"A super-advanced autocomplete system trained on a large portion of the internet."
Generated by ChatGPT