Training vs InferenceThis is one of the most misunderstood splits in machine learning.
A model like ChatGPT does NOT learn while you are using it.
It is usually operating in a frozen state during interaction.
The key distinction:
Training = learning phase (changing the model)
Inference = usage phase (no learning, just prediction)
---
1. PretrainingPretraining is the first and largest phase.
The model is trained on massive datasets:
- Books
- Web text
- Code
- Articles
Objective:
Predict the next token
It learns statistical structure of language.
Over time it builds:
- Grammar understanding
- World knowledge patterns
- Reasoning heuristics
- Embeddings and representations
Pretraining is extremely expensive (GPU clusters, weeks/months).
---
2. Weights (The Model Itself)The model is defined by billions (or trillions) of parameters called weights.
Think of weights as:
Stored numerical knowledge inside the network
They control:
- How attention behaves
- How embeddings form
- How features are detected
- How predictions are made
Once training finishes, weights are usually frozen.
---
3. Gradient DescentThis is the core learning algorithm.
Process:
Prediction → Error → Adjustment → Repeat
The model computes how wrong it is, then adjusts weights slightly to reduce error.
Mathematically:
- Compute loss function
- Compute gradients (direction of improvement)
- Update weights in opposite direction
This happens millions to billions of times.
---
4. BackpropagationBackpropagation is how gradients flow through the network.
It works by:
- Propagating error backwards layer-by-layer
- Assigning responsibility to each weight
- Updating each parameter proportionally
Without backprop, deep learning would not work.
---
5. Inference-Only OperationOnce trained, the model is deployed for use.
At this stage:
Weights are frozen
No learning occurs
Only forward passes happen
So when you ask a question:
- Input goes in
- Network computes activations
- Output token is generated
- Nothing is updated internally
This is why the model does not "learn from conversations" in real time.
---
6. Fine-TuningFine-tuning is additional training after pretraining.
It adapts the model for specific tasks:
- Chat behaviour
- Instruction following
- Domain specialization
Process:
Pretrained model → new dataset → additional gradient descent
It slightly reshapes weights but keeps general knowledge intact.
---
7. RLHF (Reinforcement Learning from Human Feedback)RLHF aligns the model with human preferences.
Steps:
- Model generates multiple responses
- Humans rank them
- Reward model learns preferences
- Policy is optimized toward better-ranked outputs
Goal:
Make outputs more helpful, safe, and human-aligned
This is not knowledge training — it is behaviour shaping.
---
8. Why Models Are Static SnapshotsOnce deployed, a model is essentially a frozen function:
Input → Fixed parameters → Output
Reasons:
- Stability (avoid unpredictable changes)
- Safety (prevent corruption or poisoning)
- Cost (training is extremely expensive)
- Reproducibility (consistent behaviour)
So the model does NOT evolve during conversation.
It is a snapshot of weights captured at training time.
---
9. Key InsightTraining builds the system.
Inference uses the system.
Training = changing the brain
Inference = using the brain
Confusing the two leads to the incorrect assumption that AI "learns live," when in reality it is executing a fixed learned function.