Author Topic: Pretraining in modern AI / LLMs (Read 351 times)

Chip · « **on:** May 28, 2026, 08:19:02 AM »

https://www.youtube.com/embed/P7emqEtkiSki=giNSDM4CW4b6SjZN

Conversation Transcript with AI

User:
Explain the "pretraining" process

Assistant:
Pretraining (in modern AI / LLMs)

What it is
Pretraining is the first and largest phase of training a language model, where it learns general language patterns by processing massive amounts of text with no task-specific supervision.

It is essentially:
“predict the next token over and over, billions of times”

Core idea
The model is trained on a simple objective:

Code: [Select]

Given: "The cat sat on the"
Predict: "mat"

How it works step-by-step

Tokenisation — text broken into tokens
Input shifting — model sees prefix, predicts next token
Forward pass — transformer processes embeddings
Prediction — probability distribution over tokens
Loss calculation — compare prediction vs actual token
Backpropagation — update weights
Repeat at scale — billions/trillions of updates

What it learns

Grammar and syntax
Facts and associations
Reasoning patterns (implicitly)
Style imitation
Code structure
Statistical structure of language

Key point
Pretraining does not directly teach instruction-following or task completion. It builds a general predictive model of language.

Output
A base model that can continue text but is not reliably aligned or helpful.

Next steps

Supervised fine-tuning
RLHF / preference training

Summary
Pretraining = learning to predict text from massive datasets.

---

User:
So this all happens before the user enters their query

Assistant:
Yes.

Pretraining is done before any user interacts with the system.

1. Pretraining (offline)
- Huge datasets
- Months of compute
- No users involved
- Produces base model

2. Post-training (alignment)
- Instruction tuning
- Human feedback training
- Produces chat-capable model

3. Inference (user time)
- User types prompt
- Model generates response
- No learning occurs

Timeline
Pretraining → Post-training → Deployment → User query → Inference

Key point
The model is already fully trained before any user input.

User:
⚙️ How Much Code Is Shared Between Pretraining and Live Inference?

Assistant:

Short answer: About 90–95% of the model’s code is identical.
The difference is that all training-only components are removed.

✔ Shared in Both Training and Inference:
• Tokeniser
• Embedding layers
• Transformer blocks (attention, FFN, layernorm, residuals)
• Forward pass
• Output head (logits → softmax)
• Sampling logic (top‑k, top‑p, temperature)
• All learned weights

These are the core of the model — the part that actually “is” the intelligence.

✘ Removed After Training:
• Backpropagation
• Optimiser (AdamW, etc.)
• Loss function (cross‑entropy)
• Weight‑update logic
• Training data pipeline
• Distributed training infrastructure
• Checkpointing and gradient storage

Inference is just the forward pass with frozen weights.

Simplest way to say it:
Training builds the model.
Inference runs the model.
Same engine — the build system is gone.

User:
How long does it take to train ?

Assistant:
⏳ Typical Training Time for Large Language Models

Public sources indicate that training frontier‑scale LLMs (GPT‑4 class and above) requires weeks to months of continuous compute.

Key facts:
• Models at GPT‑4 scale require over 10^25 FLOPs of compute to train.
• Training at this scale costs tens of millions of dollars with current hardware.
• These compute levels imply training runs lasting several weeks to multiple months, depending on cluster size and parallelism (inferred from the compute requirements and cost).

Condensed takeaway:
Frontier LLMs typically train for several weeks to a few months.
Smaller models (7B–70B) train faster, but GPT‑4‑class systems require extremely long, resource‑intensive runs.

[/list]

dopetalk

News:

dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse

Author Topic: Pretraining in modern AI / LLMs (Read 351 times)

Chip (OP)

Pretraining in modern AI / LLMs

Related Topics

Need help or a chat ?

If you need any help or a chat then IM/PM or email me, Chip

dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse