dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse


Our Discord Notification Server invitation link is https://discord.gg/jB2qmRrxyD

Author Topic: Pretraining in modern AI / LLMs  (Read 9 times)

Online Chip (OP)

  • Server Admin
  • Hero Member
  • *****
  • Administrator
  • *****
  • Join Date: Dec 2014
  • Location: Australia
  • Posts: 7150
  • Reputation Power: 0
  • Chip has hidden their reputation power
  • Gender: Male
  • Last Login:Today at 10:12:01 AM
  • Deeply Confused Learner
  • Profession: IT Engineer now retired
Pretraining in modern AI / LLMs
« on: Today at 08:19:02 AM »
i=giNSDM4CW4b6SjZN

Conversation Transcript with AI

User:
Explain the "pretraining" process

Assistant:
Pretraining (in modern AI / LLMs)

What it is 
Pretraining is the first and largest phase of training a language model, where it learns general language patterns by processing massive amounts of text with no task-specific supervision.

It is essentially: 
“predict the next token over and over, billions of times”

Core idea 
The model is trained on a simple objective:

Code: [Select]
Given: "The cat sat on the"
Predict: "mat"
    How it works step-by-step

    [list=1]
    • Tokenisation — text broken into tokens
    • Input shifting — model sees prefix, predicts next token
    • Forward pass — transformer processes embeddings
    • Prediction — probability distribution over tokens
    • Loss calculation — compare prediction vs actual token
    • Backpropagation — update weights
    • Repeat at scale — billions/trillions of updates
    What it learns
    • Grammar and syntax
    • Facts and associations
    • Reasoning patterns (implicitly)
    • Style imitation
    • Code structure
    • Statistical structure of language

    Key point 
    Pretraining does not directly teach instruction-following or task completion. It builds a general predictive model of language.

    Output 
    A base model that can continue text but is not reliably aligned or helpful.

    Next steps
    • Supervised fine-tuning
    • RLHF / preference training

    Summary 
    Pretraining = learning to predict text from massive datasets.

    ---

    User:
    So this all happens before the user enters their query

    Assistant:
    Yes.

    Pretraining is done before any user interacts with the system.

    1. Pretraining (offline)
    - Huge datasets
    - Months of compute
    - No users involved
    - Produces base model

    2. Post-training (alignment)
    - Instruction tuning
    - Human feedback training
    - Produces chat-capable model

    3. Inference (user time)
    - User types prompt
    - Model generates response
    - No learning occurs

    Timeline
    Pretraining → Post-training → Deployment → User query → Inference

    Key point
    The model is already fully trained before any user input.



    User:
    ⚙️ How Much Code Is Shared Between Pretraining and Live Inference?

    Assistant:

    Short answer: About 90–95% of the model’s code is identical. 
    The difference is that all training-only components are removed.

    ✔ Shared in Both Training and Inference:
    • Tokeniser 
    • Embedding layers 
    • Transformer blocks (attention, FFN, layernorm, residuals) 
    • Forward pass 
    • Output head (logits → softmax) 
    • Sampling logic (top‑k, top‑p, temperature) 
    • All learned weights 

    These are the core of the model — the part that actually “is” the intelligence.

    ✘ Removed After Training:
    • Backpropagation 
    • Optimiser (AdamW, etc.) 
    • Loss function (cross‑entropy) 
    • Weight‑update logic 
    • Training data pipeline 
    • Distributed training infrastructure 
    • Checkpointing and gradient storage 

    Inference is just the forward pass with frozen weights.

    Simplest way to say it: 
    Training builds the model. 
    Inference runs the model. 
    Same engine — the build system is gone.


    User:
    How long does it take to train ?


    Assistant:
    ⏳ Typical Training Time for Large Language Models


    Public sources indicate that training frontier‑scale LLMs (GPT‑4 class and above) requires weeks to months of continuous compute.

    Key facts:
    • Models at GPT‑4 scale require over 10^25 FLOPs of compute to train.   
    • Training at this scale costs tens of millions of dollars with current hardware.   
    • These compute levels imply training runs lasting several weeks to multiple months, depending on cluster size and parallelism (inferred from the compute requirements and cost).

    Condensed takeaway:
    Frontier LLMs typically train for several weeks to a few months. 
    Smaller models (7B–70B) train faster, but GPT‑4‑class systems require extremely long, resource‑intensive runs.

    « Last Edit: Today at 09:52:02 AM by Chip »
    friendly
    0
    funny
    0
    informative
    0
    agree
    0
    disagree
    0
    like
    0
    dislike
    0
    No reactions
    No reactions
    No reactions
    No reactions
    No reactions
    No reactions
    No reactions
    Our Discord Server invitation link is https://discord.gg/jB2qmRrxyD

    Tags:
     

    Related Topics

      Subject / Started by Replies Last post
    6 Replies
    45716 Views
    Last post November 08, 2015, 02:47:24 PM
    by neighbor
    8 Replies
    40291 Views
    Last post January 29, 2018, 12:45:10 PM
    by LadyKalma
    0 Replies
    11701 Views
    Last post January 16, 2025, 11:46:31 AM
    by smfadmin
    0 Replies
    182 Views
    Last post May 23, 2026, 01:14:47 PM
    by smfadmin


    dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse





    TERMS AND CONDITIONS

    In no event will d&u or any person involved in creating, producing, or distributing site information be liable for any direct, indirect, incidental, punitive, special or consequential damages arising out of the use of or inability to use d&u. You agree to indemnify and hold harmless d&u, its domain founders, sponsors, maintainers, server administrators, volunteers and contributors from and against all liability, claims, damages, costs and expenses, including legal fees, that arise directly or indirectly from the use of any part of the d&u site.


    TO USE THIS WEBSITE YOU MUST AGREE TO THE TERMS AND CONDITIONS ABOVE


    Founded December 2014
    SimplePortal 2.3.6 © 2008-2014, SimplePortal