This should get you up to speed, in this order:
1. The First AI - Samuel’s Checkers ML Systemhttps://forum.drugs-and-users.org/index.php?topic=7338Historical grounding.
Shows the origins of machine learning and self-improving systems.
2. What the basic components of AI are and how the data flowshttps://forum.drugs-and-users.org/index.php?topic=7347High-level architectural overview before deep diving.
3. How Neural Networks Workhttps://forum.drugs-and-users.org/index.php?topic=7343Core foundation.
Everything modern comes from this.
4. The AI Tokenisation Pipelinehttps://forum.drugs-and-users.org/index.php?topic=7350Now the reader understands WHY text must become vectors and embeddings.
5. Transformershttps://forum.drugs-and-users.org/index.php?topic=7344The real breakthrough architecture behind modern LLMs.
6. A light intro to LLMs, chatbots, pretraining, and transformershttps://forum.drugs-and-users.org/index.php?topic=7342Applies the transformer concept to actual LLM systems and chatbot behaviour.
7. RAG - Retrieval Augmented Generationhttps://forum.drugs-and-users.org/index.php?topic=7348Advanced modern extension layer.
Shows how models interface with external knowledge.
Those were the basics and once you roughly understand them then continue on with the following topics:
8. Embeddings and Vector Spaceshttps://forum.drugs-and-users.org/index.php?topic=7351Right now embeddings are probably buried inside tokenisation or neural networks, but embeddings are absolutely central to modern AI.
Topics:
- What embeddings actually are
- High-dimensional vector spaces
- Semantic proximity
- Why "cat" and "dog" cluster together
- Cosine similarity
- Latent space
- Why RAG works
- Why hallucinations happen
This becomes the bridge between:
Token IDs → Meaning Space
9. Attention Mechanisms and Self-Attentionhttps://forum.drugs-and-users.org/index.php?topic=7352Transformers really deserve to be split and Attention is the actual revolutionary mechanism.
Topics:
- Query / Key / Value vectors
- Attention weighting
- Context windows
- Token relationships
- Parallel processing vs recurrence
- Why transformers replaced RNNs/LSTMs
Without attention, transformers look like magic.
10. Training vs Inferencehttps://forum.drugs-and-users.org/index.php?topic=7353This is one of the most misunderstood things in AI discussions.
Most people think ChatGPT is "learning while talking."
It usually is not.
Topics:
- Pretraining
- Gradient descent
- Backpropagation
- Weights
- Inference-only operation
- Fine tuning
- RLHF
- Why models are static snapshots
This clears up enormous confusion.
11. Context Windows and Memoryhttps://forum.drugs-and-users.org/index.php?topic=7354Critical for chatbot understanding.
Topics:
- What context windows are
- Token limits
- Sliding attention windows
- Conversation truncation
- Why models "forget"
- Persistent memory systems
- RAG vs memory
This directly explains chatbot behaviour.
12. Hallucinations and Failure Modeshttps://forum.drugs-and-users.org/index.php?topic=7355Very important as AIs won't say "I don't know".
Topics:
- Probabilistic generation
- Why confidence ≠ correctness
- Distribution gaps
- Mode collapse
- Confabulation
- Context poisoning
- Prompt injection
Most people fundamentally misunderstand hallucinations.
13. Multi-Modal AIhttps://forum.drugs-and-users.org/index.php?topic=7356Modern systems are no longer just text.
Topics:
- Vision transformers
- Image tokenisation
- Audio embeddings
- Cross-modal embeddings
- Unified latent spaces
- Image generation diffusion models
This connects LLMs to image/video/audio systems.
14. Agents and Tool Usehttps://forum.drugs-and-users.org/index.php?topic=7357Modern frontier AI architecture.
Topics:
- Tool calling
- External APIs
- Planning loops
- Chain-of-thought orchestration
- Autonomous agents
- Memory stores
- Execution environments
This is where systems are heading now.
15. Scaling Lawshttps://forum.drugs-and-users.org/index.php?topic=7358Very important historically.
Topics:
- Why bigger models suddenly worked
- Emergent behaviour
- Parameter scaling
- Data scaling
- Compute scaling
- Why GPT-3 changed everything
16. Quantisation and Model Compressionhttps://forum.drugs-and-users.org/index.php?topic=7360The practical consequence of Scaling Laws.
Topics:
- What model weights actually are at the bit level
- FP32 vs FP16 vs INT8 vs INT4
- How precision reduction affects output quality
- GGUF and GGML formats
- llama.cpp and local inference
- Pruning and knowledge distillation
- Why a 7B quantised model can run on a laptop
Scaling Laws explains why models got enormous.
This explains how ordinary hardware runs them anyway.
Directly relevant to anyone self-hosting or running local models.
This explains why AI progress looked sudden.
17. Diffusion Models and What They Are Forhttps://forum.drugs-and-users.org/index.php?topic=7359When discussing image generation.
Topics:
- Noise schedules
- Denoising
- Latent diffusion
- Classifier guidance
- Why Stable Diffusion works
Completely different architecture family from transformers.
That’s plenty for now !