♟️ 0. The First AI — Samuel’s Checkers Programhttps://forum.drugs-and-users.org/index.php?topic=7338
What this section teaches:
The birth of machine learning. Samuel’s system was the first program that improved itself through experience rather than rules.
It introduced:
• Evaluation functions
• Self-play
• Learning from outcomes
• Early reinforcement-learning-like behaviour
This is the conceptual ancestor of modern RL, not LLM pretraining.
------------------------------------------------------------
🧠 1. Pretraining Modern AI / LLMshttps://forum.drugs-and-users.org/index.php?topic=7365
What this section teaches:
How modern AI evolved from early ML ideas. The lineage:
Samuel → TD learning → deep learning → transformers → RLHF.
Clarifies that LLM pretraining is NOT reinforcement learning. Pretraining extracts patterns; RLHF aligns behaviour.
------------------------------------------------------------
🏗️ 2. High‑Level AI Architecture & Data Flowhttps://forum.drugs-and-users.org/index.php?topic=7347
What this section teaches:
A systems-engineering overview of the entire AI stack:
• Tokenisation
• Embeddings
• Transformer layers
• Output decoding
This gives the “big picture” before diving into internals.
------------------------------------------------------------
🔧 3. How Neural Networks Workhttps://forum.drugs-and-users.org/index.php?topic=7343
What this section teaches:
The foundation of all deep learning:
• Neurons and layers
• Activations
• Backpropagation
• Gradient descent
Everything modern builds on this.
------------------------------------------------------------
🔤 4. The Tokenisation Pipelinehttps://forum.drugs-and-users.org/index.php?topic=7350
What this section teaches:
Why text must become numbers:
• Token IDs
• Byte Pair Encoding (BPE)
• Embedding lookup
• Why models operate on vectors, not characters
This bridges raw text → model-usable representation.
------------------------------------------------------------
⚡ 5. Transformershttps://forum.drugs-and-users.org/index.php?topic=7344
What this section teaches:
The architecture that changed everything:
• Attention blocks
• Parallel processing
• Residual connections
• LayerNorm
• Feed-forward networks
Transformers replaced RNNs/LSTMs because they scale.
------------------------------------------------------------
💬 6. LLMs Explainedhttps://forum.drugs-and-users.org/index.php?topic=7342
What this section teaches:
How transformers become chatbots:
• Pretraining on massive corpora
• Next-token prediction
• Why LLMs “sound intelligent”
• How chat interfaces wrap the model
------------------------------------------------------------
📚 7. RAG — Retrieval Augmented Generationhttps://forum.drugs-and-users.org/index.php?topic=7348
What this section teaches:
How LLMs integrate external knowledge:
• Vector databases
• Embedding search
• Document retrieval
• Reducing hallucinations
RAG is the modern solution to static model knowledge.
------------------------------------------------------------
🧭 8. Embeddings & Vector Spaceshttps://forum.drugs-and-users.org/index.php?topic=7351
What this section teaches:
The heart of modern AI:
• High-dimensional meaning spaces
• Semantic clustering
• Cosine similarity
• Latent space structure
• Why “cat” and “dog” cluster
• Why hallucinations happen
This is the bridge: Token IDs → Meaning Space.
------------------------------------------------------------
🎯 9. Attention Mechanisms & Self‑Attentionhttps://forum.drugs-and-users.org/index.php?topic=7352
What this section teaches:
The real magic inside transformers:
• Query / Key / Value vectors
• Attention weights
• Token relationships
• Context windows
• Why attention replaced recurrence
Without this, transformers look like magic.
------------------------------------------------------------
🧪 10. Training vs Inferencehttps://forum.drugs-and-users.org/index.php?topic=7353
What this section teaches:
One of the most misunderstood topics:
• Pretraining
• Gradient descent
• Static weights
• Fine-tuning
• RLHF
• Why models do NOT learn during conversation
This clears up enormous confusion.
------------------------------------------------------------
🧩 11. Context Windows & Memoryhttps://forum.drugs-and-users.org/index.php?topic=7354
What this section teaches:
Why chatbots forget:
• Token limits
• Sliding windows
• Context truncation
• Persistent memory systems
• RAG vs memory
Explains real-world chatbot behaviour.
------------------------------------------------------------
⚠️ 12. Hallucinations & Failure Modeshttps://forum.drugs-and-users.org/index.php?topic=7355
What this section teaches:
Why AIs make things up:
• Probabilistic generation
• Confidence ≠ correctness
• Distribution gaps
• Mode collapse
• Confabulation
• Prompt injection
• Context poisoning
Most people misunderstand hallucinations entirely.
------------------------------------------------------------
🎥 13. Multi‑Modal AIhttps://forum.drugs-and-users.org/index.php?topic=7356
What this section teaches:
Modern AI is no longer text-only:
• Vision transformers
• Image tokenisation
• Audio embeddings
• Cross-modal latent spaces
• Diffusion models
Connects LLMs to image/audio/video systems.
------------------------------------------------------------
🤖 14. Agents & Tool Usehttps://forum.drugs-and-users.org/index.php?topic=7357
What this section teaches:
The frontier of modern AI:
• Tool calling
• External APIs
• Planning loops
• Chain-of-thought orchestration
• Autonomous agents
• Memory stores
• Execution environments
This is where AI is heading now.
------------------------------------------------------------
📈 15. Scaling Lawshttps://forum.drugs-and-users.org/index.php?topic=7358
What this section teaches:
Why bigger models suddenly worked:
• Parameter scaling
• Data scaling
• Compute scaling
• Emergent behaviour
• Why GPT‑3 changed everything
Explains the “sudden” leap in AI capability.
------------------------------------------------------------
🔀16. Mixture of Experts (MoE)https://forum.drugs-and-users.org/index.php?topic=7367
What this section teaches:
How modern frontier models scale without scaling compute proportionally:
• What an "expert" actually is (a sub-network)
• The router / gating mechanism
• Sparse activation — only some experts fire per token
• Why MoE allows enormous parameter counts on modest compute
• MoE vs dense models
• Mixtral, GPT-4 (rumoured), Gemini — real-world examples
• The trade-offs: memory vs compute
Explains why parameter counts stopped being a simple proxy for capability or cost.
------------------------------------------------------------
💾 17. Quantisation & Model Compressionhttps://forum.drugs-and-users.org/index.php?topic=7360
What this section teaches:
How huge models run on ordinary hardware:
• FP32 → FP16 → INT8 → INT4
• GGUF / GGML formats
• Pruning
• Distillation
• Running 7B models on laptops
The practical consequence of scaling laws.
------------------------------------------------------------
🖼️ 18. Diffusion Modelshttps://forum.drugs-and-users.org/index.php?topic=7359
What this section teaches:
How image generation works:
• Noise schedules
• Denoising
• Latent diffusion
• Classifier guidance
• Why Stable Diffusion works
A completely different architecture family from transformers.
------------------------------------------------------------
🔮 19. The Future of AIhttps://forum.drugs-and-users.org/index.php?topic=7361
What this section teaches:
A forward-looking synthesis:
• Frontier model trends
• Agentic systems
• Multi-modal unification
• Long-context architectures
• Self-improving systems
A realistic view of what’s coming next.