i=Oy-LzGsslOwbrLSV
Machine Learning Foundations Map
See also ML Example Code: Samuel’s Checkers in Python and Food Preference in Psuedocode
1) Core GoalAll machine learning is about:
- Approximating a function f(x)
- Or choosing actions that maximize reward over time
2) Three Fundamental ParadigmsNeural Computation (Representation Learning)Perceptron → Deep Nets → Transformers
- Learns mappings from inputs to outputs
- Optimised via gradient descent
- Builds internal representations of data
Reward-Driven Learning (Decision Making)Samuel → Temporal Difference Learning → RL → Deep RL → AlphaZero
- Learns via reward signals from environment
- Optimises long-term outcomes
- Includes exploration and trial-and-error learning
Structure Learning (Unsupervised Inference)Clustering → PCA → Bayesian methods
- Finds hidden structure in data
- No labels or rewards required
- Learns latent variables, similarity, and density structure
3) Why These Paradigms Exist- Perceptron failed → needed non-linear models → deep networks
- Rule-based systems failed → needed learning from experience → RL
- Labels are expensive → needed unsupervised structure discovery
4) Modern AI = Hybrid StackMost real AI systems combine:
- Neural computation (model capacity)
- Reward learning (decision optimisation)
- Structure learning (data organisation / embeddings)
5) Key Unification Idealearn representations + optimise objectives + exploit data structureMachine Learning Evolution Timeline
1) 1950s–1960s: FoundationsPerceptron (Rosenblatt, 1957)- Early supervised learning model
- Binary linear classifier
- Computes weighted sum of inputs then applies threshold
- Learns by adjusting weights when predictions are wrong
Core idea: decision boundaries can be learned from labeled data
Samuel’s Checkers Program (1959)- Early reinforcement learning-style system
- Self-play against itself
- Learns from win/loss reward signals
- Improves a hand-crafted evaluation function over time
Core idea: systems can improve through experience rather than fixed rules
2) 1960s–1980s: Limits & Theory Gap- Perceptron fails on non-linearly separable problems such as XOR
- Neural network research declines
- Reinforcement learning exists mainly as Markov Decision Processes theory
3) 1980s–1990s: RevivalBackpropagation (1986)- Enables multi-layer neural networks
- Solves XOR via hidden layers
- Core mechanism behind modern deep learning
Temporal Difference Learning (1988)- Bridges Samuel-style learning with formal reinforcement learning
- Learns from prediction errors over time
4) 1990s–2010s: Function Approximation Era- Q-learning becomes widely used in reinforcement learning
- Neural networks approximate value functions
- TD-Gammon demonstrates strong game-playing performance
5) 2010s: Deep Learning EraPretraining (Core LLM Stage)Pretraining is large-scale self-supervised learning where a model is trained on massive text corpora to predict the next token.
Pretraining is the foundational training stage that enables all downstream instruction tuning and RLHF. RLHF is a post-pretraining alignment step that uses human preference rankings to fine-tune model behaviour.
- Learns statistical structure of language and world knowledge
- Uses next-token prediction as the training objective
- Requires no human labels or explicit instructions
- Forms the base capability of modern language models
Core idea: learn general representations from raw data via prediction
RNNs (Recurrent Neural Networks)Process sequential data using a hidden memory state across time steps.
CNNs (Convolutional Neural Networks)Process spatial data using learned filters that detect edges, textures, and patterns.
TransformersUse attention mechanisms instead of recurrence/convolution, enabling parallel processing and long-range dependency modeling.
- CNNs, RNNs, and Transformers scale representation learning
- Perceptron becomes the basic neuron unit in deep networks
- Supervised learning dominates early deep learning success
6) 2013–2016: Deep Reinforcement Learning- DQN combines Q-learning with deep neural networks
- Learns directly from raw inputs such as images
7) 2017–Present: Self-Play SystemsAlphaZero / MuZero- Self-play reinforcement learning
- Deep neural networks combined with search
- No human training data required
2017–Present: Alignment & RLHF Era- Instruction tuning trains models to follow prompts
- RLHF uses human preference rankings to shape outputs
- Optimises behaviour: helpfulness, safety, tone, refusal style
- Does not primarily increase intelligence, but aligns model behaviour
Two Parallel StreamsRepresentation Learning StreamPerceptron → Backpropagation → Deep Learning → Transformers
Reward Learning StreamSamuel’s Checkers → Temporal Difference Learning → Reinforcement Learning → Deep RL → AlphaZero
Key InsightModern AI is the combination of:
- Neural computation (representation learning)
- Reward-driven learning (decision-making)
- Alignment methods (RLHF shaping behaviour)
Core Machine Learning Paradigms (Summary)
Neural computation learns representations via data-driven optimisation.
Reward-driven learning optimises actions via environmental feedback.
Structure learning discovers latent patterns without labels or rewards.
Modern AI systems are typically hybrids combining multiple paradigms.