Me to AI:
Show me how that same network is written in about 20 lines of Python
AI:
Here’s a minimal “3Blue1Brown-style” neural network in ~20 lines of pure NumPy. It’s a tiny 2-layer network trained on the XOR problem (the classic case where a single line fails).
Python
import numpy as np
# data (XOR)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
# weights
W1 = np.random.randn(2,4)
b1 = np.zeros((1,4))
W2 = np.random.randn(4,1)
b2 = np.zeros((1,1))
sigmoid = lambda x: 1/(1+np.exp(-x))
for i in range(10000):
# forward pass
h = sigmoid(X @ W1 + b1)
out = sigmoid(h @ W2 + b2)
# loss gradient
err = out - y
# backprop (manual chain rule)
dW2 = h.T @ (err * out * (1 - out))
db2 = np.sum(err * out * (1 - out), axis=0, keepdims=True)
dh = (err * out * (1 - out)) @ W2.T
dW1 = X.T @ (dh * h * (1 - h))
db1 = np.sum(dh * h * (1 - h), axis=0, keepdims=True)
# update
W2 -= 0.1 * dW2
b2 -= 0.1 * db2
W1 -= 0.1 * dW1
b1 -= 0.1 * db1
print(np.round(out))
What this is doing (in plain terms):
X → inputs (0/1 pairs)
W1, W2 → “knowledge knobs”
sigmoid → squashes values into probabilities
forward pass = “guess”
err = how wrong it is
backprop = how each weight contributed to the mistake
updates slowly push the system toward correct XOR behavior
Reality check
This is literally the core idea behind modern deep learning:
multiply → squish → measure error → nudge weights → repeat
Everything else (GPT, vision models, etc.) is just this scaled up massively.