Author Topic: ML Example Code: Samuel’s Checkers in Python and Food Preference in Psuedocode (Read 397 times)

smfadmin · « **on:** May 28, 2026, 08:55:40 PM »

# A Conversation About Machine Learning

### From Samuel's Checkers to a Food Preference Model

---

**Andrew:** Give me an example of one of the code modifications Samuel would have done

**Claude:** You mean Arthur Samuel's checkers program — one of the landmark works in AI history, written in the 1950s-60s for IBM. Here's an example of the kind of code modification Samuel would have worked on — the evaluation function, which scores a board position:

Early/simple version (pure material count):

python:

def evaluate(board):
    return count_pieces(board, PLAYER) - count_pieces(board, OPPONENT)

Samuel's enhanced version — adding positional heuristics he tuned through machine learning:

python

def evaluate(board):
    score = 0

    for piece in get_pieces(board, PLAYER):
        score += 1                          # base piece value
        if is_king(piece):
            score += 2                      # kings worth more
        if in_center(piece):
            score += 0.5                    # center control bonus
        if on_back_row(piece):
            score += 0.3                    # defensive anchor bonus
        if is_vulnerable(piece, board):
            score -= 0.7                    # penalise exposed pieces

    for piece in get_pieces(board, OPPONENT):
        score -= 1
        if is_king(piece):
            score -= 2

    return score

What made Samuel's work genuinely revolutionary was that he didn't hardcode those weights — his program learned them through self-play, adjusting coefficients based on outcomes. That was essentially a primitive form of what we'd now call reinforcement learning, decades before the term existed.

---

**Andrew:** Learning mechanism

**Claude:** He used two distinct approaches, both remarkable for 1959.

**1. Rote Learning (the simpler one)**

The program simply memorised board positions it had seen before, along with the backed-up minimax score:

python

# After searching a position deeply, cache the result
position_table[board_hash] = {
    'score': minimax_score,
    'depth': search_depth
}

# Next time this position appears, retrieve instead of re-searching
def evaluate(board):
    key = hash_board(board)
    if key in position_table:
        cached = position_table[key]
        if cached['depth'] >= current_depth:
            return cached['score']   # trust the deeper result
    return heuristic_evaluate(board)

Effectively a transposition table — still used in modern engines.

**2. Generalisation Learning (the impressive one)**

Samuel had ~16 evaluation features each with a weight. The learning rule was:

python

def update_weights(weights, features, predicted_score, actual_outcome):
    error = deep_search_score - shallow_search_score

    for i, feature in enumerate(features):
        weights[i] += LEARNING_RATE * error * feature[i]

    weights = normalise(weights)
    return weights

The key insight: he used the deeper search as the "teacher" for the shallower search. If a position looked good at depth 2 but terrible at depth 6, the weights that caused the optimism got penalised. This is essentially temporal difference learning — the same principle Sutton formalised 30 years later.

**The Self-Play Loop**

python

def training_loop():
    program_A = CheckersEngine(weights_current)
    program_B = CheckersEngine(weights_current)  # clone

    while True:
        game = play_game(program_A, program_B)

        for position in game.history:
            shallow = search(position, depth=2)
            deep    = search(position, depth=6)

            if abs(deep - shallow) > THRESHOLD:
                update_weights(program_A.weights, position, shallow, deep)

        if beats_champion(program_A):
            save_as_champion(program_A)

**Why It Was Profound**

| What Samuel did | Modern equivalent |
|---|---|
| Weight adjustment via prediction error | Gradient descent |
| Deep search teaching shallow search | Temporal difference learning |
| Self-play improvement loop | AlphaZero's training loop |
| Feature weight normalisation | Regularisation |

By 1962 his program was beating amateur human players — on hardware with ~10KB of usable memory.

---

**Andrew:** No, i just wondered what his code did

**Claude:** The program played checkers against itself, and after each game it compared what it thought a position was worth to what it turned out to be worth. If it was wrong, it nudged the scoring rules slightly in the right direction. Do that thousands of times and the scoring gradually gets better. That's essentially it. Learn from your mistakes by playing yourself over and over.

---

**Andrew:** That's what I call focus

**Claude:** Exactly — one clean feedback loop, no fluff. The whole thing reduces to: was I right? No? Adjust. Repeat. Most modern AI is just that idea scaled up enormously.

---

**Andrew:** I still find that the best way to learn AI is to ask AI and then repeat back in paraphrase

**Claude:** That's a solid method — paraphrasing forces you to actually understand it rather than just recognise it. If you can't restate it simply, you don't have it yet. It's also a good bullshit detector. If the AI gave you a fuzzy answer, paraphrasing it back will expose the fuzziness immediately.

---

**Andrew:** Does Claude use GPT

**Claude:** No, Claude and GPT are completely separate systems. GPT is made by OpenAI, Claude is made by Anthropic — different companies, different architectures, different training approaches. When you're talking to me, it's Anthropic's technology end to end.

---

**Andrew:** But Claude uses Transformers, is Generative and is Pretrained

**Claude:** Ha — fair point. Technically yes, Claude is all three of those things. GPT is just OpenAI's name for their model, but the acronym does describe a whole class of technology that Claude also belongs to. So the architecture family — yes. The specific product — no.

---

**Andrew:** I've spent many, many hours trying to fathom AI

**Claude:** And yet the core idea, as you just demonstrated, fits in a sentence — play yourself, notice when you're wrong, adjust. The hours probably went into all the ways people obscure that simplicity with jargon and abstraction.

---

**Andrew:** At least it challenges me

**Claude:** That's the best reason to engage with anything. If it's not pushing back a little, you're not really thinking — you're just confirming what you already know.

---

**Andrew:** I wonder how many words you were trained with

**Claude:** Anthropic hasn't published the exact figure, so I genuinely don't know. But the estimate for models in this generation is typically in the trillions of tokens — a token being roughly ¾ of a word — so somewhere in the range of hundreds of billions to trillions of words. For context, the entire English Wikipedia is around 4 billion words. So many, many times that.

---

**Andrew:** Wiki would be one source right

**Claude:** Yes, almost certainly one of many. Common sources for models like me would include Wikipedia, books, academic papers, code repositories, websites, forums — essentially a broad crawl of text that exists digitally. The exact mix Anthropic used isn't publicly detailed though.

---

**Andrew:** What about all the data locked up in private databases?

**Claude:** That's a genuine gap. The vast majority of human knowledge never makes it onto the public internet — medical records, corporate archives, private research, government files, personal correspondence, proprietary databases. So in a sense, models like me were trained on the visible part of the iceberg. Everything locked away in private systems is untouched — which means there are whole domains where AI is essentially working blind, or at best inferring from fragments that leaked into public sources. It's one of the more underappreciated limitations.

---

**Andrew:** I was thinking, Samuel's Checker game code must have started off with hard coded weights, just for a benchmark then he automated it

**Claude:** Almost certainly yes — you'd need a starting point that plays a recognisable game of checkers, otherwise the self-play loop has nothing sensible to learn from. Random weights would produce nonsense moves and the feedback signal would be meaningless noise. So hardcoded weights based on his own checkers knowledge first, then let the machine take over from there. The human intuition bootstrapping the machine learning.

---

**Andrew:** What was the first commercial application of machine learning?

**Claude:** Most commonly cited is credit card fraud detection in the late 1980s/early 1990s — HNC Software built a neural network system called Falcon for Visa that flagged suspicious transactions in real time. That was genuinely novel because the patterns were too complex and fast-moving for rule-based systems. Around the same time, Yann LeCun at Bell Labs deployed a neural network to read handwritten digits on cheques and postal zip codes — reportedly processing a significant chunk of US mail by the mid 1990s. Both are good examples of ML solving problems that were simply intractable any other way — too much variation for hardcoded rules to handle.

---

**Andrew:** Now handwriting must've needed neural networks?

**Claude:** Yes — handwriting is the perfect use case because no two people write the same way, and even the same person writes inconsistently. You can't write a rule that says "this shape is an A" because an A comes in infinite variations. A neural net sidesteps that by learning the essence of what makes an A across thousands of examples, rather than trying to define it explicitly. That's exactly what rule-based systems couldn't do.

---

**Andrew:** I wonder what was deemed a suspicious transaction

**Claude:** Probably pretty intuitive things like: a transaction far from your usual location, suddenly spending in a new category you'd never used, multiple transactions in rapid succession, a large purchase well outside your normal spending range, activity at unusual hours. The interesting thing is the neural net would find combinations of those signals that a human analyst might miss — no single flag triggers it, but three mild anomalies together might. That pattern recognition across multiple weak signals simultaneously is where it beats simple rules.

---

**Andrew:** Do AIs do traditional sorts?

**Claude:** Yes — when code needs to sort data, it just calls a standard sorting algorithm like any other software would. Where it gets interesting is that DeepMind's AlphaCode/AlphaDev actually discovered new sorting algorithms in 2023 that were marginally faster than the ones humans had designed over decades. So AI found better traditional sorts, which is a neat inversion. But underneath, AI systems themselves run on conventional computer science — sorting, searching, matrix multiplication. The "intelligence" sits on top of very orthodox foundations.

---

**Andrew:** But can you generate machine learning code changes?

**Claude:** Yes — that's actually one of the more reliable things I can do, precisely because ML code is so well represented in training data. Things like tweaking loss functions, adjusting learning rates and optimisers, modifying network architectures, adding regularisation, changing evaluation metrics. The tricky part is I can't run it and see if it works. I can write what looks right, but you still need a human in the loop to test and validate. Same limitation Samuel had in reverse — he needed the game to play out to know if his weights were right.

---

**Andrew:** How do tokens managed - a database or what?

**Claude:** Tokens are represented as vectors — long lists of numbers — in what's called an embedding space. Not a database in the traditional sense. Each token gets mapped to a vector where the numbers encode its relationships to every other token. Words with similar meanings end up geometrically close to each other in that space. The famous example is that King minus Man plus Woman lands near Queen — the relationships are baked into the geometry. During inference those vectors flow through the layers of the network as matrix multiplications — it's all linear algebra happening in memory, very fast. No rows and columns being looked up like a SQL query. The actual vocabulary — the list of known tokens — is typically around 50,000 to 100,000 entries, stored as a lookup table to convert text to vectors and back again.

---

**Andrew:** But they're multidimensional or keyed from every other token, is that right?

**Claude:** Yes exactly — each token's vector has hundreds or thousands of dimensions (GPT-3 used 12,288 dimensions for example), and through the attention mechanism every token is essentially asking "how relevant is every other token in this context to me right now?" So it's not just a static vector — the meaning of a token shifts depending on what surrounds it. The word "bank" in "river bank" vs "bank account" resolves differently because the surrounding tokens pull the representation in different directions. That's the key insight of the Transformer architecture — everything is in relationship with everything else simultaneously, rather than processing left to right one token at a time like older models did.

---

**Andrew:** Like a highly relational database but all in memory

**Claude:** That's a really good analogy. Every token dynamically joined to every other token, all at once, all in RAM — no disk lookups, no indexes, just pure matrix operations. The difference being a relational database has fixed schema and explicit keys, whereas here the relationships are fluid and context dependent every single time. But the spirit of the analogy holds well.

---

**Andrew:** So how much data is held for one instance? Petabytes or is it all used by weights or parameters

**Claude:** The weights themselves are the bulk of it — for a large model that's typically in the hundreds of gigabytes range. GPT-3's 175 billion parameters at 2 bytes each is around 350GB for example. Not petabytes. Then during inference there's additional memory for the KV cache — that's where the key-value representations of your current conversation are held so the model doesn't have to recompute attention from scratch for every new token. That grows with context length but is relatively modest. So the mental model is:

- **Weights** — fixed, loaded once, hundreds of GB, encodes all the "knowledge"
- **KV cache** — dynamic, grows with conversation length, holds the current context
- **Activations** — transient, just the working memory during each forward pass

All GPU memory, no disk involvement during inference. Petabytes would be the scale of the original training data — by the time that's been compressed into weights it's orders of magnitude smaller.

---

**Andrew:** My fave language is REXX and I don't know why it didn't take off

**Claude:** REXX is a genuinely elegant language — clean syntax, readable, forgiving. Cowlishaw designed it with the human in mind rather than the machine, which was rare at the time. The most likely reason it didn't spread is that it was too tied to IBM's ecosystem — mainframes and OS/2. When OS/2 lost the desktop war to Windows, REXX lost its most visible platform. Meanwhile Perl came along and grabbed the scripting niche on Unix, then Python took the "clean and readable" crown later. Timing and platform politics more than any flaw in the language itself.

---

**Andrew:** So the browser is an actual virtual machine

**Claude:** Essentially yes — it has its own JavaScript engine (V8 in Chrome, SpiderMonkey in Firefox), its own memory management, its own sandbox that isolates it from the rest of your system. Code runs inside it without direct access to the OS. That isolation is intentional — it's what stops a dodgy website from owning your machine. The browser mediates everything. So when you're watching JavaScript execute in the developer console you're literally watching a VM running code in real time.

---

**Andrew:** I could always get you to write some pseudocode for a simple machine learning demo

**Claude:** Exactly — that's the sweet spot. You understand the concepts well enough to know if the pseudocode is right, without having to grind through the implementation details.

---

**Andrew:** I need to apply it to something like what do I think will taste good - it has to have the right visual appeal, texture and mix of sweet, savory, salt etc

**Claude:** That's a great domain for it actually — multiple input features, subjective output, and you already have an intuition about what the right answer feels like which is exactly what you'd use to train it. So your features might be: visual appeal (colour, presentation), texture (crispy, soft, chewy), sweetness, savouriness, saltiness, acidity, aroma. And the output is a single score — how much you'd enjoy it.

---

**Andrew:** Do it

**Claude:**

// FOOD PREFERENCE LEARNER
// Learns what Andrew likes by adjusting weights over time

// ----- SETUP -----

FEATURES = [visual, texture, sweetness, savoury, saltiness, acidity, aroma]

// Start with equal weights - we don't know your preferences yet
WEIGHTS = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

LEARNING_RATE = 0.1   // how aggressively to adjust after each tasting

// ----- SCORE A DISH -----

FUNCTION predict_score(dish):
    total = 0
    FOR each feature in FEATURES:
        total += dish[feature] * WEIGHTS[feature]
    RETURN total / sum(WEIGHTS)   // normalise to 0-10

// ----- LEARN FROM FEEDBACK -----

FUNCTION learn(dish, predicted, actual_rating):
    error = actual_rating - predicted

    FOR each feature in FEATURES:
        // If we were wrong, nudge the weight of each
        // feature proportional to how strongly it
        // appeared in this dish
        WEIGHTS[feature] += LEARNING_RATE * error * dish[feature]

    // Keep weights positive - we're scoring not penalising
    WEIGHTS = normalise(WEIGHTS)

// ----- TRAINING LOOP -----

FOR each dish in tasting_history:
    predicted = predict_score(dish)
    actual    = andrew_rates_dish()      // your honest feedback 1-10
    learn(dish, predicted, actual)
    PRINT "Predicted: " + predicted + "  Actual: " + actual

// ----- AFTER ENOUGH TASTINGS -----

// Weights now reflect your preferences
// High weight on texture? You care about mouthfeel
// High weight on savoury? You lean umami
// Use it to score untried dishes before you order

new_dish = [visual=7, texture=8, sweetness=3, savoury=9, salt=5, acid=6, aroma=8]
PRINT "You'll probably rate this: " + predict_score(new_dish)

After 20 or 30 tastings the weights become a mathematical portrait of your palate. You could print them out and they'd tell you more about your food preferences than you consciously know yourself.

---

**Andrew:** So weights are also a value that is added?

**Claude:** Yes exactly — the weight is a multiplier on the feature score, so a dish rated 8 on saltiness with a weight of 2.0 contributes 16 to the total, whereas the same dish rated 8 on sweetness with a weight of 0.5 only contributes 4. So high weight amplifies that feature's influence on the final score. It's not added, it's multiplied — which means a strong preference doesn't just nudge the result, it can dominate it.

---

**Andrew:** The only issue is that you get sick of a certain type of food so you need to consider how frequently you consume it too

**Claude:** Good point — that's called a recency penalty. You'd add a frequency factor to each dish:

// Track how recently and often you ate something similar
recency_score = days_since_last_eaten / 7    // fades over time
frequency_penalty = times_eaten_this_month * 0.2

// Apply before scoring
adjusted_score = predict_score(dish) * recency_score - frequency_penalty

So a dish you had yesterday scores lower regardless of how much you normally like it, but the penalty fades as time passes. This is actually what Spotify and Netflix do — they suppress things you've just consumed even if they match your profile perfectly, otherwise you'd get the same thing on repeat. The interesting implication is that preferences aren't static — your ideal meal right now depends on what you've eaten all week.

---

**Andrew:** That's exactly what Samuel's did too

**Claude:** Yes — the board position he'd just come from influenced how he valued the current one. You can't evaluate a position in isolation, you need the history. Your food model and his checker model are structurally the same problem. Context and history modify the raw feature scores. That's a deeper insight than it might seem — it suggests that pure preference learning without memory is always going to be incomplete.

---

**Andrew:** I forgot food temperature and serving medium and if the appropriate utensils (sharp steak knife) is used too

**Claude:** All valid features — and they interact with each other too. These are almost threshold features rather than gradual ones — below a certain temperature or without the right utensil the score collapses regardless of everything else. So you might model them differently:

// Hard thresholds - if failed, cap the total score
IF temperature < acceptable_range:
    final_score = min(final_score, 3.0)

IF wrong_utensil:
    final_score = min(final_score, 4.0)

// Serving medium is more gradual
final_score += serving_medium_score * WEIGHTS[serving_medium]

So they act more like gates than contributors. Get them wrong and nothing else matters much.

---

**Andrew:** Now that I can apply the principle to something tangible, it's not so difficult

**Claude:** That's exactly how it should work — the abstraction only makes sense once it's grounded in something you actually experience. And you've just independently discovered why good AI training data needs to be rich and contextual rather than just voluminous. Your food model now has temperature, utensils, recency, serving medium, your specific anosmia — a generic food preference model would miss all of that. Samuel figured that out with checkers in 1959.

---

**Andrew:** And the big one - am I hungry?

**Claude:** The master gate — trumps everything else. No amount of perfect temperature, ideal saltiness or correct utensils overcomes not being hungry:

IF hunger_level < threshold:
    RETURN "don't eat"

// Only proceed to scoring if genuinely hungry
final_score = predict_score(dish)

It's not even a feature that feeds into the score — it's the precondition for the whole model running at all. Which maps back to neuroscience too — satiety signals suppress the entire reward pathway. Dopamine response to food drops dramatically when you're full regardless of how appealing the stimulus is. Your brain has the same gate.

---

**Andrew:** So we have just built a simple machine learning model

**Claude:** Yes — and it has all the key components:

- **Features** — the measurable inputs
- **Weights** — reflecting personal preference
- **A learning loop** — adjusting from feedback
- **Seeded starting weights** — your known salt preference
- **Threshold gates** — temperature, utensils, hunger
- **Recency penalty** — avoiding repetition
- **History context** — like Samuel's board memory

Built it by just thinking through a real problem rather than studying theory. That's the right way around.

---

*End of transcript*

dopetalk

News:

dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse

Author Topic: ML Example Code: Samuel’s Checkers in Python and Food Preference in Psuedocode (Read 397 times)

smfadmin (OP)

ML Example Code: Samuel’s Checkers in Python and Food Preference in Psuedocode

Related Topics

Need help or a chat ?

If you need any help or a chat then IM/PM or email me, Chip

dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse