dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse


Our Discord Notification Server invitation link is https://discord.gg/jB2qmRrxyD

Author Topic: Transformers  (Read 144 times)

Online Chip (OP)

  • Server Admin
  • Hero Member
  • *****
  • Administrator
  • *****
  • Join Date: Dec 2014
  • Location: Australia
  • Posts: 7148
  • Reputation Power: 0
  • Chip has hidden their reputation power
  • Gender: Male
  • Last Login:Today at 10:52:40 PM
  • Deeply Confused Learner
  • Profession: IT Engineer now retired
Transformers
« on: May 24, 2026, 08:40:42 AM »
i=3X11ycPLbdLFs4gd
friendly
0
funny
0
informative
0
agree
0
disagree
0
like
0
dislike
0
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
Our Discord Server invitation link is https://discord.gg/jB2qmRrxyD

Online Chip (OP)

  • Server Admin
  • Hero Member
  • *****
  • Administrator
  • *****
  • Join Date: Dec 2014
  • Location: Australia
  • Posts: 7148
  • Reputation Power: 0
  • Chip has hidden their reputation power
  • Gender: Male
  • Last Login:Today at 10:52:40 PM
  • Deeply Confused Learner
  • Profession: IT Engineer now retired
Re: Transformers
« Reply #1 on: Today at 03:53:03 PM »
Bridge: From a basic neural network to a Transformer

1. Take your simple model (feed-forward neural network)

Structure
Code: [Select]
Input → Layer → Layer → Layer → Output

How it works
- Each layer applies a fixed transformation:
Code: [Select]
h = activation(Wx + b)
- Information flows strictly forward
- Each step only sees the output of the previous step

Key limitation
- Inputs are mostly processed through fixed mixing
- Relationships between tokens/features are implicit and static
- No mechanism for directly selecting what matters from other inputs

---

2. What changes in a Transformer

The transformer replaces:
- “stacked feature extraction”
with:
- “explicit interaction between all elements”

---

Core new mechanism: Attention

Instead of only:
Code: [Select]
h = activation(Wx + b)

we compute:

Code: [Select]
Attention(Q, K, V)

Where:
- Q = Query (what this token is looking for)
- K = Key (what each token offers)
- V = Value (the information carried)

---

3. Transformer data flow

Step 1: Input tokens
Code: [Select]
"The cat sat on the mat"

Converted into embeddings:
Code: [Select]
Token → Vector

---

Step 2: Create Q, K, V

Each token is projected into three roles:
Code: [Select]
X → Q (what I want)
X → K (what I am)
X → V (what I contain)

---

Step 3: Attention computation

Each token compares itself with every other token:

Code: [Select]
similarity = Q · K
weights = softmax(similarity)
output = Σ (weights × V)

Result:
- Each token becomes a weighted mixture of all other tokens
- Information is dynamically shared across the whole sequence

---

Step 4: Multi-head attention

Instead of one attention mechanism:
- multiple heads run in parallel

Each head can specialise:
- syntax relationships
- semantic meaning
- long-range dependencies

---

Step 5: Feed-forward layer

Still present, but now operating on context-aware vectors:

Code: [Select]
FFN(x) = activation(W2 · activation(W1x))

---

Step 6: Residual connections + normalization

Residual connection
Code: [Select]
x = x + Attention(x)

Layer normalization
- stabilises training
- prevents exploding/vanishing values

---

4. Full Transformer pipeline

Code: [Select]
Tokens
 ↓
Embeddings
 ↓
Self-Attention
 ↓
Feed-Forward Network
 ↓
(repeat N layers)
 ↓
Output logits

---

5. Key conceptual difference

Feed-forward network
- fixed, layered computation
- local feature mixing
- no explicit interaction between inputs

Transformer
- every token interacts with every other token
- relationships computed dynamically per input
- computation is based on context, not just depth

---

6. One-line summary

Transformers replace:
“stacked transformations of data”

with:
“dynamic, fully connected interaction between all elements at every layer”

Generated by ChatGPT
friendly
0
funny
0
informative
0
agree
0
disagree
0
like
0
dislike
0
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
Our Discord Server invitation link is https://discord.gg/jB2qmRrxyD

Tags:
 

Related Topics

  Subject / Started by Replies Last post
0 Replies
172 Views
Last post May 23, 2026, 01:14:47 PM
by smfadmin


dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse





TERMS AND CONDITIONS

In no event will d&u or any person involved in creating, producing, or distributing site information be liable for any direct, indirect, incidental, punitive, special or consequential damages arising out of the use of or inability to use d&u. You agree to indemnify and hold harmless d&u, its domain founders, sponsors, maintainers, server administrators, volunteers and contributors from and against all liability, claims, damages, costs and expenses, including legal fees, that arise directly or indirectly from the use of any part of the d&u site.


TO USE THIS WEBSITE YOU MUST AGREE TO THE TERMS AND CONDITIONS ABOVE


Founded December 2014
SimplePortal 2.3.6 © 2008-2014, SimplePortal