dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse


Our Discord Notification Server invitation link is https://discord.gg/jB2qmRrxyD

Author Topic: Context Windows and Memory  (Read 6 times)

Online Chip (OP)

  • Server Admin
  • Hero Member
  • *****
  • Administrator
  • *****
  • Join Date: Dec 2014
  • Location: Australia
  • Posts: 7149
  • Reputation Power: 0
  • Chip has hidden their reputation power
  • Gender: Male
  • Last Login:Today at 11:27:06 PM
  • Deeply Confused Learner
  • Profession: IT Engineer now retired
Context Windows and Memory
« on: Today at 09:49:58 PM »


Context Windows and Memory

This is where most confusion about chatbot “memory” comes from.

LLMs don’t have human memory. They operate on a fixed-size input buffer called a context window.

---

1. What context windows are

A context window is the maximum number of tokens the model can “see” at once.

Everything the model uses to generate an answer must fit inside this window:

Code: [Select]
System prompt + conversation + user input + tool outputs

If something is outside the window, it does not exist to the model.

---

2. Token limits

Context windows are measured in tokens, not words.

Typical ranges:

  • Small models: a few thousand tokens
  • Modern LLMs: tens of thousands
  • Frontier systems: 100k+ tokens (sometimes more)

But regardless of size:

Code: [Select]
There is always a hard cutoff

Once full, older tokens must be removed or compressed.

---

3. Sliding attention windows

When the conversation exceeds capacity, systems use strategies like sliding windows.

This means:

  • Keep the most recent tokens
  • Drop or compress older tokens
  • Shift attention focus forward

So the model effectively “moves” its attention frame along the conversation.

Code: [Select]
[OLD CHAT] → dropped
[MID CHAT] → compressed or removed
[NEW CHAT] → fully visible

---

4. Conversation truncation

When the window fills up:

  • Old messages are cut off
  • Only recent context remains
  • Important early details can disappear

This is not forgetting in a human sense — it is literal data loss from the input.

If it’s not in the prompt anymore, it cannot be used.

---

5. Why models "forget"

Models do not have persistent memory during inference.

They "forget" because:

  • They never stored the conversation internally
  • They only operate on the current input tokens
  • Old context falls outside the window

So forgetting is not failure — it is architecture.

Code: [Select]
No context = no access = no memory

---

6. Persistent memory systems

Some systems add external memory layers on top of the model.

These may include:

  • Databases of user facts
  • Vector embeddings of past conversations
  • Summarised long-term memory stores

At runtime:

Code: [Select]
User query → memory retrieval → inject into context window

So memory is not inside the model — it is external and re-injected.

---

7. RAG vs memory

These are often confused but are different systems.

RAG (Retrieval-Augmented Generation):

  • Fetches external documents (web, database, files)
  • Uses embeddings to find relevant info
  • Injects retrieved content into context window

Purpose:
Code: [Select]
Ground answers in external knowledge

---

Memory systems:

  • Store user-specific or session-specific information
  • Designed for continuity across sessions
  • Often summarised or selectively saved

Purpose:
Code: [Select]
Remember user-specific context over time

---

Key difference

Code: [Select]
RAG = knowledge retrieval (external truth source)
Memory = user/context retention (personal continuity)

---

8. Key Insight

A chatbot does not have a mind that remembers.

It has:

Code: [Select]
A fixed-size working window + optional external retrieval systems

Everything it “knows” during a response must either:

  • Be inside the context window
  • Be retrieved via RAG
  • Be injected as memory

Otherwise, it does not exist for that moment of computation.
friendly
0
funny
0
informative
0
agree
0
disagree
0
like
0
dislike
0
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
No reactions
Our Discord Server invitation link is https://discord.gg/jB2qmRrxyD

Tags:
 

Related Topics

  Subject / Started by Replies Last post
8 Replies
30221 Views
Last post March 23, 2016, 04:08:58 PM
by Esoteric Anhydride
0 Replies
23039 Views
Last post July 02, 2019, 02:26:46 PM
by Chip
0 Replies
20461 Views
Last post September 26, 2019, 11:26:00 AM
by Chip
0 Replies
11896 Views
Last post January 12, 2025, 03:27:14 AM
by Chip
0 Replies
17505 Views
Last post February 14, 2025, 04:15:44 AM
by smfadmin
0 Replies
11153 Views
Last post February 26, 2025, 05:31:07 AM
by Chip
0 Replies
17469 Views
Last post April 01, 2025, 03:50:25 AM
by Chip
0 Replies
13782 Views
Last post April 12, 2025, 08:35:38 AM
by Chip
0 Replies
14739 Views
Last post April 27, 2025, 03:37:49 AM
by smfadmin
0 Replies
18617 Views
Last post July 13, 2025, 10:55:20 AM
by Chip


dopetalk does not endorse any advertised product nor does it accept any liability for it's use or misuse





TERMS AND CONDITIONS

In no event will d&u or any person involved in creating, producing, or distributing site information be liable for any direct, indirect, incidental, punitive, special or consequential damages arising out of the use of or inability to use d&u. You agree to indemnify and hold harmless d&u, its domain founders, sponsors, maintainers, server administrators, volunteers and contributors from and against all liability, claims, damages, costs and expenses, including legal fees, that arise directly or indirectly from the use of any part of the d&u site.


TO USE THIS WEBSITE YOU MUST AGREE TO THE TERMS AND CONDITIONS ABOVE


Founded December 2014
SimplePortal 2.3.6 © 2008-2014, SimplePortal