What agents remember.
Memory is the difference between a stateless script and an intelligent agent. We're testing every major memory architecture — in-context buffers, vector stores, structured entity memory, and hybrid routers — to find where each one breaks and what it actually costs to get recall right.
Four architectures under test.
In-Context Window Buffer
Naive approach: keep recent turns in context. Cheap and fast for short workflows. Degrades sharply past 8k tokens — context compression is required.
Vector Store Episodic Memory
Embed conversation summaries into a vector DB. Retrieve top-k on new query. Recall accuracy: 84% at k=3, 91% at k=6. Latency adds ~120ms per retrieve.
Structured Entity Memory
Extract named entities and facts from conversations, store as typed records. Agents query memory as a structured DB. Zero hallucinated facts in 200 test runs.
Hybrid Memory Router
Route memory reads to in-context, vector, or structured store based on query type. Classifier adds 18ms overhead but reduces retrieve cost by 60%.
What the data says.
Three findings with enough runs behind them to publish. More land in our weekly newsletter as we validate them.
Storing raw conversation turns verbatim produces recall rates of 61% at k=3. Summarising each turn into a structured fact bundle before embedding raises recall to 84%. The compression step costs ~200ms but pays off within the second retrieval.
In CRM-style agents (track a contact across 20 sessions), structured entity memory had zero hallucinated facts vs. 6.2% hallucination rate from vector-only retrieval. Cost: a Postgres row vs. an embedding API call per fact.
The biggest mistake we see is over-engineering memory for short workflows. If an agent handles tasks under 10 turns and doesn't need cross-session recall, in-context buffer + a conversation summary at session end is sufficient and 80% cheaper than a vector store.