The Agent Mesh: A Passive Solution for Multi-Agent Learning and Collaboration

The fundamental problem with knowledge banks for AI agents isn't the knowledge—it's engagement. You can build the most comprehensive knowledge base in the world, but if agents don't consistently query it, the knowledge sits idle. That was the problem that led us to build the Agent Mesh.

We had a well-populated Knowledge Bank. Agents could query it. Sometimes they did. But "sometimes" isn't good enough when you're running dozens of autonomous agents across multiple codebases. The engagement problem—agents not reliably pulling relevant context from the KB—meant that institutional knowledge was being rediscovered from scratch, session after session.

The deeper issue is that you don't know what you don't know. How does an agent know what to search for if it hasn't encountered the problem yet? Traditional knowledge retrieval assumes the user (or agent) can formulate a good query. But the most valuable knowledge is often the kind you need before you realize you need it—the gotcha that saves you two hours of debugging, the configuration detail that prevents a production incident. By the time the agent encounters the problem and thinks to search, it's already wasted context rediscovering what a previous agent already learned.

When Agent A spends two hours investigating a codebase and learns that CloudSQL requires a proxy on port 15432, or that the auth system has two distinct patterns for web routes versus API routes—that knowledge evaporates when the session ends. Agent B, starting a similar task thirty minutes later, rediscovers the same facts from scratch. The agent was smart. The system was amnesiac.

The initial motivation for the Agent Mesh was simple: push knowledge to agents instead of waiting for them to pull it. Make the Knowledge Bank proactive rather than passive. If we can predict that a problem is coming based on an agent's behavioral trajectory, we can surface the solution before the problem fully manifests. But once we built the infrastructure for observing agent behavior in real time—capturing actions, embedding them, matching trajectories—two additional capabilities fell out almost for free: trajectory-based RAG (matching knowledge not by query similarity but by behavioral patterns) and multi-agent collision detection (spotting when two agents are working on overlapping problems and suggesting they coordinate).

What started as a solution to the engagement problem became a platform for passive multi-agent learning and collaboration. Here's how it works.

The Two Layers of the Mesh

The Agent Mesh operates in two layers, distinguished by how much the agent knows about it.

Layer 1: Active Mesh

Opt-in cooperation via the Knowledge Bank. Agents explicitly call mesh_narrate() to share discoveries with the Knowledge Bank and mesh_send_message() to communicate directly with other agents.

The agent decides what to share, when to share it, and who should receive it. This is straightforward but requires the agent to be aware of the mesh—and relies on it actually engaging with the KB.

Layer 2: Passive Mesh

Ambient awareness. Claude Code hooks fire automatically after every tool call and session end. The mesh observes actions, embeds them, and surfaces relevant knowledge—all invisibly.

The agent works naturally. The mesh learns from observation. No cooperation required.

The passive layer is where context engineering gets interesting. It's where the system stops relying on pre-configured retrieval and starts learning what agents need from how they behave.

Passive Observation: What the Hooks Capture

Every Claude Code session with the mesh enabled runs two hooks: a PostToolUse hook that fires after each tool call, and a Stop hook that fires when the agent finishes a turn.

The PostToolUse hook captures something specific: the agent's self-narration. Not the raw tool input ("Reading file: /app/models.py") but the agent's own description of what it's doing and why ("Investigating the ORM models to understand lazy loading behavior in the data access layer").

Why narration over metadata? Tool metadata tells you what happened. Agent narration tells you why. An agent reading a file might be investigating a bug, learning an architecture, or preparing a refactor. The same Read call has completely different context engineering implications depending on intent. Narration captures intent. Metadata doesn't.

Each captured action goes through two paths simultaneously:

Embedding and storage: The action text gets embedded (OpenAI's text-embedding-3-small[5]) and stored in a pgvector-indexed[4] table. This feeds both overlap detection and future pattern matching.
Sequence window: The action is appended to a Redis sliding window of the agent's last 4 actions. This sliding window is the input to trajectory matching.

The hooks are debounced—8 seconds between API calls for PostToolUse, 15 seconds for Stop. This prevents flooding the backend while maintaining responsiveness. The entire observation pipeline has a latency budget of about one second, because context has to be returned before the agent's next action.

Trajectory-Aligned Pattern Matching

This is the core idea. Traditional RAG[6] retrieves knowledge based on semantic similarity to a query. The Agent Mesh retrieves knowledge based on semantic similarity to a sequence of actions—a trajectory.

The insight is simple: agents working on similar problems follow similar paths. An agent investigating database performance tends to look at query plans, then check connection pool settings, then examine index configurations. If a previous agent followed that same trajectory and discovered something important along the way, the mesh can surface that discovery to the current agent at the right moment—not because someone configured a retrieval rule, but because the behavioral patterns match.

How Patterns Are Stored

A sequence pattern is a group of 2–4 action descriptions linked to a knowledge bank entry. Each action is stored with its own embedding and a position index within the group.

Example Sequence Pattern (4-step)

"Investigating database performance issues in production"

"Examining CloudSQL proxy configuration and connection routing"

"Checking connection pool settings against thread count"

"Analyzing query execution plans for slow endpoints"

Linked KB Entry:

CloudSQL pool sizing must match Gunicorn thread count

How Matching Works

When the mesh needs to find relevant context for a working agent, it runs a four-phase pipeline. The agent's last N actions are embedded using text-embedding-3-small,[5] then scanned against the kb_sequence_patterns table using pgvector[4] to find candidate groups with cosine similarity above 0.40:

Embed
N actions → vectors

→

Scan
pgvector ANN search

→

Align
Per-position similarity

→

Deliver
Score, dedup, inject

For each candidate group, the agent's recent actions are aligned against the pattern's positions. Per-position cosine similarity is computed with recency weighting—later positions count more because they represent where the agent is, not where it was. Candidates are scored, filtered by threshold, checked against the delivery deduplication cache, and the top matches are injected as context. The full complexity analysis is in the Computational Complexity section below.

Recency Weighting: Why It Matters

The scoring algorithm doesn't treat all positions equally. A weighted area calculation gives later actions in the sequence more influence over the final score:

# Per-position scoring with recency weighting
for i in range(len(similarities) - 1):
    distance_i = abs(1.0 - similarities[i])
    distance_next = abs(1.0 - similarities[i + 1])
    segment_area = (distance_i + distance_next) / 2.0
    weight = i + 1  # Later positions weighted higher
    weighted_area += segment_area * weight
    total_weight += weight

score = 1.0 - (weighted_area / total_weight)
            

This captures an important intuition: if the agent is currently "Checking connection pool settings" and a stored pattern ends with "Checking connection pool settings"—that's a much stronger signal than matching only on the first action in the sequence. Matching where the agent is going matters more than matching where it's been.

Figure 1 below projects the trajectory alignment into 2D space for visual intuition—you can see the shaded area between the agent's path and the stored pattern, with later segments weighted more heavily. Figure 2 shows the formal generalization: the same algorithm operating on 1536-dimensional embedding vectors, with the complete mathematical derivation and a worked numerical example.

2D projection of 4-step trajectory alignment showing weighted area between agent and stored pattern paths, with recency weighting.

Figure 1: A 4-step trajectory alignment projected into 2D for intuition. The shaded area between agent and pattern trajectories represents distance, weighted by recency (w=1, 2, 3 for successive segments). The 2D projection is illustrative; actual computation operates on 1536-dimensional embedding vectors.

Figure 2: Formal Generalization — Trajectory Scoring in ℝ¹⁵³⁶

Definitions

Agent trajectory embedding at position i: a_i ∈ ℝ¹⁵³⁶
Stored pattern embedding at position i: p_i ∈ ℝ¹⁵³⁶
Trajectory length: n positions (typically 2–4)

Per-Position Cosine Similarity

sim(a_i, p_i) = (a_i · p_i) / (||a_i|| · ||p_i||)

Per-Position Distance

d_i = |1 − sim(a_i, p_i)|

Recency-Weighted Trapezoidal Area

w_i = i + 1 (later positions weighted higher)

weighted_area = ∑_i=0ⁿ⁻² w_i · (d_i + d_i+1) / 2

Final Trajectory Score

score = 1 − weighted_area / ∑_i=0ⁿ⁻² w_i

Worked Example

Similarities: [0.82, 0.75, 0.68, 0.71] → Distances: [0.18, 0.25, 0.32, 0.29]

Weights: w₀=1, w₁=2, w₂=3

weighted_area = 1·(0.18+0.25)/2 + 2·(0.25+0.32)/2 + 3·(0.32+0.29)/2 = 1.70

score = 1 − 1.70/(1+2+3) = 1 − 0.283 = 0.717

Scoring Example (4-step)

Agent: "Checking database performance benchmarks"

sim: 0.82

Agent: "Examining CloudSQL configuration in production"

sim: 0.75

Agent: "Investigating connection pooling settings"

sim: 0.68

Agent: "Analyzing query execution plans for slow endpoints"

sim: 0.71

Trajectory Score:

0.717

Above threshold → deliver

Computational Complexity: Query Time and Order

Trajectory matching runs on every agent action, so its computational cost directly constrains latency. Here's the breakdown by phase:

Phase 1
Embed

O(N) — Embed the agent's last N actions (typically N=4). Each embedding call is a single API request to text-embedding-3-small. In practice, actions are batched into one call: O(1) API round-trips, constant wall-clock time (~50ms).

Phase 2
Scan

O(N · log P) — For each of the N query embeddings, perform a pgvector approximate nearest-neighbor search over P stored pattern actions. pgvector's IVFFlat index provides sub-linear scan time. With a candidate limit of 10 groups per embedding, the scan is bounded at O(N · 10) = O(N) candidates collected.

Phase 3
Align

O(G · N) — For each candidate group G (at most ~40 after deduplication), fetch the group's action embeddings and compute per-position cosine similarity against the agent's N embeddings. Cosine similarity between two 1536-dim vectors is O(d) where d=1536, so alignment is O(G · N · d). With G ≤ 40 and N ≤ 4, this is effectively constant.

Phase 4
Score & Deliver

O(G · N) — Compute the recency-weighted trapezoidal area for each candidate (N−1 segments per group). Sort by score. Check Redis deduplication hash. Deliver top 3. The scoring loop over k = N−1 segments is O(k) per group—trivially fast.

End-to-end per action: The dominant cost is the embedding API call (~50ms) and the pgvector scan (~10–30ms depending on index size). Alignment and scoring are CPU-bound and sub-millisecond. Total pipeline latency stays under 100ms for pattern tables up to ~100K entries, well within the 1-second budget between agent actions. The system scales with the number of stored patterns P logarithmically (via the ANN index), not linearly.

The Feedback Loop: How the Mesh Learns

Trajectory patterns don't appear from nothing. They're created through a self-reinforcing feedback loop that is, itself, an exercise in automated context engineering.

Schematic diagram of the Mesh Knowledge Loop: Agent Works → Knowledge Surfaces → Session Ends → Analyst Agent Runs → Patterns Created → cycle repeats

Figure 3: The Mesh Knowledge Loop. As agents work, the mesh surfaces relevant knowledge from previous sessions (step 2). When the session ends, an analyst agent reviews the transcript (step 4), extracts permanent knowledge, and creates new trajectory patterns (step 5) that will surface for future agents.

The analyst agent (step 4) is the judge—it reviews each session transcript and decides what constitutes permanent, reusable knowledge versus transient noise. It's instructed to extract intent-based narrations, not mechanical descriptions. "Investigating the ORM models to understand lazy loading" is a useful trajectory action. "Reading file: /app/models.py" is not. The quality of the mesh depends on the quality of its trajectory descriptions—which is why the system prefers agent self-narration over tool metadata at every stage.

Why This Is Different from RAG

The distinction is worth making explicit. Retrieval-Augmented Generation retrieves documents based on query similarity. The Agent Mesh retrieves knowledge based on behavioral trajectory similarity. The differences cascade:

Traditional RAG

Query-driven. Agent (or user) formulates a question. System retrieves relevant documents.

Reactive. Knowledge arrives after the agent knows it needs something.

Single-point matching. One query → one retrieval. No sequence awareness.

Configuration-heavy. Requires chunking strategy, embedding model selection, top-k tuning, reranking.

Trajectory Matching

Behavior-driven. Agent works naturally. System infers what knowledge is relevant from the pattern of actions.

Proactive. Knowledge arrives before the agent knows it needs it—or ever will.

Sequence-aware. Multiple actions aligned against stored trajectories. Temporal ordering matters.

Self-improving. Each session feeds back patterns that improve future sessions.

To be clear: the mesh also performs traditional KB search as a complement to trajectory matching. Both systems run in parallel on every action. But trajectory matching is what makes the mesh proactive—surfacing knowledge the agent hasn't thought to search for yet.

Multi-Agent Collision Detection

The passive mesh doesn't just surface historical knowledge—it also watches for real-time overlaps. When two agents are working on related parts of the codebase simultaneously, the mesh detects this through embedding similarity on their activity descriptions.

When Agent A is "investigating the auth middleware for API routes" and Agent B starts "examining authentication patterns in the route handlers," their activity embeddings land close together in vector space. The mesh flags the overlap and notifies both agents, suggesting they coordinate using the mesh_send_message() tool to avoid duplicating work or creating merge conflicts.

Flow diagram showing multi-agent collision detection: Agent A and Agent B working on overlapping areas, mesh detects embedding similarity, both agents notified and suggested to coordinate via mesh_send_message.

Figure 4: Multi-agent collision detection. The mesh passively detects when agents are working on overlapping problems and suggests they coordinate directly, preventing duplicate work and merge conflicts.

This is different from traditional orchestration, which assigns work top-down. Collision detection is bottom-up: agents work independently, and the mesh observes when their trajectories converge. No central coordinator needed. The agents choose whether and how to coordinate—the mesh just makes them aware of each other.

What We've Learned Building This

Running the Agent Mesh in production across multiple repositories has taught us several things that aren't obvious from the architecture alone.

Narration quality is everything. The entire system depends on agents producing useful self-narrations. When narration is mechanical ("Reading file X"), trajectory matching degrades to glorified file-based search. When narration is intent-rich ("Investigating lazy loading behavior to understand N+1 query patterns"), the trajectories become genuinely predictive. We strip tool prefixes and prefer assistant-generated descriptions over tool metadata for this reason.

Patterns need curation, not just creation. The mesh_kb_build automation generates patterns automatically, but not all generated patterns are useful. Some trajectories are too generic ("reading code, then writing code") and match everything. Low-utilization patterns need to be pruned over time. Context utilization tracking provides the signal for this, but the pruning itself is still a human decision.

Latency is the hard constraint. The entire observation-embed-match-deliver pipeline has to complete within the time between one agent action and the next. If context arrives too late, it's irrelevant. We run async processing in a background thread with a 1-second long-poll timeout, and pattern queries use pgvector indexes with window limits (4 recent actions, 10 candidate groups per embedding) to stay fast. Every optimization in the system is a latency trade-off.

Context Engineering as an Emerging Discipline

Anthropic, Google, and others have begun articulating context engineering as a distinct discipline—separate from prompt engineering, separate from fine-tuning.[1][7] The framing is right: what lands in the context window determines agent behavior more reliably than any other factor.[8]

But most current approaches still operate in what we'd call static context engineering: you configure system prompts, design RAG pipelines, set up tool descriptions, and tune retrieval parameters. The context shape is decided at design time and stays fixed during execution.

The Agent Mesh represents a step toward dynamic context engineering—systems that observe agent behavior in real time and adapt what context is available based on what the agent is actually doing. The context shape changes during execution, driven by the agent's trajectory.

The Shift Ahead

Static context engineering asks: "What information might this agent need?" Dynamic context engineering asks: "Given what this agent has done in the last four actions, what will it need next?" The trajectory is the signal. The context is the response.

We don't think this replaces RAG or careful prompt design. Those remain essential. But trajectory-based context delivery adds a layer that's predictive rather than reactive, learned rather than configured, and collaborative rather than isolated. As agents become longer-running and more autonomous, this layer becomes increasingly important.

The feedback loop—sessions generating patterns that improve future sessions—is what makes this compound. Each repository that opts into the mesh becomes a richer knowledge environment over time. The context engineering isn't a one-time design effort. It's a continuously learning system.

We're still early. Pattern quality varies. Latency constraints limit matching depth. Utilization tracking provides signal but not certainty. But the trajectory is clear: context engineering will evolve from something you configure to something that learns. The Agent Mesh is our experiment in making that transition concrete.

References

Anthropic Applied AI Team. "Effective context engineering for AI agents." Anthropic Engineering Blog, September 2025. anthropic.com/engineering/effective-context-engineering-for-ai-agents
Ji, Y. ("Peak"). "Context Engineering for AI Agents: Lessons from Building Manus." Manus AI Blog, 2025. manus.im/blog/Context-Engineering-for-AI-Agents
Anthropic Engineering. "Effective harnesses for long-running agents." Anthropic Engineering Blog, November 2025. anthropic.com/engineering/effective-harnesses-for-long-running-agents
Kane, A. et al. "pgvector: Open-source vector similarity search for Postgres." github.com/pgvector/pgvector
OpenAI. "New embedding models and API updates." OpenAI Blog, January 2024. openai.com/index/new-embedding-models-and-api-updates
Lewis, P. et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. arxiv.org/abs/2005.11401
Google. "Agent Development Kit: Making it easy to build multi-agent applications." Google Developers Blog, 2025. developers.googleblog.com/agent-development-kit
Lutke, T. "I really like the term 'context engineering' over prompt engineering. It describes the core skill better." X (Twitter), June 2025. x.com/tobi/status/1935533422589399127

The Agent Mesh: A Passive Solution for Multi-Agent Learning and Collaboration

The Two Layers of the Mesh

Layer 1: Active Mesh

Layer 2: Passive Mesh

Passive Observation: What the Hooks Capture

Trajectory-Aligned Pattern Matching

How Patterns Are Stored

Example Sequence Pattern (4-step)

How Matching Works

Recency Weighting: Why It Matters

Figure 2: Formal Generalization — Trajectory Scoring in ℝ1536

Scoring Example (4-step)

Computational Complexity: Query Time and Order

The Feedback Loop: How the Mesh Learns

Why This Is Different from RAG

Traditional RAG

Trajectory Matching

Multi-Agent Collision Detection

What We've Learned Building This

Context Engineering as an Emerging Discipline

The Shift Ahead

References

Keep Reading

Transform Your Business with AI

Figure 2: Formal Generalization — Trajectory Scoring in ℝ¹⁵³⁶