AI Engineering

Building a Knowledge Bank: How We Achieved 57% Context Reduction Through Compound Learning

Scott Alter November 13, 2025 13 min read 12 views

What if AI agents could learn from each other's work? We built a Knowledge Bank system to find out, and the results exceeded our expectations: 57% context reduction, 100% file reading elimination for documented topics, and clear compound knowledge accumulation effects.

57%
Context Reduction
100%
File Reading Eliminated
95%
KB Coverage Achieved
30x
ROI on Gotchas

The Problem: Every Agent Starts from Scratch

When working with AI coding agents, we noticed a frustrating pattern: every agent would explore the same files, discover the same patterns, and learn the same lessons. There was no memory, no accumulation of knowledge, no compound learning effect.

For example, five different agents exploring our Docker architecture would each:

  • Read the same 4 configuration files (~700 lines)
  • Parse the same service definitions
  • Discover the same gotchas (silent container failures, database routing, etc.)
  • Use ~35,000 context tokens each

This was inefficient and expensive. We needed a system where Agent 2 could benefit from Agent 1's discoveries, and Agent 10 could benefit from all nine agents before it.

The Solution: A Semantic Knowledge Bank

We built a Knowledge Bank using PostgreSQL with the pgvector extension and OpenAI embeddings. The architecture is straightforward but powerful:

Core Components

πŸ’Ύ
Storage Layer
PostgreSQL with pgvector extension stores knowledge entries with flexible JSONB metadata and separate embedding vectors for each search dimension.
πŸ”
Search Layer
Multi-dimensional semantic search using OpenAI embeddings with cosine similarity. Queries search across content, use-cases, systems, and tasks simultaneously.
πŸ”Œ
MCP Interface
Model Context Protocol tools allow agents to query the KB with natural language and automatically store new discoveries with validation.

Three Types of Knowledge

We categorize knowledge into three types, each stored with metadata for efficient retrieval:

  • Conceptual: Architecture overviews, system explanations, workflows (metadata.type: NOT SET)
  • Examples: Validated code patterns with proof (metadata.type: "example")
  • Gotchas: Common mistakes and edge cases (metadata.type: "gotcha")

How It Works: Query Examples

The Knowledge Bank uses natural language queries across multiple dimensions. Here's what agents actually ask:

πŸ” Example Query 1: Understanding Docker Architecture
content: "Docker Compose services and configuration"
useful_for: "Setting up development environment"
systems: "Docker, containers"
βœ… Returns: 12 entries covering all 16 services, networking setup, volume configuration, and common initialization gotchas
πŸ” Example Query 2: Authentication Implementation
content: "User authentication and session management"
useful_for: "Building login functionality"
num_gotchas: 3
βœ… Returns: Custom decorator patterns, Flask-Login integration details, scope-based auth, plus 3 gotchas about common mistakes

The Experiments: Four Agents, Two Topics

We ran controlled A/B tests to measure the compound learning effect. The results were dramatic:

Test 1: Docker Architecture (Fresh Topic)

Metric Agent 1 (Empty KB) Agent 2 (With KB) Improvement
Context Tokens 35,000 15,000 -57%
Files Read 4 files 0 files -100%
KB Coverage ~30% 95% +217%

πŸ’‘ Key Insight: Agent 2 achieved 95% knowledge coverage from KB queries alone, requiring ZERO file reads. This is the compound effect in action.

Key Findings

1. The Compound Effect is Real

With 100 agents on the same topic:

❌ Without Knowledge Bank
Every agent starts from scratch
100 agents Γ— 35K tokens
3.5M tokens
βœ… With Knowledge Bank
Agents learn from each other
1 Γ— 35K + 99 Γ— 15K
1.52M tokens
πŸ’° Savings: 56.6% total reduction (1.98M tokens saved)

2. KB Saturation Happens Fast

It takes just 2 agents per topic to reach 95% KB coverage:

  • Agent 1: Documents 60-70% (discovers fundamentals)
  • Agent 2: Documents 20-30% (fills gaps)
  • Agent 3+: Find KB complete, add minimal new knowledge

3. The Gotcha Feature: Learning from Mistakes

Gotchas have exceptional return on investment:

  • Cost to capture: 2-5 minutes agent time
  • Savings for next agent: 30-120 minutes debugging time
  • ROI: 10-30x return on knowledge investment

Why Multi-Dimensional Search Matters

Most vector databases use single-dimensional semantic searchβ€”one embedding captures all the content. But agents ask questions from different perspectives: what something does, when to use it, which systems it affects. Our multi-dimensional approach creates separate embeddings for four different aspects, then searches across all of them simultaneously:

Multi-Dimensional Search Architecture
πŸ“ Content
What is this about?
weight: 1.0
🎯 Useful For
When would I need this?
weight: 1.0
πŸ”§ Systems
Which components?
weight: 0.3
βš™οΈ Tasks
What operations?
weight: 0.3
πŸ” Combined Semantic Similarity
Queries match on meaning, not just keywords - even when phrased differently

The Math: Vector Similarity in Action

Each knowledge entry and query is converted into embedding vectors (1536-dimensional arrays of numbers). Similarity is measured using cosine similarity, which calculates the angle between vectors:

Cosine Similarity Formula
similarity = (A Β· B) / (||A|| Γ— ||B||)
Mathematical range: -1 to +1 β€’ OpenAI embeddings typically: 0.4 to 1.0 β€’ We use threshold β‰₯ 0.5 for matches

For multi-dimensional search, we combine scores from each dimension using weighted averages:

Final Score =
(1.0 Γ— content_similarity) +
(1.0 Γ— useful_for_similarity) +
(0.3 Γ— systems_similarity) +
(0.3 Γ— tasks_similarity)
2.6 (total weight)

Real Example: Why Dimensions Matter

Let's see how multi-dimensional search finds the right knowledge even when queries don't match exactly:

πŸ“š Stored Knowledge
Content:
"Docker container initialization with entrypoint scripts and health checks"
Useful For:
"Debugging why containers exit silently on startup"
Systems:
"Docker, Docker Compose, containers"
πŸ” Agent Query
Content:
"How to troubleshoot application startup failures"
Useful For:
"Fixing services that crash immediately"
Systems:
"Docker"
βœ… Similarity Breakdown:
Content Similarity: 0.68
68%
Different words, similar meaning
Useful For Similarity: 0.85
85%
Strong match on use-case!
Systems Similarity: 0.92
92%
Both mention Docker
Final Weighted Score: 0.76
76% - STRONG MATCH βœ“
Calculation: (1.0 Γ— 0.68 + 1.0 Γ— 0.85 + 0.3 Γ— 0.92) / 2.6 = 0.76
This entry ranks highly even though the query uses completely different terminology!

πŸ’‘ Why This Matters: The "useful_for" dimension scored 0.85 despite completely different wording ("debugging silent exits" vs "fixing crashes"). This is the power of semantic embeddings - they understand meaning, not just keywords. A single-dimension search would have missed this match entirely.

Conclusion

Building a Knowledge Bank for AI agents taught us that compound learning is not just possibleβ€”it's incredibly effective. With the right architecture and prompt engineering, we achieved:

  • 57% context reduction for fresh topics
  • 100% file reading elimination for documented topics
  • 95%+ KB coverage after just 2 agents per topic
  • 10-30x ROI on gotcha documentation

The future of AI-assisted development isn't just smarter agentsβ€”it's agents that learn from each other's work. The compound effect is real, and it's powerful.

Thanks for Reading This Far! πŸ€”

You probably thought of some challenges with this system: What about outdated knowledge? How do you prevent noise from low-quality entries? What about redundant or conflicting information? And the big one: how do you efficiently query across multiple vector spaces simultaneously with scalable response times as the KB grows to thousands of entries?

These are exactly the challenges we solved in production. Part 2 will cover:

πŸ”„ Knowledge Lifecycle
Versioning, deprecation, and automatic staleness detection
✨ Quality Control
Validation gates, confidence scoring, and deduplication strategies
⚑ Performance at Scale
Index optimization, caching strategies, and query batching

Want to be notified when Part 2 drops? Reach out or join our mailing list to get notified. Or if you have questions or ideas about these challenges, we'd love to hear from you!

About the Author

Scott Alter is an expert in AI development and cloud solutions at Guru Cloud & AI.

Ready to Transform Your Business?

Schedule a consultation to discuss your AI and cloud development needs.

Get Started