April 11, 2026

RAG vs MCP: How to Build an AI Knowledge Base That Actually Stays Current

ai rag mcp openai chatbot knowledge-base

Every company has the same problem. Someone joins a project and asks “who’s worked on this before?” Someone evaluates a new tool and asks “does anyone have experience with it?” Someone inherits a codebase and asks “what tech stack did we use here?” These questions have answers. Those answers live somewhere. They just don’t live anywhere anyone can find them without pinging three people on Slack.

I built an internal AI assistant to fix this. It plugs into our existing tools - documentation, project management, Slack archives, meeting notes - and gives employees a single place to ask. No ticket, no DM, no waiting. Ask a question, get an answer with sources. This post is about what I learned building it: specifically when RAG works well, when it doesn’t, how MCP picks up where RAG leaves off, and what it actually takes to make the whole thing produce answers worth trusting.

What this covers

RAG vs MCP architecture diagram

RAG: right tool for knowledge that doesn’t change often

RAG - Retrieval-Augmented Generation - converts your documents into vector embeddings stored in a vector database. When a user asks a question, it gets embedded too, and the database returns the most semantically similar chunks. Those chunks get passed to the AI model as context. The model answers based on what it retrieved, not just what it was trained on.

The key word is semantically. RAG doesn’t do keyword matching. “Who handled the checkout integration?” will surface a Slack thread that says “payments module, let me know if you have questions” - because the embeddings understand they’re about the same thing. This is what makes it powerful for company knowledge.

RAG is the right choice when your data is relatively stable. Company wikis, onboarding documentation, project histories, archived Slack conversations, technical post-mortems, past client work - this content changes slowly. Embed it once, refresh it nightly, and it stays useful for weeks or months. The retrieval is fast, cheap, and works across all your sources simultaneously: one query hits everything.

For this project, the knowledge base pulls from a company wiki, Slack channel archives, project management tasks and docs, and meeting transcripts. A question like “who worked on the Shopify integration?” returns names from task history and Slack threads that would take 20 minutes to find manually. “What framework did we use for X?” pulls the answer from a documentation page or project thread. Questions that previously went to a manager now get answered in seconds.

We built this on top of OpenAI’s Responses API with their native vector store for embeddings and retrieval. The vector store handles incremental file syncing and chunking; the Responses API lets us attach both the vector store and MCP tools to a single model call, with the model deciding at inference time which to use. The setup is relatively lean - no separate orchestration layer needed.

The other thing RAG gets right: it forces answers to be grounded. The model can’t just make things up - it has to work from the retrieved context. When sources are shown alongside the answer, users can verify. That transparency builds trust faster than anything else.

Where RAG breaks down

RAG has one fundamental assumption baked in: the data was current when you embedded it.

That assumption holds fine for a company wiki that gets updated occasionally. It breaks badly for anything that changes frequently. Every time a document changes, you need to re-embed it and push the new version to the vector store. That’s manageable for dozens of files. It becomes a problem when you’re dealing with content that changes daily - or multiple times a day.

The more practical issue isn’t cost, it’s staleness. A nightly sync means your knowledge base is always up to 24 hours behind. For stable documentation, that’s fine. But for an active proposal, a contract in negotiation, or a Google Drive folder that three people are editing in real time - 24 hours is a long time. The bot will confidently answer from yesterday’s version.

Old information is the silent killer for these systems. A Slack thread from two years ago saying “we use Framework X for this” will surface even if you migrated away from it last quarter. The model retrieves it because it’s semantically relevant, not because it’s current. You can add timestamps to your chunks and tell the model to prefer recent sources, but it’s a patch, not a fix.

RAG also has no mechanism to know what it doesn’t know. If a question has no good sources in the vector store, most models will attempt an answer anyway using general training knowledge. That answer may sound reasonable and be completely wrong for your company’s context. A well-crafted system prompt can push back on this, but it remains a weak point.

MCP: for data that can’t wait for a nightly sync

MCP - Model Context Protocol - is an open standard that lets AI models call external tools at inference time. Instead of searching pre-embedded snapshots, the model queries live systems directly when it needs to.

The architectural difference matters. With RAG, you pre-process data into embeddings before any question is asked. With MCP, there’s no pre-processing at all - the model decides during a conversation that it needs to check a live source, calls the tool, gets the result, and incorporates it into the response. The data is always current because it’s fetched on demand.

MCP also doesn’t require your data to be unstructured text. RAG needs text to embed. MCP can work with structured APIs, spreadsheets, databases, or file systems - anything with a queryable interface. You don’t transform the data to fit a retrieval model; you access it as-is.

In this system, Google Drive is connected via MCP. When someone asks about an active document - a proposal still being drafted, a contract under review, a brief from this week - the model searches Drive directly rather than working from whatever was last indexed. The answer reflects what’s actually in the file right now.

The tradeoff: MCP calls add latency, and they’re only as broad as the tool allows. You can’t use MCP to semantically search across three years of Slack history efficiently - the API calls would be too slow and too expensive. MCP shines for targeted lookups against live systems. RAG shines for broad semantic search over large historical corpora. They’re not competing; they’re complementary.

RAG vs MCP: side by side

RAGMCP
Data freshnessSnapshot (nightly or on-demand sync)Live - fetched at query time
Best forLarge historical corpora, archived contentActive documents, structured APIs, live data
Semantic searchYes - finds conceptually related contentDepends on the tool’s API capabilities
Setup effortHigher - embedding pipeline, chunking strategy, sync logicLower - integrate an existing API or file system
Query latencyFast - pre-indexedSlower - live API call per query
Cost per queryLowVariable - depends on API pricing
Data typesUnstructured textText, structured data, spreadsheets, databases
Scales to large corporaYesCan get expensive at scale
Handles “I don’t know”Weak - may hallucinate if no good sourceDepends on tool - empty results are explicit
When to useDocs, Slack archives, meeting notes, wikisGoogle Drive, live databases, current proposals

The decision rule in practice: does a 24-hour-old answer give the user wrong information? If yes, that source belongs in MCP. If a nightly sync is good enough, it belongs in RAG.

The hybrid approach: let the model decide

In practice, you almost always want both. The question is: which retrieval mechanism should handle which type of question?

The good news is you don’t have to hard-code routing logic. Modern AI models - especially when given well-described tools - are surprisingly good at deciding when to use each. “Who worked on the e-commerce project two years ago?” will naturally pull from the vector store. “Is the Q1 proposal document ready to send?” will trigger a live Drive search. Many questions benefit from both: the model might pull historical context from the vector store and a current document from MCP, then synthesize an answer that draws on both.

When in doubt, RAG is cheaper, faster to set up, and easier to debug. Start there. Add MCP for the specific sources where freshness actually matters.

The system prompt: the hidden layer that shapes everything

Most writing about RAG and MCP focuses on retrieval mechanics. The system prompt gets less attention, but it has an outsized effect on answer quality. A few things we learned the hard way:

Require source citations explicitly. Without this instruction, the model will often blend sources without attributing them. Tell it to always cite the source name, file type, and date. Tell it what format citations should take. Users who can verify answers trust the system; users who can’t, don’t.

Instruct the model to prioritize recency. Left to its own judgment, the model doesn’t consistently weight recent documents over older ones. You have to say it explicitly: when multiple sources are relevant, prefer the most recent based on publication or modification date. This alone reduces stale-answer incidents significantly.

Tell it when to say “I don’t know.” The default behavior of most models is to produce something even when no good source exists. You need to explicitly permit - and encourage - the response “I don’t have reliable information on this.” Without that instruction, the system will hallucinate confidently. With it, users learn to trust the bot’s admissions as much as its answers.

Define tool-use transparency. When the model calls an MCP tool, instruct it to say so: which tool it used, which file it found, what date the file was last modified. This turns invisible behavior into visible behavior. Users understand why they’re getting a particular answer.

Describe your data sources clearly. The model needs to understand what each source contains to route questions correctly. A single paragraph describing each source type - what it holds, what format it’s in, how current it is - gives the model the context it needs to decide whether to hit the vector store, call an MCP tool, or say it doesn’t know.

Add hard security constraints. Documentation and knowledge bases should never contain credentials, tokens, or other sensitive configuration data. However, as an additional safeguard in case such information is present, include an explicit prohibition:: never return passwords, API keys, secrets, or tokens, even if they appear in source documents. This is easy to overlook and important not to.

The system prompt is a first-class part of the system, not an afterthought. Treat it like code: version it, test it, iterate on it.

Data quality is the actual bottleneck

This is the part nobody tells you upfront, so I’ll say it clearly: the quality of the answers you get is almost entirely determined by the quality of the data you put in.

A well-tuned retrieval pipeline and a capable model won’t rescue bad source material. If your documentation is incomplete, outdated, or written ambiguously, the bot will answer confidently from that bad material. The AI doesn’t know the documentation is bad - it just retrieves what’s most semantically similar.

What this means in practice:

Expect iteration, not a one-time setup. Getting good answers out of a system like this requires repeated test cycles. You ask a batch of realistic questions, review the answers and their sources, identify where retrieval is failing or misleading, adjust the data or the system prompt, and repeat. It’s closer to a feedback loop than a build-and-deploy.

Chunking strategy matters more than it sounds. How you split documents into chunks for embedding affects which content gets retrieved. Chunks that are too large dilute the signal; chunks that are too small lose context. The right chunking strategy depends on your document types - meeting transcripts chunk differently than technical documentation. There’s no universal answer.

Format affects retrieval quality. A well-structured document with clear headings retrieves better than a wall of prose. Metadata - dates, authors, project names - embedded in the content helps the model apply its recency and relevance rules. Cleaning and reformatting source documents before indexing is tedious and pays off significantly.

Each data source needs its own assessment. Slack archives behave differently from ClickUp task history, which behaves differently from meeting transcripts. The noise-to-signal ratio is different. The temporal distribution is different. What counts as a “good chunk” is different. Don’t treat your entire knowledge base as a monolith - assess each source on its own terms and tune accordingly.

LLM parameters affect output style, not just quality. Temperature, top-p, and similar settings change how the model expresses uncertainty, how it synthesizes multiple sources, and how verbose it is. Lower temperature gives more consistent, predictable answers; higher temperature allows more synthesis across sources but introduces more variation. There’s no correct setting - it depends on your use case and your users’ tolerance for uncertainty in the responses.

The practical implication: plan to spend more time on data curation and iterative testing than on the retrieval infrastructure itself. The infrastructure is a weekend’s work. The data is ongoing.

What worked well

Slack archives are more valuable than you’d expect. People write a lot of institutional knowledge in Slack that never makes it into documentation. “How did we handle X for client Y?”, “Who has experience with Z?” - the answers exist in threads from two years ago that nobody remembers. Once it’s indexed, this knowledge becomes searchable. We were surprised how often the bot surfaced relevant Slack threads that would have been impossible to find by manual search.

Source citations build trust immediately. Every response shows which sources were used and their relevance scores. Users can click through and verify. This matters more than the accuracy of the answers - people will tolerate an occasional wrong answer much better if they can see where it came from and catch it themselves.

Incremental sync keeps costs low. Vectorizing your entire knowledge base from scratch daily is wasteful. Track content hashes per file, detect what actually changed, and only process the delta. In practice, a typical nightly run touches a small fraction of the total corpus. This makes the whole thing sustainable to run continuously.

Meeting transcripts capture knowledge that would otherwise vanish. Decisions get made in meetings that never make it into any document. If you can get transcripts - from a tool like Bluedot, Fireflies, or similar - indexing them fills a real gap. Just handle the privacy question carefully: not every meeting should go into a shared knowledge base.

What didn’t work as expected

Source quality is everything, and most companies have bad sources. RAG faithfully retrieves whatever you put in. If your documentation is incomplete, outdated, or disorganized, the bot will answer confidently from bad material. We spent more time cleaning up source content than building the retrieval pipeline.

Slack noise is a real problem. Not every channel belongs in a knowledge base. Off-topic conversations, jokes, social channels, channels that were active for a project that no longer exists - all of this gets indexed and pollutes retrieval. You need to be selective. Index channels that carry real work conversations; ignore the rest.

The bot doesn’t know what it doesn’t know. When no good source exists, it will still produce an answer. Getting it to say “I don’t have reliable information on this” consistently requires significant prompt engineering effort. Even then it’s imperfect. Set user expectations accordingly - frame it as a research assistant, not an oracle.

Old information surfaces with full confidence. This one burned us a few times. A two-year-old Slack thread saying “we’re evaluating Tool X” will appear as a valid source even if you evaluated and rejected Tool X eighteen months ago. Adding clear date context to indexed content helps, but doesn’t fully solve the problem. Pruning stale content from the knowledge base is ongoing maintenance, not a one-time task.

What this type of system doesn’t do

It doesn’t learn from corrections. If the bot gives a wrong answer and someone corrects it in chat, that correction doesn’t update the knowledge base. Wrong information has to be fixed at the source document.

It doesn’t have access to per-user context. Everyone sees the same knowledge base. The bot can’t tell you about your specific project assignments, your personal conversations, or data scoped to your role. For that you’d need per-user retrieval with access control applied at the document level - significantly more complex to build.

Access control in source systems carries through. If you connect live systems via MCP, the integration is only as broad as the credentials it runs under. Files the service account can’t read don’t exist to the bot. This is actually a feature - it means your existing permissions model is respected - but it means you have to configure access intentionally.

Knowledge drift requires active management. If a process changes but the documentation doesn’t get updated, the bot will keep citing the old version indefinitely. There’s no automated way to detect when a document has become factually stale. Someone has to own knowledge base hygiene.

The actual hard problem

The AI part of this is not the hard part. Models are good, APIs are mature, vector databases are widely available. You can have a working prototype in a day.

The hard part is everything else: curation, data quality, system prompt tuning, iterative testing, and ongoing maintenance. Deciding which sources belong in the knowledge base and which don’t. Keeping content current. Managing the quality of what gets indexed. Handling edge cases where no good source exists. Getting your team to trust the bot enough to use it, and getting them to maintain the sources it depends on.

The value of a company knowledge bot is almost entirely a function of the quality of the knowledge you put into it. Better sources, better answers. Garbage in, garbage out - but delivered with AI confidence, which makes it worse than just having nothing.

Start narrow. Pick the two or three sources where your team actually stores real, current knowledge. Get those working well. Expand only after you trust what you have.


Need someone to build this for you?

If your team needs a knowledge base or internal AI assistant and you’d rather have someone build it than figure it out from scratch - that’s something I do. Reach out at pawel.dymek@gmail.com and we can talk through what makes sense for your setup.

© 2026 paweldymek.com