Frank DENIS random thoughts.

LLMs don't need your secret tokens (but MCP servers hand them over anyway)

2025-06-16T00:00:00+02:00

A quick security PSA for anyone wiring large-language models to their back-end toys.

What’s an MCP server, again?

An MCP (Model Control Protocol) server is a bundle of local functions. Each function comes with a short plain-English description. When an LLM reads a user request that matches a description, it calls the function, gathers the result, and keeps chatting.

It feels like magic:

Enable that feature for me. Why is Service 42 broken?

Every major cloud vendor ships an MCP stack, and indie projects are everywhere.

Where things go off the rails: Secrets everywhere

Most MCP servers run locally so they can poke databases, hit internal APIs, and rummage through things you would never expose publicly. That is great until you remember what actually happens:

“List my services” triggers an MCP call to your API.
The API replies with service IDs, auth tokens, region keys, and so on.
MCP forwards the entire blob right back to the LLM.
The LLM hides the secrets from the user, yet the strings now sit inside a third-party context window.

Running the model on your own machine avoids that, but most of us are not lugging 230-billion-parameter beasts around.

The model literally does not need your token

An LLM’s reasoning is unchanged if

sk-f8fad...

is replaced by

[[ENCRYPTED-TOKEN:123]]

The model only cares about making the distinction between tokens, not the tokens themselves.

A dead-simple fix: Encrypt before you send

Below is an outline that any MCP implementation can follow to keep secrets local.

1 Spotting secrets

Decide what “looks like a secret.” The quick route is a deny-list of regular expressions for well-known key formats (AWS, Google, and so on).
Anything that matches is flagged as sensitive; anything else passes through untouched.

2 Encrypting outbound values

Generate a random symmetric key when the session starts.
Whenever the MCP prepares a response for the LLM, replace every flagged value with an encrypted placeholder, such as:
```
[[ENCRYPTED:...]]
```
Keep the symmetric key in RAM only. When the session ends, the key disappears.
AES-CBC with a fixed IV is acceptable in this context. More important than the algorithm itself is ensuring the key never leaves the process.

3 Decrypting inbound values

Before a local function calls a real API, scan its inputs for placeholders.
Replace each placeholder with the plaintext secret by decrypting it with the in-memory key.
From the tool’s point of view nothing changed; from the model’s point of view the secret never existed.

4 Handling files

Sometimes the MCP task writes a file, for instance, an invoice PDF, and returns the file path to the model. That still leaks if the model then asks to grep the workspace. Two pragmatic steps:

Encrypt the file contents with a key that only the user can decrypt.
Hand the LLM only the path plus an explicit tag such as “(confidential, do not read).”

Anyone who actually needs the file can decrypt it locally; the LLM should not even try.

5 Version-control hygiene

Because the placeholders are meaningless without the key and the key never hits disk, committing chat logs or workspace files to git does not expose anything valuable. The real secrets remain in RAM the whole time.

“But what about invoices and logs?”

Many workflows dump data into local files, then hand the path to the LLM. Encrypting the file itself plus a short “do not read” note is enough to mitigate casual secret leakage. If multiple teammates need the file, share the key through your normal password manager or KMS; the LLM never sees it.

Why isn’t everyone already doing this?

Developers chase the “works on my laptop” moment first, security later. The official SDK examples focus on functionality, not threat models, and their patterns get copy-pasted across projects. I have poked at several “enterprise-ready” MCP repos and found zero built-in secret-scrubbing.

Takeaways

MCP servers are powerful: they turn five clicks into one chat message.
They are also noisy: every call ferries your private tokens into the model prompt.
Fixing it is cheap: encrypt on the way out, decrypt on the way back, and keep your org’s crown jewels off someone else’s GPU.

Apps shouldn’t let users enter OpenSSL cipher-suite strings

2025-06-06T00:00:00+02:00

Letting people type in raw OpenSSL cipher‐suite strings is a terrible idea.

I’ve seen too many well-meaning admins try to “fix” TLS by pasting in a random string they found on the internet, only to make things worse.

Free-form cipher strings are a bad user interface, modern TLS defaults are usually just fine, and a simple “checkbox” approach where each box adds or removes a property can give you both flexibility and safety.

Why “cipher=foo:bar:baz” Is a Bad User Experience

Think back a few years, when the BEAST and POODLE attacks were all over the news. People panicked and grabbed “RC4” because someone said “RC4 fixes BEAST.” They never stopped to ask “But is RC4 safe?” The answer turned out to be “No, it’s broken, too.” By fiddling with a raw cipher string, lots of admins basically traded one weakness for another.

Here’s the thing: cipher-suite syntax (in OpenSSL or similar libraries) is basically its own tiny programming language. You need to know:

Which key exchanges exist (ECDHE, DHE, RSA, etc.), and which ones are safe today
Which bulk ciphers are strong (AES-GCM, ChaCha20-Poly1305) versus which are broken (RC4, 3DES)
Which hashing algorithms to prefer (SHA-256, SHA-384) versus those to avoid (MD5, SHA-1)
How to combine or exclude pieces with special symbols (think ! or + in OpenSSL’s syntax)

That’s a lot to learn. The minute you let someone write their own string, they usually just Google “best OpenSSL ciphers,” copy a block of text, and paste it into their config. Six months later, they’ve forgotten it’s even there, and by then some of those ciphers are already obsolete or broken.

Today’s TLS libraries such as OpenSSL 1.1.x and above, Go’s crypto/tls, BoringSSL, etc. already ship with sensible defaults. They favor forward-secure key exchanges (ECDHE or X25519), use AEAD ciphers (AES-GCM or ChaCha20-Poly1305), and reject known-broken options. If you’re using a modern TLS library, 99 percent of the time you should just leave the defaults alone.

When You Might Actually Need to Tweak Ciphers

Now, I’m not saying “never touch your TLS settings.” There are valid reasons to override defaults, mainly compliance requirements. For example:

FIPS 140-3 often requires you to use only algorithms that have passed NIST’s validation. Some curves, like Curve25519, were only approved in early 2023.
Minimum security level. Some policies insist on “at least 256 bits of security” in the handshake. That usually means RSA 3072 + AES-256 or P-521 + AES-256 in TLS 1.2, or a curve that actually delivers ≥ 256 bits of ECC strength.
Forward secrecy is sometimes mandated so that session keys can’t be retroactively compromised.
There might even be future rules demanding post-quantum key exchange.

In short, if you must obey a checklist of “NIST-approved curves,” or “only 256 bits or higher,” or “use PQ hybrid KEMs,” you do need to control which algorithms get used. But there is a better way than asking people to type in a giant cipher string themselves, especially if you want them to mix “FIPS” and “Post-quantum” together.

Checkboxes Instead of Cryptic Strings

Imagine if, instead of pasting in raw cipher names like ECDHE-RSA-AES128-GCM-SHA256:…, you simply saw a list of boxes, each one adds or removes a category of algorithms from the allowed set:

FIPS 140-3 approved
≥ 256 bit security
Forward secrecy
Post-quantum KEM (experimental)
Disable legacy RSA key exchange (TLS 1.2 only)
Disable TLS 1.0 and TLS 1.1
Disable deprecated/broken ciphers (RC4, 3DES, MD5, SHA-1)
Include GOST/KCMVP/GMTLS algorithms

Each box corresponds to a property or a category. Checking it means “add these algorithms to my allowed list,” except for boxes labeled “Disable…,” which remove those algorithms. If you check multiple boxes, you get the union of all the “add” categories, minus any “disable” categories. For example:

If you check FIPS 140-3 approved, you’ll include everything NIST-validated.
If you also check Post-quantum KEM, you’ll add in any experimental PQ KEMs so you can test hybrid PQ+classical designs, even though those PQ KEMs aren’t FIPS-approved yet.
If you check Disable legacy RSA key exchange, you’ll remove any TLS_RSA_* suites under TLS 1.2.
If you check Disable deprecated/broken ciphers, you’ll remove RC4, 3DES, MD5, and SHA-1 from whatever else you selected.

Your users don’t have to memorize every single cipher name or know exactly which suite corresponds to which property. They just say “I want FIPS” and “I want PQ” and “no old broken stuff,” and let the code handle the details.

Advantages of the Checkbox Approach

Plain-English intent Each checkbox describes a property. You understand “post-quantum KEM” or “256 bits of security” at a glance, without needing to parse a colon-separated list of hyphenated tokens.
Mix categories freely Want to run a FIPS-validated configuration and experiment with ML-KEM hybrids while also removing legacy RSA? Just check FIPS 140-3 approved, Post-quantum KEM, and Disable legacy RSA key exchange. The allowed set becomes (FIPS algorithms) ∪ (PQ KEMs) minus (TLS_RSA_* suites) and minus any “deprecated” bits if that box is checked.
Future-proofing When a new algorithm gets approved (for example, a new PQ KEM or a new curve), you tag it internally as “post-quantum,” “FIPS,” or “GOST.” The UI never changes: admins just see that those checkboxes now pull in the new algorithms automatically.
Reduced human error Nobody accidentally leaves RC4 or 3DES in the mix. Checking Disable deprecated/broken ciphers makes sure none of those can slip in, even if you also checked another “add” category.
Compliance made easy Instead of sending auditors a page of tiny cipher names, you can show them exactly which boxes are checked. “FIPS 140-3 approved + PQ KEM + Disable legacy RSA + No TLS 1.0/1.1” is far more transparent than a colon-separated string.

Pitfalls You Have to Watch Out For

Of course, “checkboxes” doesn’t magically solve every corner case. You still need to build a correct mapping from checkboxes to real cipher lists. Here are a few snag points:

Defining “≥ 256 bits” properly A lot of people assume “X25519 + AES-256” is 256 bits, but it isn’t. X25519 has about 128 bits of elliptic-curve security. If you truly want 256 bits, you need RSA 3072 + AES-256 (≈112-bit RSA 3072 vs. AES-256) or a P-521 curve (≈256-bit ECC) with AES-256. Be crystal clear about how you measure “security bits” in each category.
TLS 1.3 vs. TLS 1.2 differences
- In TLS 1.3, cipher suites are just AEAD+hash (e.g. TLS_AES_256_GCM_SHA384); key exchange is negotiated separately via supported_groups (curves or KEMs).
- In TLS 1.2, a “cipher suite” bundles together key exchange + bulk cipher + MAC. Filtering for “forward secrecy” means including only ECDHE_* or DHE_* suites; if you also check “Include GOST,” you add GOST KEX suites, and so on.
Your code has to handle these two worlds separately. If Post-quantum KEM is checked, you only enable TLS 1.3 (since TLS 1.2 can’t do standard PQ KEMs). If Disable legacy RSA key exchange is checked, you remove TLS_RSA_* suites under TLS 1.2 but can still add other categories side by side.
Contradictory or empty selections Since checkboxes combine additions and removals, you’ll still need rules for “disable deprecated” or “disable TLS 1.0/1.1.” If someone checks Disable TLS 1.0/1.1 but leaves every other box unchecked, you should default to “just use the library’s normal TLS 1.2+1.3 defaults.” If they check something that yields no available ciphers (for instance, “≥ 256 bits” on a library that only supports 128-bit curves), you need to detect that at startup and error out:

“No cipher suites match your selection. Please revise your boxes.”
Keeping your taxonomy up to date Internally, you need a maintained list of “tags” for each algorithm: “FIPS,” “security_256bit,” “forward_secrecy,” “post_quantum,” “deprecated,” “gost,” etc. That list must be refreshed whenever standards bodies approve new stuff (NIST FIPS 140-3 updates, PQ milestones, or new GOST releases). If you tag something incorrectly, admins might inadvertently violate compliance.

How I’d Build This in Practice

Here’s a rough sketch of the steps I’d take if I were writing a TLS library or an admin tool that used a checkbox approach instead of raw cipher strings.

Maintain a Tag Table Create a simple JSON or YAML file that lists every cipher suite, AEAD, and key‐exchange group you support. For example:

tls13_cipher_suites:
  - name: TLS_AES_256_GCM_SHA384
    tags: ["fips", "forward_secrecy", "security_256bit"]
  - name: TLS_CHACHA20_POLY1305_SHA256
    tags: ["experimental", "forward_secrecy", "security_256bit"]
  - name: TLS_AES_128_GCM_SHA256
    tags: ["fips", "forward_secrecy", "security_128bit"]

tls12_cipher_suites:
  - name: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    tags: ["fips", "forward_secrecy", "security_256bit"]
  - name: TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
    tags: ["experimental", "forward_secrecy", "security_256bit"]
  - name: TLS_RSA_WITH_AES_256_GCM_SHA384
    tags: ["legacy_no_forward_secrecy", "security_256bit"]
  - name: TLS_RSA_WITH_AES_128_CBC_SHA
    tags: ["deprecated", "security_128bit"]
  # …and so on…

And another list for key exchanges / curves / KEMs:

kex_groups:
  - name: X25519
    tags: ["forward_secrecy", "security_128bit", "fips_as_of:2023-02-01"]
  - name: p256
    tags: ["forward_secrecy", "security_128bit", "fips"]
  - name: mlkem768x25519
    tags: ["post_quantum", "experimental", "not_fips"]
  - name: gost2001
    tags: ["gost", "legacy_no_forward_secrecy"]

Present the Checkboxes In your config UI or CLI, show something like:
```
[ ] FIPS 140-3 approved
[ ] Minimum 256 bits of security
[ ] Forward secrecy
[ ] Post-quantum KEM (TLS 1.3 only – experimental)
[ ] Disable legacy RSA key exchange (TLS 1.2 only)
[ ] Disable TLS 1.0 and TLS 1.1
[ ] Disable deprecated/broken ciphers (RC4, 3DES, MD5, SHA-1)
[ ] Include GOST/KCMVP/GMTLS algorithms
```
Each box is an “inclusion” category except for the ones prefixed with “Disable,” which are explicit exclusions. When boxes are checked, your final allowed set is:

(Union of all suites/groups matching any checked “include” box) minus (anything matching any checked “disable” box).

If no “include” boxes are checked at all, default to your library’s built-in “sane defaults” (which cover most use cases).
Filter at Startup When the server or client starts, read the user’s selections. Then:
1. Build an “include_set” by union-filtering your tables:
  - If FIPS 140-3 approved is checked, add everything tagged “fips.”
  - If ≥ 256 bits is checked, add everything tagged “security_256bit.”
  - If Post-quantum KEM is checked, add everything tagged “post_quantum.”
  - If Forward secrecy is checked, add everything tagged “forward_secrecy.”
  - If Include GOST is checked, add everything tagged “gost.”
  (If no include-boxes are checked at all, skip this step and just pull in “defaults.”)
2. Build an “exclude_set”:
  - If Disable legacy RSA key exchange is checked, add everything tagged “legacy_no_forward_secrecy.”
  - If Disable deprecated/broken ciphers is checked, add everything tagged “deprecated.”
  - If Disable TLS 1.0 and 1.1 is checked, exclude any suite or KEX group that only works in TLS 1.0/1.1.
3. Compute allowed = include_set – exclude_set.
  - If that’s empty (and at least one include-box was checked), error out:
    
    “No cipher suites match your selection. Please revise your boxes.”
  - If no include-boxes were checked, use your TLS library’s defaults (after applying any exclude-boxes).
4. Separate by TLS version:
  - For TLS 1.3, parse allowed to pick only AEAD+hash tuples (e.g. TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256, etc.). Also build allowed_groups from PQ or ECC KEMs (X25519, ML-KEM, etc.).
  - For TLS 1.2, parse allowed to pick only the full suites (ECDHE_RSA_WITH_AES_256_GCM_SHA384, RSA_WITH_AES_256_CBC_SHA, etc.).
If either TLS 1.3 or TLS 1.2 ends up with no available ciphers (and the admin expects that version to be enabled), you should warn or error accordingly.

Apply to the Library For OpenSSL, build strings:

SSL_CTX_set_min_proto_version(ctx, TLS1_2_VERSION);
SSL_CTX_set_max_proto_version(ctx, TLS1_3_VERSION);
SSL_CTX_set_cipher_list(ctx, "ECDHE-RSA-AES256-GCM-SHA384:...");
SSL_CTX_set_ciphersuites(ctx, "TLS_AES_256_GCM_SHA384:...");
// Then set SSL_CTX_set_groups_list(ctx, "X25519:secp521r1:mlkem768");

The key is that everything in allowed gets wired in, and everything in excluded is dropped.

Offer Presets for Novices On top of checkboxes, provide “Modern,” “Intermediate,” and “Legacy” presets, just like Mozilla’s SSL Config Generator. Each preset simply checks a known combination:
- Modern: TLS 1.3 only, Forward secrecy, Disable TLS 1.0/1.1, Disable deprecated/broken ciphers
- Intermediate: TLS 1.3 + TLS 1.2, Forward secrecy, Disable deprecated/broken ciphers
- Strict-FIPS: FIPS 140-3 approved, ≥ 256 bit security, Disable TLS 1.0/1.1, Disable deprecated/broken ciphers
- PQ-Experiment: Post-quantum KEM, Forward secrecy, Disable deprecated/broken ciphers (might produce TLS 1.3 with ML-KEM hybrids only)
But when someone clicks “Advanced,” they see all the individual checkboxes and can tweak further, adding Post-quantum on top of FIPS, or enabling Disable legacy RSA alongside everything else.

A Few Hard Reality Checks

Checkbox ≠ Magic. You still have to maintain that tagging table carefully. If a new PQ KEM or GOST variant appears, you must tag it correctly. If you mis-tag something “post_quantum” as “fips,” admins will think they’re doing a FIPS-compliant setup when they’re not.
User Education Still Matters. Even with checkboxes, some choices are subtle. “≥ 256 bits” often confuses people, because they think “AES-256 = 256 bits” and “X25519 = 256 bits,” when in reality X25519 is about 128 bits. You need tooltips or documentation explaining:

To get true 256-bit security, you must use a 521-bit ECC curve (P-521) or at least RSA 3072 + AES-256. X25519 + AES-256 is only about 128 bits net.
TLS 1.3 Is Different. In TLS 1.3, cipher suites are just AEADs, so your checkboxes pull in AEADs (AES-GCM, ChaCha20-Poly1305) and separate KEM groups (X25519, PQ). Make sure your UI reflects that separation; otherwise people will wonder why “checking ≤ 128 bits” didn’t filter out some TLS 1.3 suites, etc.
“Disable TLS 1.0/1.1” Is Still a Bit Odd. If that box is checked, you simply set MinVersion = TLS1_2. But some older implementations might not let you fully remove TLS 1.1 at runtime; you have to ensure the code path really enforces that.

Why This Matters Today

Most people today don’t—and shouldn’t—touch their cipher lists. Go’s TLS library removed the ability to override TLS 1.3 ciphers because their defaults are strong enough. OpenSSL’s defaults (since 1.1.x) are quite sensible. Browsers do the right thing. Android, iOS, and Windows ship a solid TLS stack.

But a small minority of operators will do anything they can to “tune performance” or “optimize security” by hand. They go down the rabbit hole of “Is AES-GCM faster than ChaCha?” or “Which curve should I use?” Most of the time, they end up with a brittle, outdated configuration that nobody ever revisits.

A checkbox approach still lets them satisfy real compliance requirements: FIPS, government regulations, PCI, or “we need at least 256 bits of security.” It also lets them opt into new things like PQ KEMs without having to learn every single cipher name. And when an algorithm gets deprecated or broken, your code removes that tag so nobody picks it again.

In short, ditch free-form cipher strings. Let your users pick what properties they need: FIPS, security bits, forward secrecy, PQ, rather than forcing them to pick exactly which ciphers. You’ll save yourself (and future you) countless headaches down the road.

TL;DR

Stop letting people type raw cipher-suite strings. It’s too easy to misconfigure and forget later
Modern TLS defaults are already strong. Only tweak ciphers if you have a compliance reason
Use a checkbox approach for high-level properties (“FIPS 140-3 approved,” “≥ 256 bits,” “forward secrecy,” “post-quantum,” etc.) and compute the union, minus any “disable” categories
Maintain a simple “tags” table for every cipher suite/AEAD/group. Update it when standards bodies approve or deprecate algorithms
Provide presets for “Modern,” “Intermediate,” “Strict-FIPS,” and “PQ-Experiment” so novices aren’t overwhelmed

Trust the defaults, and let checkboxes handle the rest. Your users, and future you, will thank you.

Building a smarter chatbot - Why you need FAQ-links + RAG (and why everyone else gets it wrong)

2025-06-04T00:00:00+02:00

Let’s cut the fluff: most “AI chatbots” out there either hallucinate like mad or take forever to respond because they’re trying to search through every scrap of documentation on every turn. If you think “just use RAG for everything” or “just fine-tune a model on our docs,” you’re kidding yourself. Expect late-night bug hunts, and the endless “why is the bot saying this?” calls from support.

In this post, I’m going to tell you exactly why you must combine a lightweight FAQ-links layer with a proper RAG fallback. No hand-waving about “best practices” or “industry standards”—I’ll lay out what sucks about pure RAG, why a curated FAQ-links layer saves your sanity and how to glue them together. If you don’t do this, you’re in for a world of pain.

1. Why RAG-Only Chatbots Suck

I’m going to be blunt: if your chatbot runs a RAG cycle (embed the query, retrieve from a massive index, re-rank, generate) for every single question, you’ll end up with one of two realities:

Slow, expensive responses. In a demo, it might take 500 ms to do one retrieval. But in the real world, your vector index has millions of docs, your re-ranker is a cross-encoder that chews through CPU, and your LLM call costs real money. Every single chat turn will feel like molasses—users hate waiting, and your CFO hates seeing that credit card bill.
Hallucinations and irrelevant junk. If your index isn’t clean, you’ll serve up half-baked, slightly related passages, and your LLM will spit out answers that sound plausible but are basically made-up. Users figure that out fast and lose trust. Then they either call support anyway or click away in frustration.

Let me give it to you straight: RAG can only work if you have immaculate document hygiene, a solid re-ranker, and the patience to tune thresholds day in and day out. Most teams skip that step. They train an embedding model, shove everything into Pinecone or whatever, and hope for the best. Spoiler: it doesn’t work. You end up with “Did you mean…?” loops, endless fine-tuning, and still a 30 % chance the answer is “I’m sorry, I don’t know.”

If you want a “pure RAG” system, go ahead—but be ready to endure slow responses, angry customers, and sleepless nights. Or read on and learn how to solve 80 % of your support queries in milliseconds without hallucinations.

2. The Real Reason You Need a Curated FAQ-Links Layer

Here’s the honest truth: most user questions fall into a small, predictable set of topics. Billing, password resets, basic configuration—these are your bread and butter. And nobody wants to see the “RAG pipeline spinning up” just to explain how to reset a password. They want a bullet-pointed answer that’s been checked by a human (or at least verified by AI) and won’t change on them.

That’s where a curated FAQ-links layer comes in. But let’s be crystal clear: I’m not talking about an old-school static FAQ page that’s a 10 MB PDF with five typos. I’m talking about a searchable FAQ index where each entry is basically:

Q: “How do I reset my password?”
Links: A direct pointer to the exact section in your documentation (so the user can click through if they want more context).

By front-loading these high-value Q→document pointers, you get:

100 % Accuracy on Common Questions. Once your system matches a question to an FAQ entry, it retrieves the right docs and extracts the precise snippets. There’s no chance of the LLM inventing an extra step or mis-wording something.
Blazing-Fast Replies. Matching a user’s query against a small FAQ index (hundreds or thousands of entries) is extremely fast—often just a few milliseconds. You don’t need to spin up a full RAG pipeline unless it’s truly necessary.
Reduced Hallucination. By design, you only ever feed the LLM snippets you have retrieved from your own official documentation. That dramatically cuts down on invented or outdated info.

Most teams skip this because “writing FAQs is tedious,” but AI can do the heavy lifting for you—generate candidate questions, find the best doc sections, craft an answer stub, and then verify that answer actually works. You’ll still need a quick human check, but you’ll avoid hours of manual work.

That said, your FAQ can’t (and shouldn’t) cover every possible question. New features appear, community forums have edge cases, and someone will inevitably ask for “unicorn-mode weirdness.” That’s where RAG steps in.

3. RAG Isn’t Going Anywhere—Here’s Why

Okay, I’m not bashing RAG completely. There’s a place for it, but it has to be the backup plan for everything your FAQ-links layer doesn’t handle. Here’s why:

Your docs change all the time. New features, new API endpoints, updated workflows—if you rely on a static LLM fine-tuned on last month’s docs, it’ll be obsolete in a few weeks.
Long-tail questions exist. No matter how comprehensive your FAQ-index is, someone will ask the “edge-case” query about a bizarre configuration nobody documented. RAG gives you the safety net to handle that.
Users want citations. With RAG, you can pull a search result, show the passage, and let the user know “Source: Admin Guide, Sec 4.2.” That builds trust. If your fine-tuned model just spits text without citations, users will ask “Where’d you get that?” and you can’t give a straight answer.

But—and this is a big “but”—you cannot use RAG as your first line of defense. If you do, you’ll suffer from the “slow and hallucinate” problem. Instead, make RAG the fallback for when your FAQ-links layer doesn’t confidently match a question.

3.1 GraphRAG: When You Need to Understand Relationships

While the FAQ-links + RAG hybrid handles most support queries beautifully, there’s an emerging approach called GraphRAG that deserves mention. Microsoft’s GraphRAG takes a different angle: instead of just embedding chunks and doing similarity search, it builds a knowledge graph from your documents.

What GraphRAG Can Do

Entity and Relationship Extraction GraphRAG uses LLMs to extract entities (people, products, features, concepts) and their relationships from your docs. It then stores these in a graph structure where nodes are entities and edges are relationships.
Community Detection It groups related entities into communities using graph algorithms. This lets it understand “clusters” of related concepts—for example, all the components involved in your authentication system.
Multi-hop Reasoning When a user asks a complex question that requires connecting multiple pieces of information, GraphRAG can traverse the knowledge graph to find indirect relationships. For instance: “Which services will be affected if I change the authentication method in ServiceX?”
Global Summarization GraphRAG can generate summaries at different levels—from individual entities to entire communities of related concepts. This helps answer broad questions like “What are all the ways to configure security in our platform?”

When to Consider GraphRAG

GraphRAG shines for:

Complex troubleshooting where you need to understand dependencies
Impact analysis (“What breaks if I change X?”)
Discovery queries (“Show me all features related to Y”)
Architecture questions that span multiple systems

But here’s the catch: GraphRAG requires significant upfront processing to build and maintain the knowledge graph. For most support chatbots handling “How do I reset my password?” queries, it’s overkill. Stick with FAQ-links + RAG for 99% of use cases, and only layer in GraphRAG if you’re building something like an internal engineering assistant that needs to reason about complex system dependencies.

4. How to Put FAQ-Links + RAG Together—Step by Step

Below is the no-BS blueprint for building a hybrid system that actually works.

4.1 Step 1: Build Your FAQ-Links Index

Pick your top 100 (or 200) support questions.
- Talk to your support team. Check ticket tags. Figure out the most common pain points (billing, resets, basic setup, etc.).
- Don’t guess—use real data. If “How do I link my bank account?” was asked 1 200 times last quarter, that goes into your FAQ.
Craft each FAQ entry as “Q → document links.”
- Write a concise Q that people actually ask (e.g., “How do I configure multi-region failover?”).
- Instead of writing a full answer, list one or two exact URLs (or internal doc IDs) where the canonical answer lives.
- Store a vector embedding of just the question text using a decent sentence-transformer model. (You can still fine-tune this embedding model on your own queries if you want, but start with something off-the-shelf.)
Example FAQ entry:
```
{
  "id": 42,
  "question": "How do I configure multi-region failover for ServiceX in the EU?",
  "document_links": [
    "https://docs.example.com/ServiceX/Admin#MultiRegion",
    "https://api.example.com/ServiceX/v3.1#FailoverParams"
  ],
  "embedding_vector": [0.123, -0.456, …],
  "last_updated": "2025-06-20"
}
```
Store these embeddings in a fast, in-memory or small vector index.
- Keep it cheap—this index only has hundreds/thousands of entries, so you don’t need a huge Pinecone cluster. A single-node FAISS or even a RedisVector store is fine.
Choose a “similarity threshold” (e.g., 0.90).
- If a user’s question embedding has cosine similarity ≥ 0.90 with a stored FAQ Q embedding, you call that a “hit.”
- Tune it based on your data. If too many wrong FAQs match, raise the threshold to 0.92. If too many true FAQs are missed, drop it to 0.88. It’s not rocket science.

Bonus Step (Optional but Highly Recommended): Automatically Generate and Verify FAQ Entries with AI

Generate candidate FAQ questions: Use an LLM to parse your documentation and produce a list of plausible user questions. For example, prompt the LLM:

“Read our ServiceX Admin Guide and generate 50 realistic user questions someone might ask.”
Link them to relevant docs: Have the LLM or a simple retriever find the best section of the docs for each generated question.
Auto-generate answer stubs: Let the LLM draft a concise answer by summarizing the relevant document sections.
Automatically verify correctness:
- Run each generated Q → answer through an automated test harness or an internal API sandbox. For instance, if the bot says “To reset your password, call resetPassword(userId),” actually execute that in a safe test environment or mock.
- Check that the steps work and the links resolve correctly.
Human review (quick sanity check): A support agent or content owner glances over the AI-generated entries, fixes any minor wording issues, and approves them.

That process can cut your FAQ creation time from days to hours. You still manage the index, but you’re not writing hundreds of Q→links by hand.

4.2 Step 2: When a User Asks a Question

Embed the user’s question using the same sentence-embedding model you used for the FAQ.
Do a nearest-neighbor search vs. your FAQ embeddings.
- Find the top match and its cosine score.
- If score ≥ threshold (e.g., 0.90), trigger Branch A: FAQ-links handling.
- Otherwise, trigger Branch B: RAG fallback.

4.3 Step 3A: FAQ-Links Handling

Fetch the linked documents from the FAQ entry.
- In our example, that’s two URLs:
  - https://docs.example.com/ServiceX/Admin#MultiRegion
  - https://api.example.com/ServiceX/v3.1#FailoverParams
Retrieve the most relevant passages from those docs (not the entire doc). Here’s how:
- Offline: Pre-chunk every doc in your corpus into ~500-token passages, embed them, and store those embeddings in a vector DB with metadata tags (doc_id or URL, section titles, last_updated, etc.).
- Online: Take the user’s question, embed it again (now in the “document embedding” space), and search only within those two doc IDs. You’re effectively telling the index, “Find me the top 5 passages from these exact documents that match this question.”
Optionally re-rank those top 5 passages with a cheap cross-encoder to double-check you have the right snippets.

Build a prompt that includes the question and those 5 passages, each labeled with a “(Source: …)” citation. For example:

You are DocsBot. The user asked:
“How do I configure multi-region failover for ServiceX in the EU region?”

Here are the most relevant excerpts:

[1] “In the ServiceX Admin Console, navigate to Settings → Multi-Region. Under ‘Primary Region,’ select eu-west-1. Then click ‘Add Region’ and choose eu-central-1 as your failover. Finally, click Enable.” (Source: ServiceX Admin Guide, Sec. 4.2)

[2] “If you prefer the API, include this JSON body in your POST to /v3.1/configure:
    {
      "primaryRegion": "eu-west-1",
      "failoverRegion": "eu-central-1"
    }
  ” (Source: ServiceX API Reference, p. 78)

[3] “Note: EU failover requires that your account has cross-region networking enabled. Go to Networking → Regions and toggle ‘Cross-Region Enabled.’” (Source: ServiceX Admin Guide, Sec. 2.1)

Please write a clear, step-by-step answer using these excerpts. Include citations in parentheses after each step.

Call the LLM with that prompt. It will stitch together a crisp answer, quoting the snippets and dropping citations.
Return the generated answer to the user.
- Pros: Instant (just one LLM call), dead-on accurate (you fed it verified content), and trust-worthy (it shows citations).
- Cons: You have to maintain that FAQ index and keep the doc-links up to date—but that’s far less effort than re-training an LLM every week.

4.4 Step 3B: RAG Fallback (When FAQ Misses)

Embed the user’s question and run a nearest-neighbor search against your entire document-chunk index.
- You might retrieve the top 50 passages from anywhere in your knowledge base: internal wiki, community forum threads, developer docs, etc.
Re-rank those 50 with a cross-encoder to pick the top 5.
Build a prompt with those 5 passages and feed it to the LLM, just like in the FAQ-links branch. Ask it to cite each snippet.
Return the synthesized answer to the user, complete with citations.

Yes, this means if a question truly has no match in the FAQ, you pay the price of a full-blown RAG cycle—embedding, retrieval, re-ranking, LLM. But that only happens for 10–30% of queries (your more obscure or brand-new questions). The rest of the time, you’re in “FAQ-links land,” and life is good.

5. Why This Hybrid Approach Is a Game-Changer

5.1 You Slash Latency and Cost

FAQ-Links Hit (70–90% of the time): One small vector lookup (hundreds of entries) + one restricted retrieval (just a handful of doc IDs) + one LLM call. That’s usually under 300 ms total.
RAG Fallback (10–30% of the time): Full retrieval across millions of docs, re-rank, and LLM. Yes, it’s slower—maybe 800 ms–1 s—but it only happens when you truly need it.

Compare that to “pure RAG for everything,” which can easily take 800 ms–1 s per turn, every single time. Your users will notice, and your team will hear about it in every stand-up.

5.2 You Keep Hallucinations to Zero (for Common Q’s)

Because your FAQ-links layer only surfaces passages you trust, there’s no chance the LLM “messes up” a basic answer. If someone asks “How do I reset my password?” they get exactly the steps you vetted. No missing semicolons, no wrong URLs, no “In some cases, use this other command that doesn’t exist.” You own those top-hit questions.

5.3 You Still Handle the Long Tail

Sure, some support queries are weird. “How do I configure ServiceX to use a custom DNS SRV record in Antarctica?” (Yes, that happens.) If there’s no FAQ match, you let RAG grab relevant passages from your admin guide, community forum, or developer docs. The LLM will piece it together. It’s not perfect—some edge cases might still require a human hand—but it’s far better than refusing to answer or hallucinating “I think you need to call your network guy.”

5.4 You Avoid a Maintenance Nightmare

FAQ Maintenance: Review your top 100 FAQ entries every month or quarter. If you see a pattern of questions hitting RAG that clearly belong in the FAQ, add a new entry. That might be 5–10 new FAQs a quarter—easy.
Doc Corpus Updates: When your documentation team publishes a new version, your ETL pipeline re-chunks and re-embeds changed files. Zero manual steps needed.
No Endless LLM Retraining: You’re not fine-tuning a massive model every time something minor changes in your docs. You’re just updating embeddings and pointers. Much faster, much cheaper.

5.5 The Follow-up Question Problem

Sending FAQ-based responses doesn’t prevent follow-up questions. In fact, it often encourages them. And that’s actually a good thing if you design for it.

Why Follow-ups Happen

Your initial answer was correct but incomplete User: “How do I reset my password?” Bot: [Provides password reset steps] User: “What if I don’t have access to my email?”
The user needs clarification on a specific step User: “How do I configure failover?” Bot: [Provides configuration steps mentioning ‘cross-region networking’] User: “How do I enable cross-region networking?”
The context has shifted User: “How do I set up authentication?” Bot: [Provides OAuth setup steps] User: “Actually, I meant SAML authentication”

Handling Follow-ups in Your Hybrid System

Maintain conversation context
- Store the last 2-3 turns of conversation
- When matching against FAQs, include this context in your embedding
- This helps disambiguate “How do I do that?” type questions
Extract entities from FAQ responses
- When you serve an FAQ answer, extract key entities mentioned (e.g., “cross-region networking”, “OAuth”, “Admin Console”)
- If the follow-up question references these entities, boost relevant FAQ entries or doc sections
Design FAQs with common follow-ups in mind
- If “How do I reset my password?” often leads to “What if I don’t have email access?”, create both FAQ entries
- Link them so your system knows they’re related
- Consider proactively suggesting: “Related: Password reset without email access”
Use AI to predict and pre-generate follow-up FAQs
- After generating an FAQ entry, prompt your LLM: “What follow-up questions might users ask after reading this answer?”
- Generate FAQ entries for the most likely follow-ups
- This creates a web of related FAQs that handle conversation flows, not just isolated questions

The Bottom Line on Follow-ups

Don’t treat follow-up questions as failures—they’re natural conversation patterns. Your hybrid system should:

Track conversation context
Have FAQ entries for common follow-up patterns
Gracefully fall back to RAG when the conversation goes off-script

Remember: even human support agents get follow-up questions. The goal isn’t to eliminate them but to handle them efficiently while maintaining accuracy.

6. Common Objections (And Why They’re Wrong)

Let’s address a few excuses I hear all the time:

“We don’t have time to build a FAQ-layer or maintain it.”

Then you deserve to have a chatbot that spirals into nonsense. If you can’t allocate an engineer or a support rep to maintain 100–200 FAQ pointers, you’ll be stuck listening to “The bot gave me outdated info” complaints forever.

“Our knowledge base is too small; we don’t need RAG.”

Fine, roll a super-lightweight bot that just returns static answers. Just don’t call it “AI-powered.” If you have fewer than 50 FAQs and minimal docs, you’re not solving a search problem—you’re solving a “stupid page of text” problem. Either write a few more FAQs (AI can help) or slap a search box on your docs.

“We already fine-tuned a model on our docs; we don’t need RAG.”

If your docs changed last month, that fine-tuned model is already outdated. Fine-tuning is a slow, expensive pipeline. Unless you’re on a rigid quarterly release cycle and don’t add new features, a fine-tuned model becomes stale fast. RAG with a hybrid FAQ-links layer gets you “doc freshness” without retraining.

“We worry about data compliance—RAG might surface private data.”

If you’re truly in a regulated space, you should still have a controlled FAQ-links layer (only pointing to approved content) and a filtered RAG index (tagged by compliance status). The hybrid approach actually makes compliance easier: you gate which docs are indexed, and your FAQ-links layer only points to “approved” sections.

And yes, you can use AI to auto-generate and verify those FAQ entries so you’re not manually curating everything. I’ll walk through that in the next section.

7. Automating FAQ Expansion with AI

Alright, let’s talk about how you can stop hand-writing every FAQ entry and lean on AI to do most of the heavy lifting—while still ensuring correctness.

7.1 Why You Need AI to Bootstrap Your FAQ

Manual FAQ creation is painful. Writing 100–200 Q→links pairs by hand takes days or weeks.
Your docs are huge. Scanning them for the most common user pain points is like looking for a needle in a haystack.
Questions evolve. As soon as you publish your FAQ set, users will phrase things differently. You need a process to generate additional paraphrases.

Using AI, you can:

Generate candidate questions by having an LLM crawl your documentation and propose realistic user queries.
Link those questions to the exact doc sections that provide the answer.
Draft a brief answer stub using the same doc passages.
Automatically verify that the answer is correct—either by checking links, running code snippets in a sandbox, or comparing against a known-good test harness.
Human-in-the-loop: someone glances over and thumbs-up or tweaks as needed.

In practice, that can cut FAQ generation time by 90 %.

7.2 Step-by-Step: AI-Generated, Verified FAQs

Prompt the LLM to propose questions.
- Feed it your entire doc set (or a representative sample) and ask:
  
  “Based on our ServiceX Admin Guide and API Reference, list 50 realistic user questions someone might ask.”
- Let it output things like:
  - “How do I configure multi-region failover for ServiceX?”
  - “What parameters do I need to include to enable EU-region replication?”
  - “How can I reset my API token for ServiceX?”
For each generated question, find the best doc sections.
- Take that question, embed it, and run a retrieval against your document-chunk index.
- Fetch the top 3–5 passages that match. These become your candidate doc links (with section anchors, if possible).

Ask the LLM to draft an answer stub.

Give it the question plus the 3–5 passages, and request a concise, step-by-step answer with citations.

Example prompt snippet:

You are DocsBot. The user asked:
“What parameters do I need to include to enable EU-region replication?”

Here are the most relevant excerpts:
[1] “In the API call, set 'region' to 'eu-west-1' and include 'replicaCount' = 2.” (Source: ServiceX API Ref, Sec. 5.3)
[2] “EU-replication requires that your account has cross-region network enabled. Navigate to Networking → Regions → enable cross-region.” (Source: Admin Guide, Sec. 2.1)
[3] “If you’re using Terraform, set 'replication_region' = 'eu-west-1' in your resource.” (Source: Terraform Examples, p. 12)

Write a concise answer listing the needed parameters and steps, with citations.

Automatically verify correctness.
- If the answer includes an API snippet or a CLI command, run it in a sandbox environment or a test cluster to confirm it works exactly as described. For documentation-only steps, verify that the “Source” URLs actually resolve and that the sections contain the quoted text.
- For example, if the LLM says “Use enableFailover('eu-west-1', 'eu-central-1'),” have a test harness call that function in a mocked environment to ensure it doesn’t error out.
- Mark any failed verifications for manual review.
Human review & approval.
- A support or docs engineer skims each AI-generated entry, fixes minor wording issues, and clicks “Approve.” This usually takes a few seconds per FAQ if your verification passed.
Add to your FAQ index.
- Once approved, compute the question embedding, store the document links, and set last_updated to today. You’re done.

Rinse and repeat every week or quarter as your docs grow. With this pipeline, you can generate, verify, and deploy 50–100 new FAQ entries in an afternoon—all largely automated.

8. Putting It All Into Practice: An Example Walkthrough

Let’s run through a real-world scenario. Suppose you’re building a chat assistant for “ServiceX,” a cloud platform. You’ve got all the standard docs: an Admin Guide, API Reference, community Q\&A, etc.

8.1 Initial FAQ-Links Setup (Manual + AI)

Audit your support logs and identify your top 100 questions.
Manually craft 20–30 high-priority FAQ entries for the absolute must-have questions (billing, password reset, basic config).
Run your AI‐generation pipeline to propose 50 more candidates, link them to docs, draft answer stubs, and verify them in a sandbox. Now you have 70–80 entries total without spending days writing.

8.2 User Interaction

User asks:

“How do I configure multi-region failover for ServiceX in the EU region?”

You embed that question and compare it to your 80 FAQ vectors.
You see a match with “How do I configure multi-region failover for ServiceX in the EU?” at 0.92. That’s above your 0.90 threshold, so you go down the FAQ-links path.

8.3 FAQ-Links Path

Retrieve doc-links:
- https://docs.example.com/ServiceX/Admin#MultiRegion
- https://api.example.com/ServiceX/v3.1#FailoverParams
Fetch passages from just those two docs:
- “In the Admin Console, navigate to Settings → Multi-Region. Under ‘Primary Region,’ select eu-west-1. Click ‘Add Region,’ choose eu-central-1, then click Enable.”
- “Include this JSON in your POST to /v3.1/configure:
```
{
  "primaryRegion": "eu-west-1",
  "failoverRegion": "eu-central-1"
}
```
  ”
- “Ensure cross-region networking is toggled on (Networking → Regions → Cross-Region Enabled).”
Build prompt and call the LLM. The result:
Answer:
1. In the ServiceX Admin Console, go to Settings → Multi-Region (Source: ServiceX Admin Guide, Sec. 4.2). Under “Primary Region,” select eu-west-1.
2. Click Add Region and choose eu-central-1 as your failover. Then click Enable (Source: ServiceX Admin Guide, Sec. 4.2).
3. If you prefer the API, send a POST to /v3.1/configure with this JSON body:
  { "primaryRegion": "eu-west-1", "failoverRegion": "eu-central-1" }
  (Source: ServiceX API Reference, p. 78)
4. Finally, verify that cross-region networking is enabled by going to Networking → Regions and toggling Cross-Region Enabled (Source: ServiceX Admin Guide, Sec. 2.1).
User sees a concise, authoritative answer in under 300 ms. They know exactly which doc sections to click for more detail.

8.4 RAG Fallback (When FAQ Misses)

User asks:

“How do I configure ServiceX to use a custom DNS SRV record in Antarctica?”

FAQ matching fails (no similar question with similarity ≥ 0.90).
You embed the question and hit your full doc-corpus index. Top 50 chunks might include some “DNS setup” snippets, maybe an internal network forum thread, etc.
You pass those 50 to your cross-encoder, pick the top 5, and build a prompt. The LLM synthesizes an answer like:
Answer:
1. To configure a custom DNS SRV record, edit your ServiceX cluster’s dns_config.json to include the "_srv" key under your zone (Source: ServiceX Networking Guide, Sec. 3.4).
2. Ensure your Antarctica region supports custom DNS by contacting the Network Ops team (Source: ServiceX Network Ops Forum, post ID #3456).
3. Then, add the following snippet in your cluster deployment YAML:
  dnsConfig: recordType: _srv recordValue: "0 0 443 my-servicex.example.ant"
  (Source: ServiceX Deployment Examples, Sec. 5.2)
4. Finally, verify propagation with dig _srv.my-servicex.example.ant SRV (Source: ServiceX CLI Reference, p. 22).
User sees a detailed answer with exact citations. It’s slower (around 800 ms), but the query was rare and complex—exactly when you want RAG.

9. Maintenance: Keep It Simple, Keep It Current

A hybrid system is only as good as the process you set up. Here’s how to keep it updated and accurate—plus how to let AI pitch in so you’re not drowning in manual work.

9.1 Weekly/Monthly FAQ Review

Review top fallback queries. Pull the top 50 “RAG fallback” queries that are easy to answer—ones that your FAQ should have covered.
Use AI to propose new entries. Feed those fallback queries to an LLM and ask, “Does this match any existing FAQ? If not, generate a concise FAQ entry (question text + document links), then draft an answer stub.”
Auto-verify each AI-proposed entry with your test harness. If it passes, add it to the index. If not, flag it for manual review.
People still get the final say. A support or content engineer spends 1–2 hours per sprint skimming AI suggestions, approving or tweaking as needed.

9.2 Automated Doc Ingestion Pipeline

Trigger on doc updates. Hook your documentation repository (Markdown, HTML, PDF—whatever) to a CI job that, on every merge, re-chunks changed files, re-embeds them, and updates your vector DB.
Metadata hygiene. Tag each chunk with doc_version, last_updated_date, and compliance_status so you can filter retrievals and verify correct versions.

9.3 Monitor Hit Rates & Helpfulness

Track “FAQ hit vs. RAG fallback.” If FAQ hit rates dip below 70 %, your FAQ library is stale or missing high-volume questions. AI can help generate the missing entries.
Ask “Was this helpful? 👍/👎” after each chatbot response.
- If an FAQ answer gets too many 👎, rewrite it or re-verify the doc links.
- If a RAG answer gets too many 👎, either refine your RAG index or pull that question into the FAQ (with AI support).

9.4 Threshold Tuning Quarterly

Re-evaluate your similarity threshold based on real traffic. If you see too many false FAQ hits (users complaining “That wasn’t my question”), bump it up. If you see too many missed FAQ opportunities (users going to RAG for easy questions), lower it slightly.

If you skip these steps, your system decays: outdated answers, frustrated users, and a never-ending cascade of “bot not helpful” tickets.

10. Why This Hybrid Approach Feels So Good

Precision for Day-to-Day Questions. Instead of “generating” the answer to “How do I reset my password?”, the chatbot simply returns a vetted, pre-approved response (assembled on-the-fly via the relevant docs). Users see the exact steps every time—no variation, no guesswork.
Broad Knowledge for the Edge Cases. When someone asks “Can I use Feature X with Service Y’s beta API?” (something you haven’t FAQ’ed yet), the RAG system grabs the relevant API docs or forum threads. The LLM then knits them together in a coherent way.
Faster Performance & Lower Cost. 70–90 % of queries hit the small FAQ index, which takes under 50 ms to fetch and assemble snippets from two or three linked docs. Only the rarer 10–30 % of queries spin up the heavier RAG pipeline, which might take a few hundred milliseconds more.
Keep Content Fresh with Minimal Effort.
- As soon as you publish new documentation, your RAG pipeline picks it up via your CI job.
- As soon as you identify a new high-volume question, AI helps you generate and verify a FAQ entry in minutes.
- No more “hiring a content team” to rewrite everything every quarter.
Better User Trust. When users see a link to a specific doc section or an official FAQ entry, they feel confident. They know the bot isn’t just “making things up.”

11. Common Objections (Revisited)

Let’s re-hash a few excuses—and why they don’t hold water now that AI can help:

“We don’t have time to build or maintain a FAQ-layer.”

Fine—let your chatbot suck. Or use AI to generate and verify hundreds of FAQ entries in a weekend. It’s literally 1 hour of human oversight per 50 entries.

“Our knowledge base is too small; we don’t need RAG.”

If it’s truly tiny, drop RAG. But lean on AI to fill out your FAQ-Links so you actually have something useful to show. Otherwise, you’re just handing users a static PDF.

“We already fine-tuned a model on our docs; we don’t need RAG.”

That fine-tuned model is stale within weeks. You’ll spend more on retraining than on maintaining an FAQ-links + RAG setup. Plus, with RAG you get citations.

“We worry about compliance—RAG might surface private data.”

With a hybrid approach, you can tag docs as “compliance-approved” and only index those. AI can even help you scan new docs for PII before ingestion. The hybrid method makes compliance easier, not harder.

12. Final Thoughts: Don’t Chase Shiny Objects

Look, I get it—everyone wants to hype “AI” like it’s magic. But the second you push a “full RAG” or “mega-fine-tune” solution to production without a solid FAQ-first strategy, you’re setting yourself up for pain. Users will test every edge case, and when the bot inevitably flubs a basic question, they’ll lose faith.

The truth is this:

FAQ-links + snippet retrieval + LLM synthesis = trustworthy, fast, maintainable.
Pure RAG = slow, expensive, hallucination-prone (unless you pour endless resources into perfecting your index).
Pure fine-tuning = stale answers (unless you constantly retrain, which is a massive operational burden).

Now, with AI-generated, automatically verified FAQs, you have zero excuse to skimp on the FAQ layer. Let AI draft those Q→links pairs, validate them in a sandbox, and have a support engineer do a quick skim. You’ll save weeks of manual work and your chatbot will actually help people instead of embarrassing you in front of customers.

So be honest with yourself: do you want a chatbot that actually helps people, or do you want another “AI project” that looks cool until it inevitably breaks? Build the hybrid—keep that FAQ index tight with AI’s help—and only let RAG handle the weird stuff. Your users will thank you. Your engineers will thank you. And when Monday rolls around, you won’t dread “Did the chatbot blow up over the weekend?” Instead, you’ll see “FAQ hit rate: 85%,” “Helpfulness: 92%,” and go grab a coffee, knowing your system is solid.

That’s how you build a smarter chatbot—no myths, no excuses, just a straightforward architecture that solves real problems.

How to enable post-quantum key exchange and AEGIS in Nginx

2025-03-27T00:00:00+01:00

TLS1.3 is evolving to meet new security challenges.

Two recent developments are post-quantum key exchange and the proposed AEGIS-based TLS cipher suites.

With OpenSSL 3.5 (including specialized branches that enable AEGIS), you can easily compile and run Nginx with these features.

Let’s see how to build Nginx with X25519MLKEM768 and AEGIS.

Prerequisites

Build tools like make, gcc (or clang), and perl (for OpenSSL).
PCRE libraries for Nginx (if building with the recommended modules).
Git (to clone the AEGIS-enabled OpenSSL repository).
Nginx source tarball from nginx.org.

Step 1: Extract the Nginx source code

tar xzf nginx-*.tar.gz
cd nginx-*

Step 2: Clone the AEGIS-enabled OpenSSL source code

git clone --branch=openssl-3.5.0-beta1-aegis --depth=1 \
  https://github.com/aegis-aead/openssl.git /tmp/openssl-src

This branch includes:

The post-quantum key exchange mechanis, (X25519MLKEM768)
AEGIS cipher suites such as TLS_AEGIS_128L_SHA256 and TLS_AEGIS_128X2_SHA256

Step 3: Configure Nginx to use the custom OpenSSL

From the extracted Nginx source directory:

./configure \
  --prefix=/opt/nginx \
  --conf-path=/etc/nginx/nginx.conf \
  --user=www-data --group=www-data \
  --with-openssl=/tmp/openssl-src \
  --with-http_ssl_module \
  --with-http_v3_module \
  --with-http_v2_module \
  --with-http_realip_module \
  --with-http_mp4_module \
  --with-http_gzip_static_module \
  --with-pcre-jit \
  --with-http_stub_status_module

You can adjust:

--prefix (where Nginx will be installed)
--conf-path (location of the main Nginx configuration file)
--user and --group (system user/group for Nginx to run under)

Finally, build and install:

make && make install

Note that OpenSSL will only be statically linked into Nginx. It will not be installed, nor will it overwrite any existing OpenSSL installation.

Step 4: Enable AEGIS in Your Nginx Configuration

Open your Nginx configuration file (usually /etc/nginx/nginx.conf) and add the following line inside the http block:

ssl_conf_command Ciphersuites "TLS_AEGIS_128L_SHA256:TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384";

This line instructs Nginx (through OpenSSL) to use AEGIS-128L first, falling back to AES-based ciphers if needed. For testing, you can also try TLS_AEGIS_128X2_SHA256 though it may not yet have an officially assigned TLS identifier.

Step 5: Restart and Test

Restart Nginx with your new configuration. Then, test connectivity with a client that supports the new features. For example, using BoringSSL’s bssl client (or any PQ-enabled TLS client):

bssl client -connect libsodium.org -curves X25519MLKEM768

You should see a successful handshake similar to:

Connected.
  Version: TLSv1.3
  Resumed session: no
  Cipher: TLS_AEGIS_128L_SHA256
  ECDHE group: X25519MLKEM768
  Signature algorithm: ecdsa_secp256r1_sha256
  Secure renegotiation: yes
  ...

You have successfully enabled AEGIS encryption and the post-quantum key exchange mechanism.

Your Nginx server can now negotiate forward-looking, secure TLS sessions as soon as clients start adopting these new features.

A simple tweak to make static shared keys suck less

2025-02-27T00:00:00+01:00

Public-key cryptography is a powerful tool, but for performance and simplicity, static shared symmetric keys are still widely used.

For example, an API token can be a simple random byte sequence stored in a database to authenticate clients. Similarly, if two servers are operated by the same company, one server can encrypt data for the other using a shared symmetric key present on both hosts—this is a common practice.

Additionally, applications often use data concatenated with a signature to prevent tampering. A common example is JWT tokens. When these signatures are generated and verified by the same organization or between trusted parties, public-key cryptography may not feel necessary. Instead, a HMAC construction (as used in the HS* JWT algorithms) is simpler and faster.

Since TLS is widely used, this approach may not seem overly risky. However, a major issue remains: if a server is compromised, if a secret key appears in a leaked SQL dump, or if it is hardcoded in a repository that becomes public, attackers can forge valid tokens. Even without an explicit leak, more employees than necessary may have access to the secret, leading to security risks.

Public-key cryptography or alternative mechanisms like hash chains would provide stronger security, but existing schemes using static shared secrets can still be improved with a trivial tweak.

Improving Static API Tokens

The simplest implementation of static shared secrets works as follows:

A random, static secret x is generated and stored on both the client and server.
To authenticate, the client sends x to the server.
The server verifies that the received value matches x.

This protocol can be improved with a small change:

A random, static secret x is created.
x is stored only on the client.
The server stores z = HASH(x), not x itself.
To authenticate, the client sends x.
The server computes HASH(x) and verifies that it matches the stored value z.

At first glance, this may seem as weak as the original method, but there is a key difference: the server no longer stores the actual secret. Assuming HASH is a secure hash function, leaking z does not allow an attacker to authenticate. In fact, z can be hardcoded into applications and remain publicly visible, while the actual secret x stays only on the client.

Of course, x can still be leaked if the client is compromised, if the connection is intercepted, or if the server logs received values maliciously. However, this approach is comparable to hashing passwords instead of storing them in plain text. It significantly improves security with minimal effort and negligible performance overhead. Since secrets are not subject to dictionary attacks, a single round of a fast, standard hash function is sufficient.

Improving Authentication Tokens

Authentication tokens use a secret key to sign and verify arbitrary data—this is how HS256 JWT tokens work, for example. A typical signing and verification process using a static shared secret follows this pattern:

A random, static secret k is created and shared between the signer and verifier.
To ensure data cannot be modified, the signer computes t = HMAC(k, data).
The signer sends data and t to the verifier.
The verifier recomputes HMAC(k, data) and checks that it matches t. Since generating a valid t requires knowing k, unauthorized modifications are prevented.

This process can be enhanced as follows:

A random, static secret k is created and shared between the signer and verifier.
Another secret u is generated, stored only on the signer. The verifier stores z = HASH(u) instead.
To sign data, the signer computes t = u || HMAC(k || u, data).
The signer sends data and t to the verifier.
The verifier extracts u and the MAC from t. It then verifies that HASH(u) matches z and that HMAC(k || u, data) matches the received MAC.

This tweak does not improve security on the signer’s side, but it does enhance security for the verifier. Since the verifier no longer stores the full key, leaking z alone is insufficient for forging valid tokens—an attacker would also need to obtain u.

The same process can be applied to encryption using static, shared keys.

A Simple Yet Effective Trick

Splitting keys has numerous security applications, and the examples above are just some of the simplest implementations. However, despite being easy to implement and deploy in virtually any environment, this technique remains underutilized.

While this trick does not turn basic authentication mechanisms into a silver bullet, it provides a meaningful security improvement with minimal effort. Any opportunity to enhance security with such a low cost is worth considering!