FilterHN

Ask HN: Lessons from Building a Fortune 500 RAG Chatbot (50M Records in 10–30s)

19 points

3 months ago

| 7 comments

I’ve spent the past year and a half constructing a Retrieval Augmented Generation (RAG) chatbot for a Fortune 500 manufacturing company, integrating over 50 million records across a dozen databases. Despite that scale, the system can return relevant info in 10–30 seconds, and it’s now at 90% five-star user approval internally.

After tons of trial and error—embedding huge datasets, mixing vector + text search, handling concurrency, and dodging hallucinations, I decided to document it all in a book. It’ll be live on Manning.com’s Early Access soon (March 27th). If you’re tackling large-scale RAG or have questions about my approach (the struggles, the successes), feel free to ask. I’m happy to share lessons, config ideas, or gotchas so you can avoid the pitfalls I hit along the way.

▲

tntpreneur

3 months ago

[-]

I really love to hear about details. I have plan to build a RAG based on regulations. It is very hard because source are different and reading and interpreting legal documents completely different area of expertise. I can't answer some questions? - How can I start small in very specific area? - How can I grow it? - How to validate and measure success of the RAG solution? - How to feed with data continuesly?

▲

tylersuard

3 months ago

[-]

How to start small in a very specific area: 1. Write down a list of 10 to 20 questions that you want your RAG to be able to answer. Then write down the correct answers next to each question. If you don't know the correct answers, ask a subject matter expert. Build a RAG chatbot that can answer those questions first. 2. You have to start out using tools that are capable of growing with you. For example, don't use ChromaDB because it can only fit in memory. Use services that can scale up. 3. I cover this in my book, you can use the 10 to 20 questions as integration tests and run them every time you commit code to make sure everything still works. 4. How to feed with data continuously... do you mean keeping your databases updated? You can write a script to check for regulations published in the last week, and only upload those regulations to your database.

I hope all that helps, let me know if you have any other questions!

▲

romanhn

3 months ago

[-]

Focusing on the RA part of the RAG, which techniques or tools would you say contributed the most to the quality of the results? What sort of tradeoffs did you have to make?

▲

tylersuard

3 months ago

[-]

EXCELLENT question. We tried about 5 different ways of retrieving data, and we found that what works best for us is a search-as-a-service. We use Azure AI Search, but there are lots of other ones out there, including Google Vertex AI Search, Algolia, Amazon CloudSearch, or ElasticSearch.

What tradeoffs? It is fast and accurate, but it does get expensive when you have over 50 million records.

▲

vlit20

3 months ago

[-]

How did you measure success of the RAG solution beyond five-star user approvals? Are there any critical metrics that determine success or failure?

▲

tylersuard

3 months ago

[-]

We have tests in place to make sure every function, every search is working, otherwise things won't deploy. We do have a dashboard with usage etc., but I have not spent much time looking at it. I think we just count broad usage across the company as a success. And yes we consider one-stars and complaints to be a failure, but we get those less often, maybe once every 2 weeks, and it is almost always the user's fault. Does that answer your question?

▲

zergnick

3 months ago

[-]

When you are dealing with documents with different structures, how to do the document chunking efficiently without losing important metadata?

▲

tylersuard

3 months ago

[-]

First of all, great question.

Second, we use a search service, and vectors are treated as supplementary to the text search, so chunking doesn't matter as much. We will usually take an entire PDF page and embed that, no matter what structure the data on that page is. We do keep track of the name of the document and the page number. For SQL records, we just turn each record into a text string and embed that.

▲

zergnick

3 months ago

[-]

Thanks for your feedback! Could you share a bit about your team? I’m curious how many people are involved and what kinds of skills or roles are needed to make this happen.

▲

karanveer

3 months ago

[-]

do share the db you used for starters and the overall stack like MERN or MEAN or Firestore BAAS or Supabase or something extremely different..

▲

tylersuard

3 months ago

[-]

Ok so the company has like 20 databases, plus over 100,000 pages of PDF catalogs. We tried using agents to query the company's SQL databases, but that took 30 seconds each call, and that is unacceptable because we wanted to return an answer to the user in 10 to 30 seconds. So what we ended up doing is, we created an Azure AI Search service and we made a different search index (like a collection) for each data source, one for each database and one for our repository of 100,000 pdf pages.

Our stack was just Python, Autogen for the agents, and as I mentioned Azure AI Search. We use Azure Web Apps for the backend, and OpenAI models for the generation. Great questions!

▲

decide1000

3 months ago

[-]

27th marked! What infrastructure did you use?

▲

tylersuard

3 months ago

[-]

Great, thank you!

The main program is hosted on Azure Web Apps, the search is Azure AI Search, we use AutoGen for the agents, and we use OpenAI for the generation. Azure has a lot of tools that support AI and search, so we use those too.

▲

__m

3 months ago

[-]

the title will be "Lessons from Building a Fortune 500 RAG Chatbot"?

▲

tylersuard

3 months ago

[-]

Ah, sorry, I forgot to mention it. The title will be "Enterprise RAG: Scaling Retrieval Augmented Generation"