RAG-based compliance assistant — from zero to production in 4 weeks
A German Series A fintech needed a compliance assistant that could review regulatory documents against their product terms — a task their team spent 3+ hours on per submission. We built a multi-stage RAG pipeline and shipped it to production in 4 weeks.
Challenge
What was broken
Every new financial product at this Series A startup required a compliance review against EU regulatory documents — MiFID II, PSD2, and internal product terms. Each review took a senior compliance officer 3–4 hours. As the product line expanded, the compliance bottleneck was slowing down new launches by weeks. Hiring more compliance staff wasn't viable at their stage. They needed software that could do the heavy lifting — but the accuracy bar was non-negotiable.
Our Approach
How we thought about it
The key design decision was the accuracy requirement. Standard single-pass RAG retrieval wouldn't get above 85–88% on their test set, which wasn't good enough. We designed a three-stage pipeline: semantic retrieval (broad), cross-encoder reranking (precise), and a final reasoning pass with GPT-4o that cited specific clause numbers. The verification layer compared the AI's output against a manually labelled golden dataset before any result was surfaced to the compliance team.
Solution
What we built
We built a Next.js frontend with a Node.js backend, Pinecone for vector storage of chunked regulatory documents, and the OpenAI API for both embedding and reasoning. All document uploads, query logs, and AI decisions are audit-logged to PostgreSQL on AWS RDS — a hard compliance requirement. The system integrates with their existing Slack workflow: compliance officers receive a structured report with cited clauses, a confidence score, and a clear recommendation. Borderline cases are automatically escalated for human review.
Results
What shipped
The system handled its first real compliance review in week 5. Accuracy on the production corpus came in at 99.2%, well above the 95% threshold set as a go/no-go criterion. Manual review time dropped 65%. The compliance team now handles twice the volume with the same headcount, and the CTO reported that new product launches have accelerated by an average of 3 weeks.
System overview
“The accuracy was the thing that surprised us most. We had budgeted for 80% and planned to have humans review the rest. Getting 99.2% changes everything — it means we can actually scale compliance without scaling headcount.”
Got a similar challenge?
Let's talk about your situation — 30 minutes, no commitment, and you'll leave with a clearer picture of how to move forward.