I Built ChatGPT for MongoDB in 5 Days (And Open-Sourced It)
How I turned our database into something our team can actually talk to

"Show me all React developers who interviewed last month with evaluation scores above 85."
At our recruitment SaaS, questions like this were becoming a daily headache. Every time someone needed insights from our MongoDB database, it meant an engineering ticket. We were getting 10+ of these requests daily.
So I spent 5 days building our own "ChatGPT for MongoDB" - a system that lets our team ask database questions in plain English and get instant answers.
It worked so well that I decided to open-source the whole thing.
The Problem That Pushed Me to Build This
I'm building a SaaS for recruitment. Our users manage thousands of job applications, interviews, and candidate profiles in MongoDB.
The constant requests kept coming:
- "Show me candidates with React experience from recent applications"
- "Which interviews had the highest scores?"
- "Find applicants who might be good fits for this role"
Each question meant our engineering team had to:
- Understand what they actually wanted
- Write a custom MongoDB query
- Deploy it and explain the results
- Repeat for the next question
The numbers were brutal:
- 10+ query requests per day
- 1-2 hours of dev time each day
- Product development was slowing down
What I Built: ChatGPT, But for Our Database
Instead of writing complex MongoDB queries like this:
db.applicants.aggregate([
{ $match: { "skills": "React", "appliedDate": { $gte: lastMonth } } },
{ $lookup: { from: "interviews", localField: "_id", foreignField: "applicant" } },
{ $match: { "interviews.score": { $gte: 85 } } }
])
Our team can now just ask:
"Show me React developers who interviewed last month with scores above 85"
The system understands the question, plans the right queries, and responds conversationally with the data and insights.
The Technical Challenges I Had to Solve
Building a reliable "ChatGPT for databases" meant solving several problems:
1. The Schema Problem
MongoDB schemas change constantly. Hardcoded mappings break within days.
My solution: Built dynamic schema introspection that reads our Mongoose models in real-time, so the AI always knows our current database structure.
2. Complex Query Planning
Simple questions like "how many users?" are easy. But "show me React developers who aced their interviews" requires understanding relationships across multiple collections.
My solution: Used LangGraph to build an AI agent that can reason through multi-step database operations, just like a human developer would.
3. AI Hallucination Prevention
LLMs love making up field names and assuming data that doesn't exist in your actual database.
My solution: Schema-first approach where the AI always checks the real database structure before building any queries.
4. Conversation Flow
Real usage isn't one-off questions. It's "show me top candidates" followed by "now show me their interview scores."
My solution: Redis-backed memory so the AI remembers context across the entire conversation.
The Architecture: How It Actually Works
Natural Language Question
↓
🤖 AI Agent (Claude 3.5)
↓
📋 Dynamic Schema Reader (checks current Mongoose models)
↓
🔧 MongoDB Query Tools (find/aggregate/count)
↓
💾 Redis Memory (maintains conversation context)
↓
📄 Smart Response (data + insights + explanations)
The Smart Schema System
Every query starts here:
// Reads our actual Mongoose models at runtime
const schema = extractCompleteMongooseSchema(ApplicantModel);
// Result: "skills: Array<String>, resumeAnalysis.score: Number(0-100)"
The AI knows exactly what fields exist, their types, and their constraints.
AI Agent Reasoning
Watch how the AI thinks through a complex question:
User: "Show me our top candidates"
AI: 🔍 Let me check the applicants schema first...
AI: 💭 Found 'compatibilityScore' field (0-100), I'll sort by that
AI: ⚡ Running query: find({}, {sort: {compatibilityScore: -1}, limit: 10})
AI: 💬 "Here are your top 10 candidates, mostly senior developers..."
What I Learned Building This
Claude beats GPT for database work:
- Claude 3.5 is much better at complex reasoning
- GPT-4 makes up field names more often
- Switching to Claude cut our error rate significantly
Token optimization matters:
- Started by sending all schemas with every query (expensive!)
- Now the AI only asks for schemas it actually needs
- Cut token costs by 40%
Error handling is crucial:
- Complex aggregations sometimes fail
- Built smart fallbacks: try aggregation → fallback to simple find()
- If a field doesn't exist, re-check schema and retry
Conversation memory changes everything:
- Users ask 3-4 follow-up questions on average
- "Now show me their interview scores" should just work
- Redis sessions make it feel like talking to a person
The Results That Matter
After deploying our "ChatGPT for MongoDB":
✅ Engineering requests: 10/day → 0
✅ Query response time: 2 hours → 30 seconds
✅ Team productivity: Significantly improved
✅ New insights: Users ask questions they never asked before
More importantly, our non-technical team members became confident exploring data themselves.
Open Source: Try Your Own ChatGPT for MongoDB
This solved our recruitment SaaS problem, but I realized every company with MongoDB probably faces similar challenges.
So I've open-sourced the complete system:
🔗 GitHub: mongodb-nl-query-demo
What's included:
- Full working system with demo e-commerce data
- AI agent setup and prompt engineering
- Dynamic schema introspection code
- Redis conversation memory
- Complete adaptation guide for your database
Built for real use:
- TypeScript throughout for reliability
- Production error handling and recovery
- Rate limiting and security considerations
- Token optimization for cost control
Quick Start: Get Your Own Running
# Clone and set up
git clone https://github.com/salmankhan-prs/mongodb-nl-query-demo
cd mongodb-nl-query-demo
pnpm install
# Add your API keys
cp .env.example .env
# Edit .env with MongoDB URI and Anthropic API key
# Load demo data and start
pnpm seed
pnpm start:dev
# Test it out
curl -X POST http://localhost:3000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "Show me all users from USA"}'
Try questions like:
- "How many products do we have in each category?"
- "Show me customers who spent the most money"
- "Which orders were delivered successfully?"
Adapt It to Your Database
The magic is in the dynamic schema reading. To use your own data:
- Replace the models in
src/models/with your Mongoose schemas - Update collection names in
src/types/index.ts - Run the schema generator:
pnpm generate:schemas - Start asking questions about your actual data
The system automatically discovers your field types, relationships, constraints, and enum values.
Why This Actually Matters
When anyone on your team can ask database questions directly:
- Decisions happen faster (no engineering bottlenecks)
- More insights get discovered (easier to explore data)
- Engineering focuses on features (not custom queries)
- Data becomes accessible (non-technical users gain confidence)
The Tech Stack That Worked
- AI Model: Claude 3.5 Sonnet (superior reasoning for databases)
- Agent Framework: LangChain + LangGraph
- Memory: Redis for fast session storage
- Backend: Express + TypeScript
- Database: MongoDB + Mongoose (enables dynamic introspection)
What's Next
I'm excited to see what people build with this. Some ideas for extensions:
- Write operations (INSERT, UPDATE, DELETE with safety checks)
- Web interface (React app for non-technical users)
- Advanced analytics (trend analysis, predictive insights)
Try It Out
This represents 5 days of focused work solving a real problem we faced every day. If you're dealing with similar database query bottlenecks, maybe it'll help you too.
GitHub: mongodb-nl-query-demo
Questions? Reach out on Twitter or LinkedIn. I'd love to hear what you build with it.
Built this because we needed it. Sharing it because you might too.


