PATHAN SALMAN KHAN's Blog

From Closet to Cloud: How I Made My Raspberry Pi 5 Accessible from Anywhere

PATHAN SALMAN KHAN — Fri, 03 Oct 2025 13:41:32 GMT

The Weekend Project That Actually Worked

I had a Raspberry Pi 5 sitting on my desk, and I wanted to host some side projects on it. Simple enough, right?

Wrong. The moment you try to make your home device accessible from the internet, you hit a wall of networking complexity.

Your Pi sits behind your home router. Your router sits behind your ISP. Your ISP probably puts you behind something called CGNAT. It's like being in a room, inside a building, inside a walled city.

So I spent a weekend figuring out how to punch through all these barriers. Turns out there are two reliable approaches that actually work.

The Two Paths: Port Forwarding vs Tunneling

After testing different approaches, there are two methods that reliably work:

Approach 1: Port Forwarding (direct network access) Approach 2: Tunneling (bypass all the networking complexity)

Let me walk you through both, so you can pick what works for your setup.

Approach 1: Port Forwarding (When Your ISP Cooperates)

Port forwarding is the traditional way to expose local services to the internet.

How it works: Your router creates a direct pathway from the internet to your Pi. When someone visits your-ip:8080, your router forwards that request to your Pi's port 8080.

When this works:

Your ISP gives you a real public IP address
Your router supports port forwarding configuration
You're okay managing dynamic IP changes

Quick Setup Steps:

Step 1: Check if you have a public IP

# On your Pi, check what the internet sees
curl ifconfig.me

# Compare with your router's WAN IP in admin panel
# If they match, you have a public IP!

Step 2: Configure your router

Access router admin (usually 192.168.1.1)
Find "Port Forwarding" or "Virtual Server" section
Add rule: External Port 8080 → Your Pi's IP → Internal Port 8080

Step 3: Test it

# Start a simple server on your Pi
python3 -m http.server 8080

# Test from outside your network
curl http://YOUR-PUBLIC-IP:8080

The Reality Check: This used to work reliably, but many modern ISPs use something called CGNAT (Carrier-Grade NAT). This means multiple customers share the same public IP address, making port forwarding impossible.

Signs you're behind CGNAT:

Router shows different IP than curl ifconfig.me
Port forwarding rules don't work from external networks

If port forwarding doesn't work for you, tunneling is the solution.

Approach 2: Tunneling (The Universal Solution)

What is Tunneling?

Think of tunneling like this: instead of trying to punch a hole through all the firewalls to reach your Pi, your Pi creates an outbound connection to a server on the internet. When someone wants to access your Pi, they connect to that server, which forwards the request through the existing connection.

Internet User → Tunnel Server → (Existing Connection) → Your Pi

The key insight: outbound connections (Pi → Internet) almost always work, even behind complex firewalls. Inbound connections (Internet → Pi) are what get blocked.

Why tunneling works everywhere:

Your Pi initiates the connection (outbound = allowed)
No port forwarding needed
Works behind any firewall/NAT
Tunnel server handles the public internet part

There are several good tunneling solutions:

Cloudflare Tunnel (Free)

Best for: Custom domains, reliable service
Pros: Free, HTTPS automatic, very stable
Cons: Requires Cloudflare account

Tailscale Funnel (Free)

Best for: Quick testing, simple setup
Pros: Zero configuration, works instantly
Cons: URLs are auto-generated

ngrok (Free tier available)

Best for: Development, temporary sharing
Pros: Simple command, great for demos
Cons: Free tier has limitations

Pinggy (Free tier available)

Best for: Alternative to ngrok
Pros: No account needed for basic use
Cons: Limited free usage

For this guide, I'll show you Cloudflare Tunnel because it's free, reliable, and works great for side projects.

Setting Up Cloudflare Tunnel

This works the same whether you're using a Raspberry Pi , or even an old laptop.

Install cloudflared

For Raspberry Pi 5/4 (ARM64):

wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64
sudo mv cloudflared-linux-arm64 /usr/local/bin/cloudflared
sudo chmod +x /usr/local/bin/cloudflared

For old laptops (x86_64):

wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
sudo mv cloudflared-linux-amd64 /usr/local/bin/cloudflared
sudo chmod +x /usr/local/bin/cloudflared

Verify installation:

cloudflared --version

Option 1: Quick Anonymous Tunnel (No Domain Required)

The fastest way to get started - no authentication, no domain setup required!

Start Your Local Service

# Create a test page
echo "Hello from my Pi 5!
This is accessible from anywhere!
" > index.html

# Start simple server
python3 -m http.server 3000

Create Instant Tunnel

# One command to expose your local server
cloudflared tunnel --url http://localhost:3000

That's it! You'll see output like:

+--------------------------------------------------------------------------------------------+
|  Your quick Tunnel has been created! Visit it at (it may take some time to be reachable):  |
|  https://random-words-123.trycloudflare.com                                                |
+--------------------------------------------------------------------------------------------+

Your local server is now accessible from anywhere with automatic HTTPS! Share the URL with anyone - they can access your Pi from anywhere in the world.

Perfect for:

Quick demos and testing
Sharing work-in-progress with friends
Development and prototyping
When you don't have a domain

Option 2: Custom Domain Setup (For Your Own Domain)

If you have your own domain managed by Cloudflare, you can create permanent tunnels with custom subdomains.

Step 1: Authenticate with Cloudflare

# This will open a browser for login
cloudflared tunnel login

If you're using SSH (headless setup): The command will show you a URL. Copy it and open it on any device where you're logged into Cloudflare. The authentication will complete automatically on your Pi.

Step 2: Create Your Tunnel

# Create a named tunnel for your projects
cloudflared tunnel create my-home-server

# Note the tunnel ID that appears - save it!

Step 3: Configure DNS Routing

# Connect your domain to the tunnel (replace with your domain)
cloudflared tunnel route dns my-home-server projects.yourdomain.com

Step 4: Create Configuration

Create the tunnel configuration file:

nano ~/.cloudflared/config.yml

Add this configuration:

tunnel: YOUR-TUNNEL-ID-HERE
credentials-file: /home/pi/.cloudflared/YOUR-TUNNEL-ID.json

ingress:
  - hostname: projects.yourdomain.com
    service: http://localhost:3000
  - service: http_status:404

Step 5: Run Your Named Tunnel

cloudflared tunnel run my-home-server

Now visit https://projects.yourdomain.com - you should see your Pi's webpage!

Step 6: Multiple Services

You can host multiple projects on different subdomains:

tunnel: YOUR-TUNNEL-ID-HERE
credentials-file: /home/pi/.cloudflared/YOUR-TUNNEL-ID.json

ingress:
  - hostname: api.yourdomain.com
    service: http://localhost:3001
  - hostname: blog.yourdomain.com
    service: http://localhost:3002
  - hostname: files.yourdomain.com
    service: http://localhost:3003
  - service: http_status:404

Step 7: Auto-Start on Boot

Make your tunnel start automatically:

# Install as system service
sudo cloudflared service install --config ~/.cloudflared/config.yml

# Enable auto-start
sudo systemctl enable cloudflared
sudo systemctl start cloudflared

# Check status
sudo systemctl status cloudflared

Quick Start for Beginners

Want to try this right now? Here's the fastest way:

1. Get your device ready

# Update everything (works on Pi or laptop)
sudo apt update && sudo apt upgrade -y

# Install basics
sudo apt install curl python3 -y

2. Install cloudflared

# Choose the right version for your device
# ARM64 for Pi 5/4, amd64 for laptops
wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64
sudo mv cloudflared-linux-arm64 /usr/local/bin/cloudflared
sudo chmod +x /usr/local/bin/cloudflared

3. Create something to host

mkdir ~/my-website && cd ~/my-website
echo "My First Pi Website!
" > index.html
python3 -m http.server 8000

4. Create instant tunnel

cloudflared tunnel --url http://localhost:8000

5. Share your creation Your website is now accessible at the random URL from anywhere in the world!

The Bottom Line

A couple months ago, I wanted to host some side projects but didn't want to pay for cloud hosting or deal with complex networking. Today, I'm running multiple web services from my Pi 5 that are accessible from anywhere.

The tunneling approach solved every networking challenge. No complex router configuration, no ISP limitations, no monthly hosting fees. Traditional hosting costs $5-10/month, while this setup costs basically nothing after the initial Pi purchase ($80 one-time).

Key takeaways:

Modern networking makes direct connections difficult
Tunneling bypasses all these limitations reliably
Works on Pi or any old laptop
Start with anonymous tunnels (one command!)
Upgrade to custom domains when you're ready
Perfect for side projects and learning
Costs basically nothing after initial hardware

When it makes sense:

Personal projects and experiments
Learning web development
Sharing projects with friends
Development and testing environments

Give it a try! Worst case, you spend a weekend learning about networking and hosting. Best case, you never pay for basic web hosting again.

Questions? Hit me up on Twitter or LinkedIn. I'd love to see what you build!

Running Immich with S3 Storage: A Complete Developer Guide

PATHAN SALMAN KHAN — Fri, 19 Sep 2025 03:34:02 GMT

I was thinking for many days about hosting my own Google Photos alternative. I found many options, but Immich is very close to what Google Photos offers. I'm using a $5 Hetzner Cloud machine, but there's a problem.

What is Immich?

Immich is an open-source, self-hosted photo and video management solution that you can think of as your own personal Google Photos.

Key features:

Upload photos from mobile and web
Automatic backup from your phone
Face recognition and search
Album creation and sharing
Timeline view of your memories
Machine learning for photo tagging

The best part? You own your data completely. No monthly subscriptions, no privacy concerns, and no storage limits imposed by big companies.

What is S3FS?

S3FS-FUSE is a clever tool that makes cloud storage (like Amazon S3) appear as a regular folder on your server.

How it works:

Your server sees a normal folder: /opt/immich/library/upload
But this folder is actually connected to your S3 bucket in the cloud
When Immich writes a photo to this "local" folder, it actually gets stored in S3
When Immich reads a photo, s3fs fetches it from S3 transparently

Think of it as a bridge between your server and cloud storage. Your applications don't know the difference - they just see a regular folder, but everything is actually stored in the cloud.

The Storage Problem

Why do we need S3 storage? First, let me explain the problem.

On a $5 machine or any fixed-price server, you get limited disk space - maybe 20GB, 40GB, or 80GB. For my use case, to attach more space, I have to pay more dollars, which becomes very costly.

So I thought: why not use S3 storage?

There are some disadvantages of using S3 as storage - it makes things a little bit slow. But I mostly use it as backup and for writes rather than very frequent reads, so that was okay for me.

Why Use S3 with Immich?

Benefits of S3 Storage:

Unlimited Capacity: No more worrying about disk space
Cost Effective: Pay only for what you store
Portability: Deploy Immich anywhere while keeping the same storage

Trade-offs to Consider:

Latency: Slight delay when accessing photos (typically 100-500ms)

Architecture Overview

Our setup uses s3fs-fuse (you can read more here) to mount an S3 bucket as a local filesystem. This allows Immich to read/write files as if they were stored locally, while actually storing everything in S3.

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Immich    │◄──►│   s3fs      │◄──►│  S3 Bucket  │
│ Application │    │   Mount     │    │   Storage   │
└─────────────┘    └─────────────┘    └─────────────┘

Prerequisites

AWS Account with S3 access
Docker and Docker Compose installed
Basic Linux command line knowledge
Server with internet connectivity

Step 1: Create S3 Bucket

First, create an S3 bucket for your photos:

# Configure AWS CLI
aws configure

# Create your bucket (replace with your preferred name and region)
aws s3 mb s3://my-immich-photos --region us-east-1

Optional: Enable Transfer Acceleration for faster uploads

aws s3api put-bucket-accelerate-configuration \
    --bucket my-immich-photos \
    --accelerate-configuration Status=Enabled

Step 2: Install and Configure s3fs

Install s3fs-fuse to mount your S3 bucket as a local filesystem:

# Ubuntu/Debian
sudo apt update
sudo apt install -y s3fs

# CentOS/RHEL
sudo yum install -y s3fs-fuse

Create credentials file for s3fs:

# Create credentials file
echo "$(aws configure get aws_access_key_id):$(aws configure get aws_secret_access_key)" \
  | sudo tee /etc/passwd-s3fs > /dev/null
sudo chmod 600 /etc/passwd-s3fs

Step 3: Set Up Immich Directory Structure

Create the directory structure for Immich:

# Create Immich directory
sudo mkdir -p /opt/immich
cd /opt/immich

# Create mount point for S3 storage
sudo mkdir -p /opt/immich/library/upload
sudo chown -R 1000:1000 /opt/immich/library

Step 4: Mount S3 Bucket

Mount your S3 bucket to the Immich upload directory:

# Mount S3 bucket
sudo s3fs my-immich-photos /opt/immich/library/upload \
  -o allow_other \
  -o nonempty \
  -o use_cache=/tmp \
  -o passwd_file=/etc/passwd-s3fs \
  -o endpoint=us-east-1 \
  -o url=https://s3.us-east-1.amazonaws.com

Verify the mount:

mount | grep s3fs
ls -la /opt/immich/library/upload  # Should show your S3 bucket contents

Step 5: Create Required Subdirectories

Immich requires specific subdirectories with marker files:

BASE="/opt/immich/library/upload"

# Create required subdirectories
for DIR in upload thumbs library encoded-video profile backups; do
  FULL="$BASE/$DIR"
  sudo mkdir -p "$FULL"
  sudo touch "$FULL/.immich"
  sudo chmod 777 "$FULL/.immich"
done

Step 6: Configure Immich with Docker Compose

You can refer to the official docs for more details.

Create the Docker Compose configuration:

docker-compose.yml:

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c", "shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Create .env file:

cat > .env << 'EOF'
# Point to your S3 mount directory
UPLOAD_LOCATION=/opt/immich/library/upload

# Immich version
IMMICH_VERSION=release

# Database configuration
DB_PASSWORD=your-secure-password
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
DB_DATA_LOCATION=/opt/immich/postgres

# Redis configuration
REDIS_HOSTNAME=immich_redis
EOF

Step 7: Start Immich

Start your Immich deployment:

# Start all services
docker compose up -d

# Check status
docker compose ps
docker compose logs immich_server

Step 8: Configure Auto-Mount on Boot

Create a systemd service to automatically mount S3 on server reboot:

sudo tee /etc/systemd/system/s3fs-immich.service > /dev/null << 'EOF'
[Unit]
Description=S3FS Mount for Immich Photos
After=network.target

[Service]
Type=forking
User=root
ExecStart=/usr/bin/s3fs my-immich-photos /opt/immich/library/upload -o passwd_file=/etc/passwd-s3fs,allow_other,use_cache=/tmp,endpoint=us-east-1,url=https://s3.us-east-1.amazonaws.com
ExecStop=/bin/umount /opt/immich/library/upload
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start the service
sudo systemctl enable s3fs-immich.service
sudo systemctl start s3fs-immich.service

Step 9: Configure S3 Bucket Policy (Optional)

If you want direct access to photos via HTTPS URLs, configure a bucket policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPublicReadPhotos",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-immich-photos/photos/*"
    }
  ]
}

Testing Your Setup

Access Immich: Navigate to http://your-server-ip:2283
Create Account: Set up your admin account
Upload Photos: Try uploading photos via web or mobile app
Verify S3: Check your S3 bucket to confirm photos are being stored

Performance Optimization Tips

s3fs Mount Options:

# For better performance, use these mount options:
sudo s3fs my-immich-photos /opt/immich/library/upload \
  -o allow_other,nonempty \
  -o use_cache=/var/cache/s3fs \
  -o max_stat_cache_size=100000 \
  -o stat_cache_expire=60 \
  -o multireq_max=5 \
  -o parallel_count=30

AWS CLI Configuration:

# Optimize AWS CLI for better S3 performance
aws configure set default.s3.max_concurrent_requests 20
aws configure set default.s3.multipart_threshold 64MB
aws configure set default.s3.multipart_chunksize 16MB

Cost Comparison: Why This Setup Makes Sense

Let me break down the costs to show why this approach is better:

Hetzner Cloud Storage Upgrade Costs:

Base $5/month: 20GB storage
$10/month: 40GB storage (+$5 for 20GB = $0.25/GB/month)
$20/month: 80GB storage (+$15 for 60GB = $0.25/GB/month)

AWS EC2 + EBS Storage:

t3.micro: $8.5/month + $10/month for 100GB EBS = $18.5/month
t3.small: $17/month + $10/month for 100GB EBS = $27/month

AWS Lightsail:

$5/month: 20GB storage
$10/month: 40GB storage
$20/month: 80GB storage

S3 Storage (eu-north-1):

S3 Standard: $0.023/GB/month
100GB: ~$2.30/month
500GB: ~$11.50/month
1TB: ~$23/month

The Winner: Hetzner + S3

Recommended Setup:

Hetzner Cloud $5/month: Cheapest compute option
S3 Storage: Pay only for what you use
Total for 100GB photos: $5 + $2.30 = $7.30/month

Conclusion

So yeah, I think by this process, I have achieved what I wanted. I'm currently using Immich app with S3.

The other advantage we have here is: if we want to change or migrate the machine, S3 will be a single source of truth for us. We can easily migrate, which will help in my case when we want to change the machine.

And obviously, we will pay less for this setup.

So yeah, that's it. Thank you.

Key Takeaways

Problem Solved: Unlimited photo storage without expensive server upgrades
Cost Effective: Hetzner ($5) + S3 storage cheaper than alternatives
Portable: S3 as single source of truth makes migration easy
Reliable: Enterprise-grade storage with automatic backups
Scalable: Pay only for what you use, grow as needed

Your photos are now safely stored in the cloud while maintaining the familiar Immich experience, all on a budget!

I Built ChatGPT for MongoDB in 5 Days (And Open-Sourced It)

PATHAN SALMAN KHAN — Sat, 13 Sep 2025 00:40:33 GMT

"Show me all React developers who interviewed last month with evaluation scores above 85."

At our recruitment SaaS, questions like this were becoming a daily headache. Every time someone needed insights from our MongoDB database, it meant an engineering ticket. We were getting 10+ of these requests daily.

So I spent 5 days building our own "ChatGPT for MongoDB" - a system that lets our team ask database questions in plain English and get instant answers.

It worked so well that I decided to open-source the whole thing.

The Problem That Pushed Me to Build This

I'm building a SaaS for recruitment. Our users manage thousands of job applications, interviews, and candidate profiles in MongoDB.

The constant requests kept coming:

"Show me candidates with React experience from recent applications"
"Which interviews had the highest scores?"
"Find applicants who might be good fits for this role"

Each question meant our engineering team had to:

Understand what they actually wanted
Write a custom MongoDB query
Deploy it and explain the results
Repeat for the next question

The numbers were brutal:

10+ query requests per day
1-2 hours of dev time each day
Product development was slowing down

What I Built: ChatGPT, But for Our Database

Instead of writing complex MongoDB queries like this:

db.applicants.aggregate([
  { $match: { "skills": "React", "appliedDate": { $gte: lastMonth } } },
  { $lookup: { from: "interviews", localField: "_id", foreignField: "applicant" } },
  { $match: { "interviews.score": { $gte: 85 } } }
])

Our team can now just ask:

"Show me React developers who interviewed last month with scores above 85"

The system understands the question, plans the right queries, and responds conversationally with the data and insights.

The Technical Challenges I Had to Solve

Building a reliable "ChatGPT for databases" meant solving several problems:

1. The Schema Problem

MongoDB schemas change constantly. Hardcoded mappings break within days.

My solution: Built dynamic schema introspection that reads our Mongoose models in real-time, so the AI always knows our current database structure.

2. Complex Query Planning

Simple questions like "how many users?" are easy. But "show me React developers who aced their interviews" requires understanding relationships across multiple collections.

My solution: Used LangGraph to build an AI agent that can reason through multi-step database operations, just like a human developer would.

3. AI Hallucination Prevention

LLMs love making up field names and assuming data that doesn't exist in your actual database.

My solution: Schema-first approach where the AI always checks the real database structure before building any queries.

4. Conversation Flow

Real usage isn't one-off questions. It's "show me top candidates" followed by "now show me their interview scores."

My solution: Redis-backed memory so the AI remembers context across the entire conversation.

The Architecture: How It Actually Works

Natural Language Question
        ↓
🤖 AI Agent (Claude 3.5)
        ↓
📋 Dynamic Schema Reader (checks current Mongoose models)
        ↓
🔧 MongoDB Query Tools (find/aggregate/count)
        ↓
💾 Redis Memory (maintains conversation context)
        ↓
📄 Smart Response (data + insights + explanations)

The Smart Schema System

Every query starts here:

// Reads our actual Mongoose models at runtime
const schema = extractCompleteMongooseSchema(ApplicantModel);
// Result: "skills: Array, resumeAnalysis.score: Number(0-100)"

The AI knows exactly what fields exist, their types, and their constraints.

AI Agent Reasoning

Watch how the AI thinks through a complex question:

User: "Show me our top candidates"
AI: 🔍 Let me check the applicants schema first...
AI: 💭 Found 'compatibilityScore' field (0-100), I'll sort by that
AI: ⚡ Running query: find({}, {sort: {compatibilityScore: -1}, limit: 10})
AI: 💬 "Here are your top 10 candidates, mostly senior developers..."

What I Learned Building This

Claude beats GPT for database work:

Claude 3.5 is much better at complex reasoning
GPT-4 makes up field names more often
Switching to Claude cut our error rate significantly

Token optimization matters:

Started by sending all schemas with every query (expensive!)
Now the AI only asks for schemas it actually needs
Cut token costs by 40%

Error handling is crucial:

Complex aggregations sometimes fail
Built smart fallbacks: try aggregation → fallback to simple find()
If a field doesn't exist, re-check schema and retry

Conversation memory changes everything:

Users ask 3-4 follow-up questions on average
"Now show me their interview scores" should just work
Redis sessions make it feel like talking to a person

The Results That Matter

After deploying our "ChatGPT for MongoDB":

✅ Engineering requests: 10/day → 0
✅ Query response time: 2 hours → 30 seconds
✅ Team productivity: Significantly improved
✅ New insights: Users ask questions they never asked before

More importantly, our non-technical team members became confident exploring data themselves.

Open Source: Try Your Own ChatGPT for MongoDB

This solved our recruitment SaaS problem, but I realized every company with MongoDB probably faces similar challenges.

So I've open-sourced the complete system:

🔗 GitHub: mongodb-nl-query-demo

What's included:

Full working system with demo e-commerce data
AI agent setup and prompt engineering
Dynamic schema introspection code
Redis conversation memory
Complete adaptation guide for your database

Built for real use:

TypeScript throughout for reliability
Production error handling and recovery
Rate limiting and security considerations
Token optimization for cost control

Quick Start: Get Your Own Running

# Clone and set up
git clone https://github.com/salmankhan-prs/mongodb-nl-query-demo
cd mongodb-nl-query-demo
pnpm install

# Add your API keys
cp .env.example .env
# Edit .env with MongoDB URI and Anthropic API key

# Load demo data and start
pnpm seed
pnpm start:dev

# Test it out
curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Show me all users from USA"}'

Try questions like:

"How many products do we have in each category?"
"Show me customers who spent the most money"
"Which orders were delivered successfully?"

Adapt It to Your Database

The magic is in the dynamic schema reading. To use your own data:

Replace the models in src/models/ with your Mongoose schemas
Update collection names in src/types/index.ts
Run the schema generator: pnpm generate:schemas
Start asking questions about your actual data

The system automatically discovers your field types, relationships, constraints, and enum values.

Why This Actually Matters

When anyone on your team can ask database questions directly:

Decisions happen faster (no engineering bottlenecks)
More insights get discovered (easier to explore data)
Engineering focuses on features (not custom queries)
Data becomes accessible (non-technical users gain confidence)

The Tech Stack That Worked

AI Model: Claude 3.5 Sonnet (superior reasoning for databases)
Agent Framework: LangChain + LangGraph
Memory: Redis for fast session storage
Backend: Express + TypeScript
Database: MongoDB + Mongoose (enables dynamic introspection)

What's Next

I'm excited to see what people build with this. Some ideas for extensions:

Write operations (INSERT, UPDATE, DELETE with safety checks)
Web interface (React app for non-technical users)
Advanced analytics (trend analysis, predictive insights)

Try It Out

This represents 5 days of focused work solving a real problem we faced every day. If you're dealing with similar database query bottlenecks, maybe it'll help you too.

GitHub: mongodb-nl-query-demo

Questions? Reach out on Twitter or LinkedIn. I'd love to hear what you build with it.

Built this because we needed it. Sharing it because you might too.

How to scrap webpages using Javascript

PATHAN SALMAN KHAN — Fri, 21 Jan 2022 09:33:53 GMT

In this blog, I am going to use a puppeteer to scrap Wikipedia pages

Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. which is headless

we are going to achieve this in two steps the first step scrap the Wikipedia page[https://en.wikipedia.org/wiki/List_of_programming_languages] to get all the programming names and URLs to specific Wikipedia programming language pages [https://en.wikipedia.org/wiki/:programminglanguageName] and stored in an array(which contains an object of name and URLs

const puppeteer = require("puppeteer");//headless chrome 

var fs = require("fs");
let List_of_programming_languages = [];// to store the name ad urls of programming languages 
(async () => {
  const browser = await puppeteer.launch();

  const page = await browser.newPage();
  await page.goto(
    "https://en.wikipedia.org/wiki/List_of_programming_languages"
  );
  await page.waitForTimeout(2000);//waiting for 2000 seconds to load the webpage 

  const getLanguageUrl = await page.evaluate(() => {
    let urls = document.querySelectorAll(".div-col ul li a");//the class name for extracting  anchor tag  

    const urlList = [...urls];

    console.log(urlList);
    const extractedUrls = urlList.map((u) => u.getAttribute("href"));//extracting only links from anchor tags 

    return extractedUrls;
  });

  const getLanguageName = await page.evaluate(() => {
    let headingFromWeb = document.querySelectorAll(".div-col");//the class name for extracting  div 


    const lanuagesNameList = [...headingFromWeb];

    const extractedLanuagesNameList = lanuagesNameList.map((h) => h.innerText);//extracting only inner text from div 

    return extractedLanuagesNameList;
  });

  const allLanguagesUrls = [...getLanguageUrl];
  const allLanguagesNames = [...getLanguageName];
  const nameDataSplit = allLanguagesNames.join("").split("\n");
//saving the urls and name as single object
  List_of_programming_languages = nameDataSplit.map((name, i) => {
    return {
      name,
      url: allLanguagesUrls[i],
    };
  });

the second step using the above array of URLs. from this specific Wikipedia programming I have taken full data from table(which has class name .infobox) like [Paradigm, Designed by, Developer, First appeared, Website, etc..] including images and also took the first four-paragraph for description and generated an object for each programming pushed it into an array and saved it into the file.

const getEachLanguageDetails = async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    let allLanguagesDetails = [];
    for (i = 0; i <= List_of_programming_languages.length; i++) {
      try {
        await page.goto(
          "https://en.wikipedia.org" + List_of_programming_languages[i].url
        );
        await page.waitForTimeout(1000);

        const name = await page.evaluate(() => {
          return document.querySelector(".infobox-title")
            ? document.querySelector(".infobox-title").innerText
            : undefined;
        });

        const img_result = await page.evaluate(() => {
          return document.querySelector(".infobox-image a img ")
            ? document
                .querySelector(".infobox-image a img ")
                .getAttribute("src")
            : undefined;
        });

        const getTableLabels = await page.evaluate(() => {
          let labelsdata = document.querySelectorAll(
            ".infobox  .infobox-label"
          );

          const labelList = [...labelsdata];

          const extractedLabelList = labelList.map((h) => h.innerText);

          return extractedLabelList;
        });
        const getTableData = await page.evaluate(() => {
          let tableData = document.querySelectorAll(".infobox  .infobox-data");

          const tableDataList = [...tableData];

          const extractedTableDataList = tableDataList.map((h) => h.innerText);

          return extractedTableDataList;
        });

        const p1 = await page.evaluate(
          () => document.querySelectorAll("p")[1].innerText
        );
        const p2 = await page.evaluate(
          () => document.querySelectorAll("p")[2].innerText
        );
        const p3 = await page.evaluate(
          () => document.querySelectorAll("p")[3].innerText
        );
        const p4 = await page.evaluate(
          () => document.querySelectorAll("p")[4].innerText
        );

        let eachLanguageDescription = {
          name: name ? name : List_of_programming_languages[i].name,
          image: img_result ? "https:" + img_result : undefined,
          description: p1 + " " + p2 + " " + p3 + " " + p4,
        };
        getTableLabels.forEach((e, i) => {
          eachLanguageDescription[e] = getTableData[i];
        });

        allLanguagesDetails.push(eachLanguageDescription);
      } catch (e) {
        console.log(e);
      }
    }
    fs.writeFile(
      "languages.txt",
      JSON.stringify(allLanguagesDetails),
      (err) => {
        if (err) throw err;
        console.log('The "data to append" was appended to file!');
      }
    );
  };

  getEachLanguageDetails();
  await browser.close();
})();

The final code will be available at wikipedia-scraping-puppeteer