<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[PATHAN SALMAN KHAN's Blog]]></title><description><![CDATA[PATHAN SALMAN KHAN's Blog]]></description><link>https://blog.salmankhan.pro</link><generator>RSS for Node</generator><lastBuildDate>Wed, 22 Apr 2026 17:39:29 GMT</lastBuildDate><atom:link href="https://blog.salmankhan.pro/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[From Closet to Cloud: How I Made My Raspberry Pi 5 Accessible from Anywhere]]></title><description><![CDATA[The Weekend Project That Actually Worked
I had a Raspberry Pi 5 sitting on my desk, and I wanted to host some side projects on it. Simple enough, right?
Wrong. The moment you try to make your home device accessible from the internet, you hit a wall o...]]></description><link>https://blog.salmankhan.pro/from-closet-to-cloud-how-i-made-my-raspberry-pi-5-accessible-from-anywhere</link><guid isPermaLink="true">https://blog.salmankhan.pro/from-closet-to-cloud-how-i-made-my-raspberry-pi-5-accessible-from-anywhere</guid><category><![CDATA[Raspberry Pi]]></category><category><![CDATA[cloudflare]]></category><dc:creator><![CDATA[PATHAN SALMAN KHAN]]></dc:creator><pubDate>Fri, 03 Oct 2025 13:41:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759497767964/844aca1d-9cc4-47f1-8637-c76e7dfb5255.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-weekend-project-that-actually-worked"><strong>The Weekend Project That Actually Worked</strong></h2>
<p>I had a Raspberry Pi 5 sitting on my desk, and I wanted to host some side projects on it. Simple enough, right?</p>
<p>Wrong. The moment you try to make your home device accessible from the internet, you hit a wall of networking complexity.</p>
<p>Your Pi sits behind your home router. Your router sits behind your ISP. Your ISP probably puts you behind something called CGNAT. It's like being in a room, inside a building, inside a walled city.</p>
<p>So I spent a weekend figuring out how to punch through all these barriers. Turns out there are two reliable approaches that actually work.</p>
<h2 id="heading-the-two-paths-port-forwarding-vs-tunneling"><strong>The Two Paths: Port Forwarding vs Tunneling</strong></h2>
<p>After testing different approaches, there are two methods that reliably work:</p>
<p><strong>Approach 1: Port Forwarding</strong> (direct network access)
<strong>Approach 2: Tunneling</strong> (bypass all the networking complexity)</p>
<p>Let me walk you through both, so you can pick what works for your setup.</p>
<h2 id="heading-approach-1-port-forwarding-when-your-isp-cooperates"><strong>Approach 1: Port Forwarding (When Your ISP Cooperates)</strong></h2>
<p>Port forwarding is the traditional way to expose local services to the internet.</p>
<p><strong>How it works:</strong>
Your router creates a direct pathway from the internet to your Pi. When someone visits <code>your-ip:8080</code>, your router forwards that request to your Pi's port 8080.</p>
<p><strong>When this works:</strong></p>
<ul>
<li>Your ISP gives you a real public IP address</li>
<li>Your router supports port forwarding configuration  </li>
<li>You're okay managing dynamic IP changes</li>
</ul>
<h3 id="heading-quick-setup-steps"><strong>Quick Setup Steps:</strong></h3>
<p><strong>Step 1: Check if you have a public IP</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># On your Pi, check what the internet sees</span>
curl ifconfig.me

<span class="hljs-comment"># Compare with your router's WAN IP in admin panel</span>
<span class="hljs-comment"># If they match, you have a public IP!</span>
</code></pre>
<p><strong>Step 2: Configure your router</strong></p>
<ul>
<li>Access router admin (usually 192.168.1.1)</li>
<li>Find "Port Forwarding" or "Virtual Server" section</li>
<li>Add rule: External Port 8080 → Your Pi's IP → Internal Port 8080</li>
</ul>
<p><strong>Step 3: Test it</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start a simple server on your Pi</span>
python3 -m http.server 8080

<span class="hljs-comment"># Test from outside your network</span>
curl http://YOUR-PUBLIC-IP:8080
</code></pre>
<p><strong>The Reality Check:</strong>
This used to work reliably, but many modern ISPs use something called CGNAT (Carrier-Grade NAT). This means multiple customers share the same public IP address, making port forwarding impossible.</p>
<p><strong>Signs you're behind CGNAT:</strong></p>
<ul>
<li>Router shows different IP than <code>curl ifconfig.me</code></li>
<li>Port forwarding rules don't work from external networks</li>
</ul>
<p>If port forwarding doesn't work for you, tunneling is the solution.</p>
<h2 id="heading-approach-2-tunneling-the-universal-solution"><strong>Approach 2: Tunneling (The Universal Solution)</strong></h2>
<h3 id="heading-what-is-tunneling"><strong>What is Tunneling?</strong></h3>
<p>Think of tunneling like this: instead of trying to punch a hole through all the firewalls to reach your Pi, your Pi creates an outbound connection to a server on the internet. When someone wants to access your Pi, they connect to that server, which forwards the request through the existing connection.</p>
<pre><code>Internet User → Tunnel Server → (Existing Connection) → Your Pi
</code></pre><p>The key insight: outbound connections (Pi → Internet) almost always work, even behind complex firewalls. Inbound connections (Internet → Pi) are what get blocked.</p>
<p><strong>Why tunneling works everywhere:</strong></p>
<ul>
<li>Your Pi initiates the connection (outbound = allowed)</li>
<li>No port forwarding needed</li>
<li>Works behind any firewall/NAT</li>
<li>Tunnel server handles the public internet part</li>
</ul>
<p>There are several good tunneling solutions:</p>
<p><strong>Cloudflare Tunnel (Free)</strong></p>
<ul>
<li>Best for: Custom domains, reliable service</li>
<li>Pros: Free, HTTPS automatic, very stable</li>
<li>Cons: Requires Cloudflare account</li>
</ul>
<p><strong>Tailscale Funnel (Free)</strong>  </p>
<ul>
<li>Best for: Quick testing, simple setup</li>
<li>Pros: Zero configuration, works instantly</li>
<li>Cons: URLs are auto-generated</li>
</ul>
<p><strong>ngrok (Free tier available)</strong></p>
<ul>
<li>Best for: Development, temporary sharing</li>
<li>Pros: Simple command, great for demos</li>
<li>Cons: Free tier has limitations</li>
</ul>
<p><strong>Pinggy (Free tier available)</strong></p>
<ul>
<li>Best for: Alternative to ngrok</li>
<li>Pros: No account needed for basic use</li>
<li>Cons: Limited free usage</li>
</ul>
<p>For this guide, I'll show you Cloudflare Tunnel because it's free, reliable, and works great for side projects.</p>
<h2 id="heading-setting-up-cloudflare-tunnel"><strong>Setting Up Cloudflare Tunnel</strong></h2>
<p>This works the same whether you're using a <strong>Raspberry Pi </strong>, or even an <strong>old laptop</strong>.</p>
<h3 id="heading-install-cloudflared"><strong>Install cloudflared</strong></h3>
<p><strong>For Raspberry Pi 5/4 (ARM64):</strong></p>
<pre><code class="lang-bash">wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64
sudo mv cloudflared-linux-arm64 /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
sudo chmod +x /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
</code></pre>
<p><strong>For old laptops (x86_64):</strong></p>
<pre><code class="lang-bash">wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
sudo mv cloudflared-linux-amd64 /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
sudo chmod +x /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
</code></pre>
<p><strong>Verify installation:</strong></p>
<pre><code class="lang-bash">cloudflared --version
</code></pre>
<h2 id="heading-option-1-quick-anonymous-tunnel-no-domain-required"><strong>Option 1: Quick Anonymous Tunnel (No Domain Required)</strong></h2>
<p>The fastest way to get started - no authentication, no domain setup required!</p>
<h3 id="heading-start-your-local-service"><strong>Start Your Local Service</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a test page</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"&lt;h1&gt;Hello from my Pi 5!&lt;/h1&gt;&lt;p&gt;This is accessible from anywhere!&lt;/p&gt;"</span> &gt; index.html

<span class="hljs-comment"># Start simple server</span>
python3 -m http.server 3000
</code></pre>
<h3 id="heading-create-instant-tunnel"><strong>Create Instant Tunnel</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># One command to expose your local server</span>
cloudflared tunnel --url http://localhost:3000
</code></pre>
<p><strong>That's it!</strong> You'll see output like:</p>
<pre><code>+--------------------------------------------------------------------------------------------+
|  Your quick Tunnel has been created! Visit it at (it may take some time to be reachable):  |
|  https:<span class="hljs-comment">//random-words-123.trycloudflare.com                                                |</span>
+--------------------------------------------------------------------------------------------+
</code></pre><p>Your local server is now accessible from anywhere with <strong>automatic HTTPS</strong>! Share the URL with anyone - they can access your Pi from anywhere in the world.</p>
<p><strong>Perfect for:</strong></p>
<ul>
<li>Quick demos and testing</li>
<li>Sharing work-in-progress with friends</li>
<li>Development and prototyping</li>
<li>When you don't have a domain</li>
</ul>
<h2 id="heading-option-2-custom-domain-setup-for-your-own-domain"><strong>Option 2: Custom Domain Setup (For Your Own Domain)</strong></h2>
<p>If you have your own domain managed by Cloudflare, you can create permanent tunnels with custom subdomains.</p>
<h3 id="heading-step-1-authenticate-with-cloudflare"><strong>Step 1: Authenticate with Cloudflare</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># This will open a browser for login</span>
cloudflared tunnel login
</code></pre>
<p><strong>If you're using SSH (headless setup):</strong>
The command will show you a URL. Copy it and open it on any device where you're logged into Cloudflare. The authentication will complete automatically on your Pi.</p>
<h3 id="heading-step-2-create-your-tunnel"><strong>Step 2: Create Your Tunnel</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a named tunnel for your projects</span>
cloudflared tunnel create my-home-server

<span class="hljs-comment"># Note the tunnel ID that appears - save it!</span>
</code></pre>
<h3 id="heading-step-3-configure-dns-routing"><strong>Step 3: Configure DNS Routing</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Connect your domain to the tunnel (replace with your domain)</span>
cloudflared tunnel route dns my-home-server projects.yourdomain.com
</code></pre>
<h3 id="heading-step-4-create-configuration"><strong>Step 4: Create Configuration</strong></h3>
<p>Create the tunnel configuration file:</p>
<pre><code class="lang-bash">nano ~/.cloudflared/config.yml
</code></pre>
<p>Add this configuration:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">tunnel:</span> <span class="hljs-string">YOUR-TUNNEL-ID-HERE</span>
<span class="hljs-attr">credentials-file:</span> <span class="hljs-string">/home/pi/.cloudflared/YOUR-TUNNEL-ID.json</span>

<span class="hljs-attr">ingress:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">hostname:</span> <span class="hljs-string">projects.yourdomain.com</span>
    <span class="hljs-attr">service:</span> <span class="hljs-string">http://localhost:3000</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">service:</span> <span class="hljs-string">http_status:404</span>
</code></pre>
<h3 id="heading-step-5-run-your-named-tunnel"><strong>Step 5: Run Your Named Tunnel</strong></h3>
<pre><code class="lang-bash">cloudflared tunnel run my-home-server
</code></pre>
<p>Now visit <code>https://projects.yourdomain.com</code> - you should see your Pi's webpage!</p>
<h3 id="heading-step-6-multiple-services"><strong>Step 6: Multiple Services</strong></h3>
<p>You can host multiple projects on different subdomains:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">tunnel:</span> <span class="hljs-string">YOUR-TUNNEL-ID-HERE</span>
<span class="hljs-attr">credentials-file:</span> <span class="hljs-string">/home/pi/.cloudflared/YOUR-TUNNEL-ID.json</span>

<span class="hljs-attr">ingress:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">hostname:</span> <span class="hljs-string">api.yourdomain.com</span>
    <span class="hljs-attr">service:</span> <span class="hljs-string">http://localhost:3001</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">hostname:</span> <span class="hljs-string">blog.yourdomain.com</span>
    <span class="hljs-attr">service:</span> <span class="hljs-string">http://localhost:3002</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">hostname:</span> <span class="hljs-string">files.yourdomain.com</span>
    <span class="hljs-attr">service:</span> <span class="hljs-string">http://localhost:3003</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">service:</span> <span class="hljs-string">http_status:404</span>
</code></pre>
<h3 id="heading-step-7-auto-start-on-boot"><strong>Step 7: Auto-Start on Boot</strong></h3>
<p>Make your tunnel start automatically:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install as system service</span>
sudo cloudflared service install --config ~/.cloudflared/config.yml

<span class="hljs-comment"># Enable auto-start</span>
sudo systemctl <span class="hljs-built_in">enable</span> cloudflared
sudo systemctl start cloudflared

<span class="hljs-comment"># Check status</span>
sudo systemctl status cloudflared
</code></pre>
<h2 id="heading-quick-start-for-beginners"><strong>Quick Start for Beginners</strong></h2>
<p>Want to try this right now? Here's the fastest way:</p>
<p><strong>1. Get your device ready</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Update everything (works on Pi or laptop)</span>
sudo apt update &amp;&amp; sudo apt upgrade -y

<span class="hljs-comment"># Install basics</span>
sudo apt install curl python3 -y
</code></pre>
<p><strong>2. Install cloudflared</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Choose the right version for your device</span>
<span class="hljs-comment"># ARM64 for Pi 5/4, amd64 for laptops</span>
wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64
sudo mv cloudflared-linux-arm64 /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
sudo chmod +x /usr/<span class="hljs-built_in">local</span>/bin/cloudflared
</code></pre>
<p><strong>3. Create something to host</strong></p>
<pre><code class="lang-bash">mkdir ~/my-website &amp;&amp; <span class="hljs-built_in">cd</span> ~/my-website
<span class="hljs-built_in">echo</span> <span class="hljs-string">"&lt;h1&gt;My First Pi Website!&lt;/h1&gt;"</span> &gt; index.html
python3 -m http.server 8000
</code></pre>
<p><strong>4. Create instant tunnel</strong></p>
<pre><code class="lang-bash">cloudflared tunnel --url http://localhost:8000
</code></pre>
<p><strong>5. Share your creation</strong>
Your website is now accessible at the random URL from anywhere in the world!</p>
<h2 id="heading-the-bottom-line"><strong>The Bottom Line</strong></h2>
<p>A couple months ago, I wanted to host some side projects but didn't want to pay for cloud hosting or deal with complex networking. Today, I'm running multiple web services from my Pi 5 that are accessible from anywhere.</p>
<p>The tunneling approach solved every networking challenge. No complex router configuration, no ISP limitations, no monthly hosting fees. Traditional hosting costs $5-10/month, while this setup costs basically nothing after the initial Pi purchase ($80 one-time).</p>
<p><strong>Key takeaways:</strong></p>
<ul>
<li>Modern networking makes direct connections difficult</li>
<li>Tunneling bypasses all these limitations reliably  </li>
<li>Works on Pi or any old laptop</li>
<li>Start with anonymous tunnels (one command!)</li>
<li>Upgrade to custom domains when you're ready</li>
<li>Perfect for side projects and learning</li>
<li>Costs basically nothing after initial hardware</li>
</ul>
<p><strong>When it makes sense:</strong></p>
<ul>
<li>Personal projects and experiments</li>
<li>Learning web development</li>
<li>Sharing projects with friends</li>
<li>Development and testing environments</li>
</ul>
<p>Give it a try! Worst case, you spend a weekend learning about networking and hosting. Best case, you never pay for basic web hosting again.</p>
<p>Questions? Hit me up on <a target="_blank" href="https://x.com/salmankhanprs">Twitter</a> or <a target="_blank" href="https://www.linkedin.com/in/salman-khan-tech">LinkedIn</a>. I'd love to see what you build!</p>
<hr />
]]></content:encoded></item><item><title><![CDATA[Running Immich with S3 Storage: A Complete Developer Guide]]></title><description><![CDATA[I was thinking for many days about hosting my own Google Photos alternative. I found many options, but Immich is very close to what Google Photos offers. I'm using a $5 Hetzner Cloud machine, but there's a problem.
What is Immich?
Immich is an open-s...]]></description><link>https://blog.salmankhan.pro/running-immich-with-s3-storage-a-complete-developer-guide</link><guid isPermaLink="true">https://blog.salmankhan.pro/running-immich-with-s3-storage-a-complete-developer-guide</guid><category><![CDATA[Immich]]></category><category><![CDATA[Docker]]></category><category><![CDATA[S3]]></category><category><![CDATA[s3fs]]></category><dc:creator><![CDATA[PATHAN SALMAN KHAN]]></dc:creator><pubDate>Fri, 19 Sep 2025 03:34:02 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758028134957/be868546-5c2d-4079-bebb-5bdd556015dc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was thinking for many days about hosting my own Google Photos alternative. I found many options, but Immich is very close to what Google Photos offers. I'm using a $5 Hetzner Cloud machine, but there's a problem.</p>
<h2 id="heading-what-is-immich">What is Immich?</h2>
<p>Immich is an open-source, self-hosted photo and video management solution that you can think of as your own personal Google Photos. </p>
<p><strong>Key features:</strong></p>
<ul>
<li>Upload photos from mobile and web</li>
<li>Automatic backup from your phone</li>
<li>Face recognition and search</li>
<li>Album creation and sharing</li>
<li>Timeline view of your memories</li>
<li>Machine learning for photo tagging</li>
</ul>
<p>The best part? You own your data completely. No monthly subscriptions, no privacy concerns, and no storage limits imposed by big companies.</p>
<h2 id="heading-what-is-s3fs">What is S3FS?</h2>
<p>S3FS-FUSE is a clever tool that makes cloud storage (like Amazon S3) appear as a regular folder on your server.</p>
<p><strong>How it works:</strong></p>
<ul>
<li>Your server sees a normal folder: <code>/opt/immich/library/upload</code></li>
<li>But this folder is actually connected to your S3 bucket in the cloud</li>
<li>When Immich writes a photo to this "local" folder, it actually gets stored in S3</li>
<li>When Immich reads a photo, s3fs fetches it from S3 transparently</li>
</ul>
<p>Think of it as a bridge between your server and cloud storage. Your applications don't know the difference - they just see a regular folder, but everything is actually stored in the cloud.</p>
<h2 id="heading-the-storage-problem">The Storage Problem</h2>
<p>Why do we need S3 storage? First, let me explain the problem.</p>
<p>On a $5 machine or any fixed-price server, you get limited disk space - maybe 20GB, 40GB, or 80GB. For my use case, to attach more space, I have to pay more dollars, which becomes very costly.</p>
<p>So I thought: why not use S3 storage?</p>
<p>There are some disadvantages of using S3 as storage - it makes things a little bit slow. But I mostly use it as backup and for writes rather than very frequent reads, so that was okay for me.</p>
<h2 id="heading-why-use-s3-with-immich">Why Use S3 with Immich?</h2>
<p><strong>Benefits of S3 Storage:</strong></p>
<ul>
<li><strong>Unlimited Capacity</strong>: No more worrying about disk space</li>
<li><strong>Cost Effective</strong>: Pay only for what you store</li>
<li><strong>Portability</strong>: Deploy Immich anywhere while keeping the same storage</li>
</ul>
<p><strong>Trade-offs to Consider:</strong></p>
<ul>
<li><strong>Latency</strong>: Slight delay when accessing photos (typically 100-500ms)</li>
</ul>
<h2 id="heading-architecture-overview">Architecture Overview</h2>
<p>Our setup uses <strong>s3fs-fuse</strong> (<a target="_blank" href="https://github.com/s3fs-fuse/s3fs-fuse">you can read more here</a>) to mount an S3 bucket as a local filesystem. This allows Immich to read/write files as if they were stored locally, while actually storing everything in S3.</p>
<pre><code>┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Immich    │◄──►│   s3fs      │◄──►│  S3 Bucket  │
│ Application │    │   Mount     │    │   Storage   │
└─────────────┘    └─────────────┘    └─────────────┘
</code></pre><h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li>AWS Account with S3 access</li>
<li>Docker and Docker Compose installed</li>
<li>Basic Linux command line knowledge</li>
<li>Server with internet connectivity</li>
</ul>
<h2 id="heading-step-1-create-s3-bucket">Step 1: Create S3 Bucket</h2>
<p>First, create an S3 bucket for your photos:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Configure AWS CLI</span>
aws configure

<span class="hljs-comment"># Create your bucket (replace with your preferred name and region)</span>
aws s3 mb s3://my-immich-photos --region us-east-1
</code></pre>
<p><strong>Optional: Enable Transfer Acceleration for faster uploads</strong></p>
<pre><code class="lang-bash">aws s3api put-bucket-accelerate-configuration \
    --bucket my-immich-photos \
    --accelerate-configuration Status=Enabled
</code></pre>
<h2 id="heading-step-2-install-and-configure-s3fs">Step 2: Install and Configure s3fs</h2>
<p>Install s3fs-fuse to mount your S3 bucket as a local filesystem:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Ubuntu/Debian</span>
sudo apt update
sudo apt install -y s3fs

<span class="hljs-comment"># CentOS/RHEL</span>
sudo yum install -y s3fs-fuse
</code></pre>
<p>Create credentials file for s3fs:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create credentials file</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"<span class="hljs-subst">$(aws configure get aws_access_key_id)</span>:<span class="hljs-subst">$(aws configure get aws_secret_access_key)</span>"</span> \
  | sudo tee /etc/passwd-s3fs &gt; /dev/null
sudo chmod 600 /etc/passwd-s3fs
</code></pre>
<h2 id="heading-step-3-set-up-immich-directory-structure">Step 3: Set Up Immich Directory Structure</h2>
<p>Create the directory structure for Immich:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create Immich directory</span>
sudo mkdir -p /opt/immich
<span class="hljs-built_in">cd</span> /opt/immich

<span class="hljs-comment"># Create mount point for S3 storage</span>
sudo mkdir -p /opt/immich/library/upload
sudo chown -R 1000:1000 /opt/immich/library
</code></pre>
<h2 id="heading-step-4-mount-s3-bucket">Step 4: Mount S3 Bucket</h2>
<p>Mount your S3 bucket to the Immich upload directory:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Mount S3 bucket</span>
sudo s3fs my-immich-photos /opt/immich/library/upload \
  -o allow_other \
  -o nonempty \
  -o use_cache=/tmp \
  -o passwd_file=/etc/passwd-s3fs \
  -o endpoint=us-east-1 \
  -o url=https://s3.us-east-1.amazonaws.com
</code></pre>
<p><strong>Verify the mount:</strong></p>
<pre><code class="lang-bash">mount | grep s3fs
ls -la /opt/immich/library/upload  <span class="hljs-comment"># Should show your S3 bucket contents</span>
</code></pre>
<h2 id="heading-step-5-create-required-subdirectories">Step 5: Create Required Subdirectories</h2>
<p>Immich requires specific subdirectories with marker files:</p>
<pre><code class="lang-bash">BASE=<span class="hljs-string">"/opt/immich/library/upload"</span>

<span class="hljs-comment"># Create required subdirectories</span>
<span class="hljs-keyword">for</span> DIR <span class="hljs-keyword">in</span> upload thumbs library encoded-video profile backups; <span class="hljs-keyword">do</span>
  FULL=<span class="hljs-string">"<span class="hljs-variable">$BASE</span>/<span class="hljs-variable">$DIR</span>"</span>
  sudo mkdir -p <span class="hljs-string">"<span class="hljs-variable">$FULL</span>"</span>
  sudo touch <span class="hljs-string">"<span class="hljs-variable">$FULL</span>/.immich"</span>
  sudo chmod 777 <span class="hljs-string">"<span class="hljs-variable">$FULL</span>/.immich"</span>
<span class="hljs-keyword">done</span>
</code></pre>
<h2 id="heading-step-6-configure-immich-with-docker-compose">Step 6: Configure Immich with Docker Compose</h2>
<p>You can refer to the <a target="_blank" href="https://immich.app/docs/install/docker-compose">official docs</a> for more details.</p>
<p>Create the Docker Compose configuration:</p>
<p><strong>docker-compose.yml:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">immich</span>

<span class="hljs-attr">services:</span>
  <span class="hljs-attr">immich-server:</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">immich_server</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">${UPLOAD_LOCATION}:/usr/src/app/upload</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">/etc/localtime:/etc/localtime:ro</span>
    <span class="hljs-attr">env_file:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">.env</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">2283</span><span class="hljs-string">:3001</span>
    <span class="hljs-attr">depends_on:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">redis</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">database</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>

  <span class="hljs-attr">immich-machine-learning:</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">immich_machine_learning</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">model-cache:/cache</span>
    <span class="hljs-attr">env_file:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">.env</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>

  <span class="hljs-attr">redis:</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">immich_redis</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">docker.io/redis:6.2-alpine</span>
    <span class="hljs-attr">healthcheck:</span>
      <span class="hljs-attr">test:</span> <span class="hljs-string">redis-cli</span> <span class="hljs-string">ping</span> <span class="hljs-string">||</span> <span class="hljs-string">exit</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>

  <span class="hljs-attr">database:</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">immich_postgres</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-attr">POSTGRES_PASSWORD:</span> <span class="hljs-string">${DB_PASSWORD}</span>
      <span class="hljs-attr">POSTGRES_USER:</span> <span class="hljs-string">${DB_USERNAME}</span>
      <span class="hljs-attr">POSTGRES_DB:</span> <span class="hljs-string">${DB_DATABASE_NAME}</span>
      <span class="hljs-attr">POSTGRES_INITDB_ARGS:</span> <span class="hljs-string">'--data-checksums'</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">${DB_DATA_LOCATION}:/var/lib/postgresql/data</span>
    <span class="hljs-attr">healthcheck:</span>
      <span class="hljs-attr">test:</span> <span class="hljs-string">pg_isready</span> <span class="hljs-string">--dbname='${DB_DATABASE_NAME}'</span> <span class="hljs-string">--username='${DB_USERNAME}'</span> <span class="hljs-string">||</span> <span class="hljs-string">exit</span> <span class="hljs-number">1</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">5m</span>
      <span class="hljs-attr">start_interval:</span> <span class="hljs-string">30s</span>
      <span class="hljs-attr">start_period:</span> <span class="hljs-string">5m</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"postgres"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"shared_preload_libraries=vectors.so"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">'search_path="$$user", public, vectors'</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"logging_collector=on"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"max_wal_size=2GB"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"shared_buffers=512MB"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"wal_compression=on"</span>]
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>

<span class="hljs-attr">volumes:</span>
  <span class="hljs-attr">model-cache:</span>
</code></pre>
<p><strong>Create .env file:</strong></p>
<pre><code class="lang-bash">cat &gt; .env &lt;&lt; <span class="hljs-string">'EOF'</span>
<span class="hljs-comment"># Point to your S3 mount directory</span>
UPLOAD_LOCATION=/opt/immich/library/upload

<span class="hljs-comment"># Immich version</span>
IMMICH_VERSION=release

<span class="hljs-comment"># Database configuration</span>
DB_PASSWORD=your-secure-password
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
DB_DATA_LOCATION=/opt/immich/postgres

<span class="hljs-comment"># Redis configuration</span>
REDIS_HOSTNAME=immich_redis
EOF
</code></pre>
<h2 id="heading-step-7-start-immich">Step 7: Start Immich</h2>
<p>Start your Immich deployment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start all services</span>
docker compose up -d

<span class="hljs-comment"># Check status</span>
docker compose ps
docker compose logs immich_server
</code></pre>
<h2 id="heading-step-8-configure-auto-mount-on-boot">Step 8: Configure Auto-Mount on Boot</h2>
<p>Create a systemd service to automatically mount S3 on server reboot:</p>
<pre><code class="lang-bash">sudo tee /etc/systemd/system/s3fs-immich.service &gt; /dev/null &lt;&lt; <span class="hljs-string">'EOF'</span>
[Unit]
Description=S3FS Mount <span class="hljs-keyword">for</span> Immich Photos
After=network.target

[Service]
Type=forking
User=root
ExecStart=/usr/bin/s3fs my-immich-photos /opt/immich/library/upload -o passwd_file=/etc/passwd-s3fs,allow_other,use_cache=/tmp,endpoint=us-east-1,url=https://s3.us-east-1.amazonaws.com
ExecStop=/bin/umount /opt/immich/library/upload
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

<span class="hljs-comment"># Enable and start the service</span>
sudo systemctl <span class="hljs-built_in">enable</span> s3fs-immich.service
sudo systemctl start s3fs-immich.service
</code></pre>
<h2 id="heading-step-9-configure-s3-bucket-policy-optional">Step 9: Configure S3 Bucket Policy (Optional)</h2>
<p>If you want direct access to photos via HTTPS URLs, configure a bucket policy:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
  <span class="hljs-attr">"Statement"</span>: [
    {
      <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"AllowPublicReadPhotos"</span>,
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
      <span class="hljs-attr">"Principal"</span>: <span class="hljs-string">"*"</span>,
      <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:GetObject"</span>,
      <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::my-immich-photos/photos/*"</span>
    }
  ]
}
</code></pre>
<h2 id="heading-testing-your-setup">Testing Your Setup</h2>
<ol>
<li><strong>Access Immich</strong>: Navigate to <code>http://your-server-ip:2283</code></li>
<li><strong>Create Account</strong>: Set up your admin account</li>
<li><strong>Upload Photos</strong>: Try uploading photos via web or mobile app</li>
<li><strong>Verify S3</strong>: Check your S3 bucket to confirm photos are being stored</li>
</ol>
<h2 id="heading-performance-optimization-tips">Performance Optimization Tips</h2>
<p><strong>s3fs Mount Options:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># For better performance, use these mount options:</span>
sudo s3fs my-immich-photos /opt/immich/library/upload \
  -o allow_other,nonempty \
  -o use_cache=/var/cache/s3fs \
  -o max_stat_cache_size=100000 \
  -o stat_cache_expire=60 \
  -o multireq_max=5 \
  -o parallel_count=30
</code></pre>
<p><strong>AWS CLI Configuration:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Optimize AWS CLI for better S3 performance</span>
aws configure <span class="hljs-built_in">set</span> default.s3.max_concurrent_requests 20
aws configure <span class="hljs-built_in">set</span> default.s3.multipart_threshold 64MB
aws configure <span class="hljs-built_in">set</span> default.s3.multipart_chunksize 16MB
</code></pre>
<h2 id="heading-cost-comparison-why-this-setup-makes-sense">Cost Comparison: Why This Setup Makes Sense</h2>
<p>Let me break down the costs to show why this approach is better:</p>
<h3 id="heading-hetzner-cloud-storage-upgrade-costs">Hetzner Cloud Storage Upgrade Costs:</h3>
<ul>
<li><strong>Base $5/month</strong>: 20GB storage</li>
<li><strong>$10/month</strong>: 40GB storage (+$5 for 20GB = $0.25/GB/month)</li>
<li><strong>$20/month</strong>: 80GB storage (+$15 for 60GB = $0.25/GB/month)</li>
</ul>
<h3 id="heading-aws-ec2-ebs-storage">AWS EC2 + EBS Storage:</h3>
<ul>
<li><strong>t3.micro</strong>: $8.5/month + $10/month for 100GB EBS = $18.5/month</li>
<li><strong>t3.small</strong>: $17/month + $10/month for 100GB EBS = $27/month</li>
</ul>
<h3 id="heading-aws-lightsail">AWS Lightsail:</h3>
<ul>
<li><strong>$5/month</strong>: 20GB storage</li>
<li><strong>$10/month</strong>: 40GB storage</li>
<li><strong>$20/month</strong>: 80GB storage</li>
</ul>
<h3 id="heading-s3-storage-eu-north-1">S3 Storage (eu-north-1):</h3>
<ul>
<li><strong>S3 Standard</strong>: $0.023/GB/month</li>
<li><strong>100GB</strong>: ~$2.30/month</li>
<li><strong>500GB</strong>: ~$11.50/month</li>
<li><strong>1TB</strong>: ~$23/month</li>
</ul>
<h3 id="heading-the-winner-hetzner-s3">The Winner: Hetzner + S3</h3>
<p><strong>Recommended Setup:</strong></p>
<ul>
<li><strong>Hetzner Cloud $5/month</strong>: Cheapest compute option</li>
<li><strong>S3 Storage</strong>: Pay only for what you use</li>
<li><strong>Total for 100GB photos</strong>: $5 + $2.30 = $7.30/month</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>So yeah, I think by this process, I have achieved what I wanted. I'm currently using Immich app with S3.</p>
<p>The other advantage we have here is: if we want to change or migrate the machine, S3 will be a single source of truth for us. We can easily migrate, which will help in my case when we want to change the machine.</p>
<p>And obviously, we will pay less for this setup.</p>
<p>So yeah, that's it. Thank you.</p>
<h2 id="heading-key-takeaways">Key Takeaways</h2>
<ul>
<li><strong>Problem Solved</strong>: Unlimited photo storage without expensive server upgrades</li>
<li><strong>Cost Effective</strong>: Hetzner ($5) + S3 storage cheaper than alternatives</li>
<li><strong>Portable</strong>: S3 as single source of truth makes migration easy</li>
<li><strong>Reliable</strong>: Enterprise-grade storage with automatic backups</li>
<li><strong>Scalable</strong>: Pay only for what you use, grow as needed</li>
</ul>
<p>Your photos are now safely stored in the cloud while maintaining the familiar Immich experience, all on a budget!</p>
]]></content:encoded></item><item><title><![CDATA[I Built ChatGPT for MongoDB in 5 Days (And Open-Sourced It)]]></title><description><![CDATA["Show me all React developers who interviewed last month with evaluation scores above 85."
At our recruitment SaaS, questions like this were becoming a daily headache. Every time someone needed insights from our MongoDB database, it meant an engineer...]]></description><link>https://blog.salmankhan.pro/i-built-chatgpt-for-mongodb-in-5-days-and-open-sourced-it</link><guid isPermaLink="true">https://blog.salmankhan.pro/i-built-chatgpt-for-mongodb-in-5-days-and-open-sourced-it</guid><category><![CDATA[MongoDB, LangChain, AI, Natural Language Processing, Claude, Open Source, TypeScript, Database Tools, Recruitment SaaS, Dev Productivity]]></category><dc:creator><![CDATA[PATHAN SALMAN KHAN]]></dc:creator><pubDate>Sat, 13 Sep 2025 00:40:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757499670507/749c6acc-727f-4a8d-a816-1280158c0f91.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>"Show me all React developers who interviewed last month with evaluation scores above 85."</p>
<p>At our recruitment SaaS, questions like this were becoming a daily headache. Every time someone needed insights from our MongoDB database, it meant an engineering ticket. We were getting 10+ of these requests daily.</p>
<p>So I spent 5 days building our own "ChatGPT for MongoDB" - a system that lets our team ask database questions in plain English and get instant answers.</p>
<p>It worked so well that I decided to open-source the whole thing.</p>
<h2 id="heading-the-problem-that-pushed-me-to-build-this">The Problem That Pushed Me to Build This</h2>
<p>I'm building a SaaS for recruitment. Our users manage thousands of job applications, interviews, and candidate profiles in MongoDB.</p>
<p>The constant requests kept coming:</p>
<ul>
<li>"Show me candidates with React experience from recent applications"</li>
<li>"Which interviews had the highest scores?"  </li>
<li>"Find applicants who might be good fits for this role"</li>
</ul>
<p>Each question meant our engineering team had to:</p>
<ol>
<li>Understand what they actually wanted</li>
<li>Write a custom MongoDB query</li>
<li>Deploy it and explain the results</li>
<li>Repeat for the next question</li>
</ol>
<p><strong>The numbers were brutal:</strong></p>
<ul>
<li>10+ query requests per day</li>
<li>1-2 hours of dev time each day</li>
<li>Product development was slowing down</li>
</ul>
<h2 id="heading-what-i-built-chatgpt-but-for-our-database">What I Built: ChatGPT, But for Our Database</h2>
<p>Instead of writing complex MongoDB queries like this:</p>
<pre><code class="lang-javascript">db.applicants.aggregate([
  { <span class="hljs-attr">$match</span>: { <span class="hljs-string">"skills"</span>: <span class="hljs-string">"React"</span>, <span class="hljs-string">"appliedDate"</span>: { <span class="hljs-attr">$gte</span>: lastMonth } } },
  { <span class="hljs-attr">$lookup</span>: { <span class="hljs-attr">from</span>: <span class="hljs-string">"interviews"</span>, <span class="hljs-attr">localField</span>: <span class="hljs-string">"_id"</span>, <span class="hljs-attr">foreignField</span>: <span class="hljs-string">"applicant"</span> } },
  { <span class="hljs-attr">$match</span>: { <span class="hljs-string">"interviews.score"</span>: { <span class="hljs-attr">$gte</span>: <span class="hljs-number">85</span> } } }
])
</code></pre>
<p>Our team can now just ask:</p>
<pre><code><span class="hljs-string">"Show me React developers who interviewed last month with scores above 85"</span>
</code></pre><p>The system understands the question, plans the right queries, and responds conversationally with the data and insights.</p>
<h2 id="heading-the-technical-challenges-i-had-to-solve">The Technical Challenges I Had to Solve</h2>
<p>Building a reliable "ChatGPT for databases" meant solving several problems:</p>
<h3 id="heading-1-the-schema-problem">1. The Schema Problem</h3>
<p>MongoDB schemas change constantly. Hardcoded mappings break within days.</p>
<p><strong>My solution:</strong> Built dynamic schema introspection that reads our Mongoose models in real-time, so the AI always knows our current database structure.</p>
<h3 id="heading-2-complex-query-planning">2. Complex Query Planning</h3>
<p>Simple questions like "how many users?" are easy. But "show me React developers who aced their interviews" requires understanding relationships across multiple collections.</p>
<p><strong>My solution:</strong> Used LangGraph to build an AI agent that can reason through multi-step database operations, just like a human developer would.</p>
<h3 id="heading-3-ai-hallucination-prevention">3. AI Hallucination Prevention</h3>
<p>LLMs love making up field names and assuming data that doesn't exist in your actual database.</p>
<p><strong>My solution:</strong> Schema-first approach where the AI always checks the real database structure before building any queries.</p>
<h3 id="heading-4-conversation-flow">4. Conversation Flow</h3>
<p>Real usage isn't one-off questions. It's "show me top candidates" followed by "now show me their interview scores."</p>
<p><strong>My solution:</strong> Redis-backed memory so the AI remembers context across the entire conversation.</p>
<h2 id="heading-the-architecture-how-it-actually-works">The Architecture: How It Actually Works</h2>
<pre><code>Natural Language Question
        ↓
🤖 AI Agent (Claude <span class="hljs-number">3.5</span>)
        ↓
📋 Dynamic Schema Reader (checks current Mongoose models)
        ↓
🔧 MongoDB Query Tools (find/aggregate/count)
        ↓
💾 Redis Memory (maintains conversation context)
        ↓
📄 Smart Response (data + insights + explanations)
</code></pre><h3 id="heading-the-smart-schema-system">The Smart Schema System</h3>
<p>Every query starts here:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Reads our actual Mongoose models at runtime</span>
<span class="hljs-keyword">const</span> schema = extractCompleteMongooseSchema(ApplicantModel);
<span class="hljs-comment">// Result: "skills: Array&lt;String&gt;, resumeAnalysis.score: Number(0-100)"</span>
</code></pre>
<p>The AI knows exactly what fields exist, their types, and their constraints.</p>
<h3 id="heading-ai-agent-reasoning">AI Agent Reasoning</h3>
<p>Watch how the AI thinks through a complex question:</p>
<pre><code>User: <span class="hljs-string">"Show me our top candidates"</span>
<span class="hljs-attr">AI</span>: 🔍 Let me check the applicants schema first...
AI: 💭 Found <span class="hljs-string">'compatibilityScore'</span> field (<span class="hljs-number">0</span><span class="hljs-number">-100</span>), I<span class="hljs-string">'ll sort by that
AI: ⚡ Running query: find({}, {sort: {compatibilityScore: -1}, limit: 10})
AI: 💬 "Here are your top 10 candidates, mostly senior developers..."</span>
</code></pre><h2 id="heading-what-i-learned-building-this">What I Learned Building This</h2>
<p><strong>Claude beats GPT for database work:</strong></p>
<ul>
<li>Claude 3.5 is much better at complex reasoning</li>
<li>GPT-4 makes up field names more often</li>
<li>Switching to Claude cut our error rate significantly</li>
</ul>
<p><strong>Token optimization matters:</strong></p>
<ul>
<li>Started by sending all schemas with every query (expensive!)</li>
<li>Now the AI only asks for schemas it actually needs</li>
<li>Cut token costs by 40%</li>
</ul>
<p><strong>Error handling is crucial:</strong></p>
<ul>
<li>Complex aggregations sometimes fail</li>
<li>Built smart fallbacks: try aggregation → fallback to simple find()</li>
<li>If a field doesn't exist, re-check schema and retry</li>
</ul>
<p><strong>Conversation memory changes everything:</strong></p>
<ul>
<li>Users ask 3-4 follow-up questions on average</li>
<li>"Now show me their interview scores" should just work</li>
<li>Redis sessions make it feel like talking to a person</li>
</ul>
<h2 id="heading-the-results-that-matter">The Results That Matter</h2>
<p>After deploying our "ChatGPT for MongoDB":</p>
<p>✅ <strong>Engineering requests: 10/day → 0</strong><br />✅ <strong>Query response time: 2 hours → 30 seconds</strong><br />✅ <strong>Team productivity: Significantly improved</strong><br />✅ <strong>New insights: Users ask questions they never asked before</strong></p>
<p>More importantly, our non-technical team members became confident exploring data themselves.</p>
<h2 id="heading-open-source-try-your-own-chatgpt-for-mongodb">Open Source: Try Your Own ChatGPT for MongoDB</h2>
<p>This solved our recruitment SaaS problem, but I realized every company with MongoDB probably faces similar challenges.</p>
<p>So I've open-sourced the complete system:</p>
<p><strong>🔗 GitHub:</strong>  <a target="_blank" href="https://github.com/salmankhan-prs/mongodb-nl-query-demo">mongodb-nl-query-demo</a></p>
<p><strong>What's included:</strong></p>
<ul>
<li>Full working system with demo e-commerce data</li>
<li>AI agent setup and prompt engineering</li>
<li>Dynamic schema introspection code</li>
<li>Redis conversation memory</li>
<li>Complete adaptation guide for your database</li>
</ul>
<p><strong>Built for real use:</strong></p>
<ul>
<li>TypeScript throughout for reliability</li>
<li>Production error handling and recovery</li>
<li>Rate limiting and security considerations</li>
<li>Token optimization for cost control</li>
</ul>
<h2 id="heading-quick-start-get-your-own-running">Quick Start: Get Your Own Running</h2>
<pre><code class="lang-bash"><span class="hljs-comment"># Clone and set up</span>
git <span class="hljs-built_in">clone</span> https://github.com/salmankhan-prs/mongodb-nl-query-demo
<span class="hljs-built_in">cd</span> mongodb-nl-query-demo
pnpm install

<span class="hljs-comment"># Add your API keys</span>
cp .env.example .env
<span class="hljs-comment"># Edit .env with MongoDB URI and Anthropic API key</span>

<span class="hljs-comment"># Load demo data and start</span>
pnpm seed
pnpm start:dev

<span class="hljs-comment"># Test it out</span>
curl -X POST http://localhost:3000/api/query \
  -H <span class="hljs-string">"Content-Type: application/json"</span> \
  -d <span class="hljs-string">'{"query": "Show me all users from USA"}'</span>
</code></pre>
<p>Try questions like:</p>
<ul>
<li>"How many products do we have in each category?"</li>
<li>"Show me customers who spent the most money"</li>
<li>"Which orders were delivered successfully?"</li>
</ul>
<h2 id="heading-adapt-it-to-your-database">Adapt It to Your Database</h2>
<p>The magic is in the dynamic schema reading. To use your own data:</p>
<ol>
<li><strong>Replace the models</strong> in <code>src/models/</code> with your Mongoose schemas</li>
<li><strong>Update collection names</strong> in <code>src/types/index.ts</code></li>
<li><strong>Run the schema generator</strong>: <code>pnpm generate:schemas</code></li>
<li><strong>Start asking questions</strong> about your actual data</li>
</ol>
<p>The system automatically discovers your field types, relationships, constraints, and enum values.</p>
<h2 id="heading-why-this-actually-matters">Why This Actually Matters</h2>
<p>When anyone on your team can ask database questions directly:</p>
<ul>
<li><strong>Decisions happen faster</strong> (no engineering bottlenecks)</li>
<li><strong>More insights get discovered</strong> (easier to explore data)</li>
<li><strong>Engineering focuses on features</strong> (not custom queries)</li>
<li><strong>Data becomes accessible</strong> (non-technical users gain confidence)</li>
</ul>
<h2 id="heading-the-tech-stack-that-worked">The Tech Stack That Worked</h2>
<ul>
<li><strong>AI Model:</strong> Claude 3.5 Sonnet (superior reasoning for databases)</li>
<li><strong>Agent Framework:</strong> LangChain + LangGraph</li>
<li><strong>Memory:</strong> Redis for fast session storage</li>
<li><strong>Backend:</strong> Express + TypeScript</li>
<li><strong>Database:</strong> MongoDB + Mongoose (enables dynamic introspection)</li>
</ul>
<h2 id="heading-whats-next">What's Next</h2>
<p>I'm excited to see what people build with this. Some ideas for extensions:</p>
<ul>
<li><strong>Write operations</strong> (INSERT, UPDATE, DELETE with safety checks)</li>
<li><strong>Web interface</strong> (React app for non-technical users)</li>
<li><strong>Advanced analytics</strong> (trend analysis, predictive insights)</li>
</ul>
<h2 id="heading-try-it-out">Try It Out</h2>
<p>This represents 5 days of focused work solving a real problem we faced every day. If you're dealing with similar database query bottlenecks, maybe it'll help you too.</p>
<p><strong>GitHub:</strong> <a target="_blank" href="https://github.com/salmankhan-prs/mongodb-nl-query-demo">mongodb-nl-query-demo</a></p>
<p>Questions? Reach out on <a target="_blank" href="https://x.com/salmankhanprs">Twitter</a> or <a target="_blank" href="https://www.linkedin.com/in/salman-khan-tech">LinkedIn</a>. I'd love to hear what you build with it.</p>
<p><em>Built this because we needed it. Sharing it because you might too.</em></p>
]]></content:encoded></item><item><title><![CDATA[How to scrap webpages using Javascript]]></title><description><![CDATA[In this blog, I am going to use a  puppeteer  to scrap Wikipedia pages

Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. which is headless

we are going to achieve this in two steps ...]]></description><link>https://blog.salmankhan.pro/how-to-scrap-webpages-using-javascript</link><guid isPermaLink="true">https://blog.salmankhan.pro/how-to-scrap-webpages-using-javascript</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[PATHAN SALMAN KHAN]]></dc:creator><pubDate>Fri, 21 Jan 2022 09:33:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1642757200265/B3Cc9ghhe.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this blog, I am going to use a  <a target="_blank" href="https://www.npmjs.com/package/puppeteer">puppeteer</a>  to scrap Wikipedia pages</p>
<blockquote>
<p>Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. which is headless</p>
</blockquote>
<p><strong>we are going to achieve this in two steps 
</strong>
the first step  scrap the Wikipedia page[https://en.wikipedia.org/wiki/List_of_programming_languages] to get  all the  programming names and URLs to specific  Wikipedia programming language pages  [https://en.wikipedia.org/wiki/:programminglanguageName] and stored in an array(which contains an object of name and URLs </p>
<pre><code>const puppeteer <span class="hljs-operator">=</span> <span class="hljs-built_in">require</span>(<span class="hljs-string">"puppeteer"</span>);<span class="hljs-comment">//headless chrome </span>

<span class="hljs-keyword">var</span> fs <span class="hljs-operator">=</span> <span class="hljs-built_in">require</span>(<span class="hljs-string">"fs"</span>);
let List_of_programming_languages <span class="hljs-operator">=</span> [];<span class="hljs-comment">// to store the name ad urls of programming languages </span>
(async () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
  const browser <span class="hljs-operator">=</span> await puppeteer.launch();

  const page <span class="hljs-operator">=</span> await browser.newPage();
  await page.goto(
    <span class="hljs-string">"https://en.wikipedia.org/wiki/List_of_programming_languages"</span>
  );
  await page.waitForTimeout(<span class="hljs-number">2000</span>);<span class="hljs-comment">//waiting for 2000 seconds to load the webpage </span>

  const getLanguageUrl <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
    let urls <span class="hljs-operator">=</span> document.querySelectorAll(<span class="hljs-string">".div-col ul li a"</span>);<span class="hljs-comment">//the class name for extracting  anchor tag  </span>

    const urlList <span class="hljs-operator">=</span> [...urls];

    console.log(urlList);
    const extractedUrls <span class="hljs-operator">=</span> urlList.map((u) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> u.getAttribute(<span class="hljs-string">"href"</span>));<span class="hljs-comment">//extracting only links from anchor tags </span>

    <span class="hljs-keyword">return</span> extractedUrls;
  });

  const getLanguageName <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
    let headingFromWeb <span class="hljs-operator">=</span> document.querySelectorAll(<span class="hljs-string">".div-col"</span>);<span class="hljs-comment">//the class name for extracting  div </span>


    const lanuagesNameList <span class="hljs-operator">=</span> [...headingFromWeb];

    const extractedLanuagesNameList <span class="hljs-operator">=</span> lanuagesNameList.map((h) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> h.innerText);<span class="hljs-comment">//extracting only inner text from div </span>

    <span class="hljs-keyword">return</span> extractedLanuagesNameList;
  });

  const allLanguagesUrls <span class="hljs-operator">=</span> [...getLanguageUrl];
  const allLanguagesNames <span class="hljs-operator">=</span> [...getLanguageName];
  const nameDataSplit <span class="hljs-operator">=</span> allLanguagesNames.join(<span class="hljs-string">""</span>).split(<span class="hljs-string">"\n"</span>);
<span class="hljs-comment">//saving the urls and name as single object</span>
  List_of_programming_languages <span class="hljs-operator">=</span> nameDataSplit.map((name, i) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
    <span class="hljs-keyword">return</span> {
      name,
      url: allLanguagesUrls[i],
    };
  });
</code></pre><p>the second step using the above array of URLs.
 from this specific  Wikipedia programming I have taken  full data from table(which has class name .infobox) like [Paradigm, Designed by, Developer, First appeared, Website, etc..] including images 
  and also took the first four-paragraph for description and generated an object for each programming pushed it into an array and saved it into the file.</p>
<pre><code>const getEachLanguageDetails <span class="hljs-operator">=</span> async () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
    const browser <span class="hljs-operator">=</span> await puppeteer.launch();
    const page <span class="hljs-operator">=</span> await browser.newPage();

    let allLanguagesDetails <span class="hljs-operator">=</span> [];
    <span class="hljs-keyword">for</span> (i <span class="hljs-operator">=</span> <span class="hljs-number">0</span>; i <span class="hljs-operator">&lt;</span><span class="hljs-operator">=</span> List_of_programming_languages.<span class="hljs-built_in">length</span>; i<span class="hljs-operator">+</span><span class="hljs-operator">+</span>) {
      <span class="hljs-keyword">try</span> {
        await page.goto(
          <span class="hljs-string">"https://en.wikipedia.org"</span> <span class="hljs-operator">+</span> List_of_programming_languages[i].url
        );
        await page.waitForTimeout(<span class="hljs-number">1000</span>);

        const name <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
          <span class="hljs-keyword">return</span> document.querySelector(<span class="hljs-string">".infobox-title"</span>)
            ? document.querySelector(<span class="hljs-string">".infobox-title"</span>).innerText
            : undefined;
        });

        const img_result <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
          <span class="hljs-keyword">return</span> document.querySelector(<span class="hljs-string">".infobox-image a img "</span>)
            ? document
                .querySelector(<span class="hljs-string">".infobox-image a img "</span>)
                .getAttribute(<span class="hljs-string">"src"</span>)
            : undefined;
        });

        const getTableLabels <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
          let labelsdata <span class="hljs-operator">=</span> document.querySelectorAll(
            <span class="hljs-string">".infobox  .infobox-label"</span>
          );

          const labelList <span class="hljs-operator">=</span> [...labelsdata];

          const extractedLabelList <span class="hljs-operator">=</span> labelList.map((h) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> h.innerText);

          <span class="hljs-keyword">return</span> extractedLabelList;
        });
        const getTableData <span class="hljs-operator">=</span> await page.evaluate(() <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
          let tableData <span class="hljs-operator">=</span> document.querySelectorAll(<span class="hljs-string">".infobox  .infobox-data"</span>);

          const tableDataList <span class="hljs-operator">=</span> [...tableData];

          const extractedTableDataList <span class="hljs-operator">=</span> tableDataList.map((h) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> h.innerText);

          <span class="hljs-keyword">return</span> extractedTableDataList;
        });

        const p1 <span class="hljs-operator">=</span> await page.evaluate(
          () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> document.querySelectorAll(<span class="hljs-string">"p"</span>)[<span class="hljs-number">1</span>].innerText
        );
        const p2 <span class="hljs-operator">=</span> await page.evaluate(
          () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> document.querySelectorAll(<span class="hljs-string">"p"</span>)[<span class="hljs-number">2</span>].innerText
        );
        const p3 <span class="hljs-operator">=</span> await page.evaluate(
          () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> document.querySelectorAll(<span class="hljs-string">"p"</span>)[<span class="hljs-number">3</span>].innerText
        );
        const p4 <span class="hljs-operator">=</span> await page.evaluate(
          () <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> document.querySelectorAll(<span class="hljs-string">"p"</span>)[<span class="hljs-number">4</span>].innerText
        );

        let eachLanguageDescription <span class="hljs-operator">=</span> {
          name: name ? name : List_of_programming_languages[i].<span class="hljs-built_in">name</span>,
          image: img_result ? <span class="hljs-string">"https:"</span> <span class="hljs-operator">+</span> img_result : undefined,
          description: p1 <span class="hljs-operator">+</span> <span class="hljs-string">" "</span> <span class="hljs-operator">+</span> p2 <span class="hljs-operator">+</span> <span class="hljs-string">" "</span> <span class="hljs-operator">+</span> p3 <span class="hljs-operator">+</span> <span class="hljs-string">" "</span> <span class="hljs-operator">+</span> p4,
        };
        getTableLabels.forEach((e, i) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
          eachLanguageDescription[e] <span class="hljs-operator">=</span> getTableData[i];
        });

        allLanguagesDetails.<span class="hljs-built_in">push</span>(eachLanguageDescription);
      } <span class="hljs-keyword">catch</span> (e) {
        console.log(e);
      }
    }
    fs.writeFile(
      <span class="hljs-string">"languages.txt"</span>,
      JSON.stringify(allLanguagesDetails),
      (err) <span class="hljs-operator">=</span><span class="hljs-operator">&gt;</span> {
        <span class="hljs-keyword">if</span> (err) <span class="hljs-keyword">throw</span> err;
        console.log(<span class="hljs-string">'The "data to append" was appended to file!'</span>);
      }
    );
  };

  getEachLanguageDetails();
  await browser.close();
})();
</code></pre><p>The final code will be available at  <a target="_blank" href="https://github.com/salmankhan-prs/wikipedia-scraping-puppeteer">wikipedia-scraping-puppeteer</a> </p>
]]></content:encoded></item></channel></rss>