The Night I Chased a Ghost Bug in My Docker Deployment by Shifa Salheen on May 19, 2025 42 views

Teaser: Scaling exposed a hidden build issue that cost me 12 hours of debugging, wrong guesses, and a hard-earned lesson about containers.

My Weekend Plan Was 200 Until It Returned a 404

It all began on a cozy Friday evening—the kind where Slack and Teams go quiet, the lights are dimmed, and you’re already dreaming of a Netflix binge with a warm cup of black coffee in hand.

But just as I was about to hit play…

there came the most annoying Microsoft Teams ringtone — that dreaded chime that hijacks your vibe and ruins all your plans:

“The app has some ghost and is behaving weird when we scale pods.”

My Friday night? Gone. I spent the entire evening trying to trace the issue.

By the time Saturday rolled around, I was still deep in the trenches. We had a release coming up the next week, and there was no way I could leave things broken.

So there I was — still in my pajamas — glued to the screen, gulping down coffee after coffee ☕, diving deeper into a 12-hour rabbit hole of broken pages, dead ends, and the most elusive bug I’ve chased in months.


The First Sign Something Was Off

At first glance, the app looked perfect.

One pod running. All pages loading beautifully.

But the moment we scaled the pods — chaos, chaos and unpredictable chaos.

  • Some pages loaded.
  • Some didn’t.
  • Some threw weird 404 errors.
  • Some just broke mid-loading.
  • At times everything works, at time nothing works

At this point, it felt like the app had developed a mind of its own.


The Usual Suspects (and All the Wrong Turns)

The most logical culprit? Caching.

It made perfect sense —

Pages loaded fine sometimes and broke other times.

Could it be that the browser was caching old static files while the server had newer ones?

I went all in on fixing caching:

  • Cleared browser caches aggressively — hard refreshes, cleared local storage, service workers, everything.
  • Tweaked Cache-Control headers inside nginx:
  • Played with ETag headers:
  • Modified nginx expires settings:
  • Even manually appended version numbers to static file URLs —

For a while, it felt like I was onto something -some refreshes worked, some pages loaded.

And for a moment? Relief.

Some things loaded. Then… 404s again.

The bug was mocking me. ☕ More coffee. Back to logs.

Though I managed to catch a few hours of sleep Friday night, I swear I was debugging in my dreams too — stuck between stack traces and main.js file hashes.


Maybe It Was Service Workers?

Next thought:

  • Maybe Service Worker caching was interfering?
  • I disabled Service Workers.
  • Cleared everything.
  • Re-registered them.
  • Tested both with and without Service Workers.

No difference

Load Balancer? Sticky Sessions? Surely… 🧠

By now, my doubts were starting to shift into suspicion.

Could this really be nginx?

If multiple pods were serving requests, maybe one of them just wasn’t behaving.

What if the browser got the HTML from one pod, but static assets from another — and that other one didn’t have them?

So I rolled up my sleeves and went deeper:

  • Checked upstream setup — all seemed fine. Backend servers were listed and load balanced correctly.
  • Enabled sticky sessions using ip_hash — and something changed. Suddenly, things started working.
  • Manually hit each pod — bypassed nginx entirely to compare behavior.

That’s when suspicion turned into conviction.

If sticky sessions fixed it, Pod 2 was clearly broken.

It wasn’t nginx. It wasn’t routing. It was something inside the pod.

The Turning Point: When I Peered Inside the Pods

With coffee in one hand and terminal in the other, I dove into the belly of the beast. No more surface-level debugging. I SSH-ed straight into the containers — like stepping directly into the crime scene.

After hours of chasing theories, I decided to stop guessing — and start seeing.

Went straight to the /dist folder — the folder where the built frontend assets should live — identical across pods.

And what I saw hit me instantly:

  • Pod 1 had a file called main-abc123.js.
  • Pod 2 had a completely different main-def456.js.
  • Other static assets were mismatched too — different timestamps, different hashes.

Each pod had its own different build.

No wonder the browser was breaking:

  • Requesting main-abc123.js from Pod 1? All good.
  • Requesting the same file from Pod 2? 404 error. Page crash.


The mystery finally had a face.

It wasn’t nginx.

It wasn’t browser caching.

It wasn’t environment variables.

It was a deeper problem: the build process itself was wrong.


The Real Villain: A Tiny Docker Mistake

Once I saw that each pod had a different main.js hash, the real investigation began:

How were the builds ending up different in the first place?

I dug into the Docker image and the start.sh script — and there it was, plain as day.

The Docker image wasn’t being built with production-ready frontend assets baked in.

Instead, inside start.sh, there was a sneaky line:

npm run build

Which meant:

  • Every time a container started,
  • It would build the frontend at runtime inside the container — based on whatever random conditions existed then.

Result?

  • Pod 1 would generate main-abc123.js.
  • Pod 2 would generate main-def456.js.
  • Pod 3 would generate main-ghi789.js.

Each one was self-building differently. Each one had a different set of static files.

Why This Was a Disaster:

The browser expects that assets like main-abc123.js will exist on any pod it hits.

But if the HTML came from Pod 1 and asset request went to Pod 2? Boom — 404 error.

Classic distributed system mistake: assuming all nodes are identical — when they’re not.

The Fix: Obvious Now, Painful Then

Once I realized the real villain — builds happening at runtime — the fix was surprisingly simple.

I updated the Dockerfile to build the app during the image creation itself:

# Build stage
FROM node:18 as builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html 

Now:

  • ✅ The dist/ folder was already fully built and packaged into the image.
  • ✅ Every pod started with the exact same set of static files.
  • ✅ No more random builds. No more mismatched hashes.

I redeployed the updated image.

✅ Scaled the pods.

✅ Held my breath.

✅ And this time — everything just worked.

✅ No 404s.

✅ No broken pages.

✅ No randomness.

Just a clean, consistent, reliable deployment across every pod.


Lessons That Are Now Burned Into My Mind

That Saturday cost me an entire day, several cups of black coffee, and a lot of sanity — but the lessons I learned are priceless:

  • Never build your app at container startup.
  • Treat scaling as part of your development testing.
  • When bugs feel random, suspect environment and deployment first.
  • Don’t just trust configs and assumptions — go inside and verify.
  • Distributed systems demand consistency.

The Ending (and a Small Victory Dance)

That weekend didn’t end with Netflix and chill,

but it did end with a solid deployment, a fixed app, and a lesson burned deep into my engineering instincts.

Now every time I build a Docker image, a little voice inside reminds me:

“Build once, run everywhere. Not build whenever you feel like it.”

This was Episode 1 of “Journey of Debugging Nightmares.”

Follow me to catch the next real-world debugging disaster! 🚀

— Shifa Salheen

Invisible Recaptcha: For a Seamless User Experience

About Author

Shifa Salheen

Shifa is a coder by profession, a singer and a bibliophile by passion. She is a keen observer of nature and when not coding, she scribbles poems and quotes, adoring the vivid beauty around her . She has published a collection of poems named "Tales from the Cafe" and is soon planning to finish her debut fiction novel. Always surrounded by books, she calls coffee and books her eternal lovers.