Kubernetes is really here nor there. It's the crashing of the app that is our focus. An app should not be crashing on expected behaviour.
That's clearly a bug, and the bug you need to fix first so that you can have your failsafes start working again. You asked where to start and that's the answer, unquestionably.
The app doesn't crash, it's deadlocked. It can't do any more work because to do future work it needs to accept TCP connections. It can't do that because it has hit a resource limit. It hit the resource limit because it didn't correctly close files. It can't close files because of a bug in the filesystem. You don't know this because you didn't log the errors.
I really don't know how I can make my explanation simpler.
> I really don't know how I can make my explanation simpler.
Not making up some elaborate story that you now are trying to say didn't even happen would be a good start. What you are actually trying to communicate is not complicated at all. It didn't need a story. Not sure what you were thinking when you decided fiction writing was a good idea, but I certainly had fun making fun of you for it! So, at least it was not all for not.
"You wake up and find out that Heroku's staff is anxiously awaiting your departure from your apartment to tell you that your app is down."