A lot of ppl get this backwards, that a stable program should never "crash". Whi...

yellowapple · on June 1, 2015

This is how Erlang (for example) gets its reputation of being "nine-nines" capable (i.e. capable of 99.9999999% uptime, or downtime on the order of milliseconds per year). Erlang (and Elixir and LFE) software following the OTP framework is usually ordered into "supervision trees" - layer upon layer of Erlang processes managing other Erlang processes in turn managing other Erlang processes, all potentially distributed across multiple Erlang VM (nowadays BEAM) instances.

tormeh · on June 1, 2015

A watchdog pattern just splits the program into several processes, the program as a whole still never crashes.

peteretep · on June 2, 2015

If an error is caught and handled, calling it a crash seems disingenuous.

z3t4 · on June 2, 2015

One mistake that ppl do is they wrap their code around a try ... catch, where it's better to throw an error and exit. If there's an error in one place, chances are there are also errors elsewhere, so it's better to restart the program instead of continue with a bad state.

When the error gets thrown in your face, there's a higher chance that it gets fixed.

But this also have its setbacks. Loosing the whole state can be really bad.

peteretep · on June 2, 2015

I'm really note sure why you think catching an error in a separate process is somehow superior to catching it in a higher scope

sacado2 · on June 2, 2015

I think it depends on the kind of error. If it is a "bug-detected" error (null-pointer dereference, out-of-bounds, divide-by-zero, out-of-memory, etc.), you better restart the program since you're in an unstable state. If it is a "domain-specific" error (connection lost, robot could not reach its destination, battery low, etc.), you better deal with it as soon as possible.