Multiprocessing is not a real solution, it’s a break-glass procedure when you just need to throw some cores at something without any hope for reliability. Unless something has changed since I used python, it is essentially a wrapper on Fork.
This means you need to deal with stuck/dead processes. I’ve used multiprocessing extensively and once you hit a certain amount of usage, even in a pool, you just get hangs and unresponsive processes.
I’ve also written a huge amount of Cython wrapped c++ code which releases the GIL. This never hangs and I can multithread there all I want without issue.
Why would they get stuck/dead and why wouldn't that happen with threads which might be even worse as they're more tightly bound? At least with zombies or inactive processes you can detect and kill them externally - if needs be.
Haven't played with multiprocess at scale, so am genuinely interested.
If subprocesses die (segfault maybe) it isn't uncommon for them to not be cleaned up and/or cause the parent process to hang while it waits for the zombie to respond. That's one I experienced last week on Python 3.9. A thread that experienced that would likely kill the parent process or maybe even exit with a stacktrace. Way easier to debug, and doesn't require me to search through running tasks and manually kill them after each debug cycle.
My impression is that the multiprocessing module is a heroic effort, but unfortunately making the whole system work transparently across multiple OSs and architectures is a nearly insurmountable problem.
It provides a nice interface but is using multiprocessing or multi threading under the hood depending on which executioner you use:
> The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.
Your trouble seems to involve not understanding how to set up signal handlers, which ProcessPoolExecutor handles for you and exposes via a BrokenProcessPool exception.
> Derived from BrokenExecutor (formerly RuntimeError), this exception class is raised when one of the workers of a ProcessPoolExecutor has terminated in a non-clean fashion (for example, if it was killed from the outside).
Always setting a timeout on every IPC or network operation helps immensely. IIRC multiprocessing module allows that everywhere, but defaults to waiting forever in a couple of places.
Zombies don't respond, they merely have to be wait()'d for. Which should take microseconds at most.
I've seen orphaned processes sometimes idle, sometimes busy doing god knows what.
But Zombies OTOH are rarely a problem, and should be able to be dealt with easily.
Perhaps the desire of Python to be Windows compatible mitigates against some design more suitable for Unix.
If processes were a universal substitute for threads we wouldn't have threads. That reasoning only gets stronger when you apply python's heavy limitations, but it gets the most strength when you experience the awkwardness of multiprocessing firsthand.
There isn't much difference on Linux between threads and processes that share memory. Multiprocessing is fine, it's just slightly more isolated threads.
multiprocessing is very good solution for scatter-and-gather (or map/reduce) type workloads:
for example ssh to 1000 machines, run some commands, grab output, analyze output, done some action based on output, etc
if you are managing a fleet of machines and have some tasks to do on each machine, then multiprocessing is the life saver.
There is a "fork" mode and a "spawn" mode. Fork (the default) tends to result in broken process pools as you say, spawn seems to work a lot better but the performance is worse.
I’m not a huge fan of Cython and the like. It seems to be more natural to open a tcp connection to a c/c++ program and let that do the heavy lifting. Anything else seems like not a proper UNIX style solution.
This means you need to deal with stuck/dead processes. I’ve used multiprocessing extensively and once you hit a certain amount of usage, even in a pool, you just get hangs and unresponsive processes.
I’ve also written a huge amount of Cython wrapped c++ code which releases the GIL. This never hangs and I can multithread there all I want without issue.