These don't really apply to the parent commenter's scenario.
1) gunicorn or any solution with multiple processes is going to just multiply the RAM usage. Using 10-100GB of RAM per effective thread makes this sort of problem very RAM bound, to the point that it can be hard to find hardware or VM support.
2) This isn't I/O bound.
3) If your service is fundamentally just looking up data in a huge in-memory data store, adding LRU caching around that is unlikely to make much of a difference because you're a) still doing a lookup in memory, just for the cache rather than the real data, and b) you're still subject to the GIL for those cache lookups.
I've also written services like this, we only loaded ~5GB of data, but it was sufficient to be difficult to manage in a few ways like this. The GIL-ectomy will probably have a significant impact on these sorts of use cases.
Ha! Yes! Unfortunately I know this because of terrible reasons. Python is reference counted so copy-on-write doesn't work for this with Python objects (note: if your Python object is actually just a reference to a native object in a library all bets are off, may work or may not).
We had an issue with the service I mentioned above where VMs with ~6GB RAM weren't working, because at the point that gunicorn forked there was instantaneously >10GB RAM usage because everything got copied. We had to make sure that the data file was only loaded after the daemon fork, which unfortunately limits the benefits of that fork, part of the idea is that you do all your setup before forking so that you know you've started cleanly.
1) gunicorn or any solution with multiple processes is going to just multiply the RAM usage. Using 10-100GB of RAM per effective thread makes this sort of problem very RAM bound, to the point that it can be hard to find hardware or VM support.
2) This isn't I/O bound.
3) If your service is fundamentally just looking up data in a huge in-memory data store, adding LRU caching around that is unlikely to make much of a difference because you're a) still doing a lookup in memory, just for the cache rather than the real data, and b) you're still subject to the GIL for those cache lookups.
I've also written services like this, we only loaded ~5GB of data, but it was sufficient to be difficult to manage in a few ways like this. The GIL-ectomy will probably have a significant impact on these sorts of use cases.