Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had this misunderstanding for a long time until I saw Go explain the difference: https://go.dev/blog/waza-talk

The confusion here is parallelism vs concurrency. Parallelism is executing multiple tasks at once and concurrency is the composition of multiple tasks.

For example, imagine there is a woodshop with multiple people and there is only one hammer. The people would be working on their projects such as a chair, a table, etc. Everyone needs to use the hammer to continue their project.

If someone needed a hammer, they would take the single hammer and use it. There are still other projects going on but everyone else would have to wait until the hammer is free. This is concurrency but not parallelism.

If there are multiple hammers, then multiple people could use the hammer at the same time and their project continues. This is parallelism and concurrency.

The hammer here is the CPU and the multiple projects are threads. When you have Python concurrency, you are sharing the hammer across different projects, but it's still one hammer. This is useful for dealing with blocking I/O but not computing bottlenecks.

Let's say that one of the projects needs wood from another place. There is no point in this project to hold on to the hammer when waiting for wood. This is what those Python concurrency libraries are solving for. In real life, you have tasks waiting on other services such as getting customer info from a database. You don't want the task to be wasting the CPU cycles doing nothing, so we can pass the CPU to another task.

But this doesn't mean that we are using more of the CPU. We are still stuck with a single core. If we have a compute bottleneck such as calculating a lot of numbers, then the concurrency libraries don't help.

You might be wondering why Python only allows for a single hammer/CPU core. It's because it's very hard to get parallelism properly working, you can end up with your program stalling easily if you don't do it correctly. The underlying data structures of Python were never designed with that in mind because it was meant to be a scripting language where performance wasn't key. Python grew massive and people started to apply Python to areas where performance was key. It's amazing that Python got so far even with GIL IMO.

As an aside, you might read about "multiprocessing" Python where you can use multiple CPU cores. This is true but there are heavy overhead costs to this. This is like building brand-new workshops with single hammers to handle more projects. This post would get even longer if I explained what is a "process" but to put it shortly, it is how the OS, such as Windows or Linux, manages tasks. There is a lot of overhead with it because it is meant to work with all sorts of different programs written in different languages.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: