In practice if you want the popcount of anything larger than a couple registers you'll want to use vector instructions anyways though. There are a lot of operations that will speed up certain applications if made into their own instruction, not all of them need to be.