I implemented your "free 2x speedup" in asm [1]. Sure enough, solution time improved to 4.6us on my machine [2].
[1] https://github.com/davidad/8queens/commit/61119a7f0019e4a85e... [2] https://github.com/davidad/8queens/tree/free-2x-speedup
I implemented your "free 2x speedup" in asm [1]. Sure enough, solution time improved to 4.6us on my machine [2].
[1] https://github.com/davidad/8queens/commit/61119a7f0019e4a85e... [2] https://github.com/davidad/8queens/tree/free-2x-speedup