The semantics of C map poorly onto this kind of code. The types of decisions that need to be made here, like combining certain state values into single registers so they can be operated on in parallel, and the particular structure of branching, would be difficult to make post-hoc to a C-style program.
If you're excited about it, though, I'd encourage you to give it a shot and write a blog post about your experience! However hard it turns out to be, I bet we'd all learn something new. (:
I wonder how my old C code stack up to this as well. 15ms seems like eternity btw. My naive old code without any thought in it (no bit magic, no intrinsics) and only with -Ofast flag on i7 3770 Windows 7 iterates all the solutions in 10.1ms as well as prints first of them on the screen.
If redirected to file it takes 0.232ms, again to iterate all the solutions and print first of them.
It would be nice if authors provide some way to compare it to their code. Preferably on say 15 queens instead of 8 so pretty printing time could be ignored :)