Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great stuff. I think there are some potential tricks here to reduce the number of comparisons - maybe do a parallel subtract by k to pull down 'A' to -128 (smallest possible byte) then do your comparison against (ord('Z')-k). Or maybe push up 'Z' to +127..?

That way you can get a single comparison and can replace a pcmpgtb and pand with a single subtract. Then switch it to SSE2, unroll and you're good to go.

Alternately, http://www.azillionmonkeys.com/qed/asmexample.html in section 11 ("Converting Uppercase") contains a brainsmashing version of this entirely in SWAR ("SIMD Within a Register"), which could be adapted with a certain amount of pain (largely due to the absence of a double-quadword bitshift in SSE2, which is retardlepated).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: