*Disclaimer The author has never used APL* I’d like to know more about the under...

kd0amg · on Jan 25, 2019

Even without vector intrinsics, you can get a lot of mileage out of spending almost all your time in loop bodies that map primitive operations over large arrays. Interpretive overhead tends to grow a lot slower than the user program's input size.

pvitz · on Jan 25, 2019

IIRC J didn't use intrinsics until maybe two years ago. They used mostly C and everything was built with for-loops. Some special verbs or phrases were however implemented in x86 assembly.

jerf · on Jan 25, 2019

Go assembler can get to vector instructions, for instance as used by Cloudflare to implement crypto speedups: https://blog.cloudflare.com/go-crypto-bridging-the-performan...

grumpydba · on Jan 25, 2019

Indeed but without intrinsics, functions using those vector instructions cannot be inlined, and the go calling convention's overhead is still big.

patrickg_zill · on Jan 26, 2019

I think part of the speed of APL is the way that parts of it are small enough to fit in L2 cache (at least 1 of the APL descended languages, Q / K, bragged about this). Whether the Go implementation seen here will fit in typical caches of today's CPUs I don't know.