I’d like to know more about the underlying motivation! APL is a fascinating language but I’m not sure how you’d implement it efficiently in a language like Go - you would need intrinsics to get at the vector instructions no?
Even without vector intrinsics, you can get a lot of mileage out of spending almost all your time in loop bodies that map primitive operations over large arrays. Interpretive overhead tends to grow a lot slower than the user program's input size.
IIRC J didn't use intrinsics until maybe two years ago. They used mostly C and everything was built with for-loops. Some special verbs or phrases were however implemented in x86 assembly.
I think part of the speed of APL is the way that parts of it are small enough to fit in L2 cache (at least 1 of the APL descended languages, Q / K, bragged about this). Whether the Go implementation seen here will fit in typical caches of today's CPUs I don't know.
The author has never used APL
I’d like to know more about the underlying motivation! APL is a fascinating language but I’m not sure how you’d implement it efficiently in a language like Go - you would need intrinsics to get at the vector instructions no?