I think part of the speed of APL is the way that parts of it are small enough to fit in L2 cache (at least 1 of the APL descended languages, Q / K, bragged about this). Whether the Go implementation seen here will fit in typical caches of today's CPUs I don't know.