Interesting architecture. I like how well-documented everything is. Usually, either the low-level ISA for accelerator chips is not documented at all (like with GPUs), or detailed documentation is only available under NDA, and only proprietary development tools are available (like with FPGAs).
The topology reminds me of this paper "The Landscape of Parallel Computing Research: A View from Berkeley" http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-18...