This article doesn't seem to make a strong case that a hierarchical tree is the best model for language (or motor control, or image recognition). Just because you can model something as a tree doesn't mean that's the most parsimonious or effective model. The relative lack of success thus far of symbolic parse tree based techniques in NLP compared to techniques grounded in other models should be a strong hint that trees are not the best map of the territory.
I do not work on NLP, but my understanding was that, for purely syntactic work, standard parse-tree-based techniques had been quite successful in NLP; and that it is only for semantic work that symbolic representations begin to show weaknesses. Since we often care about the meanings of words, this is a pretty strong limitation; still, it suggests that standard grammar-and-parse-tree approaches do capture something significant about how human languages work.
I don't work in NLP either, but from my understanding, the boundary between syntax and semantics is never as clean as one might imagine, and each language draws the boundary differently, so in general the utility of just looking at syntax can greatly vary.
Another issue is that many real life sentences have more than one possible parse, and we use context and semantics to disambiguate, e.g. how do you parse 'fruit flies like a banana'.
The problem with what you're saying (that grammars are not necessarily the best representations of complex hierarchical structures, with which I agree) is that anything that can represent a complex hierarchical structure as well as a grammar must necessarily be equivalent in computational power to that grammar- and unfortunately, we know of no computational process that cannot be expressed as a grammar at least in principle.
"Unfortunately" because this means that any representation of such a process you may want to pick over a gramar, that is not a grammar, will either have to be reducible to a grammar, or fail to capture the expressive power of the modelled process.
So, sure, a "tree" may not be the best way to model natural language. But we don't have the theoretical tools to figure out what can be a better representation than that.