there’s decent work on computational reasoning power of transformers, SSMs, etc.
some approximate snippets that come to mind are that decoder-only transformers recognize AC^0 and think in TC^0, that encoder-decoders are strictly more powerful than decoder-only, etc.
Person with last name Miller iric if poke around on arXiv, a few others, been a while since was current top of mind so ymmv on exact correctness of above snippets
some approximate snippets that come to mind are that decoder-only transformers recognize AC^0 and think in TC^0, that encoder-decoders are strictly more powerful than decoder-only, etc.
Person with last name Miller iric if poke around on arXiv, a few others, been a while since was current top of mind so ymmv on exact correctness of above snippets