This is the problem with this gen of “external AI boards” floating around. 8, 16, even 24 is not really enough to run much useful, and even then (ie. offloading to disk) they're so impractically slow.
Forget running a serious foundation model, or any kind of realtime thing.
The blunt reality is fast high memory GPU systems you actually need to self host are really really expensive.
These devices are more optics and dreams (“itd be great if…”) than practical hacker toys.
Almost nothing useful runs in 8.
This is the problem with this gen of “external AI boards” floating around. 8, 16, even 24 is not really enough to run much useful, and even then (ie. offloading to disk) they're so impractically slow.
Forget running a serious foundation model, or any kind of realtime thing.
The blunt reality is fast high memory GPU systems you actually need to self host are really really expensive.
These devices are more optics and dreams (“itd be great if…”) than practical hacker toys.