Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
dist-epoch
49 days ago
|
parent
|
context
|
favorite
| on:
Big GPUs don't need big PCs
After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.
numpad0
49 days ago
[–]
Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: