Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes! The varying precisions and maths feels like just the start!

Look at next gen Rubin with it's CPX co-processor chip to see things getting much weirder & more specialized. There for prefilling long contexts, which is compute intensive:

> Something has to give, and that something in the Nvidia product line is now called the "Rubin" CPX GPU accelerator, which is aimed specifically at parts of the inference workload that do not require high bandwidth memory but do need lots of compute and, increasingly, the ability to process video formats for both input and output as part of the AI workflow.

https://www.nextplatform.com/2025/09/11/nvidia-disaggregates...

To confirm what you are saying, there is no coherent unifying way to measure what's getting built other than by power consumption. Some of that budget will go to memory, some to compute (some to interconnect, some to storage), and it's too early to say what ratio each may have, to even know what ratios of compute:memory we're heading towards (and one size won't fit all problems).

Perhaps we end up abandoning HBM & dram! Maybe the future belongs to high bandwidth flash! Maybe with it's own Computational Storage! Trying to use figures like flops or bandwidth is applying today's answers to a future that might get weirder on us. https://www.tomshardware.com/tech-industry/sandisk-and-sk-hy...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: