> I think this is solid proof that the bedrock of academia is deeply motivated by money and still defaults to optimizing where it impacts its bottom line.
no shit - could've asked literally anyone that's finished their phd to save yourself the conjecturing/hypothesizing about this fact.
> My conclusion after using Julia for many years is that there are too many correctness and composability bugs throughout the ecosystem to justify using it in just about any context where correctness matters.
i hope you realize this is purely because julia uses LLVM and LLVM has backends for those targets (noticeably absent are GPUs which do not have LLVM backends). any other language which uses LLVM could do the same exact same thing (and would be hampered in the exact same way).
Probably true, but one unique thing about Julia is, that exposes almost all stages of the compilation to the user. From typed IR to native code generation you can customise the compilation in many ways. Together with the power of LISP's metaprogramming features, that's a really fine basis for powerful and performamt DSLs and code transformations.
All those GPU targets are powered by libraries, that are not part of Julia itself (GPUCompiler.jl). The same goes for automatic differentiation. That's remarkable in my opinion.
So you're right, that many programming languages could do it, but it's no wonder, that other languages are lacking in this regard compared to Julia.
> but absolutely no one is going to switch from C to C++ just for dtors
The decision would be easier if the C subset in C++ would be compatible with modern C standards instead of being a non-standard dialect of C stuck in ca. 1995.
For two reasons: First, where C++ features are used, it make the code harder to understand rather than easier. Second, it requires newer and more complex toolchains to build GCC itself. Some people still maintain the last C version of GCC just to keep the bootstrap path open.
This is not my experience at all. In fact, my experience is where C++ is used in GCC it became harder to read. Note that GCC was written in C and then introduced C++ features later so this is not hypothetical.
In general, I think clean C code is easier to read than C++ due to much less complexity and not having language features that make it more difficult to understand by hiding crucial information from the reader (overloading, references, templates, auto, ..).
> list object is constructed once and assigned to both variables
Ummm no the list is constructed once and assigned to b and then b is assigned to a. It would be crazy semantics if `a = b = ...` meant `a` was assigned `...`.
Edit: I'm wrong it's left to right not right to left, which makes the complaint in the article even dumber.
it's astounding to me how many people pop off about "AMD SHOULD SUPPORT CUDA" not knowing that HIP (and hipify) has been around for literally a decade now.
Major players in China don't play like that. MooreThreads, Lisuan, and many other smaller companies all have their own porting kits, which are basically copied from HIP. They just port every piece of software and it just works.
If you want to fight against Nvidia monopoly, then don't just rant, but buy a GPU other than Nvidia and build on it. Check my GitHub and you'll see what I'm doing.
You don't understand what HIP is - HIP is AMD's runtime API. it resembles CUDA runtime APIs but it's not the same thing and it doesn't need to be - the hard part of porting CUDA isn't the runtime APIs. hipify is the thing that translates both runtime and kernels. Now is hipify a drop-in replacement? No of course but because the two vendors have different architectures. So it's absolutely laughable to imagine that some random could come anywhere near "drop-in replacement" when AMD can't (again: because of fundamental architecture differences).
I think you misunderstand what's fundamentally possible with AMD's architecture. They can't wave a magic wand for a CUDA compatibility layer any better than Apple or Qualcomm can, it's not low-hanging fruit like DirectX or Win32 translation. Investing billions into translating CUDA on raster GPUs is a dead end.
AMD's best option is a greenfield GPU architecture that puts CUDA in the crosshairs, which is what they already did for datacenter customers with AMD Instinct.
Let's say you put 50-100 seasoned devs on the problem, and within 2-3 years, probably get ZLUDA to the point where most mainstream CUDA applications — ML training/inference, scientific computing, rendering — run correctly on AMD hardware at 70-80% of the performance you'd get from a native ROCm port. Even if its not optimal due to hardware differences, it would be genuinely transformative and commercially valuable.
This would give them runway for their parallel effort to build native greenfield libraries and toolkits and get adoption, and perhaps make some tweaks to future hardware iterations that make compatibility easier.
And while compatibility layers aren't illegal, they ordinarily have to be a cleanroom design. If AMD knew that the ZLUDA dev was decompiling CUDA drivers to reverse-engineer a translation layer, then legally they would be on very thin ice.
ROCm is supported by the minority of AMD GPUs, and is accelerated inconsistently across GPU models. 70-80% of ROCm's performance is an unclear target, to the point that a native ROCm port would be a more transparent choice for most projects. And even then, you'll still be outperformed by CUDA the moment tensor or convolution ops are called.
Those billions are much better-off being spent on new hardware designs, and ROCm integrations with preexisting projects that make sense. Translating CUDA to AMD hardware would only advertise why Nvidia is worth so much.
> it would be genuinely transformative and commercially valuable.
Bullshit. If I had a dime for every time someone told me "my favorite raster GPU will annihilate CUDA eventually!" then I could fund the next Nvidia competitor out of pocket. Apple didn't do it, Intel didn't do it, and AMD has tried three separate times and failed. This time isn't any different, there's no genuine transformation or commercial value to unlock with outdated raster-focused designs.
This is a big part of AMD still not having a proper foothold in the space: AMD Instinct is quite different from what regular folks can easily put in their workstation. In Nvidia-land I can put anything from mid-range gaming cards, over a 5090 to an RTX 6000 Pro in my machine and be confident that my CUDA code will scale somewhat acceptably to a datacenter GPU.
This is where I feel like Khronos could contribute, making a Compute Capability-equivalent hardware standard for vendors to implement. CUDA's versioning of hardware capabilities plays a huge role in clarifying the support matrix.
...but that requires buy-in from the rest of the industry, and it's doubtful FAANG is willing to thread that needle together. Nvidia's hedged bet against industry-wide cooperation is making Jensen the 21st century Mansa Musa.
No I'm arguing with someone who clearly doesn't understand GPUs
> invest BILLIONS to make this happen
As I have already said twice, they already have, it's called hipify and it works as well as you'd imagine it could (ie poorly because this is a dumb idea).
Wow you're so very smart! You should tell all the llm and stablediffusion developers who had no idea it existed! /s
HIP has been dismissed for years because it was a token effort at best. Linux only until the last year or two, and even now it only supports a small number of their cards.
Meanwhile CUDA runs on damn near anything, and both Linux and Windows.
Also, have you used AMD drivers on Windows? They can't seem to write drivers or Windows software to save their lives. AMD Adrenalin is a slow, buggy mess.
Did I mention that compute performance on AMD cards was dogshit until the last generation or so of GPUs?
https://en.wikipedia.org/wiki/Motivated_reasoning
reply