More

keldaris · 2025-12-03T12:22:48 1764764568

Thankfully, you can still write C++ just fine without the "modern" stuff and have not only readable code, but also sane compile times. The notion, explicitly mentioned in the article, that all this insane verbosity also adds 5 seconds to your build for a single executor invocation is just crazy to me (it is far longer than my entire build for most projects).

menaerus · 2025-12-03T12:36:51 1764765411

I am confused. Out of curiosity WDYM by 5 seconds being far longer than your entire build for most projects? That sounds crazy low.

keldaris · 2025-12-03T22:19:57 1764800397

It's not crazy, it's just what happens if you write mostly C with some conveniences where they actually make sense instead of "modern C++". I generally write very performance sensitive code, so it's naturally fairly low on abstraction, but usually most of my projects take between one and two seconds to build (that's a complete rebuild with a unity build, I don't do incremental builds). Those that involve CUDA take a bit longer because nvcc is very slow, but I generally build kernels separately (and in parallel) with the rest of the code and just link them together at the end.

menaerus · 2025-12-04T08:55:22 1764838522

Sure, C++ is heavy for compilation, there's simply more by the compiler to do, but code repository building under 5 seconds is at the very low end of tail so making the point about someone bearing with the 5 seconds longer build time is sort of moot.

I wrote a lot of plain C and a lot of C++ (cumulatively probably close to MLoC) and I can't remember any C code that would compile in such a short time unless it was a library code or some trivial example.

keldaris · 2025-10-14T01:52:36 1760406756

Same here, I have multiple decades of experience running Linux on desktops and servers alike, and Omarchy just saves me time and manages to be productive and fun at the same time.

Personally, I don't feel any moral obligation to investigate the personal views of people who write the software I use. Using software, especially free software, doesn't constitute an endorsement of the authors' views. Before this thread, I was blissfully unaware of this entire silly controversy, since Omarchy doesn't mention any politics anywhere as far as I can tell. If that ever changes, I'll delete it in a heartbeat (regardless of the kind of politics it happens to be), but so far the only people politicizing the issue seem to be its detractors.

rufugee · 2025-10-14T12:35:36 1760445336

The elapsed time from burning the ISO to productive development environment is impressive. Also, folks worry so much about customizing it, but you don't have to. And hyprland and Omarchy almost entire driven by text files, so Claude Code and its ilk are super effective at customizations.

bigyabai · 2025-10-14T16:55:01 1760460901

I guess I should defend my point! I actually really like Hyprland (despite it's controversy) and really have no interest in re-hashing DHH's ragebait. My larger point is that we've seen this happen before, hundreds of times, and these distros always end up breaking and making people blame Linux instead of their maintainer. I don't think DHH is addressing this concern, and he's basically teeing-up a catastrophic system update with zero rollbacks by choosing Arch as the base systen.

If you search the web for "Manjaro broken update" or "LARBS error" you're just flooded with myriad tech issues that don't exist on normal systems. It's a genuine handicap to rely on someone else's opinionated dotfiles when you don't understand why they made each decision. I think people using Omarchy long-term will end up fighting the distro more than they fight Linux.

rufugee · 2025-10-15T13:13:02 1760533982

Omarchy uses limine plus snapper to give you (by default, but configurable) five system rollbacks. Each time an update happens, or a package is installed, a bootable btrfs snapshot is created. I've leveraged this myself to after an update caused an issue with nvidia drivers.

I don't mean this to come across as snarky, but before you spread misinformation, you might want to inform yourself.

keldaris · 2025-07-16T15:18:01 1752679081

> and then released Vulkan years later as a response that has had incredibly slow adoption due to the same over complexity that OpenCL died from.

I agree with everything else you said, but as someone who has used both OpenCL and Vulkan, the complexity is not comparable in the slightest. Even for pure compute applications, Vulkan is radically more annoying to use than OpenCL, though of course both are much worse than CUDA (or even Metal). OpenCL is somewhat annoying, but usable if you're okay with a basic C API, whereas Vulkan feels like an insufferable waste of time.

dagmx · 2025-07-16T15:44:20 1752680660

Oh for sure. Sorry, I didn’t mean they were the same level of complexity as each other. Just that they’re significantly more complex than their respective alternatives.

keldaris · 2025-06-07T23:52:49 1749340369

There's nothing wrong with using LTO, but I prefer simply compiling everything as a single translation unit ("unity builds"), which gets you all of the LTO benefits for free (in the sense that you still get fast compile times too).

keldaris · 2025-04-11T20:12:25 1744402345

How are you writing compute shaders that work on all platforms, including Mac? Are you just writing Vulkan and relying on MoltenVK?

AFAIK, the only solution that actually works on all major platforms without additional compatibility layers today is OpenCL 1.2 - which also happens to be officially deprecated on MacOS, but still works for now.

coffeeaddict1 · 2025-04-11T21:14:00 1744406040

Yes, MoltenVK works fine. Alternatively, you can also use WebGPU (there are C++ and Rust native libs) which is a simpler but more limiting API.

keldaris · 2025-04-11T21:24:20 1744406660

WebGPU has no support for tensor cores (or their Apple Silicon equivalents). Vulkan has an Nvidia extension for it, is there any way to make MoltenVK use simdgroup_matrix instructions in compute shaders?

coffeeaddict1 · 2025-04-11T21:39:15 1744407555

AFAIK, MoltenVK doesn't. Dawn (Google's C++ WebGPU implementation) does have some experimental support for it [0][1].

[0] https://issues.chromium.org/issues/348702031

[1] https://github.com/gpuweb/gpuweb/issues/4195

pjmlp · 2025-04-11T21:11:19 1744405879

And is stuck with C99, versus C++20, Fortran, Julia, Haskell, C#, anything else someone feels like targeting PTX with.

keldaris · 2025-04-11T21:22:19 1744406539

Technically, OpenCL can also include inline PTX assembly in kernels (unlike any compute shader API I've ever seen), which is relevant for targeting things like tensor cores. You're absolutely right about the language limitation, though.

pjmlp · 2025-04-12T08:20:32 1744446032

At which point why bother, PTX is CUDA.

keldaris · 2025-04-12T17:33:38 1744479218

Generally, the reason to bother with this approach is if you have a project that only needs tensor cores in a tiny part of the code and otherwise benefits from the cross platform nature of OpenCL, so you have a mostly shared codebase with a small vendor-specific optimization in a kernel or two. I've been in that situation and do find that approach valuable, but I'll be the first to admit the modern GPGPU landscape is full of unpleasant compromises whichever way you look.

keldaris · on Feb 1, 2025

Luckily, little of it matters if you simply write C for your actual target platforms, whatever they may be. C thankfully discourages the very notion of "general purpose" code, so unless you're writing a compiler, I've never really understood why some C programmers actually care about the standard as such.

In reality, if you're writing C in 2025, you have a finite set of specific target platforms and a finite set of compilers you care about. Those are what matter. Whether my code is robust with respect to some 80s hardware that did weird things with integers, I have no idea and really couldn't care less.

msla · on Feb 1, 2025

> I've never really understood why some C programmers actually care about the standard as such.

Because I want the next version of the compiler to agree with me about what my code means.

The standard is an agreement: If you write code which conforms to it, the compiler will agree with you about what it means and not, say, optimize your important conditionals away because some "Can't Happen" optimization was triggered and the "dead" code got removed. This gets rather important as compilers get better about optimization.

uecker · on Feb 1, 2025

True, we are currently eliminating a lot of UB from the future C standard to avoid compilers breaking more code.

Still, while I acknowledge that this is a real issue, in practice I find my C code from 30 years ago still working.

It is also a bit the fault of users. Why favor so many user the most aggressive optimizing compilers? Every user filing bugs or complaining about aggressive optimizing breaking code in the bug tracker, very user asking for better warnings, would help us a lot pushing back on this. But if users prefer compiler A over compiler B when you a 1% improvement in some irrelevant benchmark, it is difficult to argue that this is not exactly what they want.

bobmcnamara · on Feb 9, 2025

Sadly, at least in the embedded space some of us still deal with platforms where the proprietary core vendor's compiler routinely beats open source compiler cycle counts by a factor of 1.5 to 3.

The big weak region seems to be in-order machines with smaller numbers of general purpose registers.

GCC at least seems to do its basic block planning entirely before register allocation with no feedback between phases.

keldaris · on Feb 1, 2025

In practice, you're going to test the next version of the compiler anyway if you want to be sure your code actually works. Agreements or not, compilers have bugs on a regular basis. From the point of view of a programmer, it doesn't matter if your code broke because you missed some fine point in the standard or because the compiler got it wrong, either way you're going to want to fix it or work around it.

In my experience, if you don't try to be excessively clever and just write straightforward C code, these issues almost never arise. Instead of wasting my time on the standard, I'd rather spend it validating the compilers I support and making sure my code works in the real world, not the one inhabited by the abstract machine of ISO C.

lelanthran · on Feb 1, 2025

> In practice, you're going to test the next version of the compiler anyway

> In my experience, if you don't try to be excessively clever and just write straightforward C code, these issues almost never arise.

I think these two sentiments are what gets missed by many programmers who didn't actually spend the last 25+ years writing software in plain C.

I lose count of the number of times I see in comments (both here and elsewhere) how it should be almost criminal to write anything life-critical in C because it is guaranteed to fail.

The reality is that, for decades now, life-critical software has been written in C - millions and millions of lines of code controlling millions and millions of devices that are sitting in millions and millions of machines that kill people in many failure modes.

The software defect rate resulting in deaths is so low that when it happens it makes the news (See Toyota's unintended acceleration lawsuit).

That's because, regardless of what the programmers think their code does, or what a compiler upgrade does to it, such code undergoes rigorous testing and, IME, is often written to be as straightforward as possible in the large majority of cases (mostly because the direct access to the hardware makes reasoning about the software a little easier).

keldaris · on Nov 8, 2024

> I’m convinced there’s a contingent of devs who don’t like/grok abstraction.

I am one of those. I grok abstractions just fine (have commercially written idiomatically obtuse Scala and C#, some Haskell for fun, etc.), but I don't enjoy them.

I use them, of course (writing everything in raw asm is unproductive for most tasks), but rather than getting that warm fuzzy feeling most programmers seem to get when they finish writing a fancy clever abstraction and it works on the first try, I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away, that it is efficient and readable in the sense of being explicit and clear, rather than hiding all the complexity away in order to look pretty or maximize more abstract concerns (reusability, DRY, etc.).

This mindset is a very good fit for writing compute-heavy numerical code, GPU stuff and lots of systems level code, not so much for being a cog in a large team on enterprise web backends, so I mostly write numerical code for physics simulations. You can write many other things this way and get very fast and bloatfree websites or anything else, but it doesn't work well in large teams or people using "industry best practices". It also makes me prefer C to Rust.

fuzztester · on Nov 8, 2024

>I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away,

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

- Antoine de Saint-Exupery

https://www.brainyquote.com/quotes/antoine_de_saintexupery_1...

guappa · on Nov 11, 2024

Given how go's binaries are 500x bigger than other binaries, I'd say it has still something to take away :p

keldaris · on Oct 18, 2024

If that's true, how are they so much more reasonable in most developed countries with far greater government involvement still? Is the US government just uniquely bad at healthcare somehow? Why?

tightbookkeeper · on Oct 19, 2024

By reasonable do you mean 30-50% of your income for your entire life? Regardless of whether you use the services?

rsynnott · on Oct 19, 2024

… Where on earth are you getting that? As a high earner in a European country, about 8-9% of my income goes on the health service (though that includes some non-healthcare stuff). And I’m an extreme outlier; multinational salary and equity, single, no kids. For a single childless person on the average wage it’s about 2.5%.

tightbookkeeper · on Oct 19, 2024

What’s your effective tax rate?

Galaxeblaffer · on Oct 19, 2024

No country in the world has you paying 30-50% og your income to health care, it's more like 15-18%

Angostura · on Oct 19, 2024

Where on earth are you getting that figure from?

tightbookkeeper · on Oct 19, 2024

The uk tax receipts in 2024 was 342.2 billion.

The nhs budget was 181 billion. Half of all government money appears to be going to healthcare.

Angostura · on Oct 31, 2024

You appear to be working on the assumption that 100% of the population's income goes to the UK exchequer

immibis · on Oct 19, 2024

What percentage of people's money is government money?

WalterBright · on Oct 19, 2024

It is true. Look at graphs of it.

> Why?

I don't know how other countries manage their health care systems, though I know that the British one is facing bankruptcy, and while health care was free in the Soviet Union patients had to pay for anesthetic for root canals, and bribery was the norm.

Here's a link to what's wrong with the American system:

https://www.theatlantic.com/magazine/archive/2009/09/how-ame...

tonyedgecombe · on Oct 19, 2024

>British one is facing bankruptcy

No it isn't.

keldaris · on Sept 28, 2024

This looks like a nice case study for when you're already using Rust for other reasons and just want to make a bit of numerical code go fast. However, as someone mostly writing C++ and Julia, this does not look promising at all - it's clear that the Julia implementation is both more elegant and faster, and it seems much easier to reproduce that result in C++ (which has no issues with compile time float constants, SIMD, GPU support, etc.) than Rust.

I've written very little Rust myself, but when I've tried, I've always come away with a similar impression that it's just not a good fit for performant numerical computing, with seemingly basic things (like proper SIMD support, const generics without weird restrictions, etc.) considered afterthoughts. For those more up to speed on Rust development, is this impression accurate, or have I missed something and should reconsider my view?

adgjlsfhk1 · on Sept 28, 2024

This sort of thing is where Julia really shines. https://github.com/miguelraz/StagedFilters.jl/blob/master/sr... is the julia code and it's only 65 lines and uses some fairly clean generated code to get optimal performance for all floating point types.

gauge_field · on Sept 28, 2024

In terms of speed, Rust is up there with C/C++. See e.g. https://benchmarksgame-team.pages.debian.net/benchmarksgame/... I also ported several algo from C and it was matching the performance.

Regarding SIMD support, the only thing that is missing, is stable support for avx512, and some more exotic feature extensions for deep learning e.g. avx_vnni. Those are implemented and waiting to be included in the next stable versions.

Gpu support: this is still an issue b/c of not enough people working on it, but there projects trying to improve this: see https://github.com/tracel-ai/cubecl .

Const generics: Yeah, there are a few annoying issues: it is limited to small set of types. For instance, you cant use const enum as a generic. Also, you cant use generic parameters in const operations on stable rust: see unstable feature generic_const_exprs.

My main reason for using rust in numerical computing:

- type system. Some find it weird. I find it explicit and easier to understand.

- cargo (nicer cross platform defaults, since I tend to develop both from windows and linux)

- unconditional code generation, with [target_feature(enable = "feature_list")]. This makes it so that I dont have to set different set of flags for each compilation unit when building. It is enough to put that on top of function making use of SIMD.

I agree that if you want to be fast/exploratory in developing algo and you can sacrifice a little bit of performance, Julia is a better choice.

bobmcnamara · on Sept 28, 2024

TBH, Intel ISAs have never been very stable, the mixed AVX flavors are just the latest examples.

gauge_field · on Sept 29, 2024

Yeah, it so fast that AMD is not even able to catchup and that many of the extensions is available only on Intel CPUs.

As far as I could tell, it is only unstable in the sense of being fast and having many features. I dont see any breaking for my code using cpuid to detect avx512 features.

bobmcnamara · on Oct 2, 2024

No, I mean things like when they removed BCD support, or when they removed simultaneous 16-bit ISA when they added 64-bit. Or the bit string instructions. Or moving the cmpxchg encoding across chip revisions.

keldaris · on April 11, 2024

While this is completely true, it is also true that OpenCL 1.2 is the one compute API that just works on every major platform and the drivers don't seem that unusably bad (though I'm not claiming experience of every platform here, just Nvidia on Windows/Linux and Apple Silicon on MacOS). Writing a limited dialect of C and sticking to 1.2 limitations is far from ideal, but it does at least work reliably. Sadly, that is more than can be said about most competitors.

VHRanger · on April 11, 2024

Yes, 1.2 is OK.

The problem is that the drivers are merely OK. Presumably if you're using OpenCL you care about the performance (otherwise why would you??) and since that's the case, it's the best on no platforms, and there are alternatives for any set of platforms that do better.

I think OpenCL is sadly on its way out, and it's mostly Apple's fault (and Nvidia a little). Vulkan compute is much more interesting if you're looking to leverage iGPUs/mobile/other random CPUs.

If you're targetting workstations/server workloads only, it makes sense to restrict yourself to a subset of accelerator types and code for that (eg. Torch or JAX for GPUs, use highway for SIMD, etc.)

keldaris · on April 11, 2024

It depends on what you're doing. For writing FP32 number crunching code from scratch (meaning you don't care about something like Torch, or even cuBLAS/cuDNN), I haven't encountered cases where I couldn't match CUDA performance and if I did, I could always just use a bit of PTX assembly where absolutely necessary (which OpenCL lets you do, whereas Vulkan does not). This also gets me good performance on MacOS without rewriting the whole thing in Metal. There is no native FP16 support and there are other limitations that may matter to your usecase or be completely irrelevant.

I'm definitely not saying OpenCL is any sort of a reasonable default for cross platform GPGPU work. In truth, I don't think there is any reasonable "general" default for that sort of thing. Vulkan has its own issues (only works via a compatibility layer on MacOS, implementation quality varies widely, extension hell, boilerplate hell, some low level things are just impossible, etc.) and everything else is a higher level approach that can't work for everything by definition.

It's a pretty sad situation overall and every solution has severe tradeoffs. Personally, I just write CUDA when I can get away with it and try to stick to OpenCL otherwise, but everyone needs to make that choice for their own set of tradeoffs.

VHRanger · on April 12, 2024

Yeah, TBH I'm kind of sad about where OpenCL ended up, because it "should have" been what CUDA was used for from 2011-2021. AlexNet, TF, Pytorch, etc. "should have" been written with OpenCL backends.

But the driver implementations inconsistency, version support issues, etc. meant people used CUDA instead.

I agree Vulkan has its own issues, and having written some MoltenVK stuff, you clearly know the quality-of-life pains in developping with it. That said, at least from the user side it works and performs well.

pjmlp · on April 12, 2024

It is Intel, AMD and Google's fault for never supporting OpenCL as they should, and Khronos for pissing off Apple with how they took ownership of OpenCL.