Dynamic linking with a safe ABI, where if you change and recompile one library then the outcome has to obey some definition of safety, and ABI stability is about as good as C or Objective-C or Swift.
Until that happens, it'll be hard to adopt Rust in a lot of C/C++ strongholds where C's ABI and dynamic linking are the thing that enables the software to get huge.
> Until that happens, it'll be hard to adopt Rust in a lot of C/C++ strongholds where C's ABI and dynamic linking are the thing that enables the software to get huge.
Wait, Rust can already communicate using the C ABI. In fact, it offers exactly the same capabilities as C++ in this regard (dynamic linking).
As unsafe as C or C++. In fact, safer, because only the ABI surface is unsafe, the rust code behind it can be as safe or unsafe as you want it to be.
I was addressing this portion of your comment: "C's ABI and dynamic linking are the thing that enables the software to get huge". If the C ABI is what enables software to get huge then Rust is already there.
There is a second claim in your comment about a "safe ABI", but that is something that neither C or C++ offers right now.
Here's the problem. If you told me that you rebuilt the Linux userland with Rust but you used C ABI at all of the boundaries, then I would be pretty convinced that you did not create a meaningful improvement to security because of how many dynamic linking boundaries there are. So many of the libraries involved are small, and big or small they expose ABIs that involve pointers to buffers and manual memory management.
> There is a second claim in your comment about a "safe ABI", but that is something that neither C or C++ offers right now.
Of course C and C++ are no safer in this regard. (Well, with Fil-C they are safer, but like whatever.)
But that misses the point, which is that:
- It would be a big deal if Rust did have a safe dynamic linking ABI. Someone should do it. That's the main point I'm making. I don't think deflecting by saying "but C is no safer" is super interesting.
- So long as this problem isn't fixed, the upside of using Rust to replace a lot of the load bearing stuff in an OS is much lower than it should be to justify the effort. This point is debatable for sure, but your arguments don't address it.
> - It would be a big deal if Rust did have a safe dynamic linking ABI. Someone should do it. That's the main point I'm making. I don't think deflecting by saying "but C is no safer" is super interesting.
I think we all agree that it would be a huge deal.
> - So long as this problem isn't fixed, the upside of using Rust to replace a lot of the load bearing stuff in an OS is much lower than it should be to justify the effort. This point is debatable for sure, but your arguments don't address it.
As you point out, this is the debatable part, and I'm not sure I get your justification here.
This might end up being the forcing function (quoting myself from another reply in this discussion):
> It can't be that replacing 20 C/C++ shared objects with 20 Rust shared objects results in 20 copies of the Rust standard library and other dependencies that those Rust libraries pull in. But, today, that is what happens. For some situations, this is too much of a memory usage regression to be tolerable.
If memory was cheap, then maybe you could say, "who cares".
Can you even make the standard library dynamically linked in the C way??
In C, a function definition usually corresponds 1-to-1 to a function in object code. In Rust, plenty of things in the stdlib are generic functions that effectively get a separate implementation for each type you use them with.
If there's a library that defines Foo but doesn't use VecFoo>, and there are 3 other libraries in your program that do use that type, where should the Vec functions specialized for Foo reside? How do languages like Swift (which is notoriously dynamically-linked) solve this?
You can have an intermediate dynamic object that just exports Vec<Foo> specialized functions, and the three consumers that need it just link to that object. If the common need for Vec<Foo> is foreseeable by the dynamic object that provides Foo, it can export the Vec<Foo> functions itself.
Your apt update would still be huge though. When the dependency changes (eg. a security update) you’d be downloading rebuilds of 20 apps. For the update of a key library, you’d be downloading your entire distribution again. Every time.
Oh, well yeah, statically linked binaries have that downside. I guess I don't think that's a big deal, but I could maybe imagine on some devices that are heavily constrained that it could be? IDK. Compression is insanely effective.
You are forgetting about elephant in the room - if every bug require rebuild of downstream then it is not only question of constraint it is also question of SSD cycles - you are effectively destroying someone drive faster. And btrfs actually worsens this problem - because instead of one Copy on Write of library you now have 2n copies of library within 2 copies of different apps. Now (reverting/ø) update will cost you even more writes. It is just waste for no apparent reason - less memory, less disk space.
"compression is insanely effective" - And what about energy? compression will increase CPU use. It will also make everything slower - slower than just plain deduplication. Also, your reason for using worse for user tech is: the user can mitigate in other ways? This strikes me as the same logic as "we don't need to optimize our program/game, users will just buy better hardware" or just plain throwing cost to user - this is not valid solution just downplaying of the argument.
If Rust and static linking were to become much more popular, Linux distros could adopt some rsync/zsync like binary diff protocol for updates instead of pulling entire packages from scratch.
Static linking used to be popular, as it was the only way of linking in most computer systems, outside expensive hardware like Xerox workstations, Lisp machines, ETHZ, or what have you.
One of the very first consumer hardware to support dynamic linking was the Amiga, with its Libraries and DataTypes.
We moved away from having a full blown OS done with static linking, with exception of embedded deployments and firmware, for many reasons.
Even then, they would still need to rebuild massive amounts on updates. That is nice in theory, but see the number of bugs reported in Debian because upstream projects fail to rebuild as expected. "I don't have the exact micro version of this dependency I'm expecting" is one common reason, but there are many others. It's a pretty regular thing, and therefore would be burdensome to distro maintainers."
NixOS "suffers" from this. It's really not that bad if you have solid bandwidth. For me it's more than worth the trade off. With a solid connection a major upgrade is still just a couple minutes.
I think you misunderstand my point. Nix basically forces dynamic linking to be more like static linking. So changing a low level library causes ~everything to redownload.
What you are asking for is to make a library definition replacement to .h-files that contain sufficient information to make rust safe. That is a big, big step and would be fantastic not only for rust but for any other language trying to break out of the C tar pit.
So you're calling for dynamic linking for rust native code? Because rust's safety doesn't come from runtime, it comes from the compiler and the generated code. An object file generated from a bit of rust source isn't some "safe" object file, it's just generated in a safe set of patterns. That safety can cross the C ABI perfectly fine if both things on either side came from rust to begin with. Which means rust dynamic linking.
I don’t think GP is moving the goalposts at all, rather I think a lot of people are willfully misrepresenting GP’s point.
Rust-to-rust code should be able to be dynamically linked with an ABI that has better safety guarantees than the C ABI. That’s the point. You can’t even express an Option<T> via the C ABI, let alone the myriad of other things rust has that are put together to make it a safe language.
It would be very hard to accomplish. Apple was extremely motivated to make Swift have a resilient/stable ABI, because they wanted to author system frameworks in swift and have third parties use them in swift code (including globally updating said frameworks without any apps needing to recompile.) They wanted these frameworks to feel like idiomatic swift code too, not just be a bunch of pointers and manual allocation. There’s a good argument that (1) Rust doesn’t consider this an important enough feature and (2) they don’t have enough resources to accomplish it even if they did. But if you could wave a magic wand and make it “done”, it would be huge for rust adoption.
Since Rust cares very much about zero-overhead abstractions and performance, I would guess if something like this were to be implemented, it would have to be via some optional (crate/module/function?) attributes, and the default would remain the existing monomorphization style of code generation.
Swift’s approach still monomorphizes within a binary, and only has runtime costs when calling code across a dylib boundary. I think rust could do something like this as well.
You could maybe say that a pointer can be transmuted to an Option<&T> because there’s an Option-specific optimization that an Option<&T> uses null as the None value, but that’s not always guaranteed. And it doesn’t apply to non-references, for instance Option<bool>’s None value would be indistinguishable from false. You could get lucky if you launder your Option<T> through repr(C) and the compiler versions match and don’t mangle the internal representation, but there’s no guarantees here, since the ABI isn’t stable. (You even get a warning if you try to put a struct in your function signatures that doesn’t have a stable repr(C).)
You're right that there isn't a single standard convention for representing e.g. Option<bool>, but that's just as true of C. You'd just define a repr(C) compatible object that can be converted to or from Option<Foo>, and pass that through the ABI interface, while the conversion step would happen internally and transparently on both sides. That kind of marshaling is ubiquitous when using FFI.
Right, that's the whole point of this thread. The only stable ABI rust has is one where you can only use C's features at the boundaries. It would be really nice if that wasn't the case (ie. if you could express "real" rust types at a stable ABI boundary.)
As OP said, "I don't think deflecting by saying "but C is no safer" is super interesting". People seem intent on steering that conversation that way anyway, I guess.
No fundamental reason, that I know of, why Rust or any other safe language can't also have some kind of story here.
> I think you're moving the goalposts significantly here.
No. I'm describing a problem worth solving.
Also, I think a major chasm for Rust to cross is how defensive the community gets. It's important to talk about problems so that the problems can be solved. That's how stuff gets better.
Swift and fil-c are only pseudo safe. Once you deal with the actual world and need to pass around data from memory things are always unsafe since there is no safe way of sharing memory. At least not in our current operating systems. Swift and fil-c can at least guard to some extent the api.
A safe ABI would be cool, for sure, but in the market (specifically addressing your prediction) I don't know if it's really that big a priority for adoption. The market is obviously fine with an unsafe ABI, seeing how C/C++ is already dominant. Rust with an unsafe ABI might then not be as big an improvement as we would like, but it's still an improvement, and I feel like you're underestimating the benefits of safe Rust code as an application-level frontline of security, even linked to unsafe C code.
> An ABI can't control whether one or both parties either end of the interface are honest.
You are aware that Rust already fails that without dynamic linking? The wrapper around the C getenv functionality was
originally considered safe, despite every bit of documentation on getenv calling out thread safety issues.
Yes? That's called a bug? The standard library incorrectly labelled something as safe, and then changed it. The root was an unsafe FFI call which was incorrectly marked as safe.
It's no different than a bug in an unsafe pure Rust function.
I'm choosing to ignore that libc is typically dynamically linked, but linking in foreign code and marking it safe is a choice to trust the code. Under dynamic linking anything could get linked in, unlike static linking. At least a static link only includes the code you (theoretically) audited and decided is safe.
A "safe" ABI is just a C ABI plus a "safe" Rust crate (the moral equivalent to a C/C++ header file) that wraps it to provide safety guarantees. All bare-metal "safe" FFI's are ultimately implemented on top of completely "unsafe" assembly, and Rust is not really any different.
C++ ABI stability is the main reason improvements to the language get rejected.
You cannot change anything that would affect the class layout of something in the STL. For templated functions where the implementation is in the header, ODR means you can't add optimizations later on.
Maybe this was OK in the 90s when companies deleted the source code and laid off the programmers once the software was done, but it's not a feature Rust should ever support or guarantee.
The "stable ABI" is C functions and nothing else for a very good reason.
I think if Rust wants to evolve even more aggressively than C++ evolves, then that is a chasm that needs to be crossed.
In lots of domains, having a language that doesn't change very much, or that only changes very carefully with backcompat being taken super seriously, is more important than the memory safety guarantees Rust offers.
As a C++ developer, I regularly deal with people that think creating a compiled object file and throwing away the source code is acceptable, or decide to hide source code for "security" while distributing object files. This makes my life hell.
Rust preventing this makes my life so much better.
Rust does not prevent you from creating a library that exports a C/C++ interface. It's indistinguishable from a C or C++ library, except that it's written in Rust. cbindgen will even generate proper C header files out of the box, that Rust can then consume via bindgen.
> As a C++ developer, I regularly deal with people that think creating a compiled object file and throwing away the source code is acceptable, or decide to hide source code for "security" while distributing object files. This makes my life hell.
I mean yeah that's bad.
> Rust preventing this makes my life so much better.
I'm talking about a different issue, which is: how do you create software that's in the billions of lines of code in scale. That's the scale of desktop OSes. Probably also the scale of some other things too.
At that scale, you can't just give everyone the source and tell them to do a world compile. Stable ABIs fix that. Also, you can't coordinate between all of the people involved other than via stable ABIs. So stable ABIs save both individual build time and reduce cognitive load.
This is true even and especially if everyone has access to everyone else's source code
> At that scale, you can't just give everyone the source and tell them to do a world compile. Stable ABIs fix that. Also, you can't coordinate between all of the people involved other than via stable ABIs. So stable ABIs save both individual build time and reduce cognitive load.
Rust supports ABI compatibility if everyone is on the same compiler version.
That means you can have a distributed caching architecture for your billion line monorepo where everyone can compile world at all times because they share artifacts. Google pioneered this for C++ and doesn't need to care about ABI as a result.
What Rust does not support is a team deciding they don't want to upgrade their toolchains and still interoperate with those that do. Or random copy and pasting of `.so` files you don't know the provenance of. Everyone must be in sync.
In my opinion, this is a reasonable constraint. It allows Rust to swap out HashMap implementations. In contrast, C++ map types are terrible for performance because they cannot be updated for stability reasons.
My understanding: Even if everyone uses the same toolchain, but someone changes the code for a module and recompiles, then you're in UB land unless everyone who depends on that recompiles
If your key is a hash of the code and its dependencies, for a given toolchain and target, then any change to the code, its dependencies, the toolchain or target will result in a new key unique to that configuration. Though I am not familiar with these distributed caching systems so I could be overlooking something.
> Except as you well know, C might not change as fast, but it does change, including the OS ABI.
I don't know that.
Here's what I know: the most successful OSes have stable OS ABIs. And their market share is positively correlated with the stability of their ABIs.
Most widely used: Windows, which has a famously stable OS ABI. (If you wanted to be contrarian you could say that it doesn't because the kernel ABI is not stable, but that misses the point - on Windows you program against userland ABIs provided by DLLs, which are remarkably stable.)
Second place: macOS, which maintains ABI stability with some sunsetting of old CPU targets. But release to release the ABI provides solid stability at the framework level, and used to also provide stability at the kernel ABI level (not sure if that's still true - but see above, the important thing is userland framework ABI stability at the end of the day).
Third place: Linux, which maintains excellent kernel ABI stability. Linux has the stablest kernel ABI right now AFAIK. And in userland, glibc has been investing heavily in ABI stability; it's stable enough now that in practice you could ship a binary that dynlinks to glibc and expect it to work on many different Linuxes today and in the future.
So it would seem that OS ABIs are stable in those OSes that are successful.
Speaking of Windows alone, there are the various calling conventions (pascal, stdcall, cdecl), 16, 32, 64 bits, x86, ARM, ARM64EC, DLLs, COM in-proc and ext-proc, WinRT within Win32 and UWP.
Leaving aside the platforms it no longer supports.
So there are some changes to account for depending on the deployment scenario.
- the same entity has access to the source of both the library and the main app
- library and main app share the same build tooling
And even if that’s the case, you have the problem of end users accidentally using different versions of the main app and the library and getting unexpected UB.
What's the stat of single-compiler version ABI? I mean - if the compiler guaranteed that for the same version of the compiler the ABI can work, we could potentially use dynamic linking for a lot of things (speed up iterative development) without committing to any long term stable API or going through C ABI for everything.
> In their zeal to convert, they are happily replacing pro-user software with pro-business software.
This is one of the two main reasons I'm not using Rust. Second reason is being addressed by gccrs team, so I have no big gripes there, since they are progressing well.
By this same metric, do you refuse to use C because the vast majority of OSS C codebases are permissively licensed? Surely you see that this makes no sense, yes? Neither Rust-the-language nor Rust-the-ecosystem are any more hostile to GPL than any other language and ecosystem.
> By this same metric, do you refuse to use C because the vast majority of OSS C codebases are permissively licensed?
It's not comparable - the Rewrite-it-in-Rust community is aiming to replace the existing pro-user products, with new pro-business products.
The last significant online C community was the one that gave us the pro-user products in the first place.
> Surely you see that this makes no sense, yes? Neither Rust-the-language nor Rust-the-ecosystem are any more hostile to GPL than any other language and ecosystem.
I don't care whether or not they are hostile, that is not relevant. What is relevant to the complaints you are reading is that their primary goal is the spread of Rust, not the interests of the users.
It is totally reasonable to be against a community who are working very hard to replace pro-user software with pro-business software.
> The last significant online C community was the one that gave us the pro-user products in the first place.
You mean the OSI, headed by famous C hacker Eric S. Raymond, the permissive-license rebellion against the GPL? Pretending that the MIT/BSD licenses aren't a legacy of the C ecosystem is revisionist history.
> It's not comparable - the Rewrite-it-in-Rust community is aiming to replace the existing pro-user products, with new pro-business products.
It's clear that you have no idea what you're talking about. There is no "rewrite-it-in-Rust community", there are just people using Rust and writing what they want. That copyleft licenses have lost mindshare to permissive licenses in the decades since the rise of the OSI is a broader movement in OSS that long predates Rust, and has nothing to do with Rust itself.
> You mean the OSI, headed by famous C hacker Eric S. Raymond, the permissive-license rebellion against the GPL? Pretending that the MIT/BSD licenses aren't a legacy of the C ecosystem is revisionist history.
Sure, C played a great part there too, but you are ignoring the present.
What we are seeing now is a concerted effort to replace pro-user products with pro-business products.
Even if you re right that the start of Copyleft, with gcc, is revisionist history, that has no relevance to what is happening now, which is a large effort by a specific community to replace pro-user products with pro-business products.
Well, that's funny. Considering all the comments I have written for this submission.
First of all, most of the arguments I'd make is already addressed by lelanthran. Do I need to write the same things over and over? It's bad etiquette to write the same things said by someone else. This is why we have the voting mechanism here.
So, since you insist, let me reiterate the same thing.
No I don't refuse to use C, because most of the GPL software which is enabling everything we do today is written in C or a C-descendant language. However, as I write everywhere, I refuse to use Rust because of two reasons:
1- LLVM only for now (I don't use any language which doesn't have a compiler in GCC)
2- Rust's apparent rewrite in rust, in MIT, replace the thing and beat it with a club if it refuses to die attitude.
For reference, uutils and sister projects use "drop-in-replacement" and "completely replace" leisurely, signaling their clear intentions to forcefully replace GPL code with more permissive, business-friendly bits.
I tend to reluctantly accept Rust in the Kernel since gccrs is in the works and progressing steadily, and Rust guys are somewhat forced to write a proper reference for their language and back it with proper PLT, since it's a hard requirement if you want your programming language to be a long-living, dependable one.
Similarly, you use words like courage and non-sequitur leisurely. I'm not sure it's fitting in this instance.
There is absolutely nothing "pro-business" about permissive licenses. People choose permissive licenses for all kinds of reasons. For example, I personally use them because I believe they are more free and thus more in line with my values. You shouldn't project unsubstantiated statements onto people's motives like this.
With permissive licenses you often run into the following situation:
You buy something physical from a company, say a humanoid unitree robot, a robot actuator or Arm SBC. These pieces of hardware come with their own proprietary SDK that they sell for a significant fee or a proprietary GPU driver without any hope of updates. The SDK heavily uses MIT licensed code and there is no possibility of modifying or inspecting the code for debugging.
From the perspective of the user, the system might as well be 100% proprietary and his freedoms are maximally restricted. You could say that this is fine since it doesn't detract from the original open source project, but you have to remember that these companies would ordinarily have to pay significant development fees to build the same level of functionality and they have no obligation to help or support your project financially. You as the open source developer will then have to beg them to hire you, so you can do paid work that is unrelated to the original project to finally work on your project in your spare time, purely because it is possible to charge for hardware but not the software that the hardware depends on.
What I'm trying to get at here is that this means full vertical integration is the only way. The problem is that most hardware companies are hardware companies first and they don't care about software. They concentrate on making hardware, because each sale brings in money. They don't spend money on software, because it appears to be optional. You can just tell the customer or an open source community to bring their own software. The money that is needed to pay for open source projects flows through the very companies that refuse to spend money on software.
If you want to write open source software, you must be a hardware company so you are customer facing and have access to customer money that can be diverted to the development of the software.
> You shouldn't project unsubstantiated statements onto people's motives like this.
I am not criticising their motives, I am criticising the result!
Also, definitions are hard. It's why we have pro-choice/pro-life and not anti-choice/anti-life - using the positive spin is a good faith characterisation of a position.
In much the same way, I am using pro-user/pro-business; if my intention was to vilify one of those positions I would have used pro-user/anti-user or pro-business/anti-business to label those positions.
No reasonable interpretation of pro-user/pro-business can make the audience think that I am unfairly characterising either of two positions.
I say this to address the use of the word "unsubstantiated" in your assertion about my characterisations.
That would be great, but Rust relies on compile-time monomorphization for efficiency (very much like C++, if you consider templates polymorphic functions/classes).
This means that any Rust ABI would have to cater for link-time specialization. I think this should be doable, but it would require a solution that's better than just to move the code generation into the linker. Instead, one would need to carefully consider the usage of the "shape" of all parameters of a function.
I wonder if we look at it from a too narrow perspective. We use the C ABI because it's the only game in town. We should be aiming for a safe cross language ABI. I'd love to make Rust, C, PHP, Swift, Java and Python easily talk to each other inside 1 process.
It should extend the C ABI with things like strings, arrays, objects with a way to destruct them, and provide some safety guarantees.
As an example, the windows world has COM, which is at the core pretty reasonable for its design constraints, even if gnarly sometimes.
> It should extend the C ABI with things like strings, arrays, objects with a way to destruct them, and provide some safety guarantees.
> As an example, the windows world has COM, which is at the core pretty reasonable for its design constraints, even if gnarly sometimes.
Yeah, and we had CORBA. Gnome was originally not a DE - the acronym stood for Gnu Network Object Model Environment or similar.
I programmed in CORBA in the 90s. Other than being slower than a snail on weed, I liked it just fine. Maybe it's time for a resurgence of something similar, but without requiring that calls work across networks.
You'll find that all of these languages ultimately build FFI on top of C ABI conventions, though Swift's own internally stable ABI uses a lot of alloca() to place dynamically sized objects on the stack, in a way that's somewhat unidiomatic (the Rust folks are trying to back out of their alloca() equivalent). You can even interface to COM from pure C.
Dynamic linking is also great for compile time of debug builds. If a large library or application is split up into smaller shared libraries, ones unaffected by changes don't need to be touched at all. Runtime dynamic linking has a small overhead, but it's several orders of magnitude faster than compile-time linking, so not a problem in debug builds.
for developer turnaround time, it is huge. we explicitly do not statically link Ardour because as developers we are in the edit-compile-debug cycle all day every day, and speeding up the link step (which dynamic linking does dramatically, especially with parallel linkers like lld) is a gigantic improvement to our quality of life and productivity.
1) It can't be that replacing 20 C/C++ shared objects with 20 Rust shared objects results in 20 copies of the Rust standard library and other dependencies that those Rust libraries pull in. But, today, that is what happens. For some situations, this is too much of a memory usage regression to be tolerable.
2) If you really have 20 libraries calling into one another using C ABI, then you end up with manual memory management and manual buffer offset management everywhere even if you rewrite the innards in Rust. So long as Rust doesn't have a safe ABI, the upside of a Rust rewrite might be too low in terms of safety/security gained to be worth doing
Many Rust core/standard library functions are trivial and inlining them is not really a concern. For those that do involve significant amount of code, C ABI-compatible code could be exported from some .so dynamic object, with only a small safe wrapper being statically linked.
I found c ABI a bit too difficult in rust compared to c or zig. Mainly because of destructors. I am guessing c++ would be difficult in a similar way.
Also unsafe rust has always on strict-aliasing, which makes writing code difficult unless you do it in certain ways.
Having glue libraries like pyo3 makes it good in rust. But that introduces bloat and other issues. This has been the biggest issue I had with rust, it is too hard to write something so you use a dependency. And before you know it, you are bloating out of control
Not really. The foreign ABI requires a foreign API, which adds friction that you don't have with C exporting a C API / ABI. I've never tried, but I would guess that it adds a lot of friction.
COM is interesting as it implements interfaces using the C++ vtable layout, which can be done in C. Dynamic COM (DCOM) is used to provide interoperability with Visual Basic.
You can also access .NET/C# objects/interfaces via COM. It has an interface to allow you to get the type metadata but that isn't necessary. This makes it possible to e.g. get the C#/.NET exception stack trace from a C/C++ application.
>Dynamic COM (DCOM) is used to provide interoperability with Visual Basic.
DCOM is Distributed COM not Dynamic COM[1].
COM does have an interface for dynamic dispatch called IDispatch[2] which is used for scripting languages like VBScript or JScript. It isn't required for Visual Basic though. VB is compiled and supports early binding to interfaces.
Eh, some people can work on moving to Rust, while others work on adding dynamic linking to Rust.
Or maybe we can some how get used to living with static linking. (I don't think so, but many seem to think so in spite of my advice to the contrary!)
Another possibility is to use IPC as the dynamic linking boundary of sorts, but this will consume lots more memory, and as is stated elsewhere in this thread, memory ain't cheap no more.
Porting the JS parser to Rust and adopting Rust in other parts of the engine while continuing to use C++ heavily is unlikely to make Ladybird meaningfully more secure.
Attackers are surprisingly resilient to partial security.
Even the specific example in the article, the non-determinism was treated as a bug and was fixed (since by 2004 that was definitely a regression - we put a lot of work in, in the mid to late 1990s, to get bit level reproducibility - and even before that, those little details like timestamps were still deterministic variations, we had binary diff tools that could filter them out.)
Agree with author - it's especially scary that even without getting hacked, openclaw did something harmful
That's not to say that prompt injection isn't also scary. It's just that software getting hacked by bad actors has always been a thing. Software doing something scary when no human did anything malicious is worse.
- the clanker comes back with a shell command that fits your yo command and fills it in as if you had retrieved it from your shell history by pressing the up arrow
- you have to press enter to actually execute the command. Or you could edit the command just like you can edit commands retrieved from your shell history.
I personally find this approval flow to spark more joy than what the other agent TUIs and CLIs do - they usually pop a modal menu dialog with yes/no/something else. And that’s jarring, because modality is a jarring UX. What yosh does feels groovy because it so so much like just retrieving something from history, or like a speedrun of opening a browser, asking Google or a clanker, and copy pasting.
It’s weird how whiny this post is. Like there’s zero intellectual curiosity about why C got this way, and why C gets to be the foundation for how systems software is written.
I could write a whole essay about why, but now isn’t the time. I’m just going to enjoy the fact that TFA and the author don’t get it.
> why C gets to be the foundation for how systems software is written.
Is there an answer here more interesting than "it's what Unix and Windows were written in, so that's how programs talked to the OS, and once you have an interface, it's impossible to change"?
It wasn't a coincidence, or an accident. C was specifically designed to write Unix, by people who had experience with a lot of other computer languages, and had programmed other operating systems including Multics and some earlier versions of Unix. They knew exactly what they were doing, and exactly what they wanted.
I'm not sure what you mean by "coincidence" or "accident" here.
C is a pretty OK language for writing an OS in the 70s. UNIX got popular for reasons I think largely orthogonal to being written in C. UNIX was one of the first operating systems that was widely licensed to universities. Students were obliged to learn C to work with it.
If the Macintosh OS had come out first and taken over the world, we'd probably all be programming in Object Pascal.
When everyone wanted to program for the web, we all learned JavaScript regardless of its merits or lack thereof.
I don't think there's much very interesting about C beyond the fact that it rode a platform's coattails to popularity. If there is something interesting about it that I'm missing, I'd definitely like to know.
> Operating systems have to deal with some very unusual objects and events: interrupts; memory maps; apparent locations in memory that really represent devices, hardware traps and faults; and I/O controllers. It is unlikely that even a low-level model can adequately support all of these notions or new ones that come along in the future. So a key idea in C is that the language model be flexible; with escape hatches to allow the programmer to do the right thing, even if the language designer didn't think of it first.
This. This is the difference between C and Pascal. This is why C won and Pascal lost - because Pascal prohibited everything but what Wirth thought should be allowed, and Wirth had far too limited a vision of what people might need to do. Ritchie, in contrast, knew he wasn't smart enough to play that game, so he didn't try. As a result, in practice C was considerably more usable than Pascal. The closer you were to the metal, the greater C's advantage. And in those days, you were often pretty close to the metal...
Later, on page 60:
> Much of the C model relies on the programmer always being right, so the task of the language is to make it easy what is necessary... The converse model, which is the basis of Pascal and Ada, is that the programmer is often wrong, so the language should make it hard to say anything incorrect... Finally, the large amount of freedom provided in the language means that you can make truly spectacular errors, far exceeding the relatively trivial difficulties you encounter misusing, say, BASIC.
Also true. And it is true that the "Pascal model" of the programmer has quite a bit of truth to it. But programmers collectively chose freedom over restrictions, even restrictions that were intended to be for their own good.
The irony is that all wannabe C and C++ replacements are exactly the "Pascal model" brought back into the 21st century, go figure.
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
-- C.A.R Hoare's "The 1980 ACM Turing Award Lecture"
The thing for me at least is that when I looked at Pascal, MODULA-2, Ada, if you had complex data structures which had to allocate and deallocate memory, then those language would not help at all. They would allow you to make pointer mistakes. Pascal and MODULA-2 were also very restrictive in various area (no generics). Ada is better in that respect, but Ada compilers were rare.
In my opinion it is only Rust that offers a language without runtime system requirement and fixes essentially all of the problems of C.
First of all C did not had any generics, so same playing field.
C has a runtime, even if tiny. That is what calls into main(), handles floating point arithmetic when none is available, functions that run before and after main(), nowadays also does threading.
Heap memory handling in Pascal, Modula-2, Ada, is much safer than C, first of all no need to do math to calculate the right size, arenas are available on the standard library, dinamic allocation can also be managed by the compiler if desired (Ada), pointers are safe as they by default must be used with existing data, however if one really wants to do pointer arithmetic it is available.
The only issue that they have in regards to C, is the use-after-free, but that apparently isn't an issue for folks moving away from C into Zig, wich is basically Modula-2 with some C syntax flavour.
C uses pointer casts all over the place to fake generics. If you don't have that (in Pascal or MODULA-2) then life becomes very unpleasant.
There is a quite a bit of C code that makes creative use of the size of allocations. For example linked lists with a variable sized payload. Again one of the things that would prevent a C programmer from switching to Pascal.
I don't expect the Zig user base to become larger than the Rust user base any time soon. But we have to wait and see, Zig is quite young.
> C uses pointer casts all over the place to fake generics.
by "C" do you mean users of C? because most of the C code I write I don't use those sorts of techniques; instead I just use the preprocessor to make scuffed generics.[1] Unless you mean in libc itself, where I don't recall any use of pointer casts like that? If I'm missing something, please enlighten me.
Same tricks are possible in Modula-2, Pascal, Ada, if fake generics count.
Creative use of the size of allocations are also possible in those languages, the BIG difference is that they aren't the default way everything gets done.
In Pascal (not the original Pascal standard, but, say, Turbo Pascal), could you allocate a variable-sized array of something, and still have index protection when using it?
(I know quite well that C couldn't. Even a C++ vector may or may not, depending on which access method you use.)
It is often said that C became popular just because Unix was popular, due to being free -- it just "rode its coattails" as you put it.
As if you could separate Unix from C. Without C there wouldn't have been any Unix to become popular, there wouldn't have been any coattails to ride.
C gave Unix some advantages that other operating systems of the 1970s and 80s didn't have:
Unix was ported to many different computers spanning a large range of cost and size, from microcomputers to mainframes.
In Unix both the operating system and the applications were written in the same language.
The original Unix and C developers wrote persuasive books that taught the C language and demonstrated how to do systems programming and application programming in C on Unix.
Unix wasn't the first operating system to be written in a high-level language. The Burroughs OS was written in Algol, Multics was written in PL/I, and much of VMS was written in BLISS. None of those languages became popular.
IN the 1970s and 80s, Unix wasn't universal in universities. Other operating systems were also widely used: Tenex, TOPS-10, and TOPS-20 on DEC-10s and 20s, VMS on VAXes. But their systems languages and programming cultures did not catch on in the same way as C and Unix.
The original Macintosh OS of the 1980s was no competitor to Unix. It was a single user system without integrated network support. Apple replaced the original Macintosh OS with a system based on a Unix.
> Unix wasn't the first operating system to be written in a high-level language. The Burroughs OS was written in Algol, Multics was written in PL/I, and much of VMS was written in BLISS. None of those languages became popular.
Of course, they weren't available as free beer with source tapes.
> Apple replaced the original Macintosh OS with a system based on a Unix.
Only because they decided to buy NeXT instead of Be.
Had they bough Be, that would not been true at all.
> Of course, they weren't available as free beer with source tapes.
I think this was less important then, than people sometimes think.
I recall those days. In the 1980s and 90s I worked as a scientific programmer in a university department. Some of our software was commercialized and sold and supported as a product for a time in the 80s. Pardon the following long memoir, but I think some reporting on what actually happened then, as seen by even one participant, is pertinent.
We used a VAX with DEC's VMS operating system. Our application was developed in DEC Pascal (which didn't have the limitations of Standard Pascal because it used the DEC CLR, Common Language Runtime). Later on we began using Allegro Common Lisp for some things.
Through the 80s and early 90s, we never used Unix and C. And, we were not unusual, even in a university. Most of the VAXes at that university ran VMS
(or one of the DEC-10/20 OS in the early 80s), including the computer science department (which began running Unix on some but not all systems later in the 80s). So Unix was not as pervasive in the 80s as some people seem to think.
About "free beer": running Unix on a VAX in the 1980s was definitely not "free", it was a major investment in time, effort, and yes, money (in the form of salaries). First, the OS wasn't a separate line item. You bought a bundled system including both the VAX hardware and the VMS OS. Then the DEC guy came and turned it on and it just worked. I don't even know how buying a bare VAX and installing your own OS worked. How did you handle DEC field service?
They required their own utilities that ran on VMS. If you used Unix, you needed an expert in Unix to install it and maintain it.
And it was no different with the early commercial Unixes. You bought a Sun workstation and it came with their Unix bundled (Solaris or whatever).
In the 1990s we switched from VAX/VMS to HP workstations that bundled HP-UX, their Unix. In all of these Unix platforms, Unix was bundled and you did pay for it, it was just included in the price.
I think there is some confusion about the history. The free, frictionless, install-it-and-run-it-yourself OS was not Unix in the 80s, it was Linux in the 1990s. By then C and Unix-like operating systems were well established.
Also, there was genuine admiration for Unix technical features, notably its simplicity and uniformity, even at sites like ours that didn't use it. There were several projects to give VMS a Unix-like userspace. There was a (yes) free Software Tools project (that was its name), and a commercial product called Eunice. People who had already paid for VMS paid more for Enunice to make VMS
look like Unix.
Unix was a better platform for teaching CS than VMS or the other alternatives.
VMS did come with source code. It came on a huge stack of fiche cards, along with several pallet-loads of hardcopy documentation in binders.
There was nothing like the books The C Programming Language by K&R, or The Unix Programming Environment by Kernighan and Pike. Or the many Unix and C books that followed them. And then the college courses that used them.
Instead there were special courses in system programming and OS internals (separate courses) from DEC. The university would pay for them once in a while. A DEC expert would come for a week and programmers from all the VAX sites would get together all day every day in a classroom while they lectured. There was no textbook, but everyone got a huge binder of printed notes.
So systems programming on VMS, and I suppose other non-Unix platforms, remained an esoteric, inaccessible art, totally divorced from application programming, that used a programming language that was not used for anything else.
A few words comparing my experience programming in C in the 1990s to programming in DEC Pascal in the 80s: C wasn't much worse. The greater safety of Pascal did not make much difference in application programming. In Pascal, array-bounds errors etc. produced a crash with a traceback. In C similar errors produced a crash with a cryptic message like "segfault". But often the actual defect was far from the line that crashed, that appeared in the traceback, so the investigation and debugging was similar in both languages. But the more common (and often more difficult) errors that just computed the wrong answer were about the same in both languages.
My recollection of working in a similar environment was very different. The Comp Sci department wanted Unix but not for its own sake. They wanted access to the burgeoning software being produced for it aimed at academics. Tex/LaTeX was the biggest driver because it was the best way at the time to make a readable research paper that was heavy in math.
Then the students needed access to lex/yacc etc for their courses and X Windows too.
That we produced other Unix programs was just an artifact of the original drive to have Unix. The Compaq 386 or Macintosh II were niche products for that job and VMS had been turfed by the late eighties.
First to market is not necessarily the best, case in point: many video sites existed before Youtube, including ones based on Apple Quicktime. But in the end Flash won.
To me it looks like there is a better way to do things and the better one eventually wins.
> I'm not sure what you mean by "coincidence" or "accident" here.
I mean Unix had to be written in C, not in, say, Algol or PL/I or BLISS, high-level languages used to write other operating systems.
I also meant that the features of C were not put there by impulse or whim, they were the outcome of considered decisions guided by the specific needs of Unix.
> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.
They say right there that Fortran, PL/I, and Algol 68 were too big and complicated for Unix. Yes, if you are building a system, it is more productive to use a language that is built for purpose and pleasant to work with ("fun") than one you have to struggle against all the time.
They wanted to play and ignored other languages on purpose, that is all.
> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.
Pity that in regards to secure programing practices in C, community also ignores the decisions of the authors.
> Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.
Also to be noted that on Plan 9 they attempted to replace C with Alef for userspace, and while the experiment failed, they went with Limbo on Inferno, and also contributed to Go.
And that C compiler on Plan 9 is its own thing,
> The compiler implements ANSI C with some restrictions and extensions [ANSI90]. Most of the restrictions are due to personal preference, while most of the extensions were to help in the implementation of Plan 9. There are other departures from the standard, particularly in the libraries, that are beyond the scope of this paper.
Yes and no. Clearly what you said is true, but the more profound reason is that C just minimally reflects how computers work. The rest is just convention.
More concretely, I think the magic lies in these two properties:
1. Conservation of mass: the amount of C code you put in will be pretty close to the amount of machine code you get out. Aside from the preprocessor, which is very obviously expanding macros, there are almost no features of C that will take a small amount of code and expand it to a large amount of output. This makes some things annoyingly verbose to code in C (eg. string manipulation), but that annoyance is reflecting a true fact of machine code, which is that it cannot handle strings very easily.
2. Conservation of energy: the only work that will be performed is the code that you put into your program. There is no "supervisor" performing work on the side (garbage collection, stack checking, context switching), on your behalf. From a practical perspective, this means that the machine code produced by a C compiler is standalone, and can be called from any runtime without needing a special environment to be set up. This is what makes C such a good language for implementing garbage collection, stack checking, context switching, etc.
There are some exceptions to both of these principles. Auto-vectorizing compilers can produce large amounts of output from small amounts of input. Some C compilers do support stack checking (eg. `-fstack-check`). Some implementations of C will perform garbage collection (eg. Boehm, Fil-C). For dynamically linked executables, the PLT stubs will perform hash table lookups the first time you call a function. The point is that C makes it very possible to avoid all of these things, which has made it a great technology for programming close to the machine.
Some languages excel at one but not the other. Byte-code oriented languages generally do well at (1): for example, Java .class files are usually pretty lean, as the byte-code semantics are pretty close to the Java langauge. Go is also pretty good at (1). Languages like C++ or Rust are generally good at (2), but have much larger binaries on average than C thanks to generics, exceptions/panics, and other features. C is one of the few languages I've seen that does both (1) and (2) well.
This is a meme which is repeated often, but not really true. If you disagree, please state specifically what property of PDP-11 you think it different from how modern computers work, and where this affects C but not other languages.
In a nutshell, the useful fiction of computer-as-Von-Neumann-meaning doesn’t adequately reflect the reality of modern hardware. Not only does the CPU itself not fit that model (with things like speculative execution, sophisticated power and load management…), but the system as a whole is increasingly an amalgamation of different processors and address spaces.
C compilers can emit SIMD instructions just fine and often have extensions to support writing it explicitly. Also few other languages have explicit support for them from the start and most have added them as some kind of extension later. So the idea that this is some fundamental computer architecture thing C got wrong seem pretty far-fetched. Support for multi-core processing would be a more plausible thing to look at, but even there it seems that C still does quite well.
The things complained about in the article are not a minimal reflection of how computers work.
Take the "wobbly types" for example. It would have been more "minimal" to have types tied directly to their sizes instead of having short, int, long, etc.
There isn't any reason that compilers on the same platform have to disagree on the layout of the same basic type, but they do.
The complaints about parsing header files could potentially be solved by an IDL that could compile to c header files and ffi definitions for other languages. It could even be a subset of c that is easier to parse. But nothing like that has ever caught on.
There were many different types of computers back then. Some even had 36 bit word sizes. I don't think there was any clear winner like amd64 back then that they could have prioritized. 16 and 32 bit machines existed in decent amounts and so on.
It seems to be a meme on HN that C doesn't reflect hardware, now you're extending that to assembly. It seems silly to me. It was always an approximation of what happens under the hood, but I think the concepts of pointers, variable sizes and memory layout of structs all represent the machine at some level.
For example, C has pointer provenance, so pointers arent just addresses. Thats why type punning is such a mess. If a lang claims to be super close to the hardware this seems like a very weird thing.
C is super close to the hardware in that it works exactly like the abstract C machine, which is kind of a generalization of the common subset of a lot of machines, invented to make it portable, i.e. viable to be implemented straightforwardly on various architectures. For example pointer provenance makes it work on machines with segmented storage, these can occur anywhere, so there is no guarantee that addresses beyond a single allocation are expressible or meaningful.
What makes C feel free for programming is that instead of prescribing an implementation paradigm, it instead exposes a computing model and then lets the programmer write whatever is possible with that (and also what is not -- UB). And a lot of higher level abstractions are quickly implemented in C, e.g. inheritance and polymorphism, but then they still allow to be used in ways you like, so you can not just do pure class inheritance, but get creative with a vtable, or just use another vtable with the same object. These are things you can't do when the classes are a language construct.
The C abstract machine is exactly the important part. There is a difference between saying C is close to "the hardware" and C is close to the C abstract machine. The latter like you described has a few concepts that allow for abstraction and thus portability but obviously they lead to situations where the "maps to the hardware" doesn't seem to hold true.
My gripe is only with people acting like the C abstract machine doesn't exist and C is just syntax sugar for a bit of assembly. It's a bit more involved than that.
> The C abstract machine is exactly the important part. ... My gripe is only with people acting like the C abstract machine doesn't exist and C is just syntax sugar for a bit of assembly. It's a bit more involved than that.
Most people have no understanding of an abstract machine though the very idea of a high-level programming language is based on it.
The C Language Standard itself specifies "Program Execution" only on a "Abstract Machine". Mapping that abstract machine to an ISA/Memory on real hardware is the task of the C compiler. It can do this in any manner as long as the observable behaviour of the program is "as-if" it ran on the abstract machine.
Relevant quote:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.
> the concepts of pointers, variable sizes and memory layout of structs all represent the machine at some level.
Exactly.
Everything in assembly is still one-to-one in terms of functional/stateful behavior to actual execution. Runtime hardware optimization (pinhole instruction decomposition and reordering, speculative branching, automated caching, etc.) give a performance boost but do not change the model. Doing so would mean it didn't work!
And C is still very close to the assembly, in terms of basic operations. Even if a compiler is able to map the same C operations to different instructions (i.e. regular, SIMD, etc.)
You keep making these sorts of comments on various threads which tells me that perhaps you are not clear on the idea of an "Abstract machine" which underpins all high-level languages.
The gap between the "C Abstract Machine" and the actual Hardware underneath is smaller than most other high-level languages. This comment by user haberman puts it very nicely - https://news.ycombinator.com/item?id=46910015
Yes, most languages allow C type code, if that’s what you are trying to do.
Java with only primitive values, arrays, and classes only with fields and static methods.
But that wouldn’t be idiomatic Java, so typically non-explicit abstractions such as polymorphism have code generated for them that you don’t have explicit control over.
C is consistently low level because that’s all you get. Down to direct access to addressing and RAM, the stack frame, etc. as with assembly.
I am puzzled by the claim that C and assembly are not relatively close.
Note here “close” being used in the injective, not bijective, sense. (Scratch out “one-to-one” in my earlier comment.)
And “closer” lowers the bar here too. C isn’t simply decorated assembly. But closer to it.
And “close” being used informally. Arguments for closeness are several and strong (I think), but a bit of a hodgepodge.
In terms of non-bijectivity, for systems programming and performance choices C makes it easy to drop into assembly. But the former are uniquely application specific. And the latter doesn’t make the C version less like the assembly it maps onto - whether the compiler uses the more performant instructions for the context or not.
C’s convenient assembly inlining, and the handoff in both directions being smoothed by an assembly friendly model of the C code around it, are both a part of the “closeness”
But C is generally “close” to assembly, because its data types emphasize types handled natively, compound types reflect RAM layout, and pointers are explicit addresses to data and code. And those address values can be constructed and operated on just like any other data.
C is objectively closer to assembly than languages with strongly required abstractions. (E.g., Java classes, Lisp S-exp's/cons cells, etc.)
C is more “strictly closer” to assembly than languages with more optional abstractions, even if they also allow for relatively low level coding.
Functions could be viewed as a preferred abstraction, but they have a clear assembly level model accessible directly with pointer arithmetic. And they don’t get in the way of directly encoding custom argument passing schemes, and using goto’s and zero argument functions and tail calls as atomic assembly calls for function and jumps for continuations.
Types are a significant non-assembly abstraction, but are zero-cost in that they don't separate C from assembly, but C from C, as a code safety mechanism that is easily escaped.
It is often easy to add abstractions, via regular C, or macros, but you have to provide an explicit implementation for them in the source or complied library.
(However, if macros, with their mixed logical, symbol, text and file “data” model, are viewed as C source instead of as a C source construction language, then C becomes a very wacky abstraction language with behavior and rules that look nothing like simple assembly.)
> I am puzzled by the claim that C and assembly are not relatively close.
Did anyone say that?
I think the point is not that it is not "close", but that C is not equivalent to ASM: C has its own abstractions, and there are things you can do on assembly that you can't express in C.
The other low level languages such as C++, Rust, Zig, ... are equally close since you can express the same things. In some respect they are even closer since they got more features builtins that modern assembly can now do that was not part of the design in C. (SIMD, threading, ...)
Modern languages also have extra abstractions that makes programming easier without compromising on the cost. There are more abstractions than in C, but they also are optional. (Just like you could use goto instead of while or for loop, but you're happy this abstractions exist. We could also use functions pointer in C++ instead of virtual functions, but why would we if the language provide tools that make programming easier, for the same result)
> The other low level languages such as C++, Rust, Zig, ... are equally close since you can express the same things.
C is not just low level friendly, but low level out of the box. That is the level that all C must be written in, even when creating higher abstractions.
Some higher level languages are also low level friendly, not low level strict. Which is a kind dual.
I would argue that what makes C lower level, is that it comes in at, or under, the low levels of other languages, and its high bar comes in much lower than the abstractions built into other languages.
Forth is a good candidate for being even lower level.
But if someone else doesn't see things that way, that is fine. It is just one lens for comparing languages.
> C is not just low level friendly, but low level out of the box. That is the level that all C must be written in
No, it is not:
- People use for/while loop, for example, instead of the "low level" 'goto'
- C compiler compute pointer aliasing, assume operations don't overflow, etc., in order to optimise your code: What you write doesn't translate directly to assembly.
- Some low level operations cannot even be represented in pure C (without using __asm__ extension escape hatch)
There is no "C's convenient inline assembly": that is a vendor extension, if available, and its convenience could vary considerably.
The manipulation of memory by C programs is close semantically to the manipulation of memory by assembly programs. Memory accessed through pointers is similarly "external" to both assembly language and C programs.
The evaluation of C program code is not close to assembly language. C programs cannot reflect on themselves portably; features like parameter passing, returning, and allocating local storage during procedure activation, are not in the programming model.
C loses access to detailed machine state. Errors that machine language can catch, like overflows, division by zero and whatnot, are "undefined behavior". An assembly language program can easily add two integers together and then two more integers which include the carry out from the previous addition. Not so in C.
Assembly language instruction set designs (with some exceptions) tend to bend over backwards to preserve the functioning of existing binary programs, by maintaining the illusion that instructions are executed in sequence as if there were no pipelining or speculative execution, or register renaming, etc.
Meanwhile, C compiler vendors bend over backwards to prove that code you wrote 17 years ago was wrong and make it fail. C is full of unspecified evaluation orders and various kinds of undefined behavior in just the basic evaluation model of its syntactic, built-in constructs; and then some more in the use of libraries.
In assembly language, you would never have doubt about the order of evaluation of arguments for a procedure.
Even when it comes to memory, where C and asasembly language agree in many points, there are some subtle ways C can screw you. In assembly language, you would never wonder whether copying a structure from one memory location to another included the alignment padding bits. In C you also don't have to wonder, if you use memcpy. Oh, but if you use memset to clear some memory which you don't touch afterward and which goes out of scope, the compiler can optimize that away, oops!
Maybe because the performance gain is just not there. Adding support for string with explicit length everywhere is a huge amount of work. And then the question is whether such a string is like a Rust slice or something else.
And then the gain is close to zero because most filenames are short enough that there is almost no gain.
You need to do weird string operations, you have certainly a class somewhere that needs to append a zero to then end of a buffer, and exclusively use the class for thw filename.
You can't just toss a contiguous number of bytes you to convert it first.
Every single piece of software that need to interact with the file system needs to deal with this.
I'm not asking about a new string type. I'm asking to be able to be free from null terminating string.
In practice, with Rust the problem is somewhere else. There is a libc crate that wraps the Unix system calls, so no need to worry about that. What is a lot harder is that Unix filenames are not guaranteed to be UTF-8. So you can't convert to &str or String. At least, not without loss. So you have to keep this around as an OsString.
Even when you don't care about being cross-platform, you still need to rely on specific routines instead of having the most low level `ut8fopen(buffer, len, mode);`
My point is that I wish we would have a new "standard OS-related API", not even talking about introducing span type or anything, just creating something way more sane and care about moving forward from this point.
If I was about to create my own OS and decided to eliminate null-terminating string, and keep it as tiny and efficient as possible, I would face so many issues because I cannot reuse 99% of the code (related to file API) that already exists, I would need to think how to properly parse arguments from "main" without overhead etc.
I am not sure what Filip's view on this is. But like to point out the article from Stephen Kell linked below which explains why C is an incredibly useful tool for systems programming and what distinguishes it from all other languages.
The author is upfront about their goals and motivations and explicitly acknowledges that other concerns exist. Calling it whiny is ungracious -- the author is letting some very human frustration peek through in their narrative.
Not everything has to be written with all the warmth and humanity of a UN subcommittee interim report on widget standardisation.
They tried to do something that probably would have looked like Copilot integration into Windows, and they chose not to do that, because they discovered that it sucked.
So, they failed in an internal sense, which is better than the externalized kind of failure that Microsoft experienced.
I think that the nut that hasn't been cracked is: how do you get LLMs to replace the OS shell and core set of apps that folks use. I think Microsoft is trying by shipping stuff that sucks and pissing off customers, while Apple tried internally declined to ship it. OpenClaw might be the most interesting stab in that direction, but even that doesn't feel like the last word on the subject.
The CEO just has to have followership: the people who work there have to think that this is a good person to follow. Even they don't have to "like" him
No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs
> Threads wait (instead of spinning) if the lock is not available immediately-ish
They use parking lots, which is one way to do futew (in fact, WaitOnAddress is implemented similarly). And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.
> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs
You say this with zero data.
I know that yielding 40 times is optimal for WebKit because I measured it. In fact it was re-measured many times because folks like you would doubt that it could’ve optimal, suggest something different, and then again the 40 yields would be shown to be optimal.
> And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.
Threads wait if the lock is not available immediately-ish.
Yes, they spin by yielding. Spinning by pausing or doing anything else results in worse performance. We measured this countless times.
I think the mistake you’re making is that you’re imagining how locks work. Whereas what I am doing is running rigorous experiments that involved putting WebKit through larger scale tests
>> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs
> You say this with zero data.
Wouldn't the null hypothesis be that the same program behaves differently on different CPUs?
Is "different people require different amounts of time to run 100m" a statement that requires data?
For reference, golang's mutex also spins by up to 4 times before parking the goroutine on a semaphore. A lot less than the 40 times in the webkit blogpost, but I would definitely consider spinning an appropriate amount before sleeping to be common practice for a generic lock. Granted, as they have a userspace scheduler things do differ a bit there, but most concepts still apply.
The guy you relied to wrote the locking code. If you’re so certain they’re doing it wrong, would it not be easier to just prove it? It’s only one file, and they already have benchmarking set up
I mean my "No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs" can be verified directly by having a look at the table in my article showing different timings for pause.
For the yield part, I already linked to the part that shows that. Yes it doesn't call yield if it sees others are parked, but on quick lock/unlock of threads it happens that it sees nobody parked and fails, yielding directly to the OS. This is not frequent, but frequent enough that it can introduce delay issues.
Dynamic linking with a safe ABI, where if you change and recompile one library then the outcome has to obey some definition of safety, and ABI stability is about as good as C or Objective-C or Swift.
Until that happens, it'll be hard to adopt Rust in a lot of C/C++ strongholds where C's ABI and dynamic linking are the thing that enables the software to get huge.
reply