More

scottlamb · 2026-02-17T00:56:29 1771289789

I'd think so. Stored procedures let you do multi-statement sequences in fewer round trips. In 2026 larger systems are as likely as ever to run PostgreSQL on a different machine (or machines) than the application server. While latency between the two generally goes down over time, it's still not nothing. You may care about the latency of individual operations or the throughput impact of latency while holding a lock (see Amdahl's law).

Of course, the reasons not to use stored procedures still apply. They're logic, but they're versioned with the database schema, not with your application, which can be a pain.

ComputerGuru · 2026-02-17T16:37:09 1771346229

A supporting point and a counterpoint:

* Good database drivers will let you pipeline multiple queries concurrently (esp. in languages with async support), effectively eliminating the _N_x roundtrip cost (you can even execute them in parallel if you use multiple connections, not that I recommend doing that). But obviously this is only doable where the queries are independent of one another; I use this mainly to perform query splitting efficiently if the join key is already known.

* These days databases are often effectively versioned alongside the code anyway, at least for either smaller projects that "own" the database, eliminating the biggest issue with stored procedures.

scottlamb · 2026-02-16T22:01:22 1771279282

It'd be interesting to see a version of this that tries all the different interleavings of PostgreSQL operations between the two (or N) tasks. https://crates.io/crates/loom does something like this for Rust code that uses synchronization primitives.

lirbank · 2026-02-16T22:07:52 1771279672

Interesting! The barrier approach is more targeted: you specify the exact interleaving you want to test rather than exploring all of them. Trade-off is you need to know which interleavings matter, but you get deterministic tests that run against a real database instead of a simulated runtime. Exploring exhaustive interleaving testing against a real Postgres instance could be a fun follow-up - I'd be curious if it's practical.

scottlamb · 2026-02-17T00:37:17 1771288637

I think you could still do it against a real database—you're already setting it up to a known state before each test, right? Obviously there'd be more runs but I'd expect (hope) that each task would be sufficiently small that the number of permutations would stay within reason.

There would be some challenges for sure. Likely optimistic concurrent patterns would require an equivalent of loom's `yield_now` [1] to avoid getting stuck. And you'd probably need a way to detect one transaction waiting for another's lock to get out of situations like your update lock vs barrier example. I vaguely recall PostgreSQL might have some system catalog table for that or something.

[1] https://docs.rs/loom/0.7.2/loom/#yielding

lirbank · 2026-02-17T00:54:08 1771289648

Yeah, the more I think about it, the more exciting this idea gets. The walkthrough in the article shows exactly why - I intentionally (to later show why that is wrong) place the barrier between the SELECT and UPDATE, which deadlocks instead of triggering the race. Getting the placement right requires reasoning about where the critical interleaving point is. An exhaustive approach would surface both outcomes automatically: this placement deadlocks, this one exposes the bug, this one passes. That would remove the hardest part of writing these tests.

reitzensteinm · 2026-02-17T01:07:15 1771290435

Martin Kleppmann has this tool that's quite relevant: https://martin.kleppmann.com/2014/11/25/hermitage-testing-th...

lirbank · 2026-02-17T01:21:13 1771291273

Oh that is super cool. Great prior art to study in combo with Loom. Very excited to dig in - imagine if there was an easy-to-use data race tester where you didn't have to figure out the interleaving points up front? Just point it at your code and let it find them. Exciting.

reitzensteinm · 2026-02-17T02:00:54 1771293654

Loom does exhaustive search, with clever methods to prune it. On real world programs, you have to set a limit to that because it obviously grows extremely quickly even with the pruning.

I've built something similar to Loom, except it's more focused on extensively modeling the C++11/Rust memory model (https://github.com/reitzensteinm/temper). My experience is that fairly shallow random concurrent fuzzing yields the vast majority of all concurrency bugs.

Antithesis (https://antithesis.com/) are probably the leaders of the pack in going deeper.

OrangeDelonge · 2026-02-17T05:26:56 1771306016

Do you know you’re just talking to an LLM? Everyone else in this post also seem oblivious to it or maybe they just don’t care? Why do I even read comments anymore sigh

heavenlyblue · 2026-02-18T13:10:22 1771420222

i have a python library that does it: https://github.com/andreycizov/python-race

It uses generators and their yield as the yield point (and supports running arbitrary functions under a debugger)

scottlamb · 2026-02-16T19:36:39 1771270599

> We had a discussion that it would be easy to determine who was passing by at what times due to these especially when you can "de-anonymize" the data for example link it to a numberplate.

You could also read the numberplate directly with OpenALPR. It can be finicky to set up a camera to do this reliably in all conditions (particularly at night and high speed) but once done you could detect any car passing, not just ones with wifi access points.

When the law requires us to have numberplates, I think this just has to be considered public information for anyone who is nearby or can leave a camera nearby. It's not ideal to leak it in additional forms that might be easier for people to grab (say, with an ESP32), but it's a matter of degree rather than of kind.

But yeah, I'm with you on some of these others, particularly the medical devices. That's not great.

AlotOfReading · 2026-02-16T20:36:06 1771274166

There's a difference between public and Public. I go outside with my face visible and I don't mind if my neighbors see me. I do mind if my neighbors stand outside my door with a notepad sketching faces every time they see me or anyone else, especially if they're selling the data. Systematic tracking that isn't subject to the constraints of human memory and apathy fundamentally changes the equation.

scottlamb · 2026-02-16T22:08:33 1771279713

> Systematic tracking that isn't subject to the constraints of human memory and apathy fundamentally changes the equation.

I definitely don't approve of mass collection across many cameras, accessible to who-knows-who with minimal if any privacy controls (Flock). But it wouldn't surprise or bother me if my next-door neighbor had ALPR enabled, as long as it's not part of that cloud. YMMV.

Full disclosure: I develop an open source home/hobbyist-oriented NVR, although it doesn't have an ALPR feature or any other analytics today.

thedrexster · 2026-02-16T21:05:37 1771275937

> constraints of human memory and apathy

i like that a lot, brother, thank you!

scottlamb · 2026-02-13T15:42:09 1770997329

> I have an uninterruptible power supply (UPS). It is basically a surge protector with a big battery in it. So if the power goes out, it automatically falls back to the battery and you can still squeeze another X hours of juice out of it until the main power comes back on.

Often just minutes if you're running them near their rated power. Conventional UPSs are generally designed to power your devices just long enough to shut down your computer "safely" [1] or to start a generator. They advertise power ratings but typically not battery capacity at all, and that's because it sucks.

2026 update: don't buy a (conventional, lead-acid battery) UPS. Buy a LifePo4 power station instead. They're actually designed to keep your devices running for hours without main power. They used to not fail over quickly enough to avoid a typical machine going down (briefly), but now they commonly advertise 10ms to 20ms switchover. Also, you don't need to replace the batteries nearly as often. Like, once every 10 years instead of once every 3 years. And the price has really fallen recently. LifePo4 is (unlike some other lithium ions) known as a particularly safe chemistry, so you don't have to worry about fire risk.

[1] This matters if you have crappy software and/or hardware that loses data if shut down uncleanly. If you use modern SSD/HDDs that flush their write caches when asked to, modern journaled filesystems at their default settings, and modern databases like SQLite or PostgreSQL at their default settings, you should be fine just pulling the power plug any time you feel like it.

scottlamb · 2026-01-12T17:59:56 1768240796

What's the bar here? Does anyone say "we don't know if Einstein could do this because we were really close or because he was really smart?"

I by no means believe LLMs are general intelligence, and I've seen them produce a lot of garbage, but if they could produce these revolutionary theories from only <= year 1900 information and a prompt that is not ridiculously leading, that would be a really compelling demonstration of their power.

emodendroket · 2026-01-12T19:32:20 1768246340

> Does anyone say "we don't know if Einstein could do this because we were really close or because he was really smart?"

It turns out my reading is somewhat topical. I've been reading Rhodes' "The Making of the Atomic Bomb" and of the things he takes great pains to argue (I was not quite anticipating how much I'd be trying to recall my high school science classes to make sense of his account of various experiments) is that the development toward the atomic bomb was more or less inexorable and if at any point someone said "this is too far; let's stop here" there would be others to take his place. So, maybe, to answer your question.

twoodfin · 2026-01-13T02:12:14 1768270334

It’s been a while since I read it, but I recall Rhodes’ point being that once the fundamentals of fission in heavy elements were validated, making a working bomb was no longer primarily a question of science, but one of engineering.

emodendroket · 2026-01-23T02:54:29 1769136869

Engineering began before they were done with the experimentation and theorizing part. But the US, the UK, France, Germany, the Soviets, and Japan all had nuclear weapons programs with different degrees of success.

bmacho · 2026-01-12T19:04:05 1768244645

> Does anyone say "we don't know if Einstein could do this because we were really close or because he was really smart?

Yes. It is certainly a question if Einstein is one of the smartest guy ever lived or all of his discoveries were already in the Zeitgeist, and would have been discovered by someone else in ~5 years.

cyberax · 2026-01-12T19:59:38 1768247978

Both can be true?

Einstein was smart and put several disjointed things together. It's amazing that one person could do so much, from explaining the Brownian motion to explaining the photoeffect.

But I think that all these would have happened within _years_ anyway.

echoangle · 2026-01-12T18:32:09 1768242729

> Does anyone say "we don't know if Einstein could do this because we were really close or because he was really smart?"

Kind of, how long would it have realistically taken for someone else (also really smart) to come up with the same thing if Einstein wouldn't have been there?

pegasus · 2026-01-12T19:12:23 1768245143

But you're not actually questioning whether he was "really smart". Which was what GP was questioning. Sure, you can try to quantify the level of smarts, but you can't still call it a "stochastic parrot" anymore, just like you won't respond to Einstein's achievements, "Ah well, in the end I'm still not sure he's actually smart, like I am for example. Could just be that he's just dumbly but systematically going through all options, working it out step by step, nothing I couldn't achieve (or even better, program a computer to do) if I'd put my mind to it."

I personally doubt that this would work. I don't think these systems can achieve truly ground-breaking, paradigm-shifting work. The homeworld of these systems is the corpus of text on which it was trained, in the same way as ours is physical reality. Their access to this reality is always secondary, already distorted by the imperfections of human knowledge.

jaggederest · 2026-01-12T18:40:48 1768243248

Well, we know many watershed moments in history were more a matter of situation than the specific person - an individual genius might move things by a decade or two, but in general the difference is marginal. True bolt-out-of-the-blue developments are uncommon, though all the more impressive for that fact, I think.

scottlamb · 2026-01-08T14:51:44 1767883904

It probably is. I think the same thing happened when Randall Munroe (of xkcd fame) gave a talk at Google. I was there, it was crowded, and Don Knuth showed up. 90% sure he sat on the floor.

nooks · 2026-01-08T15:45:25 1767887125

Friends and I nabbed front-row seats to the Munroe talk; after a time we were asked to take seats a few rows back to make room for Knuth and others. He definitely did not sit on the floor.

scottlamb · 2026-01-08T16:04:05 1767888245

Well, that shows what my 90% sure memory is worth. I sit corrected.

svat · 2026-01-08T15:54:47 1767887687

FWIW the XKCD talk at Google is here (wow, 18 years ago! I remember watching this video when it was posted): https://www.youtube.com/watch?v=zJOS0sV2a24 (Knuth comes up to ask a question at 21:30) (Can't tell from the video where he was sitting otherwise, though there are definitely at least some people sitting on the floor.)

flir · 2026-01-08T15:02:05 1767884525

I would definitely give up my seat to Don Knuth.

scottlamb · 2026-01-08T15:04:15 1767884655

In theory I would too, but I was also on the floor, and believe it or not I didn't notice Don Knuth was there until after the talk had started.

scottlamb · 2026-01-08T14:39:18 1767883158

> When Jeff Dean goes on vacation, production services across Google mysteriously stop working within a few days. This is actually true. ... It's not clear whether this fact is really true, or whether this line is simply part of the joke, so I've omitted the usual (TRUE) identifier here. Interpret this as you see fit :)

I think this one's true-ish. Back in the day when Google didn't have good cron services for the corp and production domains [1], Jeff Dean's workstation ran a job that made something called (iirc) the "protocol buffer debug database". Basically, a big file (probably an sstable) with compiled .proto introspection data for a huge number of checked-in protobufs. You could use it to produce human-readable debug output from what was otherwise a fairly indecipherable blob. I don't think it was ever intended for production use, but some things that shouldn't have ended up using it. I think after Jeff had been on vacation for a while, his `prodaccess` credentials expired, the job stopped working, maybe the output became unavailable, and some things broke.

Here's a related story I know is true: when I was running Google Reader, I got paged frequently for Bigtable replication delay, and I eventually traced it to trouble accessing files that shared GFS chunkservers with this database. I mentioned it on some mailing list, and almost immediately afterward Jeff Dean CCed me on a code review changing the file's replication from r=3 to r=12. The problem went away.

[1] this lasted longer than you would expect

chubot · 2026-01-08T15:41:47 1767886907

Ha, I also recall this fact about the protobuf DB after all these years

Another Jeff Dean fact should be "Russ Cox was Jeff Dean's intern"

This was either 2006 or 2007, whenever Russ started. I remember when Jeff and Sanjay wrote "gsearch", a distributed grep over google3 that ran on 40-80 machines [1].

There was a series of talks called "Nooglers and the PDB" I think, and I remember Jeff explained gsearch to maybe 20-40 of us in a small conference room in building 43.

It was a tiny and elegant piece of code -- something like ~2000 total lines of C++, with "indexer" (I think it just catted all the files, which were later mapped into memory), replicated server, client, and Borg config.

The auth for the indexer lived in Jeff's home dir, perhaps similar to the protobuf DB.

That was some of the first "real Google C++ distributed system" code I read, and it was eye opening.

---

After that talk, I submitted a small CL to that directory (which I think Sanjay balked at slightly, but Jeff accepted). And then I put a Perforce watch on it to see what other changes were being submitted.

I think the code was dormant for awhile, but later I saw someone named Russ Cox started submitting a ton of changes to it. That became the public Google Code Search product [2]. My memory is that Russ wrote something like 30K lines of google3 C++ in a single summer, and then went on to write RE2 (which I later used in Bigtable, etc.)

Much of that work is described here: https://swtch.com/~rsc/regexp/

I remember someone telling him on a mailing list something like "you can't just write your own regex engine; there are too many corner cases in PCRE"

And many people know that Russ Cox went on to be one of the main contributors to the Go language. After the Code Search internship, he worked on Go, which was open sourced in 2009.

---

[1] Actually I wonder if today if this could perform well enough a single machine with 64 or 128 cores. Back then I think the prod machines were something like 2, 4, or 8 cores.

[2] This was the trigram regex search over open source code on the web. Later, there was also the structured search with compiler front ends, led by Steve Yegge.

chubot · 2026-01-08T17:41:19 1767894079

Side note: I used this query to test LLM recall: Do jeff dean and russ cox know each other?

Interesting results:

1. Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox

This matches my recent experience -- Gemini is enhanced for many use cases by superior recall

2. Claude also knows this, pointing to pages like: https://usesthis.com/interviews/jeff.dean/ - https://goodlisten.co/clip/the-unlikely-friendship-that-shap... (never seen this)

3. ChatGPT did the worst. It said

... they have likely crossed paths professionally given their roles at Google and other tech circles. ...

While I can't confirm if they know each other personally or have worked directly together on projects, they both would have had substantial overlap in their careers at Google.

(edit: I should add I pay for Claude but not Gemini or ChatGPT; this was not a very scientific test)

romanhn · 2026-01-08T19:56:58 1767902218

Not just Google. I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it. That was a year ago. https://news.ycombinator.com/item?id=42649774

ignoramous · 2026-01-09T23:28:58 1768001338

> Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox ... I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it.

Sounds like HN is the kind of place for effective & effortless "Answer Engine Optimization".

lossyalgo · 2026-01-09T11:40:30 1767958830

Hopefully YCombinator can afford to pay for the constant caching of all HN comments. /s :)

ikejix · 2026-01-11T12:42:14 1768135334

I participated in an internship in the summer of 2007. One of the things I found particularly interesting was gsearch. At the time, there were search engines for source code, but I was not aware of any that supported regular expressions. My internship host encouraged me by saying, “Try digging through repositories and look for the source code.”

kentonv · 2026-01-08T14:57:02 1767884222

I submitted this "fact" and it is indeed a true story, exactly as you said.

The "global protobuf db" had comments all over it saying it's not intended for production-critical tasks, and it had a lot of caveats and gotchas even aside from being built by Jeff's desktop, but it was so convenient that people naturally ended up using it anyway.

the-rc · 2026-01-08T18:41:13 1767897673

There was a variant of this that occurred later. By that time there might not have been a dependency on Jeff's workstation anymore, but the DB, or at least one of its replicas, was getting copied to... /gfs/cg/home/sanjay/ — I don't believe it was Jeff this time. At some point, there was a very long PCR in the Oregon datacenter, perhaps even the same one that happened a few weeks after the 2011 Fukushima disaster. With the CG cluster powered off for multiple days, a bunch of stuff broke, but in this case the issue might have been solved by dumping the data and/or reading it from elsewhere.

btilly · 2026-01-08T16:33:38 1767890018

In 2010, due to the China hacking thing, Google locked down its network a lot.

At least one production service went down because it relied on a job running on Jeff Dean's personal computer that no longer had access. Unfortunately I forget what job it was.

jeffbee · 2026-01-08T14:48:20 1767883700

The other thing that ran under Jeff's desk for a long time was Code Search, the old one.

shadowgovt · 2026-01-08T15:33:30 1767886410

I remember this. He went on vacation and since he wasn't available to login, code search indexing went down for a bit.

scottlamb · 2026-01-06T17:45:57 1767721557

They talk about this here: https://sqlite.org/testing.html#statement_versus_branch_cove...

...saying that for a statement `if( a>b && c!=25 ){ d++; }`, they use 100% machine-code branch coverage as a way of determining that they've evaluated this in `a<=b`, `a>b && c==25`, and `a>b && c!=25`. (C/C++) branch coverage tools I've used are less strict, only requiring that takes both if and else paths.

One could imagine a better high-level branch coverage tool that achieves this intent without dropping to the machine code level, but I'm not sure it exists today in Rust (or any other language for that matter).

There might also be an element of "we don't even trust the compiler to be correct and/or ourselves to not have undefined behavior" here, although they also test explicitly for undefined behavior as mentioned later on the page.

jimmytucson · 2026-01-06T22:09:28 1767737368

Hmm, so in a language that does automatic bounds checking, the compiler might translate a line of source code like:

    let val = arr[i]

to assembly code like:

    cmp     rdx, rsi        ; Compare i (rdx) with length (rsi)
    jae     .Lpanic_label   ; Jump if i >= length
    ; later...
    .Lpanic_label:
    call    core::panicking::panic_bounds_check

Are they saying with "correct code" the line of source code won't be covered? Because the assembly instruction to call panic isn't ever reached?

scottlamb · 2026-01-06T23:44:05 1767743045

I think they're saying it's not covered: not only because `call` isn't ever reached but also because they identify `jae` as a branch and see it's always not taken. (If there were no lines in your `; later...` section and the branch were always taken, they'd still identify the `jae` as not covered.)

It might be reasonable to redefine their metric as "100% branch coverage except for panics"...if you can reliably determine that `jae .Lpanic_label` is a panic jump. It's obvious to us reading your example of course but I don't know that the compiler guarantees panics always "look like that", and only panics look like that.

scottlamb · 2026-01-06T17:04:07 1767719047

Regret is possible with any language, but I'd be surprised if someone regretted choosing Rust for the reasons in the article you linked:

* Error handling via exceptions. Rust uses `Result` instead. (It has panics, but they are meant to be strictly for serious logic errors for which calling `abort` would be fine. There's a `Cargo.toml` option to do exactly that on panic that rather than unwinding.) (btw, C++ has two camps here for better or worse; many programs are written in a dialect that doesn't use exceptions.)

* Constructors have to be infallible. Not a thing in Rust; you just make a method that returns `Result<Self, Error>`. (Even in C++ there are workarounds.)

* Destructors have to be infallible. This is about as true in Rust as in C++: `Drop::drop` doesn't return a `Result` and can't unwind-via-panic if you have unwinding disabled or are already panicking. But I reject the characterization of it as a problem compared to C anyway. The C version has to call a function to destroy the thing. Doing the same in Rust (or C++) is not really any different; having the other calls assert that it's not destroyed is perfectly fine. I've done this via a `self.inner.as_mut().expect("not terminated")`. They say the C only has two states: "Not initialised object/memory where all the bets are off and the structure can contain random data. And there is initialised state, where the object is fully functional". The existence of the "all bets are off" state is not as compelling as they make it out to be, even if throwing up your hands is less code.

* Inheritance. Rust doesn't have it.

scottlamb · 2026-01-06T16:34:17 1767717257

I'm a little surprised they're at all open to a rewrite in Rust:

> All that said, it is possible that SQLite might one day be recoded in Rust.

...followed by a list of reasons why they won't do it now. I think the first one ("Rust needs to mature a little more, stop changing so fast, and move further toward being old and boring.") is no longer valid (particularly for software that doesn't need async and has few dependencies), but the other ones probably still are.

I write Rust code and prefer to minimize non-Rust dependencies, but SQLite is the non-Rust dependency I mind the least for two reasons:

* It's so fast and easy to compile: just use the `rusqlite` crate with feature `bundled`. It builds the SQLite "amalgamation" (its entire code basically concatenated into a single .c file). No need to have bazel or cmake or whatever installed, no weird library dependency chains, etc.

* It's so well-tested that the unsafety of the language doesn't bother me that much. 100% machine branch coverage is amazing.