More

benlivengood · 2026-02-10T16:37:58 1770741478

The contrast between this and https://news.ycombinator.com/item?id=46923543 (Software engineering is back) is kind of stark. I am using frontier models to get fun technical projects done that I simply didn't have time for since my late teens. It is still possible to understand an architecture down to the hardware if you want to, but it can happen a lot faster. The specifications are queryable now. Obscure bugs that at least one person has seen in the past are seconds away instead of minutes or hours of searching. Even new bugs have extra eyes on them. I haven't written a new operating system yet but it's now a tractable problem. So is using Lean or Julia or some similar system to formally specify it. So far I've been digging into modern multithreaded cache performance which is just as fascinating as directly programming VGA and sound was in the early PC days. Linux From Scratch is still up to date. You can get FPGAs that fit in your USB port [0]. Technical depth and low-level understanding is wherever you want to look for it.

[0] https://www.crowdsupply.com/sutajio-kosagi/fomu

post-it · 2026-02-10T16:46:29 1770741989

> Obscure bugs that at least one person has seen in the past are seconds away instead of minutes or hours of searching.

This is a huge one for me. Claude is significantly better at Googling than I am.

benlivengood · 2026-02-09T23:02:05 1770678125

The McMurtry Spéirling is under 1000kg. Battery technology will only improve and so I expect to see under-1500kg sport EVs generally available eventually.

Under 1000kg for a reasonable price probably means building your own electrified exocar.

MindSpunk · 2026-02-10T00:00:18 1770681618

The McMurtry is a race car so it's not surprising it's that light. However it still pays a price compared to its contemporaries (go look at the weights of various LMP1 or LMP2 cars, even old cars like the Mazda 787B was ~850kg iirc). The only number I've found so far is "under 1000kg" so I assume it's probably quite close to that 1000kg number.

The weight of the batteries isn't going anywhere anytime soon. I expect car makers will prioritize range. I think the engineering to make an EV truly light like an old Integra Type R (1100kg) will be obscenely expensive and sacrifice so much on practicality it just won't be a viable product as a road car.

The car would be so compromised to be that light nobody will make the car, at least at an affordable price. You'll end up with a limited range, limited power, uncomfortable car for a price way out of line with what you're getting.

I think you could make a ~150kw-180kw EV pretty light, but considering the ongoing power pissing contest in modern cars I'm not sure how well it would market test.

So I expect the market will stick to heavy cars with big power because it's easier to build and easier to sell.

benlivengood · 2026-02-09T20:45:55 1770669955

I'm really curious what their data looks like at the various racetracks and circuits. Fun fact; most raceways have accurate street-level indicators (including that they are one-way, but sadly they are not the best racing line) on most online maps, and my car did complain to me in its weekly report about a lot of hard turns, quick acceleration, and hard braking with helpful pins on e.g. Laguna Seca or Thunderhill corners.

In theory, the most dangerous turns would probably have higher variance on hard braking data.

benlivengood · 2026-01-29T19:40:34 1769715634

Agreed; everyone complained that LLMs have no world model, so here we go. Next logical step is to backfill the weights with encoded video from the real world at some reasonable frame rate to ground the imagination and then branch the inference on possible interventions (actions) in the near future of the simulation, throw the results into a goal evaluator and then send the winning action-predictions to motors. Getting timing right will probably require a bit more work than literally gluing them together, but probably not much more.

patapong · 2026-01-30T12:02:21 1769774541

This is the most convincing take of what might actually get us to AGI I've heard so far :)

benlivengood · 2026-01-23T17:23:40 1769189020

I dunno, GPT-OSS and Llama and QWEN and any half dozen of other large open-weight models?

I really can't imagine OpenAI or Anthropic turning off inference for a model that my workplace is happy to spend >$200*person/month on. Google still has piles of cash and no reason to turn off Gemini.

The thing is, if inference is truly heavily subsidized (I don't think it is, because places like OpenRouter charge less than the big players for proportionally smaller models) then we'd probably happily pay >$500 a month for the current frontier models if everyone gave up on training new models because of some oddball scaling limit.

crimsoneer · 2026-01-23T17:47:02 1769190422

Yeah, this is silly. Plenty of companies are hosting their own now, sometimes on prem. This isn't going away

iLoveOncall · 2026-01-23T17:35:05 1769189705

> we'd probably happily pay >$500 a month for the current frontier models

Try $5,000. OpenAI loses hundreds of billions a year, they need a 100x, not 2x.

gingersnap · 2026-01-23T17:43:38 1769190218

But they are not losing 100x on inference on high paying customers. Their biggest loss is free user + training/development cost

weirdmantis69 · 2026-01-23T20:16:18 1769199378

Why lie on a site where people know things.

filoleg · 2026-01-23T17:47:53 1769190473

OpenAI loses hundreds of billions a year on inference? I strongly doubt it

ndriscoll · 2026-01-23T17:49:48 1769190588

$60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that. Actually, I'm fairly certain that some optimizations I had codex do this week would already pay for that from being able to scale down pod resource requirements, and that's just from me telling it to profile our code and find high ROI things to fix, taking only part of my focus away from planned work.

Another data point: I gave codex a 2 sentence description (being intentionally vague and actually slightly misleading) of a problem that another engineer spent ~1 week root causing a couple months ago, and it found the bug in 3.5 minutes.

These things were hot garbage right up until the second they weren't. Suddenly, they are immensely useful. That said, I doubt my usage costs anywhere near that much to openai.

Marsymars · 2026-01-23T19:39:48 1769197188

> $60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that.

Maybe, but that's a hard sell to all the workplaces who won't even spring for >1080p monitors for their experienced engineers.

thot_experiment · 2026-01-23T18:42:39 1769193759

Wildly different experience of frontier models than I have, what's your problem domain? I had both Opus and Gemini Pro outright fail at implementing a dead simple floating point image transformation the other day because neither could keep track of when things were floats and when they were uint8.

ndriscoll · 2026-01-23T19:41:22 1769197282

Low-level networking in some cloud applications. Using gpt-5.2-codex medium. I've cloned like 25 of our repos on my computer for my team + nearby teams and worked with it for a day or so coming up with an architecture diagram annotated with what services/components live in what repos and how things interact from our team's perspective (so our services + services that directly interact with us). It's great because we ended up with a mermaid diagram that's legible to me, but it's also a great format for it to use. Then I've found it does quite well at being able to look across repos to solve issues. It also made reference docs for all available debug endpoints, metrics, etc. I told it where our prometheus server is, and it knows how to do promql queries on its own. When given a problem, it knows how to run debug commands on different servers via ssh or inspect our kubernetes cluster on its own. I also had it make a shell script to go figure out which servers/pods are involved for a particular client and go check all of their debug endpoints for information (which it can then interpret). Huge time saver for debugging.

I'm surprised it can't keep track of float vs uint8. Mine knew to look at things like struct alignment or places where we had slices (Go) on structures that could be arrays (so unnecessary boxing), in addition to things like timer reuse, object pooling/reuse, places where local variables were escaping to heap (and I never even gave it the compiler escape analysis!), etc. After letting it have a go with the profiler for a couple rounds, it eventually concluded that we were dominated by syscalls and crypto related operations, so not much more could be microoptimized.

I've only been using this thing since right before Christmas, and I feel like I'm still at a fraction of what it can do once you start teaching it about the specifics of your workplace's setup. Even that I've started to kind-of automate by just cloning all of our infra teams' repos too. Stuff I have no idea about it can understand just fine. Any time there's something that requires more than a super pedestrian application programmer's knowledge of k8s, I just say "I don't really understand k8s. Go look at our deployment and go look at these guys' terraform repo to see all of what we're doing" and it tells me what I'm trying to figure out.

thot_experiment · 2026-01-23T22:36:51 1769207811

Yeah wild, I don't really know how to bridge the gap here because I've recently been continuously disappointed by AI. Gemni Pro wasn't even able to solve a compiler error the other day, and the solutions it was suggesting were insane (manually migrating the entire codebase) when the solution was like a 0.0.xx compiler version bump. I still like AI a lot for function-scale autocomplete, but I've almost stopped using agents entirely because they're almost universally producing more work for me and making the job less fun, I have to do so much handholding for them to make good architectural decisions and I still feel like I end up on shaky foundations most of the time. I'm mostly working on physics simulation and image processing right now. My suspicion is that there's just so many orders of magnitude more cloud app plumbing code out there that the capability is really unevenly distributed, similarly with my image processing stuff my suspicion is that almost all the code it is trained on works in 8bit and it's just not able to get past it's biases and stop itself from randomly dividing things that are already floats by 255.

benlivengood · 2026-01-22T21:11:53 1769116313

Yep, like ECHELON and friends are. The metadata recorded about your (all of our) traffic is probably enough to perform the timing attack.

shadowgovt · 2026-01-22T21:56:23 1769118983

Hey, if ECHELON snuck a listener into my house, where six devices hang out on a local router... Good for them, they're welcome to my TODO lists and vast collection of public-domain 1950s informational videos.

(I wouldn't recommend switching the option off for anything that could transit the Internet or be on a LAN with untrusted devices. I am one of those old sods who doesn't believe in the max-paranoia setting for things like "my own house," especially since if I dial that knob all the way up the point is moot; they've already compromised every individual device at the max-knob setting, so a timing attack on my SSH packet speed is a waste of effort).

benlivengood · 2026-01-21T23:53:42 1769039622

Deontological, spiritual/religious revelation, or some other form of objective morality?

The incompatibility of essentialist and reductionist moral judgements is the first hurdle; I don't know of any moral realists who are grounded in a physical description of brains and bodies with a formal calculus for determining right and wrong.

I could be convinced of objective morality given such a physically grounded formal system of ethics. My strong suspicion is that some form of moral anti-realism is the case in our universe. All that's necessary to disprove any particular candidate for objective morality is to find an intuitive counterexample where most people agree that the logic is sound for a thing to be right but it still feels wrong, and that those feelings of wrongness are expressions of our actual human morality which is far more complex and nuanced than we've been able to formalize.

staticassertion · 2026-01-22T03:17:12 1769051832

You can be a physicalist and still a moral realist. James Fodor has some videos on this, if you're interested.

benlivengood · 2026-01-22T21:26:11 1769117171

Granted, if humans had utility functions and we could avoid utility monsters (maybe average utilitarianism is enough) and the child in the basement (if we could somehow fairly normalize utility functions across individuals so that it was well-defined to choose the outcome where the minimum of everyone's utility functions is maximized [argmax_s min(U_x(s)) for all people in x over states s]) then I'd be a moral realist.

I think we'll keep having human moral disagreements with formal moral frameworks in several edge cases.

There's also the whole case of anthropics: how much do exact clones and potentially existing people contribute moral weight? I haven't seen a solid solution to those questions under consequentialism yet; we don't have the (meta)philosophy to address them yet; I am 50/50 on whether we'll find a formal solution and that's also required for full moral realism.

benlivengood · 2026-01-21T21:50:03 1769032203

Without at least some filtering a Gateway NAT appliance is vulnerable to:

* LAN IP address spoofing from the WAN

* Potential for misconfigured "internal" daemons to accept WAN traffic (listening on 0.0.0.0 instead of the LAN or localhost)

* Reflection amplification attacks

tsimionescu · 2026-01-22T06:12:40 1769062360

LAN IP address spoofing is indeed a valid attack vector, if the ISP is compromised.

Internal daemons on machines other than the router itself in the LAN network listening on 0.0.0.0 are not insecure (unless you have the problem from point 1, malicious/compromised ISP). The router won't route packets with IPs that are not in its LAN to them. Of course, the router itself could be compromised if it accidentally listens on 0.0.0.0 and accepts malicious packets.

Not sure what you mean by reflection amplification attacks, but unless they are attacking the router itself, or they are arriving on WAN with LAN IPs (again, compromised/malicious ISP), I don't see how they would reach LAN machines.

zajio1am · 2026-01-22T18:26:14 1769106374

You do not need compromised ISP for spoofed LAN IP traffic, the attack could came from other clients on the same WAN segment.

benlivengood · 2026-01-21T16:43:37 1769013817

One could use any number of LLMs on a take-home problem so in-person interviews are a must.

legel · 2026-01-21T17:09:51 1769015391

One could use any number of LLMs on real-world problems.

Why are we still interviewing like its 1999?

game_the0ry · 2026-01-21T17:13:54 1769015634

Old habits die hard. And engineers are pretty lazy when it comes to interviews, so just throwing the same leetcode problem into coder pad in every interview makes interviews easier for the person doing the interview.

selkin · 2026-01-21T17:26:38 1769016398

If you want people to interview better, you have to both allocate resources to it, and make it count on perf. It’s not laziness, it’s ROI.

yodsanklai · 2026-01-21T17:27:42 1769016462

As an interviewer, I ask the same problems because it makes it much easier to compare candidates.

game_the0ry · 2026-01-21T17:29:57 1769016597

How do you know if one candidate happened to see the problem on leetcode and memorized the solution versus one who struggled but figured it out slower?

yodsanklai · 2026-01-21T17:52:25 1769017945

It's very easy to tell, but it doesn't make much difference. The best candidates have seen the problems before and don't even try to hide it, they just propose their solution right away.

I try give positive feedback for candidates who didn't know the problem but could make good use of hints, or had the right approach. But unfortunately, it's difficult to pass a Leetcode interview if you haven't seen a similar problem to what is asked before. Most candidates I interview nowadays seem to know all questions.

That's what the company has decided so we have to go along. The positive side is that if you do your part, you have good chances of being hired, even if you disagree with the process.

bradlys · 2026-01-21T17:35:34 1769016934

It doesn’t matter. It’s about looking for candidates who have put in the time for your stupid hazing ritual. It signals on people who are willing to dedicate a lot of time to meaningless endeavors for the sake of employment.

This type of individual is more likely to follow orders and work hard - and most importantly - be like the other employees you hired.

legel · 2026-01-21T18:42:14 1769020934

Once upon a time, the "stupid hazing ritual" made sense.

Now it means company is stupid.

benlivengood · 2026-01-21T22:05:34 1769033134

Because if you want to hire engineers then you have to ask engineering questions. Claude and GPT and Gemini are super helpful but they're not autonomous coders yet so you need an actual engineer to vet their outcome still.

benlivengood · 2026-01-13T23:20:32 1768346432

I guess if you only need to store one bit you could store either 0 or 11 and on average use less than two bits (for bit flips only), or 111 if you have to also worry about losing/duplicating bits.