Hacker Newsnew | past | comments | ask | show | jobs | submit | interpol_p's commentslogin

I had Opus 4.6 running on a backend bug for hours. It got nowhere. Turned out the problem was in AWS X-ray swizzling the fetch method and not handling the same argument types as the original, which led to cryptic errors.

I had Opus 4.6 tell me I was "seeing things wrong" when I tried to have it correct some graphical issues. It got stuck in a loop of re-introducing the same bug every hour or so in an attempt to fix the issue.

I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.


Haha, reminds me of an unbelievably aggravating exchange with Codex (GPT 5.4 / High) where it was unflinchingly gaslighting me about undesired behavior still occurring after a change it made that it was adamant simply could not be happening.

It started by insisting I was repeatedly making a typo and still would not budge even after I started copy/pasting the full terminal history of what I was entering and the unabridged output, and eventually pivoted to darkly insinuating I was tampering with my shell environment as if I was trying to mislead it or something.

Ultimately it turned out that it forgot it was supposed to be applying the fixes to the actual server instead of the local dev environment, and had earlier in the conversation switched from editing directly over SSH to pushing/pulling the local repo to the remote due to diffs getting mangled.


The example given in the article is acceptance criteria for a login/password entry flow. This is fairly easy to spec-out in terms of AC and TDD.

I have been asking these tools to build other types of projects where it (seems?) much more difficult to verify without a human-in-the-loop. One example is I had asked Codex to build a simulation of the solar system using a Metal renderer. It produced a fun working app quickly.

I asked it to add bloom. It looped for hours, failing. I would have to manually verify — because even from images — it couldn't tell what was right and wrong. It only got it right when I pasted a how-to-write-a-bloom-shader-pass-in-Metal blog post into it.

Then I noticed that all of the planet textures were rotating oddly every time I orbited the camera. Codex got stuck in another endless loop of "Oh, the lookAt matrix is in column major, let me fix that <proceeds to break everything>." or focusing (incorrectly) on UV coordinates and shader code. Eventually Codex told me what I was seeing "was expected" and that I just "felt like it was wrong."

When I finally realised the problem was that Codex had drawn the planets with back-facing polygons only, I reported the error, to which Codex replied, "Good hypothesis, but no"

I insisted that it change the culling configuration and then it worked fine.

These tools are fun, and great time savers (at times), but take them out of their comfort zone and it becomes real hard to steer them without domain knowledge and close human review.


That's a pretty extreme take. I've been using the Mac since about 2001. I like Tahoe and a well designed Tahoe app can look really nice on the platform. There are bugs, inconsistencies and other issues, but it doesn't feel that different than many previous macOS / OS X releases

You've been a Mac user since the original candy iMacs and you don't see the company design ethos slipping in the last five years?

I believe you can do regular hard edged intersections. You can see in his operator list some are listed as “smoothSubtract” and some are just “subtract”

It’s just easy to do the melding thing with SDFs so a lot of people do it


From his description of the approach I suspect its also to smooth over sharp edges that the grid optimization doesn't like so much.


The reason this happens is because big companies get their software pen tested. Part of the pen test report will include something like “accessible from jailbroken devices.”

The pen test results get put into the ticket system as immovable entries. Engineers will question them, only to be shot down by the cyber security department who organized the pen test. The engineers will eventually accept that they cannot convince cyber to drop the issue, and implement the jail break detection.

Why does cyber mandate it? Because no one in a large company wants to accept the risk, even imaginary risk. They want to be able to say, when security is breached, “we did our due diligence. Look at the report, we implemented everything in it”

Why do firms offering penetration testing keep putting junk like this into their reports? Because their automated tools list them out and they’re getting paid to find issues. The more the better.

It’s insane and entirely about passing off risk.


Depends what you see as “abusing” the system. By working from home, I can take a walk in the garden when I find it hard to think, it energises me. At my office I can (and do) take a walk in the car park, but inevitably I leave the office with a headache caused by constant noise and fluorescent lighting

At home, I can put my family first if needed. When I’m at the office and something comes up at the kids’ school that I need to deal with, it’s a mad dash to get away soon enough that I almost have to drop everything and run

The times working in the office has been good as a software engineer: when we are prototyping on physical hardware I do not have at home. That’s it

It’s great if people love to go to the office. That’s fine. It’s managers that enforce it who are the problem — the people who work for you aren’t children and if you feel like you can’t trust them to make the decision to work from home, why on earth would you trust them in your office?


You seriously think this clown cares about any of this? I don’t know a single person living comfortable life who woud speak like that, only some miserable sod living in a shoebox who hates everyone around them.


> I don’t know a single person living comfortable life who woud speak like that,

Ah, yes. I’m a clown because you live in a very curated bubble?

I notice you offered no refutations, just ad hominem.


It's fast, but it's not that fast.

My son regularly borrows my iPhone 14 Pro for shooting video, and I inevitably have to do a large AirDrop transfer to him of all his footage. We usually see about 10 GB per minute, which is really fast


We have some sort of hybrid policy. Every single time I have showed up at the office, I either end up socialising far too much and get nothing done (I find it extremely hard to work next to people without talking to them).

Or nobody is there and I end up having driven (40 minutes each way) to the office to have Teams meetings with a wonderful view of the car park, under fluorescent lights, using a cheap low-resolution office monitor. When I could have been having those Teams meetings with a view of my garden and a much nicer monitor I have invested in


> socialising far too much and get nothing done

Alternatively, you networked, built useful relationships and shared knowledge.


"Hey, Bob."

"Hey, Chip."

"Catch the game last night?"

"Yeah. What a snoozer."

"Are you ready for that quarterly meeting? Heard Tracy will connect from the conference in Toledo. Hopefully I don't get a crappy seat in the conference room."

"I'll take the meeting from the desk."

"Nice. Lunch later?"

"Yeah. Talk then."

Wow. Another day of building relationships and sharing knowledge.


It depresses me how some engineers don’t realise how important this stuff is.


Sure. I catch up with many of them on weekends anyway — we hike together, our families know each other, some live nearby etc.

Regarding knowledge sharing, that happens equally well via Slack. (Actually, I'd say a screen share works better than over-the-shouldering someone else's screen in person)


I'm in the market for this

I've been hoping for Apple to return to "thin" and it's nice that they're trying. I don't know whether I would buy this, but my current iPhone 14 Pro feels like a brick — thick stainless steel

When I go for a run, it's uncomfortable to have in a pocket depending on what running clothes I am wearing. The heaviness makes it feel far more likely to break all the times I have dropped it (and I have dropped it many times, without a case)


I really like the brevity of text-davinci-001. Attempting to read the other answers felt laborious


That's by beef with some models like Qwen, god do they talk and talk...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: