More

conception · 2026-02-10T04:03:59 1770696239

Anthropic has been the only AI company actually caring about AI safety. Here’s a dated benchmark but it’s a trend Ive never seen disputed https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...

CuriouslyC · 2026-02-10T04:07:04 1770696424

Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.

wincy · 2026-02-10T05:41:29 1770702089

I asked ChatGPT about how shipping works at post offices and it gave a very detailed response, mentioning “gaylords” which was a term I’d never heard before, then it absolutely freaked out when I asked it to tell me more about them (apparently they’re heavy duty cardboard containers).

Then I said “I didn’t even bring it up ChatGPT, you did, just tell me what it is” and it said “okay, here’s information.” and gave a detailed response.

I guess I flagged some homophobia trigger or something?

ChatGPT absolutely WOULD NOT tell me how much plutonium I’d need to make a nice warm ever-flowing showerhead, though. Grok happily did, once I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead.

nandomrumber · 2026-02-10T08:00:49 1770710449

Wikipedia entry on the gaylord bulk box:

https://en.wikipedia.org/wiki/Bulk_box

ruszki · 2026-02-10T12:43:19 1770727399

> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead

Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.

Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.

Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.

ryanjshaw · 2026-02-10T04:31:03 1770697863

Claude was immediately willing to help me crack a TrueCrypt password on an old file I found. ChatGPT refused to because I could be a bad guy. It’s really dumb IMO.

BloondAndDoom · 2026-02-10T05:09:05 1770700145

ChatGPT refused to help me to disable windows defender permanently on my windows 11. It’s absurd at this point

nananana9 · 2026-02-10T06:41:31 1770705691

It just knows it's a waste of effort.

shepherdjerred · 2026-02-10T05:10:51 1770700251

Claude sometimes refuses to work with credentials because it’s insecure. e.g. when debugging auth in an app.

nradov · 2026-02-10T05:48:44 1770702524

That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.

mocamoca · 2026-02-10T08:17:17 1770711437

Would you mind explaining your point a view? Or point me to ressources making you think so?

nradov · 2026-02-10T13:29:30 1770730170

What can be asserted without evidence can also be dismissed without evidence. The benchmark creators haven't demonstrated that higher scores result in fewer humans dying or any meaningful outcome like that. If the LLM outputs some naughty words that's not an actual safety problem.

conception · 2026-02-10T03:15:56 1770693356

Pirates aren’t normally treated… well.

Ms-J · 2026-02-10T05:39:35 1770701975

With a small bribe they will be allowed in. No one is caring about the silly flag.

JohnMakin · 2026-02-10T07:11:30 1770707490

these are usually very poor crew trying to escape desperate circumstances. they arent in a bargaining position and are easy prey to pirates depending on the part of the world they are in. Since they’re ghost ships, anyone could basically take what they pleased without anyone knowing, not to mention whatever country is willing to take an unregistered ship to offload oil in amounts that someone annoying could notice.

it’s a bad situation for everyone but the seller that convinced the ship it could turn it around.

SanjayMehta · 2026-02-10T03:50:43 1770695443

The US Navy seems to do quite well.

doikor · 2026-02-10T10:20:08 1770718808

Under international law piracy has the for “private ends” clause when defining piracy meaning nation states can’t really do piracy but instead commerce raiding which is a form of warfere.

SanjayMehta · 2026-02-10T15:34:50 1770737690

Whatever verbiage you use doesn't change the fact that Trump is just another war criminal in a long line of war criminals, starting with Truman.

conception · 2026-02-09T21:17:54 1770671874

I would say one side is being told that they should believe it a daily nightmare, e.g. people on the right really disliking obamacare but loving the aca.

conception · 2026-02-09T13:15:10 1770642910

Claude has been adding itself into commit messages for a long time now. Spam but also useful to keep an eye on how much code people are having claude commit.

conception · 2026-02-08T18:51:41 1770576701

I feel like you’ve never ran into pop culture before. It was algorithmic before AI.

conception · 2026-02-08T18:34:50 1770575690

Then why does the Chinese government have a blacklist of topics not allowed to be talked about/recorded?

ted_bunny · 2026-02-09T01:34:58 1770600898

Yeah, I guess you're right. I play devil's advocate frequently enough that I forget I'm doing it. I just hate to see a dogpile, because the criticisms tend to go to extremes. People talk about a lot of asian cultures like they're brainwashed bugmen and those narratives serve dangerous interests.

The West is losing its edge on free speech and thought anyway. Criticize Israel in Germany, misgender someone in Canada, say a mean word to someone invading your house in the UK, and see what happens. And it's not going to get better anytime soon.

conception · 2026-02-08T00:05:35 1770509135

Lucky now you can just ask an LLM to diagnose and offer ways to fix it. Takes 99% of the trouble away.

conception · 2026-02-07T01:08:39 1770426519

It definitely has. Spotify regularly recommends AI artists.

Stuff like this - https://www.nme.com/news/music/ai-generated-country-track-wa...

https://www.cbsnews.com/news/meet-the-woman-behind-chart-top...

Etc

conception · 2026-02-06T22:43:34 1770417814

What an insane take. Are you not familiar with Bill Gates and like his entire business history? Sure he’s old and donates half of his hundred billion dollars to have half another billion dollars white wash his legacy that doesn’t mean he’s not a terrible person just because his charity has a great PR team.

conception · 2026-02-06T15:22:17 1770391337

I don’t understand. This badly done work wasn’t possible at all six months ago. In six more months it will be better. It’s not a mostly static technology for the last twenty plus years.

polyglotfacto · 2026-02-07T04:11:11 1770437471

Point is: it doesn't matter if agents can do it faster and cheaper than a team of humans: it's slop.

It's like writing a novel in a week that no one wants to read. If in six months you can do it in an hour, there is still zero value.

Agents are useful but very limited tools: I treat them a little machines that can translate high-level instructions into detailed code, but where I still need to review the output to make sure they understood what I meant; that's it. Zero autonomy; parallelism just means I can't keep up with the output and quality goes down.

I think the point of this project, like the fastrender slop thing, is to push the parallel agent narrative and have the financial markets believe this will create a lot more demand for inference on these models in the short term.

Example of someone falling for it: https://x.com/DKThomp/status/2019484169915572452