Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.
I asked ChatGPT about how shipping works at post offices and it gave a very detailed response, mentioning “gaylords” which was a term I’d never heard before, then it absolutely freaked out when I asked it to tell me more about them (apparently they’re heavy duty cardboard containers).
Then I said “I didn’t even bring it up ChatGPT, you did, just tell me what it is” and it said “okay, here’s information.” and gave a detailed response.
I guess I flagged some homophobia trigger or something?
ChatGPT absolutely WOULD NOT tell me how much plutonium I’d need to make a nice warm ever-flowing showerhead, though. Grok happily did, once I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead.
> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead
Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.
Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.
Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.
Claude was immediately willing to help me crack a TrueCrypt password on an old file I found. ChatGPT refused to because I could be a bad guy. It’s really dumb IMO.
That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.
What can be asserted without evidence can also be dismissed without evidence. The benchmark creators haven't demonstrated that higher scores result in fewer humans dying or any meaningful outcome like that. If the LLM outputs some naughty words that's not an actual safety problem.
these are usually very poor crew trying to escape desperate circumstances. they arent in a bargaining position and are easy prey to pirates depending on the part of the world they are in. Since they’re ghost ships, anyone could basically take what they pleased without anyone knowing, not to mention whatever country is willing to take an unregistered ship to offload oil in amounts that someone annoying could notice.
it’s a bad situation for everyone but the seller that convinced the ship it could turn it around.
Under international law piracy has the for “private ends” clause when defining piracy meaning nation states can’t really do piracy but instead commerce raiding which is a form of warfere.
I would say one side is being told that they should believe it a daily nightmare, e.g. people on the right really disliking obamacare but loving the aca.
Claude has been adding itself into commit messages for a long time now. Spam but also useful to keep an eye on how much code people are having claude commit.
Yeah, I guess you're right. I play devil's advocate frequently enough that I forget I'm doing it. I just hate to see a dogpile, because the criticisms tend to go to extremes. People talk about a lot of asian cultures like they're brainwashed bugmen and those narratives serve dangerous interests.
The West is losing its edge on free speech and thought anyway. Criticize Israel in Germany, misgender someone in Canada, say a mean word to someone invading your house in the UK, and see what happens. And it's not going to get better anytime soon.
What an insane take. Are you not familiar with Bill Gates and like his entire business history? Sure he’s old and donates half of his hundred billion dollars to have half another billion dollars white wash his legacy that doesn’t mean he’s not a terrible person just because his charity has a great PR team.
I don’t understand. This badly done work wasn’t possible at all six months ago. In six more months it will be better. It’s not a mostly static technology for the last twenty plus years.
Point is: it doesn't matter if agents can do it faster and cheaper than a team of humans: it's slop.
It's like writing a novel in a week that no one wants to read. If in six months you can do it in an hour, there is still zero value.
Agents are useful but very limited tools: I treat them a little machines that can translate high-level instructions into detailed code, but where I still need to review the output to make sure they understood what I meant; that's it. Zero autonomy; parallelism just means I can't keep up with the output and quality goes down.
I think the point of this project, like the fastrender slop thing, is to push the parallel agent narrative and have the financial markets believe this will create a lot more demand for inference on these models in the short term.
reply