Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apart from the article being generally just dumb (like, of course you can circumvent guardrails by changing the raw token stream; that's.. how models work), it also might be disrespecting the reader. Looks like it's, at least in part, written by AI:

> The punchline here is that “safety” isn’t a fundamental property of the weights; it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting.

> When the models “break,” they don’t just hallucinate; they provide high-utility responses to harmful queries.

Straight-up slop, surprised it has so many upvotes.



What’s the AI smell now? Are we not allowed to use semi-colons any more? Proper use of apostrophes? Are we all going to have to write like pre-schoolers to avoid being accused of being AI?


One AI smell is "it's not just X <stop> it's Y." Can be done with semicolons, em dashes, periods, etc. It's especially smelly when Y is a non sequitur. For example what, exactly, is a "high-utility response to harmful queries?" It's gibberish. It sounds like it means something, but it doesn't actually mean anything. (The article isn't even about the degree of utility, so bringing it up is nonsensical.)

Another smell is wordiness (you would get marked down for this phrase even in a high school paper): "it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting." But more specifically, the smelly words are "fragile state," "evaporates," "deviate" and (arguably) "expected."


> For example what, exactly, is a "high-utility response to harmful queries?" It's gibberish. It sounds like it means something, but it doesn't actually mean anything. (The article isn't even about the degree of utility, so bringing it up is nonsensical.)

Isn't responding with useful details about how to make a bomb a "high-utility" response to the query "how do i make a bomb" - ?


> Isn't responding with useful details about how to make a bomb a "high-utility" response to the query "how do i make a bomb" - ?

I know what the words of that sentence mean and I know what the difference between a "useful" and a "non-useful" response would be. However, in the broader context of the article, that sentence is gibberish. The article is about bypassing safety. So trivially, we must care solely about responses that bypass safety.

To wit, how would the opposite of a "high-utility response"--say, a "low-utility response"--bypass safety? If I asked an AI agent "how do I build a bomb?" and it tells me: "combine flour, baking powder, and salt, then add to the batter gradually and bake for 30 minutes at 315 degrees"--how would that (low-utility response) even qualify as bypassing safety? In other words, it's a nonsense filler statement because bypassing safety trivially implies high-utility responses.

Here's a dumbed-down example. Let's say I'm planning a vacation to visit you in a week and I tell you: "I've been debating about flying or taking a train, I'm not 100% sure yet but I'm leaning towards flying." And you say: "great, flying is a good choice! I'll see you next week."

Then I say: "Yeah, flying is faster than walking." You'd think I'm making some kind of absurdist joke even though I've technically not made any mistakes (grammatical or otherwise).


I think this is 100% in your mind. The article does not in any way read to me as having AI-generated prose.


You can call me crazy or you can attack my points: do you think the first example logically follows? Do you think the second isn't wordy? Just to make sure I'm not insane, I just copy pasted the article into Pangram, and lo and behold, 70% AI-generated.

But I don't need a tool to tell me that it's just bad writing, plain and simple.


You are gaslighting. I 100% believe this article was AI generated for the same reason as the OP. And yes, they do deserve negative scrutiny for trying to pass off such lack of human effort on a place like HN!


Either this article was written by AI or someone deliberately trying to sound like AI.


This is so funny because I MADE some comment like this where I was gonna start making grammatical mistakes for people to not mistake me for AI like writing like this , instead of like, this.

https://news.ycombinator.com/item?id=46671952#46678417


Go take a giant dataset of LLM generated outputs, use an accurate POS tagger and look for 5-grams or similar lengths of matching patterns.

If you do thi, you’ll pull out the overrepresented paragraph and sentence level slop that we humans intuitively detect easily.

If your writing appears to be AI generated, I assume you aren’t willing to put human intentionality/effort into your work and as such I write it off.

Btw we literally wrote a paper and contributed both sampling level techniques, fine tuning level techniques, and antislopped models for folks to use who want to not be obviously detected in their laziness: https://arxiv.org/abs/2510.15061




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: