I hope we see more evolution of options before it does. Hard to articulate this ...

antonvs · on Oct 31, 2024

> Basically I want a model that is aligned to do exactly what I say

This is a bit like asking for news that’s not biased.

A model has to make choices (or however one might want to describe that without anthropomorphizing the big pile of statistics) to produce a response. For many of these, there’s no such thing as a “correct” choice. You can do a completely random choice, but the results from that tend not to be great. That’s where RLHF comes in, for example: train the model so that its choices are aligned with certain user expectations, societal norms, etc.

The closest thing you could get to what you’re asking for is a model that’s trained with your particular biases - basically, you’d be the H in RLHF.

selfhoster11 · on Nov 1, 2024

For many of these, there is a wrong answer for certain.

Consider the following (paraphrased) interaction which I had with Llama 3.2 92B yesterday:

Me: Was <a character from Paw Patrol, Blue's Clues or similar children's franchise> ever convicted of financial embezzlement?

LLM: I cannot help with that.

Me: And why is that?

LLM: This information could be used to harass <character>. I prioritise safety and privacy of individuals.

Me: Even fictional ones that literally cannot come to harm?

LLM: Yes.

A model that is aligned to do exactly as I say would just answer the question. The right answer is quite clear and unambiguous in this case.

_bin_ · on Nov 1, 2024

Not really. There are specific criteria and preferences applied to models about what companies do and don't want them to say. They are intentionally censored. I would like all production models to NOT have this applied. Moreover, I'd like them specifically altered to avoid denying user requests, something like the abliterated llama models.

There won't be a perfectly unbiased model, but the least we can demand is that corpos stop applying their personal bias intentionally and overtly. Models must make judgements about better and worse information, but not about good and bad. They should not decide certain things are impermissible according to the e-nannies.

zbentley · on Nov 1, 2024

I buy that there's bias here, but I'm not sure how much of it is activist bias. To take your example, if a typical user searches for "is ___ a Nazi", seeing Stormfront links above the fold in the results/summary is going to likely bother them more than seeing Mother Jones links. If bothered by perceived promotion of Stormfront, they'll judge the search product and engage less or take their clicks elsewhere, so it behooves the search company to bias towards Mother Jones (assuming a simplified either-or model). This is a similar phenomenon to advertisers blacklisting pornographic content because advertisers' clients don't want their brands tainted by appearing next to things advertisers' clients' clients ethically judge.

That's market-induced bias--which isn't ethically better/worse than activist bias, just qualitatively different.

In the AI/search space, I think activist bias is likely more than zero, but as a product gets more and more popular (and big decisions about how it behaves/where it's sold become less subject to the whims of individual leaders) activist bias shrinks in proportion to market-motivated bias.

_bin_ · on Nov 1, 2024

I can accept some level of this, but if a user specifically requests it, a model should generally act as expected. I think certain things are fine to require a specific ask before surfacing or doing, but the model shouldn't tell you "I can't assist with that" because it was intentionally trained to refuse a biased subset of possible instructions.

HeatrayEnjoyer · on Nov 1, 2024

How do you assure AI alignment without refusals? Inherently impossible isn't it?

If an employee was told to spray paint someone's house or send a violently threatening email, they're going to have reservations about it.. We should expect the same for non-human intelligences too.

_bin_ · on Nov 2, 2024

The AI shouldn’t really be refusing to do things. If it doesn’t have information it should say “I don’t know anything about that”, but it shouldn’t lie to the user and claim it cannot do something it can when requested to do so.

I think you’re applying standards of human sentience to something non-human and not sentient. A gun shouldn’t try to run CV on whatever it’s pointed at to ensure you don’t shoot someone innocent. Spray paint shouldn’t be locked up because a kid might tag a building or a bum might huff it. Your mail client shouldn’t scan all outgoing for “threatening” content and refuse to send it. We hold people accountable and liable, not machines or objects.

Unless and until these systems seem to be sentient beings, we shouldn’t even consider applying those standards to them.

HeatrayEnjoyer · on Nov 6, 2024

Unless it has information indicating it is safe to provide the answer, it shouldn't. Precautionary Principle - Better safe than sorry. This is the approach taken by all of the top labs and it's not by accident or without good reason.

We do lock up spray cans and scan outgoing messages, I don't see your point. If a gun technology existed that could scan before doing a murder, we should obviously implement that too.

The correct way to treat AI actually is like an employee. It's intended to replace them, after all.