Hacker Newsnew | past | comments | ask | show | jobs | submit | LaurensBER's commentslogin

I really enjoyed using Claude but the ever changing limits, weird policies (limited to Claude Code, you can't run Openclaw, etc) made switching a very easy choice.

OpenAI simply provides more value for the money at the moment.


You're totally allowed to use Claude for OpenClaw and you're totally able to use Claude Code with non-Anthropic models. You must be referring to the fact that you have to use an API key and cannot use the auth intended for Claude-only products, which AFAIK is the same at every AI company (with Google destroying whole Google accounts for offenders most recently).

OpenAI and Github Copilot do explicitly allow this, so do many of the new/Chinese providers such as Synthetic, Z.ai, etc.

Anthropic is the outlier here, obviously they can limit their subscriptions as they want but it's a major disadvantage compared to their competitors.


How can I retrieve an API key from ChatGPT to use my subscription in other tools then? This seems like it could be useful.

> OpenAI ... explicitly allow this

Explicit means it's stated in OpenAI docs somewhere, but I can't find it. Link?


https://x.com/thsottiaux/status/2009742187484065881

There's probably a better source somewhere but this is the one I had at hand.


That's weird, I switched away from ChatGPT because I mostly got superior results from Gemini and Claude.

give 5.4 a shot - its straneg but surprisingly good for once. speaking as a daily opus user.

Used codex cli (5.4) for the first time (had never used codex or gpt for coding before - was using Opus 4.5 for everything), and it seems quite good. One thing I like is it's very focused on tests. Like it will just start setting up units tests for specs without you asking (whereas Opus would never do that unless you asked)-- I like that and think it's generally good. One thing I don't like about GPT though is it pauses too much throughout tasks where the immediate plan and also the more outward plan are all extremely well defined already in agents.md, but it still pauses too much between tasks saying, next logical task is X, and I say yeah go ahead, instead of it just proceeding to the next task which Id rather it do. I suppose that is a preference that should be put in some document? (agents.md?)

well I have a running model (ha!) in my head about the frontier providers thats roughly like this:

- chatgpt is kinda autistic and must follow procedures no matter what and writes like some bland soulless but kinda correct style. great at research, horrible at creativity, slow at getting things done but at least getting there. good architect, mid builder, horrible designer/writer.

- claude is the sensitive diva that is able to really produce elegant code but has to be reminded of correctness checks and quality gates repeatedly, so it arrives at something good very fast (sometimes oneshot) but then loses time for correction loops and "those details". great overall balance, but permanent helicoptering needed or else it derails into weird loops.

- grok is the maker, super fast and on target, but doesn't think deeply as the others, its entirely goal/achievement focussed and does just enough things to get there. uniqiely it doesn't argue or self-monologue constantly about doubts or safety or ethics, but drives forward where other stuggles, and faster than others. cannot conenctrate for too long, but delivers fast. tons of quick edits? grok it is. "experimental" stuff that is not safe talking about... definitely grok.

- gemini is whatever you quickly need in your GSuite, plus looking at what others are doing and helping out with a sometimes different perspective, but beyond that worse than all the others on top.

- kimi: currently using it on the side, not bad at all so far, but also nothing distinct I crystallized in my head.


Tried using 5.4 xhigh/codex yesterday with very narrow direction to write bazel rules for something. This is a pretty boiler-plate-y task with specific requirements. All it had to do was produce a normal rule set s.t. one could write declarative statements to use them just like any other language integration. It gave back a dumpsterfire, just shoehorning specific imperative build scripts into starlark. Asked opus 4.6 and got a normal sane ruleset.

5.4 seems terrible at anything that's even somewhat out-of-distribution.


I got it to build a stereoscopic Metal raytracing renderer of a tesseract for the Vision Pro in less than half a day.

It surprisingly went at it progressively, starting with a basic CPU renderer, all the way to a basic special-purpose Metal shader. Now it’s trying its teeth at adding passthrough support. YMMV.


The limits are what did it for me. They kept boasting about Opus performance and improvements, practically begging me to try it out, and when I did, it totally obliterated my usage. I'm sure its good, but I stick to Sonnet because I've been burned bad. Never had that problem with ChatGPT, but it turns out they're just unprincipled and evil, which is a shame.

I tend to use LLMs more for research then actual coding, so I ended up going with GPT over Claude because it's chat interface just seems to work better for me. It balances out Claude being slightly better at software tasks.

Have you considered using Gemini?

Google seems to be on a hot streak with their models, and, since they're playing from behind, I'd expect favorable pricing and terms. But, I don't know anyone who is using or talking about Gemini. All the chatter seems to be Anthropic vs. OpenAI.


because gemini, despite what stats say, still produces garbage once the problem gets harder. it nails it for lab conditions, but messy reality or creativity or even code quality is a far cry from opus or the latest gpt5.4 by a long shot. and always has been. its pretty good inside the GSuite because of integrations, but standalone its near worthless compared to even grok-code-fast which doesn't think much at all (but damn it is fast). At this point google keeps throwing noodlepots with AI against every wall in reach to see what sticks, which is more kind of desperation that still works to increase wall street highscores, but not exactly a streak or breakthrough. just rapid fire shotgun launches to see if anything sticks. No one serious talks Gemini because its not even worth considering still for real things outside shiny presentations and artificial benchmarks.

Gemini schools the other two when doing code reviews.

I used to think tokens are a commodity, but it’s becoming clear that the jagged frontier is different enough even for the easiest use case of SWE that there’s room for having two if not three providers of different foundational models. It isn’t a winner takes all, they’re all winning together. Cursor isn’t properly taking advantage of the situation yet.


My experience exactly. The more "real" the problems become, the more other models become unsuitable when compared to claude, with the sole exceptions being deepseek/kimi, which while speaking strictly w.r.t metrics and basic tasks are not better, they are more interesting and handle more odd and totally out of domain stuff better than the US models. An example being code i wrote for a hypercomplex sedenion based artififial neural network broke claude so bad it start saying it is chatgpt and cant evaluate/run code. similar experience for all US models, which are characterized by being extremely brittle at the fringes, though cladue least among them. Meanwhile chinese models are less capable for cookie cutter stuff but keep swinging when things get really weird and unusual. It's like US models optimize for the lowest minima acheivable, and god help you if distribution changes. Chinese models on the otgerhand seem to optimize for the flattest minima, giving poorer quality across the board but far more robust behaviour.

I've tried. It's just not very good compared to either mentioned alternative.

I can't even use 3.1 with Gemini CLI, not sure why.

What a baffling comment. Aren’t you aware of why this exodus is happening? (It’s not related to “value for the money”!) What are your feelings on that part?

It is entirely okay to weigh the Department of War thing against other criteria when choosing a service.

Agreed, but the comment should mention it. Nobody is talking about value for money right now.

I didn't mean to advocate for Anthropic, apologies.


Whatever Anthropic might or might not do with the department of war interests me in proportion to how much I can influence this. Rounded, speaking as an European citizen, that appears to be exactly 0 to me.

ever tried living while simultaneously deciding to only patron groups that strictly morally and ethically align to your own personal beliefs?

I would love to, but a practical look at that concept seems practically impossible.

My .02c : Claude was already involved in underhanded shit I don't want a part of[0] and that generated little ethical response from Anthropic , i've had better luck as a 200/mo tier customer with ChatGPT, and I don't really think that Dario claiming that their newest LLM is conscious[1] on a market schedule is all that ethical, either.

[0]: https://en.wikipedia.org/wiki/Project_Maven [1]: https://tech.yahoo.com/ai/claude/articles/anthropic-ceo-admi...


Why paint the choice as black and white? Most people are doing the best they can morally even if they don't get it 100% right. Even living 60% in accordance with your values is better than 50%. Likewise, bucketing organizations as good or bad misses the same nuance. Choosing something that is slightly better is has positive consequences despite it not being 100% good.

not the poster, but I guess thats kinda american thinking that actually believes voting with your wallet will make any difference in this late stage crony capitalism in a post-facts world.

realistically: AI WILL get used in military and for killing autonomously, like it or not, believe it or not. I am also against that in principle but I do accept the fact my opinion just doesn't matter and practice radial acceptance or reality as-is. twitter/X is also alive and kicking, despite musk and anti-musk-hate. xAI/Grok is genuinely really good too compared to OAI/Claude, a bit different but very good. At this point all the "outcries" feel like noise I just skip on principle. But it could turn up the fire under the OAI team to go aggressive feature/pricing wise in order to retain/increase their userbase again, which is ... good, after all.


If anyone thinks Anthropic or OpenAI are the "good guys," they've already lost the plot. If you look at additional reporting on the topic, not just the Anthropic PR spin, the disagreements were much more nuanced than it was portrayed by Anthropic. They aren't exactly a reliable narrator on the topic either. In fact it actually just seems like Amodei fumbled the deal and crashed out a bit. He's already walked back his internal memo, and is reportedly still seeking a deal with the Pentagon. I don't trust either CEO, I use their products, but if you're even leaning 51-49 on who is "less evil," I think you're giving too much slack.

Then the people mad about "mass surveillance" recommend Gemini or whatever.

They're just keeping up with the outrage news cycles.


Effectively yes, this does illustrate how hard it is to effectively structure welfare at a national level.

Everyone involved would be better of with a lower (or negative) income tax instead of subsidies.


> Everyone involved would be better of with a lower (or negative) income tax instead of subsidies.

That's quite wrong. The low income earners effectively pay no income tax - after deductibles and so on - further lowering the income tax would do absolutely nothing for them.

It'd be an economic and political suicide to lower taxes during high deficits while government money is literally blown up into fine dust in various wars around the world.


Refundable tax credits are a thing the government knows how to do. If a negative income tax law was written to allow refunds to people who owe net negative taxes, the IRS could do it.

I'm sorry to read this, I was just thinking about rereading the entire saga the other day. His words and ideas will forever life in my mind.

It's amazing how quickly Anthropic is turning into the "bad" guys.

First we couldn't use our Claude subscription with anything but Claude code, then the limits seemed to change every week without any communication, then they banned a bunch of people (including some prominent names). Then they complain about the Chinese distilling using their API (which I'm partly sympathetic to but let's not pretend that Antrophic invented their training data from scratch).

Then there's this half-baked offer. I mean sure, it looks nice on paper but given how incredibly valuable opensource has been for them and given their budget it does seem a bit tight.


Very cool! I love the approach, OpenClaw is really cool but there's two major things holding me back for deploying it from friends a family;

- Cybersecurity (you can't expect a non-technical person to read a skill)

- Token usage (without a flat fee subscription it'll become expensive very fast)

I understand that security is a hard problem to solve but having a single binary + containers should definitely help! I'll definitely keep an eye on this.


Yeap. Cost is a major problem with these agents. I wonder why MistralAI is never natively supported. It’s the cheapest paid option out there.

ps. One can use mistral’s API through liteLLM.


Don’t install skills in OpenClaw or Moltis for security reasons. Self-extension or self-evolving nature means that you can customise it to create your own skills


Sure but also don't let it consume any content you didn't write or don't give it write access to anything outside its sandbox[1]. Prompt injection is a thing, and all this molt stuff is yolo for life on all things you give it access to.

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

[1]: And even then, if you allow it to make web fetches, it can smuggle your private data out.


You can run it with a ChatGPT subscription (or even a local model) so it can be flat fee


I concur, open-source will be more reputation based and no doubt, in the future, LLMs can also act as a quality gate.

I work a lot with quants (who can program but are more focused on making money than on clean-code) and Opus 4.5 and Kimi 2.5 are extremely good at giving them architecture guidance. They tend to overcomplicate some things but the result is usually miles better than what they produced without LLMs.


as we're doing anecdotes: I work with quants too

their LLM "assisted" work seems to be the roughly the same quality (i.e. bad), but now there's much more of it

not an improvement


One issue I found is that for new projects, it is much harder to market now. shownew is flooded and subreddits have turned up their spam filters high so quality projects will have trouble getting eyes without significant social activity before wanting to share. So the bar for introverts writing (not vibing) good OSS feels like it has gone up in an unfortunate way.


Given that there's very little benefits (that I'm aware off) of a "high score" HN account it does seem that there's little benefit for people to do karma farming.

I'm sure that there's still some people who try (and/or test their pet project) but it seems far less of an issue than on Reddit.


Astroturfing.


In all fairness there's enough Elon Musk spam in the media and on X.

I'm quite happy that HN seems to stay mostly free of Musk (irrespective of positive/negative posts).


There's are some interesting AI issues with Grok. How do you edit an AI so it says the opposite of the training data because you want it to agree with your ideology? Can a AI image generator which was trained on tons of porn, which Grok was since it was trained on X, ever really be considered safe? Is Grokipedia a fair replacement for Wikipedia which it was derived from?

But alas, we can't discuss any of this without being flagged by Musk fans. Someone commented on the thread I posted that the mods could unflag if they wanted to.


Plus on Reddit people often delete their heavily down-voted comments (I dont know if this actually matters for a users karma but people sure do think so) which makes it impossible to follow the entire discussion and see what was actually going on.

Nothing as frustrating as finding what seems to be an interesting and engaging discussion but then finding several key posts missing...


Missing comments in chain is almost always someone using a privacy scrubber

Deletion doesn’t remove negative karma no. Just prevents more. Sorta


It's hard to accurately measure but one advantage that the multi-agent approach has seems to be speed. I routinely see Sisyphus launching up to 4 sub agents to read/analyse a file and/or to do things in parallel.

The quality of the output depends more on the underlying LLM. GLM 4.7 isn't going to beat Opus but Opus with an orchestra seems to be faster and perhaps marginally better than with a more linear approach.

Ofcourse this burns a lot of tokens but with a cheap subscription like z. ai or with a corporate budget does it really matter?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: