The worst part of this is if they do remove Claude, and probably GPT, and Gemini soon after because of outcry we are going to be left with our military using fucking Grok as their model, a model that not even on par with open source Chinese models.
I think the warfighters are a distraction, a system could trivially say that there is a human in the loop for LLM-derived kill lists. My money is that the mass domestic surveillance is the true sticking point, because it’s exactly what you would use a LLM for today.
> Are you Chinese? If not, I think you should prefer the people defending you to have the best tools to do so.
They already have the best and most expensive toys in the world, and they mostly seem to be waging aggressive wars with them. Perhaps if the toys weren't so shiny and didn't make it all so one-sided, they wouldn't?
This of course raises the question on whether as an American I have more to fear from the Chinese government or the US one.. given everything happening in the Executive Branch here, that’s a disappointingly hard question to answer.
I think that's an easy question to answer, but obviously you don't fear the Chinese government you're not a Chinese citizen. You can actively talk about your disagreements with the US government, that not a right the Chinese have.
Can you? By ICE agents' own admission on video, they have been adding people to "domestic terrorist" watchlists (just for verbally dissenting, making recordings with a phone, etc) which are then used by Palantir to disappear people directly from their homes - even US citizens. Palantir, the CEO of which gleefully admits to knowing many Nazis and seems to get off on the fact that his software "kills people" (direct quote).
It shouldn't be. The US government is already sending armed and masked thugs to shoot political dissidents dead or sending them to concentration camps, threatening state governments and private companies to comply with suppressing free speech and oppressing undesirables, and openly discussing using emergency powers to suspend the next election.
What exactly is the commensurate threat from China? The real tacit threat, not abstract fears like "TikTok is Chinese mind control." What can China actually do to you, an American, that the US isn't already more capable of doing, and more likely to do?
To me it isn't even a question. Even comparing worst case scenarios - open war with China versus civil war within the US - the latter is more of a threat to citizens of the US than the former unless the nukes drop. And even then, the only nation to ever use nuclear weapons in warfare is the US.
This is the correct take. It may be a different question for people living within China, but for Americans, the US Gov is a direct threat to their lives.
If the American military was focused on defending the United States, it would be a very different beast. The 21st Century American military is a tool for transferring wealth from the public to influential parties, and for inflicting destruction on non-peer nations who pose obstacles to influential parties interests. Defending the United States against various often-invoked hobgoblins is at best a very distant concern, closer to pure lip service than reality.
The Department of War under Trump has proven itself to not be interested in defending you, the American people. All they’ve done so far is aggression against foreign supposed adversaries.
I'm a natural-born American (many generations back) and firmly believe that if we ever get into a hot war with China, it will be because of American provocation, not Chinese.
I am American born and raised and I consider our current government mass murderers who I trust as much as I would have the Nazis. It was a good thing that the Nazis did not get the a-bomb before us, and the same principle applies here. The fewer magnifiers of their power the better. They are a scourge on human rights, and the world.
Yeah, Vercel should have done this with NextJS a while ago. There is a reason why quite literally every other framework uses Vite because it amazing, easy to use, and easy to extend.
It's greed, now that they have all the data and infrastructure they are pulling up the ladder.
Why do you think not a single one of these labs have released an open source models distilled on their own SOTA model?
They are all preaching they want to provide AI to everyone, wouldn't this be the best way to do this? Use your SOTA model to produce a lesser but open source model?
MiniMax, DeepSeek, and Moonshot are all releasing models for the public to use for free.
Anthropic, OpenAI, Google ect have been scraping information to train their models that they had no right in scraping yet when these company pay them to scrap data we are suppose to be worried?
Labs like Anthropic always preach we are trying to build AI for everyone while releasing expensive models that are closed source.
The only reason AI is affordable at all is because of these Chinese AI labs.
Also - how can this be prevented? the AI labs can't seriously expect that each lab will filter LLM generated content from their training sets based on the source model. Leakage of AI behavior into public datasets is inevitable.
Turn the lens the other way around. By publicly posting that these models violate IP and anyone can run them, they are painting a specific political picture here…
Anthropic have been the loudest in pushing for regulatory capture, often citing "muh security" as FUD. People should care what they write on this topic, because they're not writing for us, they're writing for "the regulators". Member when the usgov placed a dude in solitary confinement because they thought he could launch nukes with a whistle? Yeah... Let's hope they don't do some cray cray stuff with open LLMs.
Anthropic make amazing coding models, kudos for that. But they should be mocked for any communication like the one linked. Boo-hoo. Deal with it, or don't, I don't care. No one will feel for you. What goes around, comes around. Etc.
Administratively, Anthropic seems to misunderstand politics. You don't get to wear the "people's champion" and "government sweetheart" hats at the same time, when push comes to shove you'll be forced to pick a lane. We saw it with Microsoft, we saw it with Apple and Google, and now we're seeing it with OpenAI too. You can't drive down both paths at the same time.
As a member of the target audience for Claude, their messaging just leaves me confused. Are you a renegade success, or do you need the government's help? Are you a populist juggernaut, or do you hide from competition? OpenAI, for all their myriad issues, understood this from the start and stuck to the blithely profitable federal ass-kisser route.
Go free stuff! But... no one is running 400B models on their computers.
You are just giving them data instead. Its not like China is known to protect IP. Your data is going to be used against you, and we cant use western laws to keep it safe.
But I like the option to give my data to a rando rather than one of the big 5 US companies that can get sued. At least the rando probably has no idea what to do with 10M of my customer's IP.
Actually... thank you for the links. Unironically.
By the way, I'm running 400B model on my computer with 72GB VRAM: Qwen3.5-397B-A17B-GGUF/UD-Q4_K_XL getting 13 t/s. Subjectively, I feel it's runs at the level of Anthropic Claude, just slower.
If you care about improvement of models, you would support the US labs here.
It costs hundreds of millions of dollars to train a frontier model. It's not just "scraping the web."
Distillation allows labs to replicate these results at 1/100th of the cost. This creates a prisoner's dilenmma which incentivizes labs to withhold their models from the public.
How much did it cost to produce all the data on the internet and every book ever published? Surely even the most conservative calculations put it at multiple years of planetary GDP. The same argument can be made to say that letting the big labs get away with pirating it will disincentivize people to publish anything.
I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.
> I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.
I don't know about your works so pardon me but thinking about it, would a better solution be for gated communities at the very least, say matrix or xmpp or irc be better?
I suppose that scraping bots of matrix would be quite hard for AI companies to setup? but anyone interested in reading your contents can still find the data if they are interested plus you get the additional benefit of a community/like-minded people.
Not only publishing, it has already disincentivised a huge part of what made Web 2.0: public APIs for data access to platforms.
It was amazing to be able to create some toy projects using data from big platforms, now they're all afraid LLM trainers will scrape their contents and create a competitor to their moat, the data.
This reads a bit like over-moralizing to me. US labs will continue improving their models because they have to make money in a competitive market. Chinese distillations have arguably improved the status-quo, with Qwen and R1 forcing GPT-OSS to be released to the public. American businesses are competing, and American customers are getting better products because of the competitive pressure on them.
Your purported "prisoner's dilemma" hasn't happened yet to my knowledge, instead we seem to see the opposite. The high-speed development velocity has forced US labs to release more often with less nebulous results. Supporting either side will contribute to healthier competition in the long run.
> incentivizes labs to withhold their models from the public.
Does it really? How would they get revenue if they withhold their models? And doesn't economics generally say that if it's easier for your competitor to catch up, you have a higher incentive to maintain your lead?
I think that the bigger conversation to be had here is about the environmental damage - if by using distillation we can really train new models at 1% of the cost in energy, it is ethically imperative that we do this.
Seems like they actually fixed some of the problems with the model. Hallucinations rate seems to be much better. Seems like they also tuned the reasoning maybe that were they got most of the improvements from.
The hallucination rate with the Gemini family has always been my problem with them. Over the last year they’ve made a lot of progress catching the Gemini models up to/near the frontier in general capability and intelligence, but they still felt very late 2024 in terms of hallucination rate.
Which made the Gemini models untrustworthy for anything remotely serious, at least in my eyes. If they’ve fixed this or at least significantly improved, that would be a big deal.
Don't let the benchmarks fool you. Gemini models are completely useless not matter how smart they are. Google still hasn't figure out tool calling and making the model follow instructions. They seem to only care about benchmarking and being the most intelligent model on paper. This has been a problem of Gemini since 1.0 and they still haven't fixed it.
Because it's Google they can't build products and they only care about benchmarking.
The product they released so far are all half assed experiments.
Gemini 3 Pro is now being beaten by open source models because they can't fix or don't want to fix the problems with the Gemini models being completely useless.
The same for Microsoft.
Microsoft had GitHub Copilot, and Microsoft Copilot and both of them are useless to Claude Code and Claude Cowork.
You can have all the money in the world, but nothing is stopping you from building useless garbage.
I guess it depends on your spending. GPT-5.2 and -5.3-Codex are certainly much cheaper: you get much more from the same $20 sub. When I was using Claude as primary I would daily hit limits and have to wait vs on GPT with more usage it only happened to me once in a few months when I was vibecoding non-stop for a week or two to port my personal Windows tools to Linux with multiple other projects being worked on in parallel.
Anecdotally GPT was also smarter than Claude which prompted my move from Claude in the first place: Gemini and Claude back in October failed to get their own harness PID.
I'm glad more people are catching onto lightweight CLI tools and using skills to give llms more tools. It's way better than MCP. I been doing this for awhile now and it's just the best way to gets LLMs to do things with APIs built for humans.
From my perspective, this also marks the shift towards agent skills instead of MCP, since agent skills rely on CLI tools. To me, this is also better than MCP since third-party developers can easily reuse existing APIs and libraries instead of needing official MCP support.
same! I personally released a couple of CLIs (written using Claude Code) which I regularly use for my work: logbasset (to access Scalyr logs) and sentire (to access Sentry issues). I never use them manually, I wrote them to be used well by LLMs. I think they are lighter compared to an MCP.
They plan to use to for "Code Mode" which mean the LLM will use this to run Python code that it writes to run tools instead of having to load the tools up front into the LLM context window.
The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.
With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.
I like your effort. Time savings and strict security are real and important. In modern orchestration flows, however, a subagent handles the extra processing of tool results, so the context of the main agent is not poluted.
This hasn't been my experience either. I personally find the max plan is very generous for day-to-day usage. And I don't even use compact manually.
However, when I tried out the SuperPower skill and had multiple agents working on several projects at the same time, it did hit the 5-hour usage limit. But SuperPower hasn't been very useful for me and wastes a lot of tokens. When you want to trade longer running time for high token consumption, you only get a marginal increase in performance.
So people, if you are finding yourself using up tokens too quickly, you probably want to check your skills or MCPs etc.
As a regular user, I hit these walls so often. I am experimenting with local model and open code. I am hoping to see some good results with qwen3 coder
It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two. Meanwhile, I've had ample usage testing out Codex on the basic $20 ChatGPT Plus plan without a problem.
As for Anthropic's $100 Max subscription, it's almost always better to start new sessions for tasks since a long conversation will burn your 5-hour usage limit with just a few prompts (assuming they read many files). It's also best to start planning first with Claude, providing line numbers and exact file paths prior, and drilling down the requirements before you start any implementation.
> It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two.
I genuinely have no idea what people mean when I read this kind of thing. Are you abusing the word "prompt" to mean "conversation"? Or are you providing a huge prompt that is meant to spawn 10 subagents and write multiple new full-stack features in one go?
For most users, the $20 Pro subscription, when used with Opus, does not hit the 5-hour limit on "a single prompt or two", i.e. 1-2 user messages.
Today I literally gave Claude a single prompt, asking it to make a plan to implement a relatively simple feature that spanned a couple
different codebases. It churned for a long time, I asked a couple very simple
follow up questions, and then I was out of tokens. I do not consider myself to be any kind of power user at all.
The only time I've ever seen this happen is when you give it a massive codebase, without any meaningful CLAUDE.md to help make sense of it and no explicitly @ mentioning of files/folders to guide, and then ask it for something with huge cross-cutting.
> spanned a couple different codebases
There you go.
If you're looking to prevent this issue I really recommend you set up a number of AGENTS.md files, at least top-level and potentially nested ones for huge, sprawling subfolders. As well as @ mentioning the most relevant 2-3 things, even if it's folder level rather than file.
Not just for Claude, it greatly increases speed and reduces context rot for any model if they have to search less and more quickly understand where things live and how they work together.
I have a tool that scans all code files in a repo and prints the symbols (AST based), it makes orienting around easy, it can be scoped to a file or folder.
I am on $100 max subscription, and I rarely hit the limit, I used to but not anymore, but then again, I stopped building two products at the same time and concentrate to finish up the first/"easiest" one.
> you'll easily burn your token rate on a single prompt or two
My experience has been that I can usually work for a few hours before hitting a rate limit on the $20 subscription. My work time does not frequently overlap with core business hours in PDT, however. I wonder whether there is an aspect of this that is based on real-time dynamic usage.
When I used it before Christmas (free trial), it very visibly paused for a bit every so often, telling me that it was compressing/summarising its too-full context window.
I forget the exact phrasing, but it was impossible to miss unless you'd put everything in the equivalent of a Ralph loop and gone AFK or put the terminal in the background for extended periods.
However I run like 3x concurrent sessions that do multiple compacts throughout, for like 8hrs/day, and I go through a 20x subscription in about 1/2 week. So I'm extremely skeptical of these negative claims.
Edit: However I stay on top of my prompting efficiency, maybe doing some incredibly wasteful task is... wasteful?
reply