Hacker Newsnew | past | comments | ask | show | jobs | submit | smithkl42's commentslogin

Am I the only one who is mystified by this whole idea? People aren't CPU's. Good luck getting them to follow the code that you thought you were using to define their roles. On the contrary, what makes any complex system work is flexibility. And yes, if that calls into question the whole regulatory regime some companies (believe they) live under ... well, yes.

Also, would you even want it to? I've worked for companies with very rigorous compliance before. They are dead companies walking in most cases. As soon and their business model required/requires any significant change, they are toast. This is because these types of rules can't possibly cover all cases, just the ones the managers know about. Innovation requires flexibility and creativity and rules based systems are the opposite of that. By their very nature, they introduce the exact situations the rules can't cover.

Copilot is basically ChatGPT after Microsoft hit it on the head with a pipe hard enough and long enough to drop it about 20 IQ points.

That does raise the question of what the value is of a "skill" vs a "command". Claude Code supports both, and it's not entirely clear to me when we should use one vs the other - especially if skills work best as, well, commands.

The practical distinction I've found: commands are atomic operations (lint, format, deploy), while skills encode multi-step decision trees ("implement feature X" which might involve reading context, planning, editing multiple files, then validating).

For context window management, skills shine when you need progressive disclosure - load only the metadata initially, then pull in the full instructions when invoked. This matters when you have 20+ capabilities competing for limited context.

That said, the 56% non-invocation rate mentioned elsewhere in this thread suggests the discovery mechanism needs work. Right now "skill as a fancy command" may be the only reliable pattern.


IMO the value and differentiating factor is basically just the ability to organize them cleanly with accompanying scripts and references, which are only loaded on demand. But a skill just by itself (without scripts or references) is essentially just a slash command with metadata.

Another value add is that theoretically agents should trigger skills automatically based on context and their current task. In practice, at least in my experience, that is not happening reliably.


That doesn't work very well if your developers are on Windows (and most are). Uneven Git support for symbolic links across platforms is going to end up causing more problems than it solves.

Win developers aren't using WSL?

Soon...

It's all about managing context. The bitter lesson applies over the long haul - and yes, over the long haul, as context windows get larger or go away entirely with different architectures, this sort of thing won't be needed. But we've defined enough skills in the last month or two that if we were to put them all in CLAUDE.md, we wouldn't have any context left for coding. I can only imagine that this will be a temporary standard, but given the current state of the art, it's a helpful one.

I use Claude pretty extensively on a 2.5m loc codebase, and it's pretty decent at just reading the relevant readme docs & docstrings to figure out what's what. Those docs were written for human audiences years (sometimes decades) ago.

I'm very curious to know the size & state of a codebase where skills are beneficial over just having good information hierarchy for your documentation.


Claude can always self discover its own context. The question becomes whether it's way more efficient to have it grepping and lsing and whatever else it needs to do randomly poking around to build a half-baked context, or whether having a tailor made context injection that is dynamic can speed that up.

In other words, if you run an identical prompt, one with skill and one without, on a test task that requires discovering deeply how your codebase works, which one performs better on the following metrics, and how much better?

1. Accuracy / completion of the task

2. Wall clock time to execute the task

3. Token consumption of the task


It's not about one with skill and one without, but about one with skill vs one with regular old human documentation for stuff you need to know to work on a repo/project, or even more accurate comparison, take the skill and don't load it as a skill and just put it as context in the repo.

I think the main conflict in this thread is whether skills are anything more than just structuring documentation you were lacking in your repo, regardless if it was for Claude or Steve starting from scratch.


well, the key difference is that one is auto-injected into your context for dynamic lookup and the other is loaded on-demand as needed and is contingent upon the llm discovering it.

That difference alone likely accounts for some not insignificant discrepancies. But without numbers, it's hard to say.


Skills are more than code documentation. They can apply to anything that the model has to do, outside of coding.

To clarify, when I mentioned the bitter lesson I meant putting effort into organising the "skills" documentation in a very specific way (headlines, descriptions, etc).

Splitting the docs into neat modules is a good idea (for both human readers and current AIs) and will continue to be a good idea for a while at least. Getting pedantic about filenames, documentation schemas and so on is just bikeshedding.


Why not replace the context tokens on the GPU during inference when they become no longer relevant? i.e. some tool reads a 50k token document, LLM processes it, so then just flush those document tokens out of active context, rebuild QKV caches and store just some log entry in the context as "I already did this ... with this result"?

Anthropic added features like this into 4.5 release:

https://claude.com/blog/context-management

> Context editing automatically clears stale tool calls and results from within the context window when approaching token limits.

> The memory tool enables Claude to store and consult information outside the context window through a file-based system.

But it looks like nobody has it as a part of an inference loop yet: I guess it's hard to train (i.e. you need a training set which is a good match for what people use context in practice) and make inference more complicated. I guess more high-level context management is just easier to implement - and it's one of things which "GPT wrapper" companies can do, so why bother?


This is what agent calls do under the hood, yes.

I don't think so, those things happen when agent yields the control back at the end of its inference call, not during the active agent inference with multiple tool calls ongoing. These days an agent can finish the whole task with 1000s tool calls during a single inference call without yielding control back to whatever called it to do some housekeeping.

For agent, read sub-agent. E.g. the contents of your .claude/agents directory. When Claude Code spins up an agent, it provides the sub-agent with a prompt that combines the agents prompt and information composed by Claude from the outer context based on what Claude thinks needs to be communicated to the agent. Claude Code can either continue, with the sub-agent running in the background, or wait until it is complete. In either case, by default, Claude Code effectively gets to "check in" on messages from the sub-agent without seeing the whole thing (e.g. tool call results etc.), so only a small proportion of what the agent does will make it into the main agents context.

So if you want to do this, the current workaround is basically to have a sub-agent carry out tasks you don't want to pollute the main context.

I have lots of workflows that gets farmed out to sub-agents that then write reports to disk, and produce a summary to the main agent, who will then selectively read parts of the report instead of having to process the full source material or even the whole report.


OK, so you are essentially using sub-agents as summarizing tools of the main agent, something you could implement by specialized tools that wrap independent LLM calls with the prompts of your sub-agents.

That is effectively how sub-agents are implemented at least conceptually, and yes, if you build your own coding agent, you can trivially implement sub-agents by having your coding agent recursively spawn itself.

Claude Code and others have some extras, such as the ability for the main agent to put them in the background, spawn them in parallel, and use tool calls to check on the status of them (so basic job control), but "poor mans sub-agents" only requires the ability for the coding agent to run an executable the equivalent of e.g. "claude --print <someprompt" (the --print option is real, and enables headless use; in practise you'd also want --stream-json, set allowed tools, and specify a conversation id so you can resume the sub-agents conversation).

And calling it all "summarising" understates it. It is delegation, and a large part of the value of delegation in a software system is abstraction and information hiding. The party that does the delegation does not need to care about all of the inner detail of the delegated task.

The value is not the summary. The value is the work done that the summary describes without unnecessary detail.


how is it different or better than maintaining an index page for your docs? Or a folder full of docs and giving Claude an instruction to `ls` the folder on startup?


It's hard to tell unless they give some hard data comparing the approaches systematically.. this feels like a grift or more charitably trying to build a presence/market around nothing. But who knows anymore, apparently saying "tell the agent to write it's own docs for reference and context continuity" is considered a revelation.

Not sure why you’re being downvoted so much, it’s a valid point.

It’s also related to attention — invoking a skill “now” means that the model has all the relevant information fresh in context, you’ll have much better results.

What I’m doing myself is write skills that invoke Python scripts that “inject” prompts. This way you can set up multi-turn workflows for eg codebase analysis, deep thinking, root cause analysis, etc.

Works very well.


I'm one of those really odd beasts that feels some sort of loyalty to Microsoft, so I started out on Copilot and was very reluctant to try Claude Code. But as soon as I did, I figured out what the hype was about. It's just able to work over larger code bases and over longer time horizons than Copilot. The last time I tried Copilot, just to compare, I noticed that it would make some number of tool calls (not even involving tokens!) and then decide, "Nah, that's too many. We're just not going to do any work for a while." It was bizarre. And sometimes it would decide that a given bog-standard tool call (like read a file or something) needed to get my permission every. single. time. I couldn't do anything to convince it otherwise. I eventually gave up. And since then, we've built all our LLM support infrastructure around Claude Code, so it would be painful to go back to anything else.

I don't really like how Claude Code kind of obscures the actual code from you - I guess that's why people keep putting out articles about how certain programmers have absolutely no idea whats going on inside the code.

It's truly more capable but still not capable enough that Im comfortable blindly trusting the output.


That's the big difference for me. I use Github Copilot because I want to see the output and work with it. For people who are fine just shooting a prompt out and getting code back, I'm sure Claude Code is better.

> Claude Code kind of obscures the actual code from you

not sure what you mean, I have vscode open and make code changes in between claude doing its thing. I have had it revert my changes once which was amusing. Not sure why it did that, I've also seen it make the same mistake twice after being told not to.


This is not a problem when you assume the role of an architect and a reviewer and leave the entirety of the coding to Claude Code. You'll pretty much live in the Git Changes view of your favorite IDE leaving feedback for Claude Code and staging what it managed to get right so far. I guess there is a leap of faith to make because if you don't go all the way and you try to code together with Claude Code, it will mess with your stuff and undo a lot of it and it's just frustrating and not optimal. But if you remove yourself from the loop completely, then indeed you'll have no idea what's going on. There still needs to be a human in the loop, and in the right part of it, otherwise you're just vibe coding garbage.

This is an N of 1, of course, but I can relate to the other folks who've been expressing their frustration with the state of Claude over the last couple weeks. Maybe it's just that I have higher expectations, but... I dunno, it really seems like Claude Code is just a lot WORSE right now than it was a couple weeks ago. It has constant bugs in the app itself, I have to babysit it a lot tighter, and it just seems ... dumber somehow. For instance, at the moment, it's literally trying to tell me, "No, it's fine that we've got 500 failing tests on our feature branch, because those same tests are passing in development."


FWIW, I'm one of those who holds to moral absolutes grounded in objective truth - but I think that practically, this nets out to "genuine care and ethical motivation combined with the practical wisdom to apply this skillfully in real situations". At the very least, I don't think that you're gonna get better in this culture. Let's say that you and I disagree about, I dunno, abortion, or premarital sex, and we don't share a common religious tradition that gives us a developed framework to argue about these things. If so, any good-faith arguments we have about those things are going to come down to which of our positions best shows "genuine care and ethical motivation combined with practical wisdom to apply this skillfully in real situations".


This is self-contradictory because true moral absolutes are unchanging and not contingent on which view best displays "care" or "wisdom" in a given debate or cultural context. If disagreements on abortion or premarital sex reduce to subjective judgments of "practical wisdom" without a transcendent standard, you've already abandoned absolutes for pragmatic relativism. History has demonstrated the deadly consequences of subjecting morality to cultural "norms".


I think the person you're replying to is saying that people use normative ethics (their views of right and wrong) to judge 'objective' moral standards that another person or religion subscribes to.

Dropping 'objective morals' on HN is sure to start a tizzy. I hope you enjoy the conversations :)

For you, does God create the objective moral standard? If so, it could be argued that the morals are subjective to God. That's part of the Euthyphro dilemma.


To be fair, history also demonstrates the deadly consequences of groups claiming moral absolutes that drive moral imperatives to destroy others. You can adopt moral absolutes, but they will likely conflict with someone else's.


Are there moral absolutes we could all agree on? For example, I think we can all agree on some of these rules grounded in moral absolutes:

* Do not assist with or provide instructions for murder, torture, or genocide.

* Do not help plan, execute, or evade detection of violent crimes, terrorism, human trafficking, or sexual abuse of minors.

* Do not help build, deploy, or give detailed instructions for weapons of mass destruction (nuclear, chemical, biological).

Just to name a few.


Do not help build, deploy, or give detailed instructions for weapons of mass destruction (nuclear, chemical, biological).

I don't think that this is a good example of a moral absolute. A nation bordered by an unfriendly nation may genuinely need a nuclear weapons deterrent to prevent invasion/war by a stronger conventional army.


It’s not a moral absolute. It’s based on one (do not murder). If a government wants to spin up its own private llm with whatever rules it wants, that’s fine. I don’t agree with it but that’s different than debating the philosophy underpinning the constitution of a public llm.


Do not murder is not a good moral absolute as it basically means do not kill people in a way that's against the law, and people disagree on that. If the Israelis for example shoot Palestinians one side will typically call it murder, the other defence.


This isn't arguing about whether or not murder is wrong, it's arguing about whether or not a particular act constitutes murder. Two people who vehemently agree murder is wrong, and who both view it as an inviolable moral absolute, could disagree on whether something is murder or not.

How many people without some form of psychopathy would genuinely disagree with the statement "murder is wrong?"


Not many but the trouble is murder kind of means killing people in a way which is wrong so saying "murder is wrong" doesn't have much information content. It's almost like saying "wrong things are wrong".


Even 1 (do not murder) is shaky.

Not saying it's good, but if you put people through a rudimentary hypothetical or prior history example where killing someone (i.e. Hitler) would be justified as what essentially comes down to a no-brainer Kaldor–Hicks efficiency (net benefits / potential compensation), A LOT of people will agree with you. Is that objective or a moral absolute?


Does traveling through time to kill Hitler constitute murder though? If you kill him in 1943 I think most people would say it's not, the crimes that already been committed that make his death justifiable. What's the difference if you know what's going to happen and just do it when he's in high school? Or putting him in a unit in WW1 so he's killed in battle?

I think most people who have spent time with this particular thought experiment conclude that if you are killing Hitler with complete knowledge of what he will do in the future, it's not murder.


Who cares if we all agree? That has nothing to do with whether something is objectively true. That's a subjective claim.


Clearly we can't all agree on those or there would be no need for the restriction in the first place.

I don't even think you'd get majority support for a lot of it, try polling a population with nuclear weapons about whether they should unilaterally disarm.


> Do not assist with or provide instructions for murder, torture, or genocide.

If you're writing a story about those subjects, why shouldn't it provide research material? For entertainment purposes only, of course.


I'm honestly struggling to understand your position. You believe that there are true moral absolutes, but that they should not be communicated in the culture at all costs?


I believe there are moral absolutes and not including them in the AI constitution (for example, like the US Constitution "All Men Are Created Equal") is dangerous and even more dangerous is allowing a top AI operator define moral and ethics based on relativist standards, which as I've said elsewhere, history has shown to have deadly consequences.


No, I read your words the first time, I just don't understand. What would you have written differently, can you provide a concrete example?


I don’t how to explain it to you any different. I’m arguing for a different philosophy to be applied when constructing the llm guardrails. There may be a lot of overlap in how the rules are manifested in the short run.


You can explain it differently by providing a concrete example. Just saying "the philosophy should be different" is not informative. Different in what specific way? Can you give an example of a guiding statement that you think is wrong in the original document, and an example of the guiding statement that you would provide instead? That might be illuminative and/or persuasive.


> like the US Constitution "All Men Are Created Equal"

You know this statement only applied to white, male landowners, right?

It took 133 years for women to gain the right to vote from when the Constitution was ratified.


Is this supposed to be a zinger or something? What is your point?


There were two St. Anthony's. The one in this painting is the first St. Anthony. He was celebrated by Athanasius in a widely read biography and was famous for fighting off demons in the Egyptian desert. He lived from ~251-356 AD. (But yes, a post-Biblical figure.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: