Note that gpt-5 in a standard scaffold (Codex) lost to almost everyone, while in the ARTEMIS scaffold, it won. The key isn't the model itself, but the Triage Module and Sub-agents. Splitting roles into "Supervisor" (manager) and "Worker" (executor) with intermediate validation is the only viable pattern for complex tasks. This is a blueprint for any AI agent, not just in cybersec
If you can do it by splitting roles explicitly, you can fold it into a unified model too. So "scaffolding advantage" might be a thing now, but I don't expect it to stay that way.
Is this true? I mean it’s true for any specific workflow, but I am not clear it’s true for all workflows - the power set of all workflows exceeds any single architecture, in my mind.
Think of it in an end-to-end way: produce a ton of examples of final results of supervisor-worker agentic outputs and then train a model to predict those from the original user prompts straight away.
It's not true for all workflows. But many of today's custom workflows are like the magic "let's think step by step" prompt for the early LLMs. Low-hanging fruits, set to become redundant as better agentic capabilities are folded into the LLMs themselves.