The way I would approach writing specs and requirements as code would be to write a set of unit-tests against a set of abstract classes used as arguments of such unit-tests. Then let someone else maybe AI write the implementation as a set of concrete classes and then verify that those unit-tests pass.
I'm not sure how well that would work in practice, nor why such an approach is not used more often than it is. But yes the point is that then some humans would have to write such tests as code to pass to the AI to implement. So we would still need human coders to write those unit-tests/specs. Only humans can tell AI what humans want it to do.
The problem is that a sufficient black box description of a system is way more elaborate then the white box description of the system or even a rigorous description of all acceptable white boxes (a proof). Unit tests contain enough information to distinguish an almost correct system from a more correct one, but there is way more information needed to even arrive at the almost correct system. Also even the knowledge which traits likely separate an almost correct one from the correct one likely requires a lot of white box knowledge.
Unit tests are the correct tool, because going from an almost correct one to a correct one is hard, because it implies the failure rate to be zero and the lower you go the harder it is to reduce the failure rate any further. But when your constraint is not infinitesimal small failure rate, but reaching expressiveness fast, then a naive implementation or a mathematical model are a much denser representation of the information, and thus easier to generate. In practical terms, it is much easier to encode the slightly incorrect preconception you have in your mind, then try to enumerate all the cases in which a statistically generated system might deviate from the preconception you already had in your head.
“write a set of unit-tests against a set of abstract classes used as arguments of such unit-tests.”
An exhaustive set of use cases to confirm vibe AI generated apps would be an app by itself. Experienced developers know what subsets of tests are critical, avoiding much work.
I agree (?) that using AI vibe-coding can be a good way to prooduce a prototype for stakeholders to see if the AI-output is actually something they want.
The problem I see is how to evolve such a prototype to more correct specs, or changed specs in the future, because AI output is non-deterministic -- and "vibes" are ambiguous.
Giving AI more specs or modified specs means it will have to re-interpret the specs and since its output is non-deterministic it can re-interpret viby specs differently and thus diverge in a new direction.
Using unit-tests as (at least part of) the spec would be a way to keep the specs stable and unambiguous. If AI is re-interpreting the viby ambiguous specs, then the specs are unstable which measn the final output has hard-time converging to a stable state.
I've asked this before, not knowing much about AI-sw-development, whether there is an LLM that given a set of unit-tests, will generate an implementation that passes those unit-tests? And is such practice used commonly in the community, and if not why not?
> Experienced developers know what subsets of tests are critical, avoiding much work.
And, they do know this for the programs written by other experienced developers, because they know where to expect "linearity" and were to expect steps in the output function. (Testing 0, 1, 127, 128, 255, is important, 89 and 90 likely not, unless that's part of the domain knowledge) This is not necessarily correct for statistically derived algorithm descriptions.
https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...