> Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.
Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.
This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.
There's a benefit in having a model that can output only true/false if that's all that's acceptable, but if I was doing this myself I'd want to see how far I could get with just one model (and then the simple dev approach of running it again if it fails to produce a valid answer, or feeding it back with the error message). If it works 99% of the time you can get away with rerunning pretty cheaply.
Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.
This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.