There's a benefit in having a model that can output only true/false if that's all that's acceptable, but if I was doing this myself I'd want to see how far I could get with just one model (and then the simple dev approach of running it again if it fails to produce a valid answer, or feeding it back with the error message). If it works 99% of the time you can get away with rerunning pretty cheaply.