Ironically, one of its few comparisons was FlashFill / Excel.. which had been going the way of NNs and now LLMs for years now. The paper largely skips measured and non-superficial comparisons, so I wouldn't draw comparative conclusions based on their writeup.
RE Coq/Z3 vs GPUs, I think about them as different kinds of things, and thus wouldn't compare directly. A solver like z3 encapsulates search tricks for tasks where brute force approaches would effectively run forever. Their tricks are symptomatically necessary -- effective GPU acceleration requires finding a clean data parallel formulation, and if a more naive algorithm is picked for that GPU friendliness, but at the expense of inflating the workload asymptotically, the GPU implementation would have high utilization, and high speedups vs a CPU version... but would still be (much) worse than Z3. Instead of comparing them as inherently different approaches, the question is if the practically relevant core of Z3 can be GPU accelerated while preserving its overall optimization strategies.
Which takes us back to the broader point of this thread of GPUs for synthesis... LLMs are successfully GPU accelerating program synthesis of more than just regex. It's amazing that half of github code now uses them! Closer to this paper, folks are making inroads on GPU accelerating SMT solvers that the more powerful (classical) program synthesizers use, e.g., https://github.com/muhos/ParaFROST .
That's a great yet narrow request, which is great for POPL and less for PLDI. As a reference, I believe starcoder does worse than GPT4 and has taken time to publish results on program synthesis that subsumes regex. The flashfill team has academic PL roots and would also be good to compare to, as I mentioned.
With prompt engineering being so easy, and naive search/refinement over them so easy too, not comparing via benchmarks, vs brush-off related work section, to these general solvers seems strange.
Calls to Coq and Z3 will be very slow and not competitive with GPU compute.