Human-Oriented Automatic Theorem Proving

pfdietz · on Dec 29, 2022

What I would want to see is a project for automatically converting the math literature to machine verifiable proofs. It would have to do NL to formal notation translation, as well as filling in the gaps that are left in any math paper.

This is a difficult problem, but if solved it would enable every trick used in every published proof to be internalized into the theorem proving tooling.

zozbot234 · on Dec 29, 2022

Something more likely to be achievable is (semi-automatically) converting the literature to formal proof sketches, leaving some steps unfilled. Then the option of whether to accept such proofs as they are or try to make them totally rigorous can just be addressed on a case-by-case basis.

Unfortunately, modern proof assistants like Lean and Coq make this a bit harder than it could be because they're based on procedural proof scripts or complete proof terms, and a proof sketch is closer to a partial declarative proof. Since declarative proofs write down every intermediate goal, the notion of skipping some steps arises very naturally.

hackandthink · on Dec 29, 2022

I wondered what this should be about.

After all Interactive Theorem Proving is well established and there are great success stories (e.g. Lean Mathlib).

Who needs automatic theorem proving that mimics interactive theorem proving?

After reading the blog posts on the site:

That endeavour is really about understanding what mathematicians do and replace them with an AI.

YeGoblynQueenne · on Dec 29, 2022

I'm familiar with a different kind of automated theorem proving than the one the article is about, resolution-based theorem proving. I am still very uncertain about the differences between resolution-based theorem proving and the kind of theorem proving in proof assistants, so there is much I don't understand from the "About" page and others in that site (and from previous readings about proof assistants).

For example, this bit from https://wtgowers.github.io/human-style-atp/motivatedproofs.h...:

>> One potential approach to making precise the notion of what I shall call here a rabbit-free proof is to define a set of "move types" -- that is, methods of transforming a problem -- and defining a motivated proof to be one that can be generated using these move types. The idea is then to design the move types in such a way as to make it impossible to generate rabbits.

I think I misunderstand what are "move types" (is that the same as "proof steps" in other contexts? I don't understand those, either).

If I think about "transformations of a problem", with my mind stuck on resolution, all I can think of is transforming an initial formula to a new formula during a proof, by the application of one or more inference rules. But, in resolution theorem-proving, this kind of transformation can be achieved by successive applications of a single inference rule, the resolution rule. Briefly: given two clauses p ∨ q and ¬p ∨ r, resolution eliminates the contradiction p ∧ ¬p and derives the new clause q ∨ r, then continues until only the empty clause remains at which point the only logical consequence of the original theorem is a contradiction, allowing a proof by refutation. SLD-resolution, restricted to Horn goals and definite clauses is sound and refutation-complete and semi-decidable which is pretty much the best that can be done within a formal system, but the important thing is that a single inference rule, resolution, is doing all this work so the proof is straightforward to automate: even a dumb computer can just spam resolution steps until there are no more contradictions to eliminate. So I don't understand why choosing "move types" is an open problem, and why resolution can't be used to solve it. This is probably a result of my confusion and misunderstanding of what the other kind of automated theorem proving is all about in the first place.

I'm not trying to be smug. I really don't know what this is all about. Can anyone please explain? I'm also probably missing a lot of the necessary background on this.

zozbot234 · on Dec 29, 2022

Resolution proof does a lot of backtracking, and ends up with "unintuitive" proofs as a result. One thing that's clear enough about this project is that it's trying to avoid general search as much as practically possible.

YeGoblynQueenne · on Dec 29, 2022

Resolution does a lot of backtracking, but it can be ignored if what is required is understanding how the proof proceeded. When the proof terminates (well, if it does :) you can look at the completed proof tree and find a justification of the proof without having to follow the proof while it backtracks in real time.

What is "general search" and why should it be avoided? I think maybe the problem is that the mechanical application of a single transformation is not justification enough for its own outcome because it's ... too simple? Or too abstract? Like defining all of arithmetic by starting from a counting axiom? Do I misunderstand?

zozbot234 · on Dec 29, 2022

> When the proof terminates (well, if it does :) you can look at the completed proof tree and find a justification of the proof

This is of course correct as far as it goes, but it's not what this project means by "motivation" of a proof. Their concept of "motivation" is a lot closer to "what would your reasoning process be if you were trying to derive the proof yourself". In particular, this requires avoiding needless brute-force searches. (It's probably the case that some statements can only be proven via brute-force, but if so they're of course out of scope for a project aiming for "motivated" proofs!)

The real open question AFAICT is whether the informal examples given in this project of a "motivated proof" are actually also motivated in a formal sense. It may be that statements that can be formally shown to be "easy" to derive are limited to things we would already regard as so trivial as to be totally uninteresting.

(For instance, statements of syllogistic logic are known to be very easy to prove; i.e. the computational complexity of finding such proofs is quite low, so they can be usefully applied even as one's knowledge base of "background" facts grows very large. But such proofs are also not very interesting!)

aebtebeten · on Dec 29, 2022

OK, just as a check to see if I'm following this thread: we have a theorem, wlog A implies B, and we'd like a proof.

A motivated proof would say A implies foo, and foo implies bar, and bar implies B, where foo and bar are small expressions using concepts common to the theory we're working with.

A searched proof would say A implies broguefleagle (so broguefleagles exist) and broguefleagle implies B, where broguefleagle is some odd expression that doesn't seem to occur anywhere else within the practise of this theory (a "rabbit out of a hat")?

What have I gotten wrong?

zozbot234 · on Dec 29, 2022

This is what the project seems to give as their preferred examples, yes. (Though they also seem to think of this in terms of allowed "moves", which I think are heuristically-picked rules of inference? So they're sort of playing with something that's almost but not quite entirely unlike the usual notion of a restricted logic, chosen for computational tractability.) Of course my own view is that the more useful notion, in a general sense, is simply "a proof that's easy enough to find, not just check after the fact", which might not be exactly the same.

YeGoblynQueenne · on Dec 29, 2022

>> This is of course correct as far as it goes, but it's not what this project means by "motivation" of a proof. Their concept of "motivation" is a lot closer to "what would your reasoning process be if you were trying to derive the proof yourself". In particular, this requires avoiding needless brute-force searches. (It's probably the case that some statements can only be proven via brute-force, but if so they're of course out of scope for a project aiming for "motivated" proofs!)

Hm, I don't know if I would characterise the search in SLD-Resolution (or anyway its implementation in Prolog) as a "brute force" search. To clarify, search in SLD-Resolution is used to find clauses whose heads unify with a goal and not to find proofs, or parts of proofs. It is also naturally implemented as a depth-first search because that happens to match well with the selection rule (basically, "always go left"). But Resolution in effect comes with its own "pruning" in that as the proof proceeds the resolution closure gets smaller and smaller, and so does the search space. There is also SLG-Resolution that basically switches semantics to breadth-first search and avoids left-recursions.

So the search is not brute force and dumb, like in searching a game tree. I'd say brute force in the context of finding a proof would be more along the lines of "here's a set of possible proofs, find one that works". But maybe you mean the backtracking, when you're talking about "brute force"?

Anyway I think I understand what you mean about human-like proofs, essentially. I think that's a completely different goal and motivation than actually finding actual proofs. In a sense, if I can draw an analogy to gaming again, it would be like trying to find a theory of chess, or go, that would work as a closed-form solution, instead of having to minimax and MCTS all the game trees. It's a more classic AI motivation: how do humans think (in this case, how do they prove things?) and why?

That's a great kind of motivation, if I'm right, and I fully understand its value. On the other hand, I think there's also great value in being able to mechanise the "boring" parts of a proof -and they are boring, if they can be automated away! - and letting the human brain do what it does best, which is express its ideas in a formal language. Resolution theorem-proving is great for this.

Thanks for the explanation. I think I get it a bit more now :)

zozbot234 · on Dec 29, 2022

SLD-Resolution is not the same as Resolution in a general sense. Prolog works with Horn clauses, and satisfiability for any conjunction of Horn clauses can be solved in polynomial time. Thus, in a sense, you're right that such proofs do not involve computationally hard "brute force".

YeGoblynQueenne · on Dec 30, 2022

That's right! Although, Prolog is not really-really SLD-Resolution (or SLDNF-Resolution). It's all a dirty lie :)

(I mean that unlike SLD-Resolution, Prolog is incomplete because of pragmatic implementation choices made to end up with a usable programming language, e.g. the so-called "occurs check". It's a long story. I'd have to check the details myself).

zozbot234 · on Dec 29, 2022

Personally, I still stand by my comment on the earlier thread - the problem of defining a "motivated" proof must boil down to a proof in some well-defined logic (which could of course be a logic fragment) where finding proofs is inherently easy. It looks like this project hasn't been engaging all that much with that well-known line of research, seemingly choosing instead to focus on a variety of heuristic questions. I'm not sure what to make of that.

auggierose · on Dec 29, 2022

Consider the logic just as a sheet of paper. Sure, you DO need SOME sheet of paper to write down your proof, but that's about it.

aebtebeten · on Dec 29, 2022

Huh, I'd've expected that GOFM (Good Old Fashioned Maths) would've been all about figuring out some convenient theories where finding proofs is inherently easy. What's the gap between that and maths as she is done?

zozbot234 · on Jan 3, 2023

You may be right - this recent presentation on Complementary foundations for mathematics https://news.ycombinator.com/item?id=34197169 seems to be following a similar POV.

zozbot234 · on Dec 29, 2022

Previous discussion: https://news.ycombinator.com/item?id=31190641