Formal Abstracts – a Vision

By thales

The Formal Abstracts (FABSTRACTS) project will establish a formal abstract service that will express the results of mathematical publications in a computer-readable form that captures the semantic content of publications.

Specifically, the service will

  • give a statement of the main theorem of each published mathematical paper in a language that is both human and machine readable.
  • link each term in theorem statements to a precise definition of that term (again in human/machine readable form), and
  • ground every statement and definition in the system in some foundational system for doing mathematics (the Lean theorem prover).

What is the long-term vision for Formal Abstracts (FABSTRACTS) in mathematics?

FABSTRACTS enables machine learning in math

Many mathematicians dream that computers will someday fundamentally transform how mathematics is practiced, and FABSTRACTS expresses

in concrete terms how that transformation might be achieved.

Automation of mathematics has been a major aim of artificial intelligence from its inception. This broad aim includes all aspects of mathematical activity such as conjecturing, generating examples, finding counterexamples, computing, learning, collaborating, proving, generalizing, refereeing and checking,

searching, exploring, and transforming the literature.

For many of these activities, it is important or even essential to have a machine-readable semantic representation of mathematical

objects. This is achieved by FABSTRACTS.

We foresee tools for exploration such as a Google Earth for mathematics, providing an intuitive visual map of the entire world of mathematics, combining formal abstracts, computation, and other

networked content — all displayed at user-selected resolution.

It gets easier

Just as the FABSTRACTS project will enable AI in mathematics, the
development of AI will enable FABSTRACTS.

Looking forward beyond the challenging initial groundwork to get FABSTRACTS up and running, the generation of formal abstracts will become nearly fully automated. The production of formal abstracts

will become easier and more efficient with every passing year.

Tools for the natural language processing of mathematics are under development. As these tools develop, automation will replace the manual preparation of formal abstracts. Automation will make it feasible to construct hundreds of thousands (and eventually millions)

of formal abstracts.

For example, Kofler and Neumaier have developed a powerful parser for common mathematical language, and Neumaier is currently rewriting some of his published math books in multilingual editions so as to be parsed and translated by their software.  In other promising work, Ganesalingam’s thesis gave a brilliant linguistic analysis of the language of mathematics. The current Naproche (Natural language Proof Checking) system accepts a rich subset of mathematical language

including TeX-style formulas.

We expect mathematicians to be able to supply their own formal abstracts
in a structured English format, which will be parsed and translated to the Lean language.

We foresee future cooperation between FABSTRACTS and a major math abstract service (such as zbMATH or MathSciNet). The abstract service will provide statements of theorems from publications in a form suited for fast formal

abstraction, and with FABSTRACTS offering enhancements to those services.

Progress in document analysis will eventually permit the automated extraction of formal abstracts directly from the documents (building

on the work of M. Suzuki and collaborators at Kyushu University).

FABSTRACTS as a permanent service

FABSTRACTS will become integrated with the workflow of
several tens of thousands of mathematicians.

We foresee a TeX package that allows mathematicians to cite formal definitions and theorem statements for the convenience of readers, and

for the automatic processing of texts:

Let $M$ be a \formal{banach manifold}[fabstracts.org/10.1109/5.771073]. %TeX

We foresee further integration of FABSTRACTS with computation (in the style of the current SageTeX package, which performs SageMath

computation and graphics from within a TeX document).

We foresee the translation of formal abstracts into many languages
and formats. Lewis has constructed a two-way bridge between Mathematica and Lean. Computer programs already exist that give automated translations between some of the major proof assistants. The work of Kohlhase, Rabe, and their coworkers develops general infrastructure for the translation of formal mathematics between different systems. The point is that before we translate, we first need any formal representation,

and that is achieved by FABSTRACTS.

There are many expected applications, including training sets for machine learning, enhancement of automated theorem proving technologies, machine translation of mathematical texts into foreign languages, detecting errors in the mathematical literature, and intelligent mathematical search

tools.

Finally, we foresee that FABSTRACTS will inspire a broader movement for the increased precision in science, with mathematicians taking up the challenge to formalize the proofs of formal abstracts, and scientists finding precise ways to give formal expression to the

objects of their research.