Deepmind says that its AI works better than international gold medalists in mathematics in mathematics

Deepmind says that its AI works better than international gold medalists in mathematics in mathematics

An AI system developed by Google Deepmind, the main research laboratory on Google AI, seems to have exceeded the medium gold medalist in solving geometry problems in an international mathematics competition.

The system, called alphageometry2, is an improved version of a system, alphageometry, that Deepmind released last January. In a Newly published studyDeepmind researchers behind alphageometry2 claim that their AI can solve 84% of all geometry problems in the last 25 years of International Mathematical Olympiad (IMO), a mathematics competition for high school students.

Why does Deepmind care about a secondary level mathematics competition? Well, the laboratory thinks that the key to a more competent AI could reside in the discovery of new ways of solving difficult geometry problems – in particular Euclidean geometry problems.

Prove mathematical theorems, or logically explain why a theorem (for example the Pythagoras theorem) is true, requires both reasoning and the ability to choose from a range of possible steps towards a solution. These problem solving skills could – if Deepmind’s right – prove to be a useful element of future models of AI for general use.

Indeed, last summer, Deepmind demonstrated a system which combined alphageometry2 with AlphaProof, an AI model for formal mathematical reasoning, to solve four problems out of six of the OMI 2024. In addition to geometry problems, Approaches like these could be extended to other fields of mathematics and sciences – for example, to help complex engineering calculations.

Alphageometry2 has several fundamental elements, including a language model from the Gemini family from Google of AI models and a “symbolic engine”. The Gemini model helps the symbolic engine, which uses mathematical rules to deduce solutions to problems, to reach evidence achievable for a given geometry theorem.

A typical geometry diagram in the OMI.
A diagram of typical geometry problems in an OMI exam.Image credits:Google (Opens in a new window)

The problems of geometry of the Olympiads are based on diagrams that need “constructions” to be added before being able to be resolved, such as points, lines or circles. The gemini alphageometry2 model predicts which constructions could be useful to add to a diagram, which the engine refers to make deductions.

Basically, the gemini alphagueometry2 model suggests stages and constructions in a formal mathematical language to the engine, which – according to specific rules – checks these steps for logical consistency. A research algorithm allows alphageometry2 to carry out several research of solutions in parallel and to store results possibly useful in a common knowledge base.

Alphageometry2 considers a “solved” problem when it comes to proof that combines the suggestions of the Gemini model with the known principles of the symbolic engine.

Due to the complexity of the translation of evidence in a format that AI can include, there is a shortage of usable geometry training data. Deepmind has therefore created its own synthetic data to form the alphageometry language model2, generating more than 300 million theorems and evidence of variable complexity.

The Deepmind team has selected 45 problems of geometry of OMI competitions in the past 25 years (from 2000 to 2024), including linear equations and equations that require a displacement of geometric objects around an airplane. They then “translated” them into a larger set of 50 problems. (For technical reasons, certain problems had to be divided in two.)

According to the article, alphageometry2 has solved 42 of the 50 problems, eliminating the average gold medalist score of 40.9.

Certainly, there are limits. A technical oddity prevents alphagoeometry2 from solving problems with a variable number of points, non -linear equations and inequality. And alphageometry2 is not technically The first AI system to achieve performance at the geometry medal level, although it is the first to reach it with a set of problems of this size.

Alphageometry2 also made worse on another set of omi problems. For an additional challenge, the Deepmind team has selected problems – 29 in total – which had been nominated for the OMI exams by mathematics experts, but who have not yet appeared in a competition. Alphageometry2 could only resolve 20 of them.

However, the results of the study are likely to fuel the debate on the fact that AI systems should be built on manipulation of symbols-that is to say manipulating symbols which represent knowledge by using Rules – or on neural networks that have no brain.

Alphageometry2 adopts a hybrid approach: its Gemini model has a neural network architecture, while its symbolic engine is based on rules.

Supporters of neural network techniques argue that intelligent behavior, from vocal recognition to the generation of images, cannot emerge from anything more than massive amounts of data and IT. In contrast to symbolic systems, which solve tasks by defining sets of management rules of symbols dedicated to particular jobs, such as the modification of a line in the word processing software, neural networks try to resolve tasks by statistical approximation and learning from examples.

Neural networks are the cornerstone of powerful AI systems as Openai Openai “reasoning” model. But, claiming supporters of symbolic AI, they are not all the ends; Symbolic AI could be better placed to effectively code knowledge of the world, reason through complex scenarios and “explain” how they arrived at an answer, according to these supporters.

“It is striking to see the contrast between continuous and spectacular progress on this type of landmarks, and during this time, language models, including the most recent with` `reasoning ”, which continues to fight with problems Common sense, “said Vince Conitzer, a Carnegie Mellon told Techcrunch, professor at university computing specializing in AI. “I don’t think it’s all smoke and mirrors, but it illustrates that we still don’t really know what behavior expects from the following system. These systems are likely to have a very impactful impact, so we urgently understand them and the risks they pose much better. »»

Alphageometry2 perhaps demonstrates that the two approaches – handling the symbol and neural networks – combined are a promising path in the search for a generalizable AI. Indeed, according to The Deepmind Paper, O1, which also has a neural network architecture, could not solve any of the OMI problems to which alphageometry2 was able to answer.

This may not be the case forever. In the article, the Deepmind team said it has found preliminary evidence that the alphageometry2 language model was capable of generating partial solutions to problems without the help of the symbolic engine.

“”[The] Ideas support results that large language models can be self -sufficient without depending on external tools [like symbolic engines]”Wrote the Deepmind team in the newspaper,” but until [model] The speed is improved and hallucinations Are completely resolved, the tools will remain essential for mathematical applications. »»

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *