OpenAI представила о3 – семейство генеративных моделей со способностью рассуждения
December 26, 2024

OpenAI представила о3 – семейство генеративных моделей со способностью рассуждения

OpenAI held a new product launch event before Christmas, during which it demonstrated a new generation of generation mode o3, which replaced the previous version o1.

The range consists of two models: the full-featured o3 and the compact o3-mini for less complex tasks.
As a model with reasoning capabilities, o3 is able to independently check its own answers, which improves the accuracy and quality of information provided to users. However, such models process requests slower than standard models because fact checking slows down processing slightly. Depending on the complexity of the request, o3’s response latency can vary from a few seconds to a few minutes.

As described in TechCrunch, OpenAI uses “private thought chain” technology to train o3 to “think before responding.” The model can reason about a given problem and plan its response in advance, performing a series of sequential operations to help it find a solution. In practice, how Write In the portal, after you enter a query, o3 pauses, looks at a series of relevant clues and “explains” its reasoning along the way, then selects and groups the information it believes is most accurate in a given situation and publishes it as an answer.

OpenAI also reports on using new technology to train o3″deliberate coordination”, with the help of which the model learns to check whether its answers comply with safety principles.

Unlike previous versions, o3 has an option to regulate how much time the model can spend on inference. In particular, you can choose a low, medium or high inference level – the higher it is, the better o3 performs the task.

The company claims o3:

  • In the SWE-Bench Verified benchmark test that evaluates the effectiveness of solving programming problems, it improved by 22.8% compared with its predecessor o1;
  • Obtained a high rating of 2727 on the Codeforces programming competition platform;
  • AIME math test score 96.7%;
  • GPQA Biology, Physics and Chemistry question test score was 87.7%;
  • It set a record of solving 25.2% of the problems in the EpochAI Frontier Math benchmark.

The product is not yet available to the public; it is currently only available as a preview for research purposes. A full release is expected next year.

2024-12-26 10:18:57

Leave a Reply

Your email address will not be published. Required fields are marked *