
OpenAI teases new reasoning model—but don’t expect to try it soon
On the last day of Ship-mas, OpenAI previewed a new set of cutting-edge “inference” models called o3 and o3-mini. edge first reported A new model of reasoning will emerge during this event.
The company won’t release the models today (and acknowledges that final results may change with more post-training). However, OpenAI is accepting applications from the research community to test these systems before their public release (no release date has been set yet). OpenAI launches o1 (codename Strawberry) September and jump directly to o3, skipping o2 to avoid confusion (or Trademark conflict) in partnership with British telecommunications company O2.
the term reasoning It has become a common buzzword in the artificial intelligence industry recently, but it basically means that machines break down instructions into smaller tasks that can produce stronger results. These models typically show how work leads to an answer, rather than just giving a final answer without explanation.
According to the company, o3 surpassed its previous performance record across the board. It outperformed its predecessor by 22.8% on a coding test (called SWE-Bench Verified) and outperformed OpenAI’s chief scientist on competitive programming. The model almost came out on top in one of the hardest mathematics competitions, called AIME 2024, missing just one question, and achieved an 87.7% score on a benchmark of expert-level scientific questions, called GPQA Diamond. . On the toughest math and reasoning challenges that typically stump artificial intelligence, o3 solved 25.2% of the problems (no other model exceeded 2%).
The company also announced new research on consensus, which requires artificial intelligence models to process security decisions step by step. Therefore, this paradigm requires the AI model to actively reason about whether the user’s request complies with OpenAI’s security policy, rather than just giving yes/no rules to the AI model. The company claims that when tested on the o1, it adhered to security guidelines better than previous models, including GPT-4.
2024-12-20 18:55:59