Alibaba’s Qwen with Questions reasoning model beats o1-preview
December 2, 2024

Alibaba’s Qwen with Questions reasoning model beats o1-preview


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Find out more


Chinese e-commerce giant Alibaba has released the latest model in its ever-expanding Quen family. This one is known as Quen with Questions (KvK) and serves as the latest open source competitor to OpenAI is o1 reasoning model.

Like other large-scale reasoning models (LRMs), KvK uses additional computing cycles during reasoning to review its answers and correct its errors, making it better suited for tasks that require logical reasoning and planning such as math and coding.

What is Quen with questions (OvK?) and can it be used for commercial purposes?

Alibaba released a version of KvK with 32 billion parameters with a context of 32,000 tokens. The model is currently under review, which means a better-performing version is likely to follow.

According to Alibaba’s tests, KvK outperforms o1-previev on the AIME and MATH benchmarks, which assess mathematical problem-solving abilities. It also outperforms o1-mini on GPKA, a benchmark for scientific reasoning. KvK is inferior to o1 in the LiveCodeBench coding tests, but still outperforms other frontier models such as GPT-4o and Claude 3.5 Sonnet.

Quen with questions
Example of Quen output with questions

KvK does not come with supporting work that describes the data or process used to train the model, making it difficult to reproduce the model’s results. However, since the model is open, unlike OpenAI o1, its “thinking process” is not hidden and can be used to understand how the model reasons when solving problems.

Alibaba has also released the model under the Apache 2.0 license, which means it can be used for commercial purposes.

‘We discovered something profound’

According to a blog post which was published in conjunction with the release of the model, “Through deep research and countless trials, we discovered something profound: when given time to think, question, and reflect, understanding the mathematics and programming of the model blossoms like a flower opening to the sun… This process of careful thought and self-examination leads to extraordinary discoveries in solving complex problems.”

This is very similar to what we know about how reasoning models work. By generating more tokens and reviewing their previous responses, the models are more likely to correct potential errors. Marco-o1another thinking model recently published by Alibaba may also hold hints about how KwK might work. Marko-o1 uses Search for trees in Monte Carlo (MCTS) and self-reflection at the time of reasoning to create different branches of reasoning and choose the best answers. The model is trained on a mixture of Chain of Thought (CoT) examples and synthetic data generated by MCTS algorithms.

Alibaba points out that KwK still has limitations such as mixing languages ​​or getting stuck in circular reasoning loops. The model is available for download at Hugging Face and an online demo can be found at Hugging Face Spaces.

The Age of LLMs Gives Way to LRMs: Large Models of Reasoning

The release of o1 has sparked growing interest in creating LRMs, although not much is known about how the model works under the hood other than using an inference time scale to improve the model’s response.

Now there are several Chinese competitors o1. China’s AI lab DeepSeek went public recently R1-Lite-Previewits o1 competitor, which is currently only available through the company’s online chat interface. The R1-Lite-Preview is said to be better than the o1 in several key metrics.

Another recently released model is LLaVA-o1developed by researchers from multiple universities in China, which brings the inference-in-time paradigm to open source vision language models (VLM).

The focus on LRM comes at a time of uncertainty over the future of model scaling laws. Reports indicate that AI labs such as OpenAI, Google DeepMind, and Anthropic are getting diminishing returns on training larger models. And generating more quality training data is becoming increasingly difficult because models are already being trained on trillions of tokens collected from the Internet.

Meanwhile, the inference timescale offers an alternative that could provide the next breakthrough in improving the capabilities of next-generation AI models. There are reports that OpenAI using o1 to generate synthetic inference data to train the next generation of its LLMs. The release of the open reasoning model is likely to stimulate progress and make the space more competitive.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *