Chinese AI company says breakthroughs enabled creating a leading-edge AI model with 11X less compute — DeepSeek’s optimizations could highlight limits of US sanctions
December 28, 2024

Chinese AI company says breakthroughs enabled creating a leading-edge AI model with 11X less compute — DeepSeek’s optimizations could highlight limits of US sanctions

DeepSeek, a Chinese artificial intelligence startup, said the artificial intelligence models it trains are comparable to leading models from heavyweight companies such as OpenAI, Meta and Anthropic, but with 11 times less GPU computing and cost. These claims have not yet been fully verified, but this shocking announcement shows that while US sanctions have affected the availability of Chinese artificial intelligence hardware, smart scientists are working hard to extract the best performance from the limited amount of hardware to Reduce the impact of suffocation on China’s AI chip supply. The company has open sourced the model and weights, so we expect testing to be available soon.

Deepseek trained the DeepSeek-V3 Mixture-of-Experts (MoE) language model with 671 billion parameters in just two months using a cluster of 2,048 Nvidia H800 GPUs, representing 2.8 million GPU hours. Paper. In comparison, Meta took 11 times the computing power (30.8 million GPU hours) trained Llama 3 with 405 billion parameters in 54 days using a cluster of 16,384 H100 GPUs.

2024-12-27 15:23:49

Leave a Reply

Your email address will not be published. Required fields are marked *