Elon Musk says all human data for AI training ‘exhausted’ | Artificial intelligence (AI)
January 11, 2025

Elon Musk says all human data for AI training ‘exhausted’ | Artificial intelligence (AI)

AI companies have run out of data to train their models and have “exhausted” the sum of human knowledge. Elon Musk said.

The world’s richest man has suggested that technology companies will have to turn to “synthetic” data – or stuff created by artificial intelligence models – to create and fine-tune new systems, a process already happening with rapidly evolving technologies.

“The total sum of human knowledge has been exhausted in the training of AI. “Basically, this happened last year,” said Musk, who launched his own business in the field of artificial intelligencexAI, in 2023.

Artificial intelligence models, such as the GPT-4o model that underlies Chatbot ChatGPT are “trained” on massive amounts of data taken from the Internet, where they actually learn to identify patterns in that information, allowing them to, for example, predict the next word in a sentence.

Speaking in an interview broadcast live on his social media platform X, Musk said the “only way” to deal with the lack of raw material to train new models is to move to synthetic data generated by AI.

Speaking about the exhaustion of data sources, he said, “The only way to supplement it is with synthetic data, where… they’ll sort of write an essay or come up with a dissertation, and then evaluate themselves and… go through this process of self-learning.”

Meta, owner of Facebook and Instagram, used synthetic data to fine-tune its largest AI model, Llama, while Microsoft also used AI-generated content for its Phi-4 model. Google and OpenAIChatGPT also used synthetic data in its work with artificial intelligence.

However, Musk also warned that AI models’ habit of generating “hallucinations” (a term for inaccurate or meaningless output) poses a danger to the synthetic data process.

He said in live interview Along with Mark Penn, chairman of the advertising group Stagwell, hallucinations made the process of using artificial material “difficult” because “how do you know if it was… a hallucinated response or if it is a real response.”

Andrew Duncan, director of fundamental artificial intelligence at Britain’s Alan Turing Institute, said Musk’s comment coincides with a recent scientific paper that said publicly available data for artificial intelligence models could run out as early as 2026. He added that over-reliance on synthetic data risks “model collapse,” a term for deterioration in the quality of model outputs.

“When you start feeding a model synthetic material, you start to get diminishing returns,” he said, risking the results being biased and lacking in creativity.

Duncan added that the rise of AI-generated online content could also lead to this material being included in AI training datasets.

High-quality data and control over it is one of the legal battlegrounds in the artificial intelligence boom. Last year, OpenAI admitted that it would be impossible to create tools like ChatGPT. without access to copyrighted materialswhile creative industries And publishers require compensation for the use of their results in the model training process.

2025-01-09 17:14:35

Leave a Reply

Your email address will not be published. Required fields are marked *