Elon Musk said artificial intelligence companies have run out of data to train their models and have “exhausted” the totality of human knowledge.
The world's richest person suggested that technology companies will need to turn to “synthetic” data, or materials created by AI models, to build and fine-tune new systems, but the process is already being done with rapidly evolving technology.
“Human knowledge reserves are being used up in AI training. That’s basically what happened last year,” said Musk, who launched his own AI business, xAI, in 2023.
ChatGPT AI models, such as the GPT-4o model that powers chatbots, are “trained” on vast amounts of data taken from the internet, where they actually learn to find patterns in that information, allowing them to predict, for example, what comes next. It will look like this. words in a sentence.
In an interview livestreamed on his social media platform X, Musk said the “only way” to combat the lack of source material to train new models is to move to synthetic data created by AI. He said it was true.
Regarding the exhaustion of data, he said: “The only way to compensate for that is by using synthetic data…writing an essay, coming up with a paper, and then self-grading and…going through this process.” Self-study. ”
Meta, the owner of Facebook and Instagram, used synthetic data to fine-tune its largest Llama AI model, while Microsoft also used AI-generated content in its Phi-4 model. Google and OpenAIthe company behind ChatGPT, also uses synthetic data in its AI work.
But Musk also warned that AI models' habit of producing “hallucinations” (a term used to describe inaccurate or meaningless output) is dangerous for synthetic data processes.
he said in it live streaming interview Mark Penn, chairman of advertising group Stagwell, said the hallucinations were making the process of using artificial materials “difficult”, leaving him wondering “whether it's… hallucinations or the real answer. “How do I know?”
Andrew Duncan, director of basic AI at Britain's Alan Turing Institute, said Musk's comments suggest that publicly available data for AI models could run out as early as 2026. “This is in line with recent academic papers,” he said, adding that over-reliance on synthetic data is dangerous. “Model collapse”. A term that refers to the output of a model that is of reduced quality.
“When you start feeding synthetics to your models, you start to see diminishing returns,” he says, with the risk of biased output and lack of creativity.
Duncan added that with the increase in online content generated by AI, there is also the potential for that content to be absorbed into AI data training sets.
High-quality data and its control is one of the legal battlegrounds in the AI boom. OpenAI acknowledged last year that it is impossible to create tools such as ChatGPT without accessing copyrighted material, but creative industries and publishers have been unable to use the artifacts in the model training process. is demanding compensation for.





