After a year of explosive growth, generative artificial intelligence (AI) may be facing its most serious legal threat yet, The New York Times reported.
Just before the start of the year, the Times sued Microsoft and OpenAI, the developer of the popular ChatGPT tool, for copyright infringement, alleging that the companies illegally used millions of the company's articles to train their own AI models. .
The newspaper joins a number of writers and artists who have sued major technology companies in recent months for training AI on copyrighted works without their permission. Many of these cases face difficulties in court.
But experts believe the Times' complaint is sharper than previous AI-related copyright cases.
“I think they learned from some of their past losses,” Robert Bronis, a professor of intellectual property law at George Washington University Law School, told The Hill.
The Times lawsuit “has a little less variation in causes of action,” Blaunis said.
“The New York Times' lawyers are careful to avoid throwing everything at the wall and seeing what sticks,” he added. “They’re really focused on what they think will stick.”
Transformation and reproduction
Generative AI models require a large amount of material for training. Large-scale language models, such as OpenAI's ChatGPT and Microsoft's Copilot, use trained material to predict what words follow text strings and generate human-like responses. .
Shabi Khan, co-chair of law firm Foley & Lardner's artificial intelligence, automation and robotics group, said these AI models are typically transformative in nature.
“If it's a general question…it's not just about doing a search and finding the right passage and reproducing that passage,” Khan explained. “It tries to probabilistically create its own version of what to say based on patterns it finds by analyzing billions of words of content.”
But in its lawsuit against OpenAI and Microsoft, the Times alleges that the AI models developed by the companies “remember” and can sometimes reproduce parts of newspaper articles.
“If individuals can access the Times' highly valuable content through the defendant's own products without having to pay a fee or go through the Times' paywall, it is likely that many will do so.” “is high,” the complaint states.
“Defendants' misconduct threatens to drive readers, including current and potential subscribers, away from The Times and, as a result, undermines the Times' ability to continue producing its current level of groundbreaking journalism. “Subscription fees, advertising fees, license fees, and affiliate income may decline.” I would add.
In response to the lawsuit, an OpenAI spokesperson said in a statement that the company respects “the rights of content creators and owners” and “works with them to ensure they benefit from AI technology and new revenue models.” I will do my best to do so.”
Brownis said some of the “most striking” parts of the Times lawsuit are the repeated examples of AI models just spitting out the content of the article almost verbatim.
Khan pointed out that previous copyright cases have not been able to show such direct reproduction by models.
In recent months, courts have dismissed claims from plaintiffs in similar cases that the output of certain AI models infringed copyright by failing to demonstrate a substantially similar output to the copyrighted work. I've done it.
“I think [the Times] We did a good job, probably compared to other complaints that have been made in the past,” Khan told The Hill. “They provided basic excerpts from the New York Times, and frankly more than just excerpts, multiple examples of New York Times texts as reproductions.”
Khan said the court will decide that certain use cases of generative AI lack sufficient transformative power and limit certain prompts and outputs to prevent AI models from reproducing copyrighted content. He suggested that companies may be asked to do so.
Braunis similarly noted that the problem could lead to injunctions against tech companies and damages against the Times, but stressed that it is not an insurmountable problem for generative AI.
“I think companies will respond and develop filters that dramatically reproduce that kind of output and reduce the incidence,” he said. “So I don't think it's going to be a big long-term problem for these companies.”
In an October response to an inquiry from the U.S. Copyright Office, OpenAI said it could “memorize” or He said he has developed countermeasures to reduce the possibility of verbatim repetition. When copying copyrighted works.
However, the company notes that “because there are many ways for users to ask questions, ChatGPT handles all requests intended to retrieve output that may include some of the content used to train the model. Understanding and rejection may not be perfect.”
OpenAI says the AI model is also equipped with output filters that can block potentially violative content that is generated despite other safeguards.
OpenAI also emphasized in a statement Monday that memorization was a “rare bug” and claimed that the Times “deliberately manipulated the prompts” to regurgitate the article to ChatGPT.
“Even when using such prompts, our models typically don't behave the way the New York Times hints. This is either because we've told the model to regurgitate, or because we've sampled it from many attempts. ”, the company said.
“Despite their claims, this abuse is not typical, is not authorized user activity, and is not a replacement for The New York Times,” it added. “Nonetheless, we are continually making our systems more resistant to adversarial attacks that regurgitate our training data, and we have already made significant progress with our latest models.”
How media and AI can shape each other
Karl Szabo, vice president and general counsel for tech industry group NetChoice, warned that lawsuits like the Times' could stifle the industry.
“We're seeing a series of efforts to corner AI developers with money in a way that harms the public, undermines public access to information, and kind of defeats the purpose of copyright law, which is to advance human knowledge. 'At the end of the day, it's going to be,''' Szabo told The Hill.
Khan said there will eventually be a mechanism in place for tech companies to license content such as Times articles to train their AI models.
OpenAI already has agreements in place to use content with The Associated Press and Axel Springer, the German media company that owns publications such as Politico and Business Insider.
The Times said in the lawsuit that it contacted Microsoft and OpenAI in April to raise intellectual property concerns and a possible agreement, which OpenAI also acknowledged in a statement regarding the lawsuit. pointed out.
“We are surprised and disappointed by this development, as our ongoing dialogue with The New York Times has been productive and has led to constructive progress,” a spokesperson said.
An OpenAI spokesperson added that the company “looks forward to finding mutually beneficial ways to collaborate.”
“I think most publishers will adopt that model because it brings additional revenue to the company,” Khan told The Hill. “And you can see that from the New York Times trying to get into the issue. [an agreement]. So there is a price they are willing to accept. ”
Copyright 2023 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.





