WSJ: NYTimes sues the principals behind ChatGPT over copyright infringement

Gaius · December 27, 2023, 2:05pm

John bites an apple and says it tastes sweet, but I don’t witness the act nor hear him say it.
The New York Times writes “John bites an apple, and says they taste sweet.” then I read the article.

Since I have no first-hand knowledge, under US copyright law, there are limits to when I can, in a for-profit setting make such statements as “John bit an apple.” or “Apples taste sweet.” I can report my observations. It would be perfectly legal for me to say "According to the NY Times . . . " but the underlying content is NY Times’ content. It belongs to them and (subject to their licensing agreement) I cannot use it without their permission.

Yet Microsoft, OpenAI have trained ChatGPT to do just that.
They have trained ChatGPT to report, not its own observations, but those of the NYTimes and to do so without attribution of any kind.

That appears to be at the core of a lawsuit the NY Times has filed against ChatGPT principles Microsoft and OpenAI.

more below

Gaius · December 27, 2023, 2:15pm

According to the WSJ article

New York Times Sues Microsoft and OpenAI, Alleging Copyright Infringement

.
News publisher says AI tools use its content without permission; tech companies have said training AI with web content is ‘fair use’

The New York Times sued Microsoft and OpenAI for alleged copyright infringement, touching off a legal fight over generative-AI technologies with far-reaching implications for the future of the news publishing business.

In a complaint filed Wednesday, the Times said the technology companies exploited its content without permission to create their AI products, including OpenAI’s humanlike chatbot ChatGPT and Microsoft’s Copilot. The tools were trained on millions of pieces of Times content, the suit said, and draw on that material to serve up answers to users’ prompts. . . .

In its complaint, the Times said it believes it is among the largest sources of proprietary information for OpenAI and Microsoft’s AI products. Their AI tools divert traffic that would otherwise go to the Times’ web properties, depriving the company of advertising, licensing and subscription revenue, the suit said.

The Times is seeking damages, in addition to asking the court to stop the tech companies from using its content and to destroy data sets that include the Times’ work. . . .

In its suit, the Times said the fair use argument shouldn’t apply, because the AI tools can serve up, almost verbatim, large chunks of text from Times news articles. . . .

Apparently changing a few words around might not be sufficient.
When the NY Times learns this, and the NYTimes reports that, then unless their user agreement allows, you cannot use their material to create a billion-dollar company without giving them sufficient credit.

Of course, doing so risks causing ChatGPT etc. to lose their “mystique.”
If Chat GPT constantly reveals its sources it will not seem like some far-future technology that would attract tens of billions of dollars in investment and might one day take over the world. It will appear, as it really is, a computer-generated version of Wikipedia.

Smyrna · December 27, 2023, 3:28pm

The left’s current ignoring of the plagiarism of Harvard’s president makes this all ok now.

Camp · December 27, 2023, 3:50pm

Interesting.

I do consider it theft and hope the suits succeed.

Impenitent · December 27, 2023, 3:57pm

this knowledge is not your knowledge

Gaius · December 27, 2023, 4:09pm

Similar, but different.

An NYTimes subscription is $25/month if you are using it only for personal and family purposes.

Now imagine you pay $25/month, for a personal subscription,
then have your bot re-word the articles, and you sell 10,000 subscriptions to your re-worded articles for $10/month each.

If you are going to build a business around regurgitating NYTimes content you have to make that clear. You have to write and rewrite “According to the NYTimes this . . .” and “According to the NYTimes that . . .”

That’s the law.

Gaius · December 27, 2023, 4:26pm

So do I.

If ChatGPT provided proper, legal, attribution it would look and feel a lot a computer-generated Wikipedia.

I once asked a GPS to name a good Italian restaurant near me, and its vocal response began “According to Tripadvisor . . . .”

Tripadvisor likes this, and it reveals that my GPS is not some Skynet supra-intelligence that is one day going to takeover the world.

But ChatGPT pretends it is something it is not (did I ever mention I think Sam Altman is a compulsive liar?) Hence Tripadvisor’s net worth is $2.96 billion and
while OpenAI raises capital at a valuation of $100 billion.

conan · December 27, 2023, 7:25pm

Yes but than we would all know bias of ChatGPT.

Gaius · December 27, 2023, 8:31pm

True, and it is good that you point out its bias.

But my point was a different one.

Imagine that I started a new thread, and every morning, I posted some witty intellectual joke here on the Hannity forum. After a few days, you might come to look forward to reading it. You might think I am some intelligent humorous person.

Now, what if I turned out all of my jokes came from a joke list I found at some website and all I did was pay my little nephew a few bucks to write some lines of code such that every morning, it would auto-post a few lines from that website onto this one.

The truth would be revealed, and you would see I am not intelligent at all. Same deal with ChatGPT.

What if Chat GPT is not intelligence at all, but instead is just a regurgitation machine . . . that masquerades as intelligence so Sam Altman and friends can swindle investors and customers out of tens of billions of dollars?

conan · December 27, 2023, 8:41pm

But that’s precisely what it is, you cannot make a machine think despite programing. It will always be programed software.

If investors/people fall for it, that problem rest with them. Yes they’re being lied too, manipulated etc.

so what’s the solution…telling the truth what AI is accurately is.

It’s a tool and that’s it…nothing more. It can’t think on its own period.

Gaius · December 27, 2023, 10:23pm

According to the WSJ

OpenAI recited large portions of a 2019 report based on an 18-month investigation of predatory lending in New York City’s taxi industry, the complaint said.

Here is a sample provided by the NYTimes in its complaint.
Notice the red parts are virtually identical.

Source (page 30 below):

Gaius · December 27, 2023, 10:27pm

Here is another example from page 31 of the same complaint

The complaint alleges

. . . in 2012, The Times published a groundbreaking series examining how
outsourcing by Apple and other technology companies transformed the global economy. The series was the product of an enormous effort across three continents. Reporting this story was especially challenging because The Times was repeatedly denied both interviews and access. The Times contacted hundreds of current and former Apple executives, and ultimately secured information from more than six dozen Apple insiders. Again, GPT-4 copied this content and can recite large portions of it verbatim.

image900×690 103 KB

conan · December 27, 2023, 11:45pm

NYT has a very good case…that’s right there is far worst than Biden plagiarism IMO.

Gaius · January 9, 2024, 11:13am

ChatGPT has (finally) issued a public response to the lawsuit. (link below)
The response strikes me as a cheap attempt to change the subject strikes.

In it (the response) OpenAI states that

“Regurgitation” is a rare bug that we are working to drive to zero

This case however is not just about regurgitation, (the stealing of words)
In the case, the NYTimes came up with several cases where the ENTIRE STORY was exclusive to the NYTimes and ChatGPT reported the information as if ChatGPT knew it by being “intelligent.”

The value of ChatGPT is in its “mystique” which is ruined when it reveals its information is little more than a Googlesearch, like a machine-compiled Wikipedia article.