Reddit Strikes $60 Million Licensing Deal to Reportedly Train Google's AI Models

Subscribe to HubSpot's Next in AI Newsletter
Martina Bretous
Martina Bretous

Updated:

Published:

About a week ago, Bloomberg reported that Reddit had just signed a huge licensing deal ahead of its IPO, allowing an unnamed company to train its AI models on their data.

Reddit Strikes $60 Million Licensing Deal to Reportedly Train Google's AI Models

Click Here to Subscribe to HubSpot's AI Newsletter

A new report says that company is Google, although neither party has confirmed it. If true, this would be Reddit's first content deal.

Why is every AI company looking for licensing deals?

Since the AI race started, getting access to large, quality datasets has been a top priority.

AI models are trained on data – the more data it’s trained on, the better the output. In addition to quantity, there’s also a quality perspective. AI models want access to high-quality data that their competitors ideally don’t have access to.

This is where publishers like Reddit come in.

For a long time, OpenAI and other AI companies were freely roaming through publishers’ data. That was until publishers like The New York Times and Reddit caught on.

Last April, Reddit said, “If you want access to an 18-year deep well of data, you’re going to have to pay up.

The NYT, on the other hand, just said, “no.” (And they’re suing OpenAI for allegedly still doing it.)

Now close to a year later, Google, Apple, and OpenAI have all signed licensing agreements with huge publishers worth $100+ million.

The latest to join is Reddit who reportedly signed with Google, in a deal worth $60 million annually. This deal likely has an exclusivity clause, ensuring that only Google has access to this data, however that hasn’t been confirmed.

With an upcoming IPO, Reddit’s CEO Steve Huffman shared the company had earned over $200 million in licensing deals.

“Reddit’s vast and unmatched archive of real, timely, and relevant human conversation on literally any topic is an invaluable dataset for a variety of purposes, including search, AI training, and research,” wrote Huffman in their S-1 filing.

This would also be a huge win for Google who’s been trying to dethrone OpenAI for years.

Should AI licensing deals come with guardrails?

Some see licensing deals as a win-win: Publishers get paid for their data while AI companies get access to large, quality datasets.

However, it also comes with some setbacks.

Social media platforms like Reddit and X are community forums where people can write just about anything. Conspiracy theories, misinformation, and hateful rhetoric.

X user disapproves of Reddit's AI licensing deal with Google

Image Source

And although Reddit does have content moderators and policies, they only introduced a ban on hate speech 15 years after the site was founded.

Is that what AI models should be trained on?

AI companies can clean their data to filter out this type of content but there’s no clear standard that every model is built on. So, as a consumer, I won’t know what data models were trained on and how well they’ve been “cleaned.”

So, it begs the question: Should some websites be off the table when it comes to training AI models? And what guardrails are in place to ensure their models aren’t regurgitating the darkest content on the internet?

These answers are still up in the air.

Click Here to Subscribe to HubSpot's AI Newsletter

Related Articles

A weekly newsletter covering AI and business.

SUBSCRIBE

The weekly email to help take your career to the next level. No fluff, only first-hand expert advice & useful marketing trends.

Must enter a valid email

We're committed to your privacy. HubSpot uses the information you provide to us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our privacy policy.

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.