OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

ooli@lemmy.world · 1 year ago

OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

noodlejetski@lemm.ee · 1 year ago

elshandra@lemmy.world · 1 year ago

If my ai bot is as exaggerated, fake and dense as so many youtubers seem to be these days. I think it will find itself without communication components in a very short time.

Immersive_Matthew@sh.itjust.works · 1 year ago

We already know they used all the public information on the Internet. How is this news? If AI is going to be any use, it needs to learn from somewhere.

circuitfarmer@lemmy.sdf.org · 1 year ago

People have been used to a lot of private services for a while now. YouTube is so ubiquitous it’s almost like a utility, in that everyone always has access to it and it’s just everywhere, with no real competitor.

But all of these social media services are private, so as much as they feel like public information utilities, once you’re on one, your data isn’t your own. I think that’s the disconnect when people hear that “their data” has been used for AI training. It ceased to be their data as soon as it went on the platform, at least tacitly in the US.

There has traditionally been a public expectation of control that simply isn’t there for any of these services. The industry knows this and capitalizes on it regularly. It’s a key tenet of technofeudalism.

FarraigePlaisteach@kbin.social · 1 year ago

If they’re scraping the web, and they’re generating AI content on the web, how do they avoiding training their AI on it’s own nonsense somewhere?