@polakkenak

polakkenak@feddit.dk · 2 days ago

In theory: Yes, future works are not yet part of the training data.

In practice: It takes months or years for an open source project (or any new technology) to take off and be considered valuable.

The other argument relies on said tech organization doing the right thing, and spending resources on training their own model (years and 100+ million) instead of including the cost of the lawsuit and pending fine in their cost/benefit analysis. I’m not aware that any such tech organization (with the means) exists.

polakkenak@feddit.dk · 3 days ago

No, absolutely not. It is safe to assume that most/all open source (and otherwise) has been part of the training data. You need not look further than the fact that some models can recite Harry Potter from memory. There is no such thing as “clean room” for AI.