18+ pluralistic OP ,
@pluralistic@mamot.fr avatar

The same is true for wire-service copy or widely syndicated texts: there might be dozens or even hundreds of copies of these works in training data, resulting in the memorization of long passages from them.

This might be infringing (we're getting into some gnarly, unprecedented territory here), but again, even if it is, it wouldn't be a big hardship for model makers to post-process their models by comparing them to the training set, deleting any inadvertent memorizations.

13/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • test
  • worldmews
  • mews
  • All magazines