mcc ,
@mcc@mastodon.social avatar

2008, me: I love the idea of cryptocurrency

BITCOIN: The word "cryptocurrency" now means "financial scams based on inefficient write-only ledgers"

2018, me: I love the idea of the metaverse

FACEBOOK: The word "metaverse" now means "proprietary 3D chat programs with no soul"

2022, me: I love the idea of procedurally generated content

OPENAI: From now on people will associate that only with big corporations plagiarizing small artists and turning their work into ugly content slurry

mcc OP ,
@mcc@mastodon.social avatar

RONALD LACEY: Again we see, Ms. McClure, there is nothing you can possess which I cannot take away.

mcc OP ,
@mcc@mastodon.social avatar

I'm really concerned about the effect "generative AI" is going to have on the attempt to build a copyleft/commons.

As artists/coders, we saw that copyright constrains us. So we decided to make a fenced-off area where we could make copyright work for us in a limited way, with permissions for derivative works within the commons according to clear rules set out in licenses.

Now OpenAI has made a world where rules and licenses don't apply to any company with a valuation over $N billion dollars.

mcc OP ,
@mcc@mastodon.social avatar

(The exact value of "N" is not known yet; I assume it will be solidly fixed by some upcoming court case.)

mcc OP ,
@mcc@mastodon.social avatar

In a world where copyleft licenses turn out to restrict only the small actors they were meant to empower, and don't apply to big bad-actor "AI" companies, what is the incentive to put your work out under a license that will only serve to make it a target for "AI" scraping?

With NFTs, we saw people taking their work private because putting something behind a clickwall/paywall was the only way to not be stolen for NFTs. I assume the same process will accelerate in an "AI" world.

mcc OP ,
@mcc@mastodon.social avatar

Did you see this? The whole thing with "the stack".

https://post.lurk.org/@emenel/112111014479288871

Some jerks did mass scraping of open source projects, putting them in a collection called "the stack" which they specifically recommend other people use as machine learning sources. If you look at their "Github opt-out repository" you'll find just page after page of people asking to have their stuff removed:

https://github.com/bigcode-project/opt-out-v2/issues

(1/2)

mcc OP ,
@mcc@mastodon.social avatar

…but wait! If you look at what they actually did (correct me if I'm wrong), they aren't actually doing any machine learning in the "stack" repo itself. The "stack" just collects zillions of repos in one place. Mirroring my content as part of a corpus of open source software, torrenting it, putting it on microfilm in a seedbank is the kind of thing I want to encourage. The problem becomes that they then suggest people create derivative works of those repos in contravention of the license. (2/2)

mcc OP ,
@mcc@mastodon.social avatar

So… what is happening here? All these people are opting out of having their content recorded as part of a corpus of open source code. And I'll probably do the same, because "The Stack" is falsely implying people have permission to use it for ML training. But this means "The Stack" has put a knife in the heart of publicly archiving open source code at all. Future attempts to preserve OSS code will, if they base themselves on "the stack", not have any of those opted-out repositories to draw from.

mcc OP ,
@mcc@mastodon.social avatar

Like, heck, how am I supposed to rely on my code getting preserved after I lose interest, I die, BitBucket deletes every bit of Mercurial-hosted content it ever hosted, etc? Am I supposed to rely on Microsoft to responsibly preserve my work? Holy crud no.

We want people to want their code widely mirrored and distributed. That was the reason for the licenses. That was the social contract. But if machine learning means the social contract is dead, why would people want their code mirrored?

gsuberland ,
@gsuberland@chaos.social avatar

@mcc I have generally come to the conclusion that this is an intended effect. All the things you feel compelled to do for the good of others, in an ordinarily altruistic sense, are essentially made impossible unless you accept that your works and your expressions will be repackaged, sold, and absorbed into commercialised datasets.

The SoaD line "manufacturing consent is the name of the game" has been in my head a lot lately.

mark ,
@mark@mastodon.fixermark.com avatar

@gsuberland @mcc One almost wonders if the end-game is to stop pulling and try pushing.

Maybe instead of trying to claw back data we've made publicly crawlable because "I wanted it visible, but not like that" we ask why any of these companies get to keep their data proprietary when it's built on ours?

Would people be more okay with all of this if the rule were "You can build a trained model off of publicly-available data, but that model must itself be publicly-available?"

mcc OP ,
@mcc@mastodon.social avatar

@mark @gsuberland In my opinion, a trapdoor like "okay, well if copyright doesn't apply to the training data you stole, your model isn't copyrightable either" is no good. The US Gov has already said GenAI images and text are not copyrightable. It doesn't help. The thing about generative AI is it inherently takes heavy computational resources (disk space, CPU time, often-unacknowledged low-wage tagging work). Therefore, as a tool, it is inherently biased toward capital and away from individuals.

mcc OP ,
@mcc@mastodon.social avatar

@mark @gsuberland If we say "AI is a new class of thing that is outside the copyright regime entirely", that is not a level playing field. The tool is designed in a way it inherently serves the powerful. "Machine learning models are inherently open" is the exact model I am afraid of— a world where copyright is something that applies to actors who have less than some specific amount of money, and anyone with more than that specific amount of money is liberated from it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • test
  • worldmews
  • mews
  • All magazines