@mcc@mastodon.social cover
@mcc@mastodon.social avatar

mcc

@mcc@mastodon.social

glitch girl

This profile is from a federated server and may be incomplete. For a complete list of posts, browse on the original instance.

mcc , to random
@mcc@mastodon.social avatar

2008, me: I love the idea of cryptocurrency

BITCOIN: The word "cryptocurrency" now means "financial scams based on inefficient write-only ledgers"

2018, me: I love the idea of the metaverse

FACEBOOK: The word "metaverse" now means "proprietary 3D chat programs with no soul"

2022, me: I love the idea of procedurally generated content

OPENAI: From now on people will associate that only with big corporations plagiarizing small artists and turning their work into ugly content slurry

mcc OP ,
@mcc@mastodon.social avatar

RONALD LACEY: Again we see, Ms. McClure, there is nothing you can possess which I cannot take away.

mcc OP ,
@mcc@mastodon.social avatar

I'm really concerned about the effect "generative AI" is going to have on the attempt to build a copyleft/commons.

As artists/coders, we saw that copyright constrains us. So we decided to make a fenced-off area where we could make copyright work for us in a limited way, with permissions for derivative works within the commons according to clear rules set out in licenses.

Now OpenAI has made a world where rules and licenses don't apply to any company with a valuation over $N billion dollars.

mcc OP ,
@mcc@mastodon.social avatar

(The exact value of "N" is not known yet; I assume it will be solidly fixed by some upcoming court case.)

mcc OP ,
@mcc@mastodon.social avatar

In a world where copyleft licenses turn out to restrict only the small actors they were meant to empower, and don't apply to big bad-actor "AI" companies, what is the incentive to put your work out under a license that will only serve to make it a target for "AI" scraping?

With NFTs, we saw people taking their work private because putting something behind a clickwall/paywall was the only way to not be stolen for NFTs. I assume the same process will accelerate in an "AI" world.

mcc OP ,
@mcc@mastodon.social avatar

Did you see this? The whole thing with "the stack".

https://post.lurk.org/@emenel/112111014479288871

Some jerks did mass scraping of open source projects, putting them in a collection called "the stack" which they specifically recommend other people use as machine learning sources. If you look at their "Github opt-out repository" you'll find just page after page of people asking to have their stuff removed:

https://github.com/bigcode-project/opt-out-v2/issues

(1/2)

mcc OP ,
@mcc@mastodon.social avatar

…but wait! If you look at what they actually did (correct me if I'm wrong), they aren't actually doing any machine learning in the "stack" repo itself. The "stack" just collects zillions of repos in one place. Mirroring my content as part of a corpus of open source software, torrenting it, putting it on microfilm in a seedbank is the kind of thing I want to encourage. The problem becomes that they then suggest people create derivative works of those repos in contravention of the license. (2/2)

mcc OP ,
@mcc@mastodon.social avatar

So… what is happening here? All these people are opting out of having their content recorded as part of a corpus of open source code. And I'll probably do the same, because "The Stack" is falsely implying people have permission to use it for ML training. But this means "The Stack" has put a knife in the heart of publicly archiving open source code at all. Future attempts to preserve OSS code will, if they base themselves on "the stack", not have any of those opted-out repositories to draw from.

mcc OP ,
@mcc@mastodon.social avatar

Like, heck, how am I supposed to rely on my code getting preserved after I lose interest, I die, BitBucket deletes every bit of Mercurial-hosted content it ever hosted, etc? Am I supposed to rely on Microsoft to responsibly preserve my work? Holy crud no.

We want people to want their code widely mirrored and distributed. That was the reason for the licenses. That was the social contract. But if machine learning means the social contract is dead, why would people want their code mirrored?

mcc OP ,
@mcc@mastodon.social avatar

@mark @gsuberland In my opinion, a trapdoor like "okay, well if copyright doesn't apply to the training data you stole, your model isn't copyrightable either" is no good. The US Gov has already said GenAI images and text are not copyrightable. It doesn't help. The thing about generative AI is it inherently takes heavy computational resources (disk space, CPU time, often-unacknowledged low-wage tagging work). Therefore, as a tool, it is inherently biased toward capital and away from individuals.

mcc , to random
@mcc@mastodon.social avatar

I have been for some time frustrated with a project which practically speaking must use .cargo/config.toml, a Rust "feature of last resort" that the Rust devs seem to almost be leaving intentionally poorly supported to disusade you from using it. Just now I found a way to solve a serious and fundamental problem .cargo/config.toml normally causes, and I'm split between feeling very satisfied and really annoyed because oh my god this solution is so ugly

https://github.com/mcclure/pocket-riscv-rs-bug/commit/a369e1f185c729056a5886ef608d8225854f0915

mcc OP ,
@mcc@mastodon.social avatar

If you look at that patch and don't understand what's happening in it: That's a normal reaction

mcc OP ,
@mcc@mastodon.social avatar

Now, this solution to the problem actually introduces another problem. But that second problem turns out to be fixable by using an "unstable" feature in Rust nightly. Unfortunately, the "unstable" feature in Rust nightly introduces a third problem. I will not be elaborating on this post.

mcc OP ,
@mcc@mastodon.social avatar

UPDATE: So someone on Mastodon reads my posts here, looks at what I did, and says "why don't you just use [X]?". So I look in the docs, and realize [X] is indeed a far superior solution which is clean and does not introduce additional problems. So I try doing [X]. Long story short, it's now nearly half an hour later and after a series of tests I believe I've found a bug in the Rust compiler and I'm setting up to file a bug on Cargo

futurebird , to random
@futurebird@sauropods.win avatar

I wonder what would happen if one ran a LLM on data representing analog TV or radio signals for a type of video or music... then generated an analog signal and played it on a TV?

Would it make any difference in the texture, look sound of the resulting output?

mcc ,
@mcc@mastodon.social avatar

@jakemiller @futurebird Back before OpenAI poisoned the space I was wondering if you could use something based on "hidden markov models" (which work on continuous inputs/outputs) to generate continuous outputs such as waveforms.

However, tokenizing as hattifattener recommends is probably the smarter idea because it probably fits closer to how human brains actually interpret things like soundwaves. We don't hear waveforms, we hear frequency breakdowns.

mjg59 , to random
@mjg59@nondeterministic.computer avatar

Being less flippant about this - the xz backdoor relied on a line that was present in the tarball release, but not in the git repo. Do we have any infrastructure for validating this kind of thing? (It's expected that the tarball would contain things that aren't in git - for example, the configure script doesn't exist in git, but is expected to be in the release. The problem is that extra code was injected into the configure script after it was generated)

mcc ,
@mcc@mastodon.social avatar

@mjg59 It seems like one problem is it would be legitimately difficult to validate this without introducing some sort of uniform build/packaging process to audit in the first place.

(I guess you could look at effects, and specifically test whether the tarball/thing in package repo is different from the source, and compare against an approved list of intentional, inspected late patches for the cases where you want behavior like this?)

mcc ,
@mcc@mastodon.social avatar

@mjg59 Like I look at this and the problem I see is not "you can use M4 to inject code into a tarball from github" but rather "our entire supply chain is a hodgepodge of dissimilar parts awkwardly gasketed together, and someone found one of the many parts of that supply chain that is out-of-sight/confusing enough that a Bad patch could be put there without anyone noticing for quite some time"

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • test
  • worldmews
  • mews
  • All magazines