I’m interested in a new kind of license which I’m calling “relational source” - not about money or whether a product is commercial but instead if there’s an actual person who wants to use the code with some kind of AGPL-esque mechanism to ensure no mindless ingestion- perhaps this would never work but it’s also breaking the spirit of everything I love about OSS to have AI erasing the contributions of the people who put their time into doing the work.
This will kill open source. Anything of value will be derived and re-derived and re-re-derived by bad players until no-one knows which package or library to trust.
The fatal flaw of the open internet is that bad players can exploit with impunity. It happened with email, it happened with websites, it happened with search, and now it's happening with code. Greedy people spoil good things.
Copyright was the base protection layer. Not in the "I own it" sense, but in the "you can't take it and run with it" sense.
With the current weakening of it, it opens the door to abuses that we don't have the proper tools to deal with now. Perhaps new ones will emerge, but we'll have to see.
Hopefully the spread of AI will make more people realise that everything is a derivative work. If it wasn't an AI, it was a human standing on the shoulders of giants.
The offending repository is copying files verbatim while removing off the license header from the said files. It's not "standing on the shoulder of giants".
Yeah, this is where I find the copyright argument a little weak. Because how do artisans learn their craft? By observing others' work.
Instead, I feel like the objections are (rightly) these two issues:
1. GenAI operates at a much larger scale than an individual artist. I don't think artists would have an issue with someone commissioning a portrait say in the style Van Gogh (copyright argument). They would have an issue if that artist painted 100,000 pictures a day in the style of Van Gogh.
2. Lack of giving back: some of the greatest artists have internalized great art from previous generations, and then something miraculous happens. An entirely new style emerges. They have now given back to the community that incubated them. I don't really see this same giving back with GenAI.
Edit: one other thought. Adobe used their own legally created art to train their model, and people still complain about it, so I don't buy the copyright argument if they're upset about Adobe's GenAI.
Edit 2: I'm not condoning blatant copyright infringement like is detailed in this post.
1. If I wanted the "style of Van Gogh" I would simply download Van Gogh, why waste time and money on approximative AI. But if I want something Else, then I can use AI. But Gen AI is really the worst infringement tool, for example would anyone try to read bootleg Harry Potter from a LLM to avoid payment? Don't think so.
2. LLMs will give back what you put in + what they learned, it's your job to put in the original parts. But every so often this interaction will spark some new ideas. The LLM+human team can get where neither of them would get alone, building on each other's ideas.
For those of us who exceed the AI, it raises our value enormously. You see it in the pay of the AI engineers. But in the high interest rate world, those of us who continue to be employed, are commanding higher wages, as far as I can tell. It is a culling of the lesser-than.
One unfortunate side-effect is the junior engineers who cannot immediately exceed the AI are not being hired as often. But this era echos the dotcom boom, where very low-skilled people commanded very-high wages. Universities, which have always been white collar job training but pretended they weren't, are being impacted greatly.
24% of undergraduate MIT students this year have Computer Science in the title (I asked chatgpt to calculate this from the difficult-to-parse website). 1/4 of all MIT undergraduates are not being trained to be future PhD researchers - they, like all other schools, are training the vast majority of their students for private sector workforce jobs.
The culling is happening all over. We will likely go down to < 1000 colleges in America from 4000 now over the next 15 years.
This is a good thing. The cost of university degrees is far too high. We are in the midst of a vast transition. College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2. This very weird experiment in human history is ending, and it cannot happen soon enough
A racist crypto shill waxing poetically about the value of tertiary education? I'm positively enrapt, tell me more about how you exceed the AI when you can't parse a basic data table.
> College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.
You're likely correct that we're witnessing a reconsolidation of wealth and the extinction of the middle class in society, but you seem happy about this? Be careful what you wish for...
Alternatively, all middle class jobs do not require a college degree. Perhaps a college degree is primarily a signalling mechanism for adherence to a bygone era of societal norms. But the price is far too high to justify it, and the market will create alternative proof of societal norms, at a far cheaper price. Which is happening as we debate.
My concern now is a large number of under-employed college graduates who are indebted to worthless degrees, feeling pinched because the debt far surpasses their market value. This has been the case for a long time, but has now reached the upper-echelons of academia where even Ivy league grads cannot get employment. You need to re-calibrate your ire to the correct target
Open source model, created at great expense… by a still small cohort of people.
There are like a dozen organizations globally creating anything close to state of the art models. The fact that you can use some for free on your own hardware doesn’t change that those weights were trained by a small cohort of people, with training data selected by those people, and fine-tuning and “alignment” created by those people.
Sure you can fine-tune the smaller ones yourself, but that still leaves you at the will of original creator.
There is open source training and inference software. And there are open weights.
Those things are not enough to reproduce the training.
Even if you had the hardware, you would not be able to recreate llama (for example) because you don't know what data went into the training.
That's a very weird library. You can get their summaries, but you don't have access to the original works used when creating it. Sounds terrible, open source or not.
I find it odd that any LLM could be considered open source. Sure the weights are available to download and use, but you can't reasonably reconstruct the output model as it's impractical for an individual to gather a useful dataset or spend $5,000,000+ of GPU time training.
Distillation can extract the knowledge from an existing model into a newly trained one. That doesn't solve the cost problem, but costs are steadily coming down.
If we step back and examine LLMs more broadly (beyond our personal use cases, beyond "economic impact", beyond the underlying computer science) what we are largely looking at is an emerging means of collaboration. I am not an expert computer scientist, and yet I can "collaborate" (I almost feel bad using this term) with expert computer scientists when my LLM helps me design my particular algorithm. I am not an expert on Indonesian surf breaks, yet I tap into an existing knowledge base when I query my LLM while planning the trip. I am very naive about a lot of things and thankfully there are numerous ways to integrate with experts and improve my capacity to engage in whatever I am naive about, LLMs offering the latest ground-breaking method.
This is the most appropriate lens through which to assess AI and its impact on open source, intellectual property, and other proprietary assets. Alongside this new form of collaboration comes a restructuring of power. It's not clear to me how our various societies will design this restructuring (so far we are collectively doing nearly nothing) but the restructuring of these power structures is not a technical process; it is cultural and political. Engineers will only offer so much help here.
For the most part, it is up to us to collectively orchestrate the new power structure, and I am still seeing very little literature on the topic. If anyone has a reading list, please share!
not hard to believe. I’ve been using claude code and am hesitant to publish publicly because I’m concerned about copyright violations. It would be nice if there were a registry (besides github) where I could compare “new” code against public repositories.
This is the new reality. Information in the form of raw entropy encoded in weights—it doesn’t matter if it’s text, image, video, or 3D. Assets (or formerly known as assets) now belong to the big labs, if it’s on the internet.
Internet plus AI implies the tragedy of the commons manifested in the digital world.
Sorry to say but this is going to be the new normal, and it's going to be quite difficult to stop. Your moat as a creator is your personal brand and the community you build around your tools.
I just hope that means we're all allowed to feed leaked source code to our own AIs then. This is mandatory if we're to have any sort of coherent legal precedent.
Normalisation isn't something that one person by themselves can achieve. It only happens when public opinion is swayed. How is it swayed? By people deliberately trying to sway it, like GP here.
If you are instead arguing that normalisation is not really a thing at all: What do you call the change in attitudes to people who are left-handed, disabled, or homosexual?
I’m interested in a new kind of license which I’m calling “relational source” - not about money or whether a product is commercial but instead if there’s an actual person who wants to use the code with some kind of AGPL-esque mechanism to ensure no mindless ingestion- perhaps this would never work but it’s also breaking the spirit of everything I love about OSS to have AI erasing the contributions of the people who put their time into doing the work.
This will kill open source. Anything of value will be derived and re-derived and re-re-derived by bad players until no-one knows which package or library to trust.
The fatal flaw of the open internet is that bad players can exploit with impunity. It happened with email, it happened with websites, it happened with search, and now it's happening with code. Greedy people spoil good things.
If this was true, why hasn't it happened for the last... 30 or 40 years that FOSS code has been published on the internet
Copyright was the base protection layer. Not in the "I own it" sense, but in the "you can't take it and run with it" sense.
With the current weakening of it, it opens the door to abuses that we don't have the proper tools to deal with now. Perhaps new ones will emerge, but we'll have to see.
Last i checked LLMs didn’t exist until only a few years ago
Same reason why fake images and videos are now more. Photoshop existed 30 years ago.
Before LLM you needed time and abilities to do it, with AI you need less of both.
The license was MIT until two months ago.
That gives anyone the right to get the source code of that commit and do whatever.
The article does not specified if the company is still using the code AFTER the license change.
The rest of the points are still valid.
Hopefully the spread of AI will make more people realise that everything is a derivative work. If it wasn't an AI, it was a human standing on the shoulders of giants.
The offending repository is copying files verbatim while removing off the license header from the said files. It's not "standing on the shoulder of giants".
That doesn't even seem like ai but just direct copy pasting lol
AI makes it easy for others to claim they did the work so others are less likely to do the real work. Means the giants won’t grow.
Yeah, this is where I find the copyright argument a little weak. Because how do artisans learn their craft? By observing others' work.
Instead, I feel like the objections are (rightly) these two issues:
1. GenAI operates at a much larger scale than an individual artist. I don't think artists would have an issue with someone commissioning a portrait say in the style Van Gogh (copyright argument). They would have an issue if that artist painted 100,000 pictures a day in the style of Van Gogh.
2. Lack of giving back: some of the greatest artists have internalized great art from previous generations, and then something miraculous happens. An entirely new style emerges. They have now given back to the community that incubated them. I don't really see this same giving back with GenAI.
Edit: one other thought. Adobe used their own legally created art to train their model, and people still complain about it, so I don't buy the copyright argument if they're upset about Adobe's GenAI.
Edit 2: I'm not condoning blatant copyright infringement like is detailed in this post.
1. If I wanted the "style of Van Gogh" I would simply download Van Gogh, why waste time and money on approximative AI. But if I want something Else, then I can use AI. But Gen AI is really the worst infringement tool, for example would anyone try to read bootleg Harry Potter from a LLM to avoid payment? Don't think so.
2. LLMs will give back what you put in + what they learned, it's your job to put in the original parts. But every so often this interaction will spark some new ideas. The LLM+human team can get where neither of them would get alone, building on each other's ideas.
> Because how do artisans learn their craft? By observing others' work
I don't think that computer systems of any kind should have the same right to fair use that humans have
I think humans should get fair use carve outs for fanart and derivative work, but AI should not
>Lack of giving back
I disagree. There is a ton of free AI generated text, code, images, and video available for completely free for people to learn from.
Copyright is a nightmare. It's just that it sounds like a gentler nightmare than hyperscaled algorithms controlled by a few.
Nothing subverts my defense of human creativity more than the cliched human defenses of AI.
For those of us who exceed the AI, it raises our value enormously. You see it in the pay of the AI engineers. But in the high interest rate world, those of us who continue to be employed, are commanding higher wages, as far as I can tell. It is a culling of the lesser-than.
One unfortunate side-effect is the junior engineers who cannot immediately exceed the AI are not being hired as often. But this era echos the dotcom boom, where very low-skilled people commanded very-high wages. Universities, which have always been white collar job training but pretended they weren't, are being impacted greatly.
https://registrar.mit.edu/stats-reports/majors-count
24% of undergraduate MIT students this year have Computer Science in the title (I asked chatgpt to calculate this from the difficult-to-parse website). 1/4 of all MIT undergraduates are not being trained to be future PhD researchers - they, like all other schools, are training the vast majority of their students for private sector workforce jobs.
The culling is happening all over. We will likely go down to < 1000 colleges in America from 4000 now over the next 15 years.
This is a good thing. The cost of university degrees is far too high. We are in the midst of a vast transition. College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2. This very weird experiment in human history is ending, and it cannot happen soon enough
> College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.
Yeah, the world was a better place when it was mostly white males having that chance.
/s
35%, ignoring "secondary majors" which may or may not coincide with primary majors that also have CS in the title.
A racist crypto shill waxing poetically about the value of tertiary education? I'm positively enrapt, tell me more about how you exceed the AI when you can't parse a basic data table.
> College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.
You're likely correct that we're witnessing a reconsolidation of wealth and the extinction of the middle class in society, but you seem happy about this? Be careful what you wish for...
They probably think they’re one of the “truly intelligent and children/parent of the rich” lol
I would not want the unintelligent and non-rich to go into debt to spend 4 years at a university, getting a degree in a subject which is absurd
https://www.sps.nyu.edu/explore/degrees-and-programs/bs-in-h...
Please, tell me how going $300k in debt for an undergraduate degree in Tourism Studies benefits society, or the student
Alternatively, all middle class jobs do not require a college degree. Perhaps a college degree is primarily a signalling mechanism for adherence to a bygone era of societal norms. But the price is far too high to justify it, and the market will create alternative proof of societal norms, at a far cheaper price. Which is happening as we debate.
My concern now is a large number of under-employed college graduates who are indebted to worthless degrees, feeling pinched because the debt far surpasses their market value. This has been the case for a long time, but has now reached the upper-echelons of academia where even Ivy league grads cannot get employment. You need to re-calibrate your ire to the correct target
Yeah, sure, not every job should require a degree, but that doesn't justify keeping The Poors from pursuing education.
Some of us value education for its own sake, not as a prerequisite for employment.
You are assuming the only avenue to "education" is through the university experience
Some people learn best in structured class settings.
This. AI is a magnificent way to make the entire world's codebase available as a giant, cross-platform, standard library.
I welcome AI to copy my crap if that's going to help anyone in the future.
Except closed source software which it isn't trained on.
You forgot to mention that if things continue as they are, a very small group of people will have complete control over this giant library.
It's a concern. But there are open source models.
Open source model, created at great expense… by a still small cohort of people.
There are like a dozen organizations globally creating anything close to state of the art models. The fact that you can use some for free on your own hardware doesn’t change that those weights were trained by a small cohort of people, with training data selected by those people, and fine-tuning and “alignment” created by those people.
Sure you can fine-tune the smaller ones yourself, but that still leaves you at the will of original creator.
No, there aren't.
There is open source training and inference software. And there are open weights.
Those things are not enough to reproduce the training.
Even if you had the hardware, you would not be able to recreate llama (for example) because you don't know what data went into the training.
That's a very weird library. You can get their summaries, but you don't have access to the original works used when creating it. Sounds terrible, open source or not.
I find it odd that any LLM could be considered open source. Sure the weights are available to download and use, but you can't reasonably reconstruct the output model as it's impractical for an individual to gather a useful dataset or spend $5,000,000+ of GPU time training.
Distillation can extract the knowledge from an existing model into a newly trained one. That doesn't solve the cost problem, but costs are steadily coming down.
I'll give you the only upvote you'll probably get for that sentiment around here. Enjoy your trip to -4 (Dead)!
If we step back and examine LLMs more broadly (beyond our personal use cases, beyond "economic impact", beyond the underlying computer science) what we are largely looking at is an emerging means of collaboration. I am not an expert computer scientist, and yet I can "collaborate" (I almost feel bad using this term) with expert computer scientists when my LLM helps me design my particular algorithm. I am not an expert on Indonesian surf breaks, yet I tap into an existing knowledge base when I query my LLM while planning the trip. I am very naive about a lot of things and thankfully there are numerous ways to integrate with experts and improve my capacity to engage in whatever I am naive about, LLMs offering the latest ground-breaking method.
This is the most appropriate lens through which to assess AI and its impact on open source, intellectual property, and other proprietary assets. Alongside this new form of collaboration comes a restructuring of power. It's not clear to me how our various societies will design this restructuring (so far we are collectively doing nearly nothing) but the restructuring of these power structures is not a technical process; it is cultural and political. Engineers will only offer so much help here.
For the most part, it is up to us to collectively orchestrate the new power structure, and I am still seeing very little literature on the topic. If anyone has a reading list, please share!
> what we are largely looking at is an emerging means of collaboration.
They surpass open source, "out-open source-opensouce" by learning skills everywhere and opening them up for anyone who needs them later.
> Please DO NOT TURST ANY WORD THEY SAY. They're very good at lingual manipulation.
I don't know if this was intentional misspelling or not but it's damn funny
It is likely intentional as the author is battling AI with many means possible. However it leans towards funny and hopeless at the same time.
not hard to believe. I’ve been using claude code and am hesitant to publish publicly because I’m concerned about copyright violations. It would be nice if there were a registry (besides github) where I could compare “new” code against public repositories.
This is the new reality. Information in the form of raw entropy encoded in weights—it doesn’t matter if it’s text, image, video, or 3D. Assets (or formerly known as assets) now belong to the big labs, if it’s on the internet.
Internet plus AI implies the tragedy of the commons manifested in the digital world.
Sorry to say but this is going to be the new normal, and it's going to be quite difficult to stop. Your moat as a creator is your personal brand and the community you build around your tools.
I just hope that means we're all allowed to feed leaked source code to our own AIs then. This is mandatory if we're to have any sort of coherent legal precedent.
this is a blatant try to normalize. "Bad people do unethical things, I guess we'll have to live with it and shut up" is the vibe
the author is going good. it's not a new normal until everybody goes quiet
> this is a blatant try to normalize.
This doesn't mean anything. You have no ability to "normalize" anything. It's not an action that somebody can take.
> it's not a new normal until everybody goes quiet
Real let me speak to your manager energy. Nobody is waiting for you to go quiet to get on with things.
> You have no ability to "normalize" anything.
Normalisation isn't something that one person by themselves can achieve. It only happens when public opinion is swayed. How is it swayed? By people deliberately trying to sway it, like GP here.
If you are instead arguing that normalisation is not really a thing at all: What do you call the change in attitudes to people who are left-handed, disabled, or homosexual?