AI-powered open-source code laundering

(github.com)

61 points | by genkiuncle 7 hours ago ago

48 comments

  • laurex 2 minutes ago ago

    I’m interested in a new kind of license which I’m calling “relational source” - not about money or whether a product is commercial but instead if there’s an actual person who wants to use the code with some kind of AGPL-esque mechanism to ensure no mindless ingestion- perhaps this would never work but it’s also breaking the spirit of everything I love about OSS to have AI erasing the contributions of the people who put their time into doing the work.

  • foxylad 26 minutes ago ago

    This will kill open source. Anything of value will be derived and re-derived and re-re-derived by bad players until no-one knows which package or library to trust.

    The fatal flaw of the open internet is that bad players can exploit with impunity. It happened with email, it happened with websites, it happened with search, and now it's happening with code. Greedy people spoil good things.

    • awesome_dude 18 minutes ago ago

      If this was true, why hasn't it happened for the last... 30 or 40 years that FOSS code has been published on the internet

      • makeitdouble 9 minutes ago ago

        Copyright was the base protection layer. Not in the "I own it" sense, but in the "you can't take it and run with it" sense.

        With the current weakening of it, it opens the door to abuses that we don't have the proper tools to deal with now. Perhaps new ones will emerge, but we'll have to see.

      • ares623 13 minutes ago ago

        Last i checked LLMs didn’t exist until only a few years ago

      • croes 8 minutes ago ago

        Same reason why fake images and videos are now more. Photoshop existed 30 years ago.

        Before LLM you needed time and abilities to do it, with AI you need less of both.

  • cientifico 5 minutes ago ago

    The license was MIT until two months ago.

    That gives anyone the right to get the source code of that commit and do whatever.

    The article does not specified if the company is still using the code AFTER the license change.

    The rest of the points are still valid.

  • userbinator 2 hours ago ago

    Hopefully the spread of AI will make more people realise that everything is a derivative work. If it wasn't an AI, it was a human standing on the shoulders of giants.

    • as1mov 37 minutes ago ago

      The offending repository is copying files verbatim while removing off the license header from the said files. It's not "standing on the shoulder of giants".

      • typpilol 3 minutes ago ago

        That doesn't even seem like ai but just direct copy pasting lol

    • croes 5 minutes ago ago

      AI makes it easy for others to claim they did the work so others are less likely to do the real work. Means the giants won’t grow.

    • smj-edison 2 hours ago ago

      Yeah, this is where I find the copyright argument a little weak. Because how do artisans learn their craft? By observing others' work.

      Instead, I feel like the objections are (rightly) these two issues:

      1. GenAI operates at a much larger scale than an individual artist. I don't think artists would have an issue with someone commissioning a portrait say in the style Van Gogh (copyright argument). They would have an issue if that artist painted 100,000 pictures a day in the style of Van Gogh.

      2. Lack of giving back: some of the greatest artists have internalized great art from previous generations, and then something miraculous happens. An entirely new style emerges. They have now given back to the community that incubated them. I don't really see this same giving back with GenAI.

      Edit: one other thought. Adobe used their own legally created art to train their model, and people still complain about it, so I don't buy the copyright argument if they're upset about Adobe's GenAI.

      Edit 2: I'm not condoning blatant copyright infringement like is detailed in this post.

      • visarga an hour ago ago

        1. If I wanted the "style of Van Gogh" I would simply download Van Gogh, why waste time and money on approximative AI. But if I want something Else, then I can use AI. But Gen AI is really the worst infringement tool, for example would anyone try to read bootleg Harry Potter from a LLM to avoid payment? Don't think so.

        2. LLMs will give back what you put in + what they learned, it's your job to put in the original parts. But every so often this interaction will spark some new ideas. The LLM+human team can get where neither of them would get alone, building on each other's ideas.

      • bluefirebrand an hour ago ago

        > Because how do artisans learn their craft? By observing others' work

        I don't think that computer systems of any kind should have the same right to fair use that humans have

        I think humans should get fair use carve outs for fanart and derivative work, but AI should not

      • charcircuit an hour ago ago

        >Lack of giving back

        I disagree. There is a ton of free AI generated text, code, images, and video available for completely free for people to learn from.

      • alganet 2 hours ago ago

        Copyright is a nightmare. It's just that it sounds like a gentler nightmare than hyperscaled algorithms controlled by a few.

    • add-sub-mul-div 2 hours ago ago

      Nothing subverts my defense of human creativity more than the cliched human defenses of AI.

      • monero-xmr an hour ago ago

        For those of us who exceed the AI, it raises our value enormously. You see it in the pay of the AI engineers. But in the high interest rate world, those of us who continue to be employed, are commanding higher wages, as far as I can tell. It is a culling of the lesser-than.

        One unfortunate side-effect is the junior engineers who cannot immediately exceed the AI are not being hired as often. But this era echos the dotcom boom, where very low-skilled people commanded very-high wages. Universities, which have always been white collar job training but pretended they weren't, are being impacted greatly.

        https://registrar.mit.edu/stats-reports/majors-count

        24% of undergraduate MIT students this year have Computer Science in the title (I asked chatgpt to calculate this from the difficult-to-parse website). 1/4 of all MIT undergraduates are not being trained to be future PhD researchers - they, like all other schools, are training the vast majority of their students for private sector workforce jobs.

        The culling is happening all over. We will likely go down to < 1000 colleges in America from 4000 now over the next 15 years.

        This is a good thing. The cost of university degrees is far too high. We are in the midst of a vast transition. College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2. This very weird experiment in human history is ending, and it cannot happen soon enough

        • teiferer 2 minutes ago ago

          > College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.

          Yeah, the world was a better place when it was mostly white males having that chance.

          /s

        • card_zero 4 minutes ago ago

          35%, ignoring "secondary majors" which may or may not coincide with primary majors that also have CS in the title.

        • nhinck3 26 minutes ago ago

          A racist crypto shill waxing poetically about the value of tertiary education? I'm positively enrapt, tell me more about how you exceed the AI when you can't parse a basic data table.

        • sciencejerk 19 minutes ago ago

          > College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.

          You're likely correct that we're witnessing a reconsolidation of wealth and the extinction of the middle class in society, but you seem happy about this? Be careful what you wish for...

          • ares623 8 minutes ago ago

            They probably think they’re one of the “truly intelligent and children/parent of the rich” lol

          • monero-xmr 16 minutes ago ago

            Alternatively, all middle class jobs do not require a college degree. Perhaps a college degree is primarily a signalling mechanism for adherence to a bygone era of societal norms. But the price is far too high to justify it, and the market will create alternative proof of societal norms, at a far cheaper price. Which is happening as we debate.

            My concern now is a large number of under-employed college graduates who are indebted to worthless degrees, feeling pinched because the debt far surpasses their market value. This has been the case for a long time, but has now reached the upper-echelons of academia where even Ivy league grads cannot get employment. You need to re-calibrate your ire to the correct target

            • novemp 10 minutes ago ago

              Yeah, sure, not every job should require a degree, but that doesn't justify keeping The Poors from pursuing education.

              Some of us value education for its own sake, not as a prerequisite for employment.

              • monero-xmr 8 minutes ago ago

                You are assuming the only avenue to "education" is through the university experience

                • novemp 6 minutes ago ago

                  Some people learn best in structured class settings.

    • hu3 2 hours ago ago

      This. AI is a magnificent way to make the entire world's codebase available as a giant, cross-platform, standard library.

      I welcome AI to copy my crap if that's going to help anyone in the future.

      • beeflet an hour ago ago

        Except closed source software which it isn't trained on.

      • alganet 2 hours ago ago

        You forgot to mention that if things continue as they are, a very small group of people will have complete control over this giant library.

        • hu3 2 hours ago ago

          It's a concern. But there are open source models.

          • vineyardmike an hour ago ago

            Open source model, created at great expense… by a still small cohort of people.

            There are like a dozen organizations globally creating anything close to state of the art models. The fact that you can use some for free on your own hardware doesn’t change that those weights were trained by a small cohort of people, with training data selected by those people, and fine-tuning and “alignment” created by those people.

            Sure you can fine-tune the smaller ones yourself, but that still leaves you at the will of original creator.

          • alganet an hour ago ago

            No, there aren't.

            There is open source training and inference software. And there are open weights.

            Those things are not enough to reproduce the training.

            Even if you had the hardware, you would not be able to recreate llama (for example) because you don't know what data went into the training.

            That's a very weird library. You can get their summaries, but you don't have access to the original works used when creating it. Sounds terrible, open source or not.

          • zdwolfe an hour ago ago

            I find it odd that any LLM could be considered open source. Sure the weights are available to download and use, but you can't reasonably reconstruct the output model as it's impractical for an individual to gather a useful dataset or spend $5,000,000+ of GPU time training.

            • jsight an hour ago ago

              Distillation can extract the knowledge from an existing model into a newly trained one. That doesn't solve the cost problem, but costs are steadily coming down.

    • CamperBob2 2 hours ago ago

      I'll give you the only upvote you'll probably get for that sentiment around here. Enjoy your trip to -4 (Dead)!

  • arthurofbabylon 2 hours ago ago

    If we step back and examine LLMs more broadly (beyond our personal use cases, beyond "economic impact", beyond the underlying computer science) what we are largely looking at is an emerging means of collaboration. I am not an expert computer scientist, and yet I can "collaborate" (I almost feel bad using this term) with expert computer scientists when my LLM helps me design my particular algorithm. I am not an expert on Indonesian surf breaks, yet I tap into an existing knowledge base when I query my LLM while planning the trip. I am very naive about a lot of things and thankfully there are numerous ways to integrate with experts and improve my capacity to engage in whatever I am naive about, LLMs offering the latest ground-breaking method.

    This is the most appropriate lens through which to assess AI and its impact on open source, intellectual property, and other proprietary assets. Alongside this new form of collaboration comes a restructuring of power. It's not clear to me how our various societies will design this restructuring (so far we are collectively doing nearly nothing) but the restructuring of these power structures is not a technical process; it is cultural and political. Engineers will only offer so much help here.

    For the most part, it is up to us to collectively orchestrate the new power structure, and I am still seeing very little literature on the topic. If anyone has a reading list, please share!

    • visarga an hour ago ago

      > what we are largely looking at is an emerging means of collaboration.

      They surpass open source, "out-open source-opensouce" by learning skills everywhere and opening them up for anyone who needs them later.

  • ugh123 2 hours ago ago

    > Please DO NOT TURST ANY WORD THEY SAY. They're very good at lingual manipulation.

    I don't know if this was intentional misspelling or not but it's damn funny

    • josfredo 22 minutes ago ago

      It is likely intentional as the author is battling AI with many means possible. However it leans towards funny and hopeless at the same time.

  • ebcode 3 hours ago ago

    not hard to believe. I’ve been using claude code and am hesitant to publish publicly because I’m concerned about copyright violations. It would be nice if there were a registry (besides github) where I could compare “new” code against public repositories.

  • dvrp 2 hours ago ago

    This is the new reality. Information in the form of raw entropy encoded in weights—it doesn’t matter if it’s text, image, video, or 3D. Assets (or formerly known as assets) now belong to the big labs, if it’s on the internet.

    Internet plus AI implies the tragedy of the commons manifested in the digital world.

  • CuriouslyC 3 hours ago ago

    Sorry to say but this is going to be the new normal, and it's going to be quite difficult to stop. Your moat as a creator is your personal brand and the community you build around your tools.

    • o11c 3 hours ago ago

      I just hope that means we're all allowed to feed leaked source code to our own AIs then. This is mandatory if we're to have any sort of coherent legal precedent.

    • throwaway290 3 hours ago ago

      this is a blatant try to normalize. "Bad people do unethical things, I guess we'll have to live with it and shut up" is the vibe

      the author is going good. it's not a new normal until everybody goes quiet

      • pessimizer 2 hours ago ago

        > this is a blatant try to normalize.

        This doesn't mean anything. You have no ability to "normalize" anything. It's not an action that somebody can take.

        > it's not a new normal until everybody goes quiet

        Real let me speak to your manager energy. Nobody is waiting for you to go quiet to get on with things.

        • akoboldfrying 24 minutes ago ago

          > You have no ability to "normalize" anything.

          Normalisation isn't something that one person by themselves can achieve. It only happens when public opinion is swayed. How is it swayed? By people deliberately trying to sway it, like GP here.

          If you are instead arguing that normalisation is not really a thing at all: What do you call the change in attitudes to people who are left-handed, disabled, or homosexual?