3 comments

  • yladiz 5 hours ago ago

    > California lawmakers are again considering A.B. 412, a bill that would require AI developers to identify and disclose copyrighted works used to train generative AI systems.

    > The problem this year is the same as last year: it’s practically impossible to comply with this law. The bill demands information that often does not exist, and cannot realistically be obtained.

    > Its definition of “developer” extends to anyone who makes a generative AI model available to Californians.

    I get that this would burden up-and-coming companies that want to train new models, but in general I don't think it's a bad thing that a company needs to know where the material they train their model comes from, and know its copyright status, and if it's actually an impossible problem then maybe the whole system is unworkable. Assuming that model training isn't fundamentally considered fair use, how else can you approach this problem?

    • ElevenLathe 5 hours ago ago

      It's wild how software BoM is taking off at the same time that LLM BoM is being declared literally impossible. IMO the threat model is roughly the same: if you can't account for the provenance of all the text in your training set, how can say that it hasn't been poisoned?

      • Supermancho 3 hours ago ago

        The wet dream of copyright hoarders. Everyone must be responsible for everything they have ever typed, spoken, referenced, or processed over the internet, in regards to all enforceable copyright.

        > The copyright holder's need for control is so desperate because it is so unnatural. Tyranny requires constant effort. It breaks, it leaks. Authority is brittle. Oppression is the mask of fear.

        Come at me Big Mouse.