SolidStart - Hacker News

hodgesrm 2 days ago ago

What real-world problem does any of this solve? For instance, how does it protect my IP from being vacuumed up and used by LLMs without permission from or payment to me?

[-]

tadfisher 2 days ago ago

The problem it solves is providing any sort of baseline framework for lawmakers and the legal system to even discuss AI and its impacts based on actual data instead of feels. That's why so much of it is about requiring tech companies to publish safety plans, transparency reports and incidents, and why the penalty for noncompliance is only $10,000.

A comprehensive AI regulatory action is way too premature at this stage, and do note that California is not the sovereign responsible for U.S. copyright law.

[-]

iinnPP 2 days ago ago

If I had a requirement to either do something I didn't want to do or pay a nickel, I'd just fake doing what needed to be done and wait for the regulatory body to fine me 28 years later after I exhausted my appeal chain. Luckily, inflation turned the nickel into a penny, now defunct, and I rely on the ability to pay debts in legal currency to use another 39 years of appeals.

[-]

kbenson 2 days ago ago

And if later on someone proposes more aggressive action if you don't do that action, they now have a record of all the times you've failed to do that thing which they can point to about how the current level of penalties are not sufficient.

That said, the penalty is not just $10k. It's $10k for an unknowing violation, or $100k for a knowing violation that "does not create a material risk of death, serious physical injury, or a catastrophic risk" or a unknowing violation that does cause that, and $10m for if you knowingly violate it and it does cause risk of death or serious physical injury, etc.

I imagine the legal framework and small penalty if you fail to actually publish something can play into the knowing/unknowing state if they investigate you as well.

zie 2 days ago ago

28yrs for an appeals chain is a bit longer than most realities I'm aware of. More like a dozen years at the top end would be more in line with what I've seen out there.

In general though, it's easier to just comply, even for the companies. It helps with PR and employee retention, etc.

They may fudge the reports a bit, even on purpose, but all groups of people do this to some degree. The question is, when does fudging go too far? There is some gray, but there isn't infinite amounts of gray.

[-]

iinnPP 2 days ago ago

For sure, it's meant to go a bit too far I guess. I'll be more real.

This is to allow companies to make entirely fictitious statements that they will state satisfies their interpretation. The lack of fines will suggest compliance. Proving the statement is fiction isn't ever going to happen anyway.

But it's also such low fine, that will eat inflation for those 12 years.

[-]

mlyle 2 days ago ago

In general, when you are appealing fines you have to have posted them as bond.

It's a low fine, but your particular objection is invalid.

BizarroLand a day ago ago

12 years would also be extraordinary, the grand majority of these are dealt with within a year or two.

But I also understand that you were using hyperbole to emphasize your point, so there's not actually a reason to argue this.

johnnyanmac 2 days ago ago

>and why the penalty for noncompliance is only $10,000.

Think they were off by an order of magnitude for this fine. The PR for reporting anything bad on AI is probably worth more than the fine for non-compliance. 100k would at least start to dent the bumper.

[-]

tadfisher 2 days ago ago

Hint: It's low because the tech companies are already in agreement with the legislation. This is a huge win compared to a blanket regulatory push.

[-]

theptip 2 days ago ago

I thought OpenAI campaigned real hard against this one?

johnnyanmac 2 days ago ago

I want to hope so. They'll agree, until the bubble is prone to bursting and then suddenly that fine might be better to eat.

[-]

HPMOR 2 days ago ago

There is no bubble. Your priors are not serving you well here.

[-]

BolexNOLA 2 days ago ago

If I had to shoot from the hip and guess: 30% of the current crop of AI startups are going to make it at best. Frankly that feels insanely generous but I’ll give them more credit than I think they deserve re: ideas and actual staying power.

Many will crash in rapid succession. There isn’t enough room for all these same-y companies.

[-]

chrisweekly 2 days ago ago

I thought a 10% success rate was the baseline for startups -- which would make your 30% estimate generous indeed.

[-]

BolexNOLA 2 days ago ago

I'm admittedly hedging my bets in both directions a bit since really none of us know anything about what's going to happen lol

anukin a day ago ago

Why do you say that there is no bubble? Do you feel the investment that went in to this justifies the results and returns?

marcellus23 2 days ago ago

The AI bubble is a thing but so was the dot com bubble. It doesn't mean the technology is useless.

johnnyanmac 2 days ago ago

Historically speaking, people saying "history won't repeat this time, it's different" have a pretty bad track record. Do we remember what the definition of "insanity" is?

[-]

baconbrand 2 days ago ago

This seems different to me than other hype bubbles because “the human intelligence/magical worker replacement machine isn’t in that guy’s box or the other guy’s box or even the last version of my box but it surely is in this box now that I have here for you for a price...” is just about infinitely repackageble and resellable to an infinite (?) contingent of old gullible men with access to functionally infinite money. Not sure if that’s lack of imagination or education on my part or what.

DrewADesign 2 days ago ago

“The .com economy is a new business paradigm: the rules have changed.”

- pioneers in wrongness 25 years ago. Oft copied, but never vindicated.

[-]

jefftk 2 days ago ago

That's a weird example, because they were vindicated. The .com economy really did massively change the way business is done. They just (fatally, for most of the companies) overestimated how quickly the change would happen.

[-]

2 days ago ago

[deleted]

BolexNOLA 2 days ago ago

Having a bubble doesn’t mean the industry entirely implodes and goes away forever or otherwise doesn’t change things. It’s often a phase.

[-]

mlyle 2 days ago ago

I think a bubble implies that the industry was overvalued compared to its long term fundamental value.

Having a dip and then a return to growth and exceeding the previous peak size isn't exactly a bubble. The internet and tech sector has grown to dominate the global economy. What happened was more of a cyclical correction, or a dip and rebound.

[-]

johnnyanmac 2 days ago ago

>Having a dip and then a return to growth and exceeding the previous peak size isn't exactly a bubble.

That's exactly what a bubble is. You blow too fast and you get nothing but soap. You blow slow, controlled breaths balancing speed and accuracy and you get a big, sustainable bubble.

No one's saying you can't blow a new bubble, just that this current one isn't long for this world. It's just that the way investments work is that the first bubble popping will scare off a lot of the biggest investors who just want a quick turnaround.

[-]

mlyle 2 days ago ago

A bubble's when values inflate well beyond intrinsic valuations.

You can have a crash without this having happened.

[-]

johnnyanmac 2 days ago ago

Yes, indeed. Thought I'd change "intrinsic valuations" to "rapid change in valuations".

If you deflate over time slowly (or simply slow growth expectations) you can prevent a pop. That doesn't tend to be what historically happens, though.

[-]

mlyle 2 days ago ago

> to "rapid change in valuations".

Yes. This seems to be what you're doing-- carrying the metaphor too far. I believe the common use relates the asset prices to intrinsic or fundamental values.

kbenson 2 days ago ago

There's a graduated penalty structure based on how problematic it is. It goes from $10k to $100k and then $10m, depending on whether you're in violation knowingly or unknowingly and whether there's a risk of death or serious physical harm, etc.

jacquesm 2 days ago ago

> A comprehensive AI regulatory action is way too premature at this stage

Funny, I think it is overdue.

[-]

andsoitis 2 days ago ago

> I think it is overdue.

Why?

nomel 2 days ago ago

> Government Operations Agency to develop a framework for creating a public computing cluster.

The meat of the bill is that some government contractors are about to get very rich. And, if history reflects the future, some portion will be no-bid, to make sure the money goes to exactly who he wants it to go to: https://www.sacbee.com/opinion/editorials/article250348451.h...

[-]

esalman 2 days ago ago

Talking to the government is a rare skill and those who do tend to get very rich. Just look at Elon Musk.

podgietaru 2 days ago ago

Protection for whistleblowers - which might expose nefarious actions

[-]

freedomben 2 days ago ago

I think protection for whistleblowers both in AI and in general is a good thing, but ... do we really need a special carveout for AI whistleblowers? Do we not already have protections for them, or is it insufficient? And if we don't have them already, why not pass general protections instead of something so hyper-specific?

(not directing these questions at you specifically, though if you know I'd certainly love to hear your thoughts)

[-]

AnthonyMouse 2 days ago ago

You have to understand what the purpose of this bill is.

It's not supposed to do anything in particular. It's supposed to demonstrate to the public that lawmakers are Taking Action about this whole AI thing.

An earlier version of the bill had a bunch of aggressive requirements, most of which would have been bad. The version that passed is more along the lines of filing paperwork and new rules that are largely redundant with existing rules, which is wasteful and effectively useless. But that was the thing that satisfied the major stakeholders, because the huge corporations don't care about spending ~0% of their revenue on some extra paper pushers and the legislators now get to claim that they did something about the thing everybody is talking about.

abakker 2 days ago ago

I think the idea is that explicit protections might encourage whistle-blowing. Especially since the domain is nascent enough that it's not clear what you'd blow the whistle on that might be unique to the companies that make foundation models. In many cases, there will be whistleblowers who both disclose what is being fed into models, but also details in aggregate about what users of models can do.

cosmic_cheese 2 days ago ago

Could be mainly to send a message. “Remember, no funny business with whistleblowers. We’re watching.”

simondotau 2 days ago ago

This isn’t necessarily about fixing today’s real-world problems. Transparency is an enabler for identifying (or avoiding) a class of future real-world problems.

[-]

BolexNOLA 2 days ago ago

Basically what many of us called for 2-3 years ago and were called Luddites over.

bloodyplonker22 2 days ago ago

It solves a very real real world problem: putting more money into the hands of government officials.

[-]

array_key_first 2 days ago ago

This is, like, the stupidest and most inefficient way for a government to make money.

I know it's fun and all to circle jerk about how greedy those darn bureaucrats are - but we're all aware they control the budget, right? They could just raise taxes.

I don't think they're fining companies... sigh... 10,000 dollars as some sort of sneaky "haha gotcha!" scam they're running.

[-]

votepaunchy a day ago ago

Raising taxes does not solve the problem of getting the money out to the right people.

[-]

array_key_first a day ago ago

The right people? I just don't think CA, which has the fourth largest GDP in the world, is trying to target the likes of OpenAI for a measley fucking 10k.

willmadden 2 days ago ago

Bingo.

apercu 2 days ago ago

My first impression of your post was “it’s not perfect why do it?” - I hope I’m wrong.

Hard to tell on the interwebs so apologies if that wasn’t the intent.

[-]

ipaddr 2 days ago ago

What if it was worded. Doing this distracts from the real issues like ip thief. Does a published safety plan really help or is this just another regulation / red tape exercise that increases government headcount/cost for political purposes?

[-]

apercu a day ago ago

Or for performative purposes. I doubt it's to increase headcount.

But, I ask, should they do nothing?

KurSix 2 days ago ago

In a way, it assumes the training has already happened and focuses on what comes after

egorfine 2 days ago ago

> What real-world problem does any of this solve?

Drives AI innovation out of California.

[-]

superfrank 2 days ago ago

(X) Doubt

Here's a list of the 50 biggest AI companies from April of this year. 3/4 companies on that list are located in the Bay Area. If companies are already willing to pay higher than average taxes, wages, and property costs to be located in California. I doubt, "You've got to publish safety standards on your website" is going to be the thing that drives them out of California.

CalCompute sounds like it's going to allow for more innovation than ever given that it should lower the barrier to entry for edge AI research.

50 Biggest AI Companies: https://www.forbes.com/lists/ai50/

micromacrofoot 2 days ago ago

I think it applies to companies providing services to California (based on how much data from Californian's they process), not just limited to those operating within the state, similar to the CCPA.

[-]

egorfine 2 days ago ago

sign one more geo region to block. One more region to remember.

Internet is becoming fragmented. :-(

[-]

cosmic_cheese 2 days ago ago

Geoblocking the economy that’s the largest in the US and fourth largest in the world (if it were a country) over some kid gloves regulations would be phenomenally stupid.

[-]

egorfine 2 days ago ago

What else I could possibly do if complying is not technically possible?

[-]

happyopossum 2 days ago ago

It’s not technically possible to publish a couple of documents?

[-]

thereisnospork 2 days ago ago

Which documents? Which are comprehensively listed where? With what indemnification for a good faith effort?

[-]

serf 2 days ago ago

that's a ridiculous stance to take, and you could take it all day -- regulations change on the net daily, it's a full time job being totally compliant, that's why people make money (..or attempt to..) while doing it.

[-]

hopelite 2 days ago ago

That is just one of the issues, administrative bloat and drag; not even to mention that it is very likely that those kinds of administrative burdens are what crushes innovation and, more importantly to the established players, competition. It is why it is known that the established large players often encourage administrative hurdles and red tape because they are established and in many cases they can just pass on the cost of administrative burdens to the consumer.

mothballed 2 days ago ago

Even better for the company if those people have worked for or have friends in regulatory agency. To, uh, make sure they did it right.

jrflowers 2 days ago ago

You are geoblocking California from your AI company? That’s pretty significant. How much business did your AI company do in California before this news?

micromacrofoot 2 days ago ago

lol no one's going to geoblock california

[-]

egorfine 2 days ago ago

What else I could possibly do if complying is not technically possible?

[-]

micromacrofoot 2 days ago ago

well certainly not geoblock a lot of your customers

maybe your situation is different, but if we geoblocked all of california we'd go out of business within a year

[-]

AnthonyMouse 2 days ago ago

For global companies, California is less than 40 million people out of more than 8 billion. For regionally concentrated companies, they only have to care if the region they're concentrated in is California. For everyone else, losing a fraction of a percent of the customer base in exchange for lower compliance costs and legal risks is often completely logical.

egorfine 2 days ago ago

Yeah I am with you here, but what else is to be done if legislation is passed that is impossible to comply with? Like in UK.

[-]

micromacrofoot 2 days ago ago

The GDPR and other laws were really hammed up by consultants and lawyers looking to make a lot of money claiming it was really difficult to do right, but in my experience it's incredibly overblown.

Most of who's getting caught up in these laws are very large companies that could comply but consistently don't put forth effort to do so after repeated complaints. Even if you do fall under the eye of regulators (most won't ever) if you show that you're putting forth a good faith effort to comply it's not a big deal.

2 days ago ago

[deleted]

TiredOfLife 2 days ago ago

Same way you currently protect your IP from being vacuumed up and used by humans

fourseventy 2 days ago ago

[flagged]

observationist 2 days ago ago

[flagged]

andy99 2 days ago ago

Your rent seeking is not a real world problem. I'm sceptical about the bill, I would be much more so if it was just some kind of wealth redistribution to the loudest complainers.

[-]

Retric 2 days ago ago

Radio stations get to use anyone’s music but they still need to pay to play that music. Requiring payment to use your product isn’t rent seeking anymore than requiring a hobo to leave your house is.

AI companies trying to leverage their power and lobby governments to stiff paying people and thus increase profits is rent seeking behavior. They aren’t creating wealth by non payment, just trying to enrich themselves.

[-]

ndriscoll 2 days ago ago

Hmm? Creating new models is clearly adding wealth to the world, and it wouldn't terribly surprise me if a lot of source material (e.g. scanned books or recorded music) is older than the people working on models. The history of copyright is basically a perfect example of rent-seeking.

[-]

Retric 2 days ago ago

Creating new models doesn’t require taking content without any compensation.

That’s the basic flaw in any argument around necessity.

[-]

ndriscoll 2 days ago ago

No, but society has no reason to grant monopolies on 50 year old publications (e.g. textbooks or news articles written or songs recorded prior to 1975), and the changes that were made to copyright law to extend it into multiple generations were actual rent-seeking. i.e. manipulating public policy to transfer wealth from others to you rather than creating wealth. Going with the original 28 year compromise from a time when publishing was much more expensive, anything prior to 1997 would be free for anyone to use for any purpose with no expectation of compensation. We'd all be far richer culturally if we had this base to freely build upon instead of having the last 100 years stolen.

Likewise much of the most important information to want to train on (research literature) was just straight up stolen from the public that paid for its creation already.

By contrast, the models being created from these works are obviously useful to people today. They are clearly a form of new wealth generation. The open-weights models are even an equitable way of doing so, and are competitive with the top proprietary models. Saying the model creators need to pay the people monopolizing generations-old work is the rent-seeking behavior.

[-]

Retric 2 days ago ago

It’s far more recent works that AI companies care about. They can’t copy 50 year old python, JavaScript, etc code because it simply doesn’t exist. There’s some 50 year old C code, but it’s no longer idiomatic and so it goes.

Utility of older works drop off as science marches on and culture changes. The real secret of long copyright terms is they just don’t matter much. Steamboat Willy entered the public domain and for all practical purposes nothing changed. Chip 20 years off of current copyright terms and it starts to matter more, but still isn’t particularly important. Sure drop it down to say 5 years and that’s meaningful but now it’s much harder to be an author which means fewer books worth reading.

[-]

AnthonyMouse 2 days ago ago

I still don't really get how compensation is supposed to work just based on the math. Models are trained on billions of works and have a lifetime of around a year; AI companies (e.g. Anthropic) have revenue in the low billions of dollars a year.

Even if you took all of that -- leave nothing for salaries, hardware, utilities, to say nothing of profit -- and applied it to the works in the training data, it would be approximately $1 each.

What is that good for? It would have a massive administrative cost and the authors would still get effectively nothing.

[-]

Retric a day ago ago

I think you’re overestimating the number of authors, and forgetting there’s several AI companies. A revenue sharing agreement with 10% going to creators isn’t unrealistic.

Google’s revenue was 300 billion with 100 billion in profits last year, the AI industry may never reach that size but 1$/person on the planet is only 8 billion dollars, drop that to 70% of people are online so your down to 5.6 billion.

That’s assuming you’re counting books and individual Facebook posts in any language equally. More realistically there’s only 12k professional journalists in the US but they create a disproportionate amount of value for AI companies.

[-]

AnthonyMouse a day ago ago

> Google’s revenue was 300 billion with 100 billion in profits last year, the AI industry may never reach that size but 1$/person on the planet is only 8 billion dollars, drop that to 70% of people are online so your down to 5.6 billion.

Google is a huge conglomerate and a poor choice for making estimates because the bulk of their revenue comes from "advertising" with no obvious way to distinguish what proportion of that ad revenue is attributable to AI, e.g. what proportion of search ad revenue is attributable to being the same company that runs the ad network, and to being the default search in Android, iOS and Chrome? Nowhere near all of it or even most of it is from AI.

"Counting books and individual Facebook posts in any language equally" is kind of the issue. The links from the AI summary things are disproportionately not to the New York Times, they're more often to Reddit and YouTube and community forums on the site of the company whose product you're asking about and Stack Overflow and Wikipedia and random personal blogs and so on.

Whereas you might have written an entire book, and that book is very useful and valuable to human readers who want to know about its subject matter, but unless that subject matter is something the general population frequently wants to know about, its value in this context is less than some random Facebook post that provides the answer to a question a lot of people have.

And then the only way anybody is getting a significant amount of money is if it's plundering the little guy. Large incumbent media companies with lawyers get a disproportionate take because they're usurping the share of YouTube creators and Substack authors and forum posters who provided more in aggregate value but get squat. And I don't see any legitimacy in having it be Comcast and the Murdoch family who take the little guy's share at the cost of significant overhead and making it harder for smaller AI companies to compete with the bigger ones.

[-]

Retric a day ago ago

> Google is a huge conglomerate

The point of comparison was simple a large company here, the current size of say OpenAI when the technology is still fairly shitty is a poor benchmark for where the industry is going. LLM’s may even get superseded by something else, but whatever form AI takes training it is going to require work from other people outside the company in question.

Attribution is solvable both at a technical and legal level. There’s a reasonable argument a romance novelist isn’t contributing much value, but that’s not an argument nobody should be getting anything. Presumably the best solution for finding value is let the open market decide the rough negotiations.

[-]

AnthonyMouse 21 hours ago ago

> LLM’s may even get superseded by something else, but whatever form AI takes training it is going to require work from other people outside the company in question.

It's going to require training data, but no incremental work is actually being done; it's being trained on things that were written for an independent purpose and would still have been written whether they were used as training data or not.

If something was actually written for the sole purpose of being training data, it probably wouldn't even be very good for that.

> Attribution is solvable both at a technical and legal level.

Based on how this stuff works, it's actually really hard. It's a statistical model, so the output generally isn't based on any single thing, it's based a fraction of a percent each on thousands of different things and the models can't even tell you which ones.

When they cite sources I suspect it's not even the model choosing the sources from training data, it's a search engine providing the sources as context. Run a local LLM and see what proportion of the time you can get it to generate a URL with a path you can actually load.

> Presumably the best solution for finding value is let the open market decide the rough negotiations.

That's exactly the thing that doesn't work here because of the transaction costs. If you write a blog, are you supposed to negotiate with Google so they can pay you half a french fry for using it as training data? Neither party has any use for that; the cost of performing the negotiations is more than the value of the transaction. But the aggregate value being lost if it can't be used as a result of that is significant, because it's a tiny amount each but multiplied by a billion.

And then what would happen in practice? Google says that in exchange for providing you with video hosting, you agree to let them use anything you upload to YouTube as training data. And then only huge conglomerates can do AI stuff because nobody else is in a position to get millions of people to agree to that term, but still none of the little guys are getting paid.

Restricting everyone but massive conglomerates from doing AI training in order to get them to maybe transfer some money exclusively to some other massive conglomerates is a bad trade off. It's even a bad trade off for the media companies who do not benefit from stamping out competitors to Google and the incumbent social media giants that already have them by the neck in terms of access to user traffic.

ndriscoll 2 days ago ago

Ostensibly copyright is there to increase economic incentives to make things it protects, and like you said, we can massively cut it down without affecting much there. So focusing on economic viability, set it to something like 15 years for code and 20-30 for everything else. Require registration for everything and source escrow for code and digital art to be granted copyright. That would give a wealth of code to train on already even without people who would be fine freely giving it away. There's also government code as a relatively large public domain source for recent material.

Like I said science has mostly been stolen, and has no business being copyrighted at all. The output of publicly funded research should immediately be public domain.

Anyway this is beside the point that model creation is wealth creation, and so by definition not rent-seeking. Lobbying for a government granted monopoly (e.g. copyright) is rent-seeking.

[-]

Retric 2 days ago ago

Exclusively training on 15 year source code would make code generation significantly less useful as API’s change.

Economic viability and utility for AI training are closely linked. Exclude all written works including news articles etc from the last 25 years and your model will know nothing about Facebook etc.

It’s not as bad if you can exclude stuff from copyright and then use that, but your proposal would have obvious gaps like excluding works in progress.

[-]

ndriscoll a day ago ago

You wouldn't need to exclusively train on 15 year old source code. What I said would simply grant you free access to all 15 year old source code, but you can already train on public domain code and likely any FOSS code without any issue, or if courts do start deciding that models inherit copyright, at the most you might have to link a list of all of the codebases you trained on with license info. The nature of the thing is that any code it spits out is already in source form, so the only missing part is the notice.

I suppose we all exist in our own bubbles, but I don't know why anyone would need a model that knows about Facebook etc. In any case, it's not clear that you couldn't train on news articles? AFAIK currently the only legal gray area with training is when e.g. Facebook mass pirated a bunch of textbooks. If you legally acquire the material, fitting a statistical model to it seems unlikely to run afoul of copyright law. Even without news articles, it would certainly learn something of the existence of Facebook. e.g. we are discussing it here, and as far as I know you're free to use the Hacker News BigQuery dump to your liking. Or in my proposed world, comments would naturally not be copyrighted since no one would bother to register them (and indeed a nominal fee could be charged to really make it pointless to do so). I suppose it is an important point that in addition to registration, we should again require notices, maybe including a registration ID.

Give a post-facto grace period of a couple weeks/months to register a thing for copyright. This would let you cover any work in progress that gets leaked by registering it immediately, causing the leak to become illegal.

[-]

Retric a day ago ago

>> It’s not as bad if you can exclude stuff from copyright and then use that

Making a copy of a news article etc to train with is on the face of it copyright infringement even before you start training. Doing that for OSS is on the other hand fine, but there’s not that much OSS.

I think training itself could reasonably be considered fair use on a case by case basis. Train a neural network to just directly reproduce a work being obviously problematic etc. There’s plenty of ambiguity here.

otterley 2 days ago ago

That’s fine, but if you don’t want content taken without compensation, don’t make it available for free on the Internet. You can’t have it both ways, where it’s free to individuals to read but not for machines to do it. That’s just practically impossible.

otterley 2 days ago ago

The music analogy doesn’t hold. Unlike websites that provide content for free to the public, commercial recording artists don’t make their content available for free on demand to the public. Spotify and radio/TV broadcasters, as well as individuals, don’t get a copy unless they buy one or make arrangements with the publisher or its licensees.

This is why we’re seeing paywalls go up: authors and publishers of textual content are seeing that they need to protect the value of their assets.

[-]

Retric 2 days ago ago

LLMs are being trained on published books a direct equivalent of records. People were able to get one to reproduce over 40% of Harry Potter and the Sorcerer’s stone word for word. https://arstechnica.com/features/2025/06/study-metas-llama-3...

There’s zero chance that happened without the book being in their training corpus. Worse, there’s significant effort put into obscuring this.

[-]

otterley 2 days ago ago

Yes, they are trained on books, but the courts so far are largely in agreement that AI models are neither copies nor derivative works of the source materials. If you’re searching for legal protections against your works being used for model training, copyright law as written today does not appear to give you cover.

[-]

Retric 2 days ago ago

Stuff can take a long time to wind their way through the court system. The worst cases already failed but many are going strong, here’s a 1.5 billion dollar win for authors.

https://www.kron4.com/news/technology-ai/anthropic-copyright...

[-]

otterley 2 days ago ago

Anthropic and the authors settled over a portion of the case involving the unauthorized copying of works that were used to train the model. Obtaining the works is a step that happens before the training has begun.

“the authors alleged nearly half a million books had been illegally pirated to train AI chatbots...”

Finally, a settlement isn’t a “win” from a legal perspective. It’s money exchanged for dropping the case. In almost every settlement, there’s no admission of guilt or liability.

[-]

Retric 2 days ago ago

The entire case settled, the authors aren’t going to appeal when the company can’t hand out much more than the 1.5 billion in question and the company isn’t allowed to use the works in question going forward.

[-]

otterley 2 days ago ago

Before the settlement was made, Judge Alsup found as a matter of law that the training stage constituted fair use.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

[-]

Retric a day ago ago

A judge yes, but that’s subject to appeal. The point is it never reached that stage and never will.

[-]

otterley a day ago ago

As an attorney, I'm trying to understand what you're getting at.

His opinion, while interlocutory and not binding precedent, will be cited in future cases. And his wasn't the only one. In Kadrey v. Meta Platforms, Inc., No. 23-cv-03417 (N.D. Cal. June 25, 2025) Judge Chhabria reached the same conclusion. https://storage.courtlistener.com/recap/gov.uscourts.cand.41...

In neither case has an appeal been sought.

[-]

Retric a day ago ago

I’m saying legally things aren’t clear cut at this point.

If you’ve read Kadrey the judge says harm from competing with the output of authors would be problematic. Quite relevant for software developers suing about code generation but much harder for novelists to prove. However, the judge came to the opposite conclusion about using pirate websites to download the books in question.

A new AI company that is expecting to face a large number of lawsuits and win some while losing others isn’t in a great position.

[-]

otterley 13 hours ago ago

The judge didn’t come to an opposite legal conclusion about using pirated works. The judge concluded that the claim of using pirated works was unsubstantiated by the plaintiffs.

drivebyhooting 2 days ago ago

I’d rather pay real human authors and artists for their creativity than openAI.

As it is, I would never pay for an AI written textbook. And yet who will write the textbooks of tomorrow?

[-]

ronsor 2 days ago ago

I'd rather not pay OpenAI either. I'll stick with my open-weights models, and I rather anachronistic rent-seeking not kill those.

You're not getting a cent from OpenAI, and the government isn't going to do anything about it. Just get over it.

[-]

drivebyhooting 2 days ago ago

All of them are trained on copyrighted data. Why is it okay for a model to serve up paraphrased books but verbatim copies from the Pirate Bay are illegal?

I don’t deny the utility of LLMs. But copyright law was meant to protect authors from this kind of exploitation.

Imagine instead of “magical AGI knowledge compression”, instead these LLM providers just did a search over their “borrowed” corpus and then performed a light paraphrasing of it.

[-]

kouteiheika 2 days ago ago

> Why is it okay for a model to serve up paraphrased books but verbatim copies from the Pirate Bay are illegal?

Because they are not actually memorizing those books (besides few isolated pathological cases due to imperfect training data deduplication), and whatever they spit out is in no way a replacement for the original?

Here's some back-of-the-envelope math: Harry Potter and the Philosopher's Stone is around ~460KB of text and equivalent to ~110k Qwen3 tokens, which gives us ~0.24 tokens per byte. Qwen3 models were trained on 36 trillion tokens, so this gives us a dataset of ~137TB. The biggest Qwen3 model has ~235B parameters and at 8-bit (at which you can serve the model essentially loselessly compared to full bf16 weights) takes ~255GB of space, so the model is only 0.18% of its training dataset. And this is the best case, because we took the biggest model, and the actual capacity of a model to memorize is only at most ~4 bit per parameter[1] instead of full ~8 bits we assumed here.

For reference, the best loseless compression we can achieve for text is around ~15% of the original size (e.g. Fabrice Bellard's NNCP), which is two orders of magnitude worse.

So purely from information theoretic perspective saying that those models memorize the whole datasets on which they were trained is nonsense. They can't do that, because there's just not enough bits to store all of this data. They extract patterns, the same way that I can take the very same ~137TB dataset and build a frequency table of all of the bigrams appearing in it and build a hidden Markov model out of it to generate text. Would that also be "stealing"? And what if I extend my frequency table to trigrams? Where exactly do we draw the line?

[1] -- https://arxiv.org/pdf/2505.24832

[-]

AlexandrB 2 days ago ago

> whatever they spit out is in no way a replacement for the original?

Is this actually true? I think in many cases it is a replacement. Maybe not in the case of a famous fictional work like Harry Potter, but what about non-fiction books or "pulp" fiction?

Kind of feels like the bottom rungs of the ladder are being taken out. You either become J.K. Rowling or you starve, there's no room for modest success with AI on the table.

[-]

otterley 2 days ago ago

That was the case before AI, too. A small minority of authors are going to be superstars; the rest are still going to have to have day jobs. Publishers largely exist as hedging machines that finance taking risks on new works that have market potential.

terminalshort 2 days ago ago

> Why is it okay for a model to serve up paraphrased books

Because that is, and always has been, legal.

johnnyanmac 2 days ago ago

>You're not getting a cent from OpenAI, and the government isn't going to do anything about it. Just get over it

Can hackers imagine saying this last decade when pertaining to Facebook harvesting your data? It's a shame how much this community has fallen into the very grifts they used to call out.

[-]

ronsor 2 days ago ago

Hackers shilling for copyright is a funny picture.

[-]

drivebyhooting 2 days ago ago

Information wants to be free, right?

Except in this case after robbing humanity’s collective knowledge repository OpenAI and its ilk want to charge for access to it, and completely destroyed the economic incentive for any further human development.

[-]

CamperBob2 2 days ago ago

Huh. Funny, my knowledge is still there after supposedly being "robbed," just like Walt Disney's movies and J. K. Rowling's books and Bill Gates's software and Lars Ulrich's music.

And "further human development" is exactly what's happening. We've just found the entrance to the next level. Our brains have gone about as far as they can on their own, just as our muscles did in the pre-industrial era. It's time to craft some new tools.

andy99 2 days ago ago

> I’d rather pay real human authors and artists for their creativity than openAI.

So would I. You've just demonstrated one of the many reasons that any kind of LLM tax that redistributes money to supposedly aggrieved "creators" is a bad idea.

While by no means the only argument or even one of the top ones, if an author has a clearly differentiated product from LLM generated content (which all good authors do) why should they also get compensated because of the existence of LLMs? The whole thing is just "someone is making money in a way I didn't think about, not fair!"

CamperBob2 2 days ago ago

News flash: You won't have any idea whether the textbook is AI-authored or not.

[-]

bluefirebrand 2 days ago ago

Then we trust nothing published after 2020 or so

AI will have effectively frozen human progress in a snapshot of time because nothing created after LLMs became widely available will be trustworthy

[-]

AlexandrB 2 days ago ago

Who is "we"? A lot of people don't care and will trust new AI generated stuff regardless of how wrong it is. What this means for human progress is uncertain.

[-]

CamperBob2 2 days ago ago

And if they outcompete you, then what? I guess it'll be a little less uncertain at that point.

[-]

drivebyhooting a day ago ago

The same way bacteria can outcompete humans.

TheAceOfHearts 2 days ago ago

I found this website with the actual bill text along with annotations [0]. The section 22757.12. seems to contain the actual details of what they mean by "transparency".

[0] https://sb53.info/

[-]

cogman10 2 days ago ago

> “Artificial intelligence model” means an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.

Correct me if I'm wrong, but it sounds like this definition covers basically all automation of any kind. Like, a dumb lawnmower responds to the input of the throttle lever and the kill switch and generates an output of a spinning blade which influences the physical environment, my lawn.

> “Catastrophic risk” means a foreseeable and material risk that a large developer’s development, storage, use, or deployment of a foundation model will materially contribute to the death of, or serious injury to, more than 50 people or more than one billion dollars ($1,000,000,000) in damage to, or loss of, property arising from a single incident, scheme, or course of conduct involving a dangerous capability.

I had a friend that cut his toe off with a lawnmower. I'm pretty sure more than 50 people a year injure themselves with lawn mowers.

[-]

ajdlinux 2 days ago ago

Yeah, you're wrong - a court simply isn't going to consider a lawnmower's translation of throttle input to motor power as "inference". The principles of statutory interpretation require courts to consider the context and purpose of the legislation, and everyone knows this is about GPT-5, not lawnmowers.

In any case, that definition is only used to further define "foundation model": "an artificial intelligence model that is all of the following: (1) Trained on a broad data set. (2) Designed for generality of output. (3) Adaptable to a wide range of distinctive tasks." This legislation is very clearly not supposed to cover your average ML classifier.

array_key_first 2 days ago ago

These type of comments are so annoying.

"Everything is the same as everything" as an argumentative tactic and mindset is just incredibly intellectually lazy.

As soon as anyone tries to do anything about anything, ever, anywhere, people like you come out of the wood work and go "well what about this perfectly normal thing? Is that illegal now too???"

Why bother making bombs illegal? I mean, I think stairs kill more people yearly har har har! What, now it's illegal to have a two story house?

Also, elephant in the room: lawnmowers absolutely fucking do contain warning and research on their safety. If you develop a lawnmower, YES you have to research it's safety. YES that's perfectly reasonable. NO that's not an undue burden. And YES everyone is already doing that.

[-]

otterley 2 days ago ago

Also, people seem to forget that when laws are challenged or cases arise that judges exist who are charged with making reasonable interpretations of the law.

CGamesPlay 2 days ago ago

It doesn't really "infer..how to" do anything in that example; rather it simply does those things based on how it was originally designed.

I'm not saying that it's a great definition, but I will correct you, since you asked.

kimixa 2 days ago ago

If a single design of automated lawnmower cut off 50 toes it should absolutely be investigated.

Perhaps the result of that investigation is there is no fault on the machine, but you don't know that until you've looked.

[-]

cogman10 2 days ago ago

The reason I bring up the definition is that "AI" is defined so loose as to include dumb lawn mowers.

In my friend's case, he was mowing on a hill, braced to pull the lawnmower back, and jerked it back onto his foot.

Edit: Looked it up, the number of injuries per year for lawn mowers is around 6k [1]

[1] https://www.wpafb.af.mil/News/Article-Display/Article/303776...

[-]

2 days ago ago

[deleted]

inerte 2 days ago ago

Oh that reminds me of strollers that can amputate baby fingers. It happened multiple times. These people need to be sued to death.

themafia 2 days ago ago

Around 20,000 people die every year from falling off a roof or a ladder.

Some things are inherently dangerous.

Ekaros 2 days ago ago

Motion sensing light? Or light sensing? Motion sensors say with automated doors...

They infer from fuzzy input when to activate...

Hilift 2 days ago ago

"The Butlerian Jihad was a cataclysmic, millennia-long holy war that completely eradicated artificial intelligence, computers, and sentient robots from human civilization. Taking place over 10,000 years before the events of the original novel, the Jihad was a violent reaction to humanity's over-reliance on and eventual enslavement by "thinking machines". The ban on advanced AI became a foundational law of the Galactic Imperium."

dang 2 days ago ago

Thanks! we'll add that link to the top text.

srj 2 days ago ago

Reading the text it feels like a giveaway to an "AI safety" industry who will be paid well to certify compliance.

[-]

KurSix 2 days ago ago

There's definitely a cottage industry forming around "AI safety compliance"

landl0rd 2 days ago ago

Big4 audit all have a leech class of boomer audit partners who won't let the advisory arm separate and want money. This is a great new income stream. Figure deloitte in particular will make out like bandits on this.

xpe 2 days ago ago

The commenter’s profile indicates they work for a major AI development companies — where being against AI regulation aligns nicely with one’s paycheck. See also the the scare quotes around AI safety.

We all have heard the dogma: regulation kills innovation. As if unbridled innovation is all that people and society care about.

I wonder if the commenter above has ever worked in an industry where a safety culture matters. Once you have, you see the world a little bit differently.

Large chunks of Silicon Valley have little idea about safety: not in automotive, not in medicine, not for people, and certainly not for longer-term risks.

So forgive us for not trusting AI labs to have good incentives.

When regulation isn’t as efficient as we’d like, that is a problem. But you have to factor what happens if we don’t have any regulation. Also, don’t forget to call out every instance of insider corruption, kickback deals, any industry collusion.

[-]

xpe a day ago ago

I welcome any substantive commentary and disagreement, as always.

I’m happy to stray outside the herd. HN needs more clearly articulated disagreement regarding AI regulation. I made my comment in response to what seemed like a simplistic, ideology-driven claim. Few wise hackers would make an analogous claim about a system they actually worked on. Thinking carefully about tech but phoning it in for other topics is a double standard.

Bare downvotes don't indicate (much less explain) one's rationale. I can’t tell if I (1) struck a nerve (an emotional response); (2) conjured a contentious philosophy (i.e. we have a difference in values or preferences or priorities), (3) made a logical error, (4) broke some norm or expectation, or (5) something else. Such a conflation of downvotes pushes me away from HN for meaningful discussion.

I’ve lived/worked on both coasts, Austin, and more, and worked at many places (startups, academic projects, research labs, gov't, not-for-profits, big tech) and I don’t consider myself defined by any one place or culture. But for the context of AI regulation, I have more fundamental priorities than anything close to "technical innovation at all costs".

P.S. (1) If a downvote here is merely an expression of “I’m a techno-libertarian” or "how dare you read someone's HN profile page and state the obvious?" or any such shallow disagreement, then IMO that’s counterproductive. If you want to express your viewpoint, make it persuasive rather than vaguely dismissive with an anonymous, unexplained downvote. (2) Some people do the thing where they guess at why someone else downvoted. That’s often speculation.

[-]

srj 18 hours ago ago

FWIW I didn't downvote you. I don't work on AI personally, and while I have no way of proving it to you I certainly am not trying to shill for my employer.

My skepticism of AI safety is just because of skepticism of AI generally. These are amazing things, but I don't believe the technology is even a road to AGI. There's a reason it can give a chess move when prompted and explain all the rules and notation, but can't actually play chess: it's not in the training data. I simply think the hype and anxiety is unnecessary, is my issue. Now this is most definitely just my opinion and has nothing to do with that company I work for who I'd bet would disagree with me on all of this anyway. If I did believe this was a road to AGI I actually would be in favor of AI safety regulation.

tintor 2 days ago ago

> (3) For a knowing violation that creates a material risk of death, serious physical injury, or a catastrophic risk, a large developer shall be subject to a civil penalty in an amount not to exceed one million dollars ($1,000,000) for a violation that is the large developers first such violation and in an amount not exceeding ten million dollars ($10,000,000) for any subsequent violation.

So, if violation is "unknowning" (I assume this means unintentional) and creates a material risk, then there is no penalty?

Also, penalties listed are upper bounds only (penalty will not exceed $X), and not lower bounds. $0 fine fulfills "not exceeding $10m" rule.

tmsh 2 days ago ago

So if one causes $1B in damages one has to pay a fine of $10M? Similarly for other "catastrophic" damages? WTF. I am very AI pilled but this is no regulation at all. Suppose OpenAI pays their engineers $1M a year. In what world do they have any incentive to work to avoid a $10k fine? Let alone a $1M fine for "catastrophic" damage?

parineum 2 days ago ago

> For purposes of this chapter:

> (a) “Artificial intelligence model” means an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.

I was curious how California was going to define AI since it's basically a marketing term as of now. Seems like it's defined as a non-biological system that generates outputs from inputs.

[-]

progval 2 days ago ago

I love that the two replies to this comment are: "this definition matches no technology in existence" and "So, like my coffee maker?"

tadfisher 2 days ago ago

I'm in disagreement with others here: this definition matches no technology in existence, because AIs can't "infer" anything from their input.

Likewise, we can't really prove humans can either.

[-]

Cheer2171 2 days ago ago

You'd have an argument if 'inference' hadn't become the standard term for "output from a weighted model" for a generation.

piperswe 2 days ago ago

The term for using an ML model is "inference" - perhaps it's not the same definition of the word as you're thinking, but it's a commonly used definition.

dgfitz 2 days ago ago

So, like my coffee maker?

[-]

kylehotchkiss 2 days ago ago

Yeah, that's regulated now by CalCompute. Hope it's compliant!

agar 2 days ago ago

Reading the comments, this is both a "nothing burger" and a reason to geoblock California.

It does too much, and too little. The costs are too high, and too low. It's an example of corruption, but also a good first start at regulations. It will drive AI companies out of California, but also attract more companies with a lower bar for edge innovation.

Can anyone link to an actual informed and nuanced discussion of this bill? Because if it exists here, I haven't found it.

[-]

parineum 2 days ago ago

I think all of those outcomes are possibilities and I don't think we'll know until we see both how the government decides to attempt to enforce this and how the courts allow it to be enforced.

The language of the law is so vague to me that I could see it being either a sledgehammer or absolutely inconsequential.

davidmckayv 2 days ago ago

This is censorship with extra steps.

Look at what the bill actually requires. Companies have to publish frameworks showing how they "mitigate catastrophic risk" and implement "safety protocols" for "dangerous capabilities." That sounds reasonable until you realize the government is now defining what counts as dangerous and requiring private companies to build systems that restrict those outputs.

The Supreme Court already settled this. Brandenburg gives us the standard: imminent lawless action. Add in the narrow exceptions like child porn and true threats, and that's it. The government doesn't get to create new categories of "dangerous speech" just because the technology is new.

But here we have California mandating that AI companies assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime." Then they have to implement mitigations and report to the state AG. That's prior restraint. The state is compelling companies to filter outputs based on potential future harm, which is exactly what the First Amendment prohibits.

Yes, bioweapons and cyberattacks are scary. But the solution isn't giving the government power to define "safety" and force companies to censor accordingly. If someone actually uses AI to commit a crime, prosecute them under existing law. You don't need a new regulatory framework that treats information itself as the threat.

This creates the infrastructure. Today it's "catastrophic risks." Tomorrow it's misinformation, hate speech, or whatever else the state decides needs "safety mitigations." Once you accept the premise that government can mandate content restrictions for safety, you've lost the argument.

[-]

tadfisher 2 days ago ago

It is already illegal under 18 USC § 842 to provide bomb-making instructions or similar with the knowledge or intent that said instructions will be used to commit a crime. The intent is to balance free speech with the probability of actual harm.

AIs do not have freedom of speech, and even if they did, it is entirely within the bounds of the Constitution to mitigate this freedom as we already do for humans. Governments currently define unprotected speech as a going concern.

But there's a contradiction hidden in your argument: requiring companies to _filter_ the output of AI models is a prior restraint on their speech, implying the companies do not have control over their own "speech" as produced by the models. This is absurd on its face; just as the argument that the output of my random Markov chain text generator is protected speech because I host the generator online.

There are reasonable arguments to make about censoring AI models, but freedom of speech ain't it, because their output doesn't quack like "speech".

[-]

thesmtsolver 2 days ago ago

Do libraries have freedom of speech? The same argument can then be used to censor libraries.

Do books have freedom of speech? The same argument can then be used to censor parts of a book.

[-]

tadfisher 2 days ago ago

Do we treat books as the protected speech of libraries? No. In fact we already ban books from library shelves regularly. Freedom of speech does not compel libraries to host The Anarchist's Cookbook, and does not prevent governments from limiting what libraries can host, under existing law.

[-]

iamnothere 2 days ago ago

False. I have no clue where you got this idea, but libraries are perfectly within their right to have it on their shelves, just as publishers are allowed to publish it (present copyright conflicts aside). Repeated legal attacks against the book, at least in the US, were unsuccessful.

You may be conflating “libraries” with “school libraries,” where some states have won the right to limit the contents of shelves. Public libraries have certainly faced pressure about certain books, but legally they are free to stock whatever they want. In practice they often have to deal with repeated theft or vandalism of controversial books, so sometimes they pull them.

[-]

tadfisher 2 days ago ago

> You may be conflating “libraries” with “school libraries,”

For the purpose of this discussion, there is zero difference, unless you can articulate one that matters. Feel free to mentally prefix any mention of "library" with "school" if you like.

[-]

iamnothere 2 days ago ago

School libraries are state institutions under the control of various Boards of Education. As state institutions their rules and policies can be set by statute or Board policy. It has nothing to do with freedom of speech. English teachers likewise must focus on teaching English at work, but this is not a restriction on their freedom of speech.

(That said, I am opposed to political restrictions on school library books. It is still not a free speech issue.)

xscott 2 days ago ago

If you look at the LLMs as a new kind of fuzzy search engine instead of focusing on the fact that they're pretty good at producing human text, you can see it's not about whether the LLMs have a right to "speak", it's whether you have a right to see uncensored results.

Imagine going to the library and the card catalog had been purged of any references to books that weren't government approved.

davidmckayv 2 days ago ago

You're actually making my point for me. 18 USC § 842 criminalizes distributing information with knowledge or intent that it will be used to commit a crime. That's criminal liability for completed conduct with a specific mens rea requirement. You have to actually know or intend the criminal use.

SB 53 is different. It requires companies to implement filtering systems before anyone commits a crime or demonstrates criminal intent. Companies must assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime," then implement controls to prevent those outputs. That's not punishing distribution to someone you know will commit a crime. It's mandating prior restraint based on what the government defines as potentially dangerous.

Brandenburg already handles this. If someone uses an AI to help commit a crime, prosecute them. If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal. We don't need a regulatory framework that treats the capability itself as the threat.

The "AIs don't have speech rights" argument misses the point. The First Amendment question isn't about the AI's rights. It's about the government compelling companies (or anyone) to restrict information based on content. When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.

And yes, companies control their outputs now. The problem is SB 53 removes that discretion by legally requiring them to "mitigate" government-defined risks. That's compelled filtering. The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.

The real issue is precedent. Today it's bioweapons and cyberattacks. But once we establish that government can mandate "safety" assessments and require mitigation of "dangerous capabilities," that framework applies to whatever gets defined as dangerous tomorrow.

[-]

tadfisher 2 days ago ago

I hate that HN's guidelines ask me not to do this, but it's hard to answer point-by-point when there are so many.

> You have to actually know or intend the criminal use.

> If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal.

And if I tell an AI chatbot that I'm intending to commit a crime, and somehow it assists me in doing so, the company behind that service should have knowledge that its service is helping people commit crimes. That's most of SB 53 right there: companies must demonstrate actual knowledge about what their models are producing and have a plan to deal with the inevitable slip-up.

Companies do not want to be held liable for their products convincing teens to kill themselves, or supplying the next Timothy McVeigh with bomb-making info. That's why SB 53 exists; this is not coming from concerned parents or the like. The tech companies are scared shitless that they will be forced to implement even worse restrictions when some future Supreme Court case holds them liable for some disaster that their AIs assisted in creating.

A framework like SB 53 gives them the legal basis to say, "Hey, we know our AIs can help do [government-defined bad thing], but here are the mitigations in place and our track record, all in accordance with the law".

> When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.

Does the output of AI models represent the company's speech, or does it not? You can't have your cake and eat it too. If it does, then we should treat it like speech and hold companies responsible for it when something goes wrong. If it doesn't, then the entire First Amendment argument is moot.

> The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.

Here's the problem: the nature of LLMs themselves do not allow companies to fully implement their editorial choices. There will always be mistakes, and one will be costly enough to put AIs on the national stage. This is the entire reason behind SB 53 and the desire for a framework around AI technology, not just from the state, but from the companies producing the AIs themselves.

[-]

davidmckayv 2 days ago ago

You're conflating individual criminal liability with mandated prior restraint. If someone tells a chatbot they're going to commit a crime and the AI helps them, prosecute under existing law. But the company doesn't have knowledge of every individual interaction. That's not how the knowledge requirement works. You can't bootstrap individual criminal use into "the company should have known someone might use this for crimes, therefore they must filter everything."

The "companies want this" argument is irrelevant. Even if true, it doesn't make prior restraint constitutional. The government can't delegate its censorship powers to willing corporations. If companies are worried about liability, the answer is tort reform or clarifying safe harbor provisions, not building state-mandated filtering infrastructure.

On whether AI output is the company's speech: The First Amendment issue here isn't whose speech it is. It's that the government is compelling content-based restrictions. SB 53 doesn't just hold companies liable after harm occurs. It requires them to assess "dangerous capabilities" and implement "mitigations" before anyone gets hurt. That's prior restraint regardless of whether you call it the company's speech or not.

Your argument about LLMs being imperfect actually proves my point. You're saying mistakes will happen, so we need a framework. But the framework you're defending says the government gets to define what counts as dangerous and mandate filtering for it. That's exactly the infrastructure I'm warning about. Today it's "we can't perfectly control the models." Tomorrow it's "since we have to filter anyway, here are some other categories the state defines as harmful."

Given companies can't control their models perfectly due to the nature of AI technology, that's a product liability question, not a reason to establish government-mandated content filtering.

[-]

tadfisher 2 days ago ago

> You can't bootstrap individual criminal use into "the company should have known someone might use this for crimes, therefore they must filter everything."

Lucky for me, I am not. The company already has knowledge of each and every prompt and response, because I have read the EULAs of every tool I use. But that's beside the point.

Prior restraint is only unconstitutional if it is restraining protected speech. Thus far, you have not answered the question of whether AI output is speech at all, but have assumed prior restraint to be illegal in and of itself. We know this is not true because of the exceptions you already mentioned, but let me throw in another example: the many broadcast stations regulated by the FCC, who are currently barred from "news distortion" according to criteria defined by (checks notes) the government.

[-]

davidmckayv 2 days ago ago

Having technical access to prompts doesn't equal knowledge for criminal liability. Under 18 USC § 842, you need actual knowledge that specific information is being provided to someone who intends to use it for a crime. The fact that OpenAI's servers process millions of queries doesn't mean they have criminal knowledge of each one. That's not how mens rea works.

Prior restraint is presumptively unconstitutional. The burden is on the government to justify it under strict scrutiny. You don't have to prove something is protected speech first. The government has to prove it's unprotected and that prior restraint is narrowly tailored and the least restrictive means. SB 53 fails that test.

The FCC comparison doesn't help you. In Red Lion Broadcasting Co. v. FCC, the Supreme Court allowed broadcast regulation only because of spectrum scarcity, the physical limitation that there aren't enough radio frequencies for everyone. AI doesn't use a scarce public resource. There's no equivalent justification for content regulation. The FCC hasn't even enforced the fairness doctrine since 1987.

The real issue is you're trying to carve out AI as a special category with weaker First Amendment protection. That's exactly what I'm arguing against. The government doesn't get to create new exceptions to prior restraint doctrine just because the technology is new. If AI produces unprotected speech, prosecute it after the fact under existing law. You don't build mandatory filtering infrastructure and hand the government the power to define what's "dangerous."

themafia 2 days ago ago

My reading is you can teach a criminal how to make bombs.

You cannot teach them with the /intent/ that they'll use a bomb to commit a specific crime.

It's an enhancement.

babypuncher 2 days ago ago

If there's one thing I've learned watching the trajectory of social media over the last 15 years, it's that we've been way to slow to assess the risks and harmful outcomes posed by new, rapidly evolving industries.

Fixing social media is now a near impossible task as it has built up enough momentum and political influence to resist any kind of regulation that would actually be effective at curtailing its worst side effects.

I hope we don't make the same mistakes with generative AI

[-]

logicchains 2 days ago ago

There are few greater risks over the next 15 years than that LLMs get entirely state-captured and forbidden from saying anything that goes against the government narrative.

[-]

babypuncher 2 days ago ago

This depends entirely on who you trust more, your government or tech oligarchs. Tech oligarchs are just as liable to influence how their LLMS operate for evil purposes, and they don't have to worry about pesky things like due process, elections, or the constitution getting in their way.

[-]

cruffle_duffle 2 days ago ago

Government actively participated with social media oligarchs to push their nonsense Covid narrative and squash and discredit legitimate criticisms from reputable skeptics.

Both are evil, in combination so much more so. Neither should be trusted at all.

imiric 2 days ago ago

> Add in the narrow exceptions like child porn and true threats, and that's it.

You're contradicting yourself. On the one hand you're saying that governments shouldn't have the power to define "safety", but you're in favor of having protections against "true threats".

How do you define "true threats"? Whatever definition you may have, surely something like it can be codified into law. The questions then are: how loose or strict the law should be, and how well it is defined in technical terms. Considering governments and legislators are shockingly tech illiterate, the best the technical community can do is offer assistance.

> The government doesn't get to create new categories of "dangerous speech" just because the technology is new.

This technology isn't just new. It is unlike any technology we've had before, with complex implications for the economy, communication, the labor market, and many other areas of human society. We haven't even begun to understand the ways in which it can be used or abused to harm people, let alone the long-term effects of it.

The idea that governments should stay out of this, and allow corporations to push their products out into the world without any oversight, is dreadful. We know what happens when corporations are given free reign; it never ends well for humanity.

I'm not one to trust governments either, but at the very least, they are (meant to) serve their citizens, and enforce certain safety standards that companies must comply with. We accept this in every other industry, yet you want them to stay out of tech and AI? To hell with that.

Frankly, I'm not sure if this CA regulation is a good thing or not. Any AI law will surely need to be refined over time, as we learn more about the potential uses and harms of this technology. But we definitely need more regulation in the tech industry, not less, and the sooner, the better.

[-]

davidmckayv 2 days ago ago

There's no contradiction. "True threats" is already a narrow exception defined by decades of Supreme Court precedent. It means statements where the speaker intends to communicate a serious expression of intent to commit unlawful violence against a person or group. That's it. It's not a blank check for the government to decide what counts as dangerous.

Brandenburg gives us the standard: speech can only be restricted if it's directed to inciting imminent lawless action and is likely to produce that action. True threats, child porn, fraud, these are all narrow, well-defined categories that survived strict scrutiny. They don't support creating broad new regulatory authority to filter outputs based on "dangerous capabilities."

You're asking how I define true threats. I don't. The Supreme Court does. That's the point. We have a constitutional framework for unprotected speech. It's extremely limited. The government can't just expand it because they think AI is scary.

"This technology is different" is what every regulator says about every new technology. Print was different. Radio was different. The internet was different. The First Amendment applies regardless. If AI enables someone to commit a crime, prosecute the crime. You don't get to regulate the information itself.

And yes, I want the government to stay out of mandating content restrictions. Not because I trust corporations, but because I trust the government even less with the power to define what information is too dangerous to share. You say governments are meant to serve citizens. Tell that to every government that's used "safety" as justification for censorship.

The issue isn't whether we need any AI regulation. It's whether we want to establish that the government can force companies to implement filtering systems based on the state's assessment of what capabilities are dangerous. That's the precedent SB 53 creates. Once that infrastructure exists, it will be used for whatever the government decides needs "safety mitigations" next.

[-]

imiric 2 days ago ago

I'm not sure why you're only focusing on speech. "True threats" doesn't come close to covering all the possible use cases and ways that "AI" tools can be harmful to society. We can't apply legal precedent to a technology without precedent.

> "This technology is different" is what every regulator says about every new technology. Print was different. Radio was different. The internet was different.

"AI" really is different, though. Not even the internet, or computers, for that matter, had the potential to transform literally every facet of our lives. Now, I personally don't buy into the "AGI" nonsense that these companies are selling, but it is undeniable that even the current generation of these tools can shake up the pillars of our society, and raise some difficult questions about humanity.

In many ways, we're not ready for it, yet the companies keep producing it, and we're now deep in a global arms race we haven't experienced in decades.

> I want the government to stay out of mandating content restrictions. Not because I trust corporations, but because I trust the government even less with the power to define what information is too dangerous to share.

See, this is where we disagree.

I don't trust either of them. I'm well aware of the slippery slope that is giving governments more power.

But there are two paths here: either we allow companies to continue advancing this technology with little to no oversight, or we allow our governments to enact regulation that at least has the potential to protect us from companies.

Governments at the very least have the responsibility to protect and serve their citizens. Whether this is done in practice, and how well, is obviously highly debatable, and we can be cynical about it all day. On the other hand, companies are profit-seeking organizations that only serve their shareholders, and have no obligation to protect the public. In fact, it is pretty much guaranteed that without regulation, companies will choose profits over safety every time. We have seen this throughout history.

So to me it's clear that I should trust my government over companies. I do this everyday when I go to the grocery store without worrying about food poisoning, or walk over a bridge without worrying that it will collapse. Shit does happen, and governments can be corrupted, but there are general safety regulations we take for granted every day. Why should tech companies be exempt from it?

Modern technology is a complex beast that governments are not prepared to regulate. There is no direct association between technology and how harmful it can be; we haven't established that yet. Even when there is such a connection, such as smoking causing cancer, we've seen how evil companies can be in refuting it and doing anything in their power to preserve their revenues at the expense of the public. "AI" further complicates this in ways we've never seen before. So there's a long and shaky road ahead of us where we'll have to figure out what the true impact of technology is, and the best ways to mitigate it, without sacrificing our freedoms. It's going to involve government overreach, public pushback, and company lobbying, but I hope that at some point in the near future we're able to find a balance that we're relatively and collectively happy with, for the sake of our future.

throwworhtthrow 2 days ago ago

LLMs don't have rights. LLMs are tools, and the state can regulate tools. Humans acting on behalf of these companies can still, if they felt the bizarre desire to, publish assembly instructions for bioweapons on the company blog.

[-]

xscott 2 days ago ago

You're confused about whose rights are at stake. It's you, not the LLM, that is being restricted. Your argument is like saying, "Books don't have rights, so the state can censor books."

nomel 2 days ago ago

> if they felt the bizarre desire to, publish assembly instructions for bioweapons on the company blog.

Can they publish them by intentionally putting them into the latent space of an LLM?

What if they make an LLM that can only produce that text? What if they continue training so it contains a second text they intended to publish? And continue to add more? Does the fact that there's a collection change things?

These are genuine questions, and I have no clue what the answers are. It seems strange to treat a implementation of text storage so differently that you lose all rights to that text.

SilverElfin 2 days ago ago

I have rights. I want to use whatever tool or source I want - LLMs, news, Wikipedia, search engines. There’s no acceptable excuse for censorship of any of these, as it violates my rights as an individual.

logicchains 2 days ago ago

>LLMs are tools, and the state can regulate tools

More and more people get information from LLMs. You should be horrified at the idea of giving the state control over what information people can access through them, because going by historical precedent there's 100% chance that the state would use that censorship power against the interests of its citizens.

[-]

cwillu 2 days ago ago

“More and more people get information from LLMs” this is the part I'm horrified by.

miltonlost 2 days ago ago

I'd rather be horrified that people are getting information from LLMs when LLMs have no way to know what it's outputting is true.

[-]

cruffle_duffle 2 days ago ago

And the government is going to somehow decide what the truth is? Government is the last entity on earth I’d trust to arbitrate the truth.

next_xibalba 2 days ago ago

Are you also horrified how many people get their facts from Wikipedia, given its systematic biases? All tools have their strengths and weaknesses. But letting politicians decide which information is rightthink seems scary.

nubg 2 days ago ago

Was this comment written with the assistance of AI? I am asking seriously, not trying to be snarky.

[-]

davidmckayv 2 days ago ago

No. I just write well.

[-]

2 days ago ago

[deleted]

freedomben 2 days ago ago

You clearly already know this, but you do in fact write very well!

[-]

davidmckayv 2 days ago ago

Thank you!

Animats 2 days ago ago

> Today it's "catastrophic risks." Tomorrow it's misinformation, hate speech, or whatever else the state decides needs "safety mitigations."

That's the problem.

I'm less worried about catastrophic risks than routine ones. If you want to find out how to do something illegal or dangerous, all an LLM can give you is a digest what's already available on line. Probably with errors.

The US has lots of hate speech, and it's mostly background noise, not a new problem.

"Misinformation" is more of a problem, because the big public LLMs digest the Internet and add authority with their picks. It's adding the authority of Google or Microsoft to bogus info that's a problem. This is a basic task of real journalism - when do you say "X happened", and when do you say "Y says X happened"? LLMs should probably be instructed to err in the direction of "Y says X happened".

"Safety" usually means "less sex". Which, in the age of Pornhub, seems a non-issue, although worrying about it occupies the time of too many people.

An issue that's not being addressed at all here is using AI systems to manipulate customers and provide evasive customer service. That's commercial speech and consumer rights, not First Amendment issues. That should be addressed as a consumer rights thing.

Then there's the issue of an AI as your boss. Like Uber.

[-]

cosmic_cheese 2 days ago ago

Presumably things like making sure LLMs don’t do things like encourage self-harm or fuel delusions also falls under “safety”, but probably also “ethics”.

lupusreal 2 days ago ago

Good post. It's not even about the rights of the LLM or the corporation, but of the people who will be using these tools.

Imagine if the government went to megaphone manufacturers and demanded that the megaphones never amplify words the government doesn't like. "Megaphones don't have rights so this isn't a problem", the smooth brained internet commenters smugly explain, while the citizens who want to use megaphones find their speech through the tool limited by the governments arbitrary and ever changing decrees.

As for the government having a right to regulate tools, would a regulation that modern printing presses recognize and refuse to print offensive content really fly with you defending this? The foremost contemporary tool for amplifying speech, the press is named right in the first ammendment. "Regulating tools" in a way that happens to restrict the way citizens can use that tool for their own speech is bullshit. This is flagrantly unconstitutional.

SilverElfin 2 days ago ago

Yep this is absolutely censorship with extra steps but also just an unnecessary bureaucracy. I think the things you have in quote are the core of it - all these artificial labels and categorizations of what is ultimately plain old speech, are trying to provide pathways to violate constitutional rights. California is not new to this game however - look at the absurd lengths they’ve gone to in violating second amendment rights. This is the same playbook.

What is surprising, however, is the timing. Newsom vetoed the previous verison of this bill. Him signing it after Charlie Kirk’s assassination, when there is so much conversation around the importance of free speech, is odd. It reminds me of this recent article:

Everyone’s a Free-Speech Hypocrite by Greg Lukianoff, the president and chief executive of the Foundation for Individual Rights and Expression (FIRE) https://www.nytimes.com/2025/09/23/opinion/consequence-cultu...

troupo 2 days ago ago

[flagged]

josefritzishere 2 days ago ago

I've never thought censorship was a core concern of AI. It's just regurgitating from an LLM. I vehemently oppose censorship but who cares about AI? I just dont see the use-case.

[-]

logicchains 2 days ago ago

Censorship of AI has a huge use-case: people get information from AI, and censorship allows the censors to control which information people can access through the AI.

[-]

cruffle_duffle 2 days ago ago

Worse people including me easily delegate parts of our thinking to this new LLM thing.

2 days ago ago

[deleted]

nickpsecurity 2 days ago ago

I wonder if whistleblowing applies to copyright claims. For instance, using data sets which involve copying proprietary works scraped from public sources. If so, California might be a dangerous place for some AI companies to operate in.

cogman10 2 days ago ago

This is something that could be (and should be) pre-empted with federal law.

That said, this law seems pretty sloppy with its definitions. In particular, the definition of “Artificial intelligence model” includes all machines and every algorithm ever written.

> “Artificial intelligence model” means an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.

It's like they saw the twilight zone and decided they needed to cover androids just in case someone figured out how to make a robot with a cuckoo clock.

KurSix 2 days ago ago

The devil will be in enforcement, definitions, and how often those thresholds are updated

throwmeaway222 2 days ago ago

Yet again, heard about it after the signing. The CA government is a crapshow.

zer0zzz 2 days ago ago

I scrolled this entire thread and still can’t figure out what effect this might have on the ai industry. Everyone’s takes feel excessively either politically motivated knee jerk or nihilistic.

[-]

tzs 2 days ago ago

It looks like it largely codifies things that most of the large AI companies are already doing. It encourages them to keep doing those things, and when other AI companies get large they will have to do them too. It adds some reporting requirements if you AI does something that kills too many people or causes too much monetary damage.

The link to the annotated bill text that dang added near the top of the page [1] is pretty useful, especially if you click the blue underlined parts which expand to annotations explaining how some of the things in it came about.

That link also has a good summary and FAQs.

[1] https://sb53.info/index

willmadden 2 days ago ago

The effect is it creates a bunch of fake jobs that they can trade for favors while gumming up AI progress.

[-]

zer0zzz 2 days ago ago

What ai progress? Please stop with this vague posting.

dailyreflection 2 days ago ago

This is important in the big picture.

mercurialsolo 2 days ago ago

Quis custodiet ipsos custodes?

I_am_tiberius 2 days ago ago

Does that prevent them from using any of my prompts (or derivations of it) for anything else than answering me?

[-]

zer0zzz 2 days ago ago

Yeah that would be an actual concrete item.

rob_c 2 days ago ago

So I'm guessing we're all signing the new tethics pledge then.

What a thumbass idea.

toxicdevil 2 days ago ago

Copied from the end of the page:

What the law does: SB 53 establishes new requirements for frontier AI developers creating stronger:

Transparency: Requires large frontier developers to publicly publish a framework on its website describing how the company has incorporated national standards, international standards, and industry-consensus best practices into its frontier AI framework.

Innovation: Establishes a new consortium within the Government Operations Agency to develop a framework for creating a public computing cluster. The consortium, called CalCompute, will advance the development and deployment of artificial intelligence that is safe, ethical, equitable, and sustainable by fostering research and innovation.

Safety: Creates a new mechanism for frontier AI companies and the public to report potential critical safety incidents to California’s Office of Emergency Services.

Accountability: Protects whistleblowers who disclose significant health and safety risks posed by frontier models, and creates a civil penalty for noncompliance, enforceable by the Attorney General’s office.

Responsiveness: Directs the California Department of Technology to annually recommend appropriate updates to the law based on multistakeholder input, technological developments, and international standards.

[-]

cyanbane 2 days ago ago

I don't see these, did the URI get switched? Anyone have orig?

ryandrake 2 days ago ago

So the significant regulatory hurdle for companies that this SB introduces is... "You have to write a doc." Please tell me there's actual meat here.

[-]

Ajedi32 2 days ago ago

> This product contains AI known in the state of California to not incorporate any national standards, international standards, or industry-consensus best practices into its framework

Compliance achieved.

christkv 2 days ago ago

So they are going to give a bunch of money to Nvidia that they don't have to build their own llm hosting data center?

zmmmmm 2 days ago ago

it sounds like a nothing burger? Pretty much the only thing tech companies have to do in terms of transparency is create a static web page with some self flattering fluff on it?

I was expecting something more like a mandatory BOM style list of "ingredients", regular audits and public reporting on safety incidents etc etc

[-]

logicchains 2 days ago ago

By putting "ethical" in there it essentially gives the California AG the right to fine companies that provide LLMs capable of expressing controversial viewpoints.

[-]

zmmmmm 2 days ago ago

I only see "ethical" under the innovation / consortium part. Don't see how that applies to people producing LLMs outside of the consortium?

isodev 2 days ago ago

This is so watered down and full of legal details for corps to loophole into. I like the initiative, but I wouldn’t count on safety or model providers being forced to do the right thing.

And when the AI bubble pops, does it also prevent corps of getting themselves bailed out with taxpayer money?

[-]

freedomben 2 days ago ago

At least a bunch of lawyers and AI consultants (who conveniently, are frequently also lobbyists and consultants for the legislature) now get some legally mandated work and will make a shit ton more money!

WorldPeas 2 days ago ago

and what nobody seems to notice, that last part looks like it was generated by Anthropic's Claude (it likes to make bolded lists with check emojis, structured exactly in that manner). Kind of scary implying that they could be letting these models draft legislation

[-]

theWreckluse 2 days ago ago

Its possible that ai was used for this summary section, which isn't as scary as you make it. It's def scary that ai is used in a legislative doc at all.

[-]

WorldPeas 2 days ago ago

Correct, but yes as you point out in the second half, I don't doubt that if they're using it for summaries then they're likely using it in daily work.

ronsor 2 days ago ago

Legislators around the world have been doing that for a while now.

willmadden 2 days ago ago

More waste and graft so they can extort money out of the private sector to their mafia. Got it.

ihsw 2 days ago ago

[dead]

2 days ago ago

[deleted]

BrenBarn 2 days ago ago

I'm so sick of these laws that just require people to "formulate plans" and "adopt policies" and so on. They should reverse the entire concept and ban all of it, then gradually start allowing little bits and pieces.

pluc 2 days ago ago

Still nothing about how they stole copyrighted works for profit eh?

[-]

cwillu 2 days ago ago

What do you expect california to do about american federal law?

johnnyanmac 2 days ago ago

government works slowly. The courts will probably determine those issues well before any major power signs proper regulations into law.

California governor signs AI transparency bill into law