SolidStart - Hacker News

molf a day ago ago

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

[-]

miles 18 hours ago ago

> I get that approval needs to be given, and that there are barriers to entry.

Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.

[-]

AlecSchueler 17 hours ago ago

> what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

Product development?

ArnoVW 20 hours ago ago

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation

[-]

lcnPylGDnU4H9OF 16 hours ago ago

> And that you can request 0 days.

Right but the problem they're having is that the request is ignored.

pclmulqdq a day ago ago

The missing ingredient is money.

[-]

jewelry 21 hours ago ago

not just money. How are you going to support this client’s support ticket if there is no log at all?

[-]

ethbr1 20 hours ago ago

Don't. "We're unable to provide support for your request, because you disabled retention." Easy.

[-]

krisoft an hour ago ago

You can still provide support too if you want to. You just need to ask the user what their query was, what response they got, and what response they would be expecting. You can then as the expert either spot their problem immediately, or you can run the query and see for yourself what is going on.

Sure it is a possibility that the ticket will end up closed as “unable to reproduce”, but that is always a possibility. It is not like you have to shut off all support because that might happen.

Plus many support requests are not about the content of the api responses but meta info surrounding them. Support can tell you that you are over the api quota limit even if the content of your prompt was not logged. They can also tell you if your request is missing a required parameter or if they have had 500 errors because of a bad update on their part.

hirsin 20 hours ago ago

They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.

[-]

abeppu 20 hours ago ago

... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.

1vuio0pswjnm7 13 hours ago ago

"You can also request zero data retention (ZDR) for eligible endpoints if you have a qualifying use-case. For details on data handling, visit our Platform Docs page."

https://openai.com/en-GB/policies/row-privacy-policy/

1. You can request it but there is no promise the request will be granted.

Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.

It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.

lmm 21 hours ago ago

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

What's the betting that they just write it on the website and never actually implemented it?

[-]

sigmoid10 21 hours ago ago

Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.

belter 21 hours ago ago

If this stands I dont think they can operate in the EU

[-]

bunderbunder 20 hours ago ago

I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.

[-]

glookler 18 hours ago ago

>> Does this court order violate GDPR or my rights under European or other privacy laws?

>> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

[-]

danielfoster 17 hours ago ago

They didn’t say which law (the US judge’s order or EU law) they are complying with.

_jab a day ago ago

> How will you store my data and who can access it?

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

[-]

tptacek a day ago ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

[-]

VanTheBrand a day ago ago

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

[-]

tptacek a day ago ago

No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").

[-]

bee_rider a day ago ago

Standard corporate spin, then?

[-]

bunderbunder 20 hours ago ago

No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.

[-]

jmull 20 hours ago ago

This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.

[-]

bunderbunder 18 hours ago ago

I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.

[-]

bee_rider 17 hours ago ago

Can you share your definition? This is actually quite puzzling because as far as I know “spin” has always been associated with presenting things in a way that benefits you. Like, decades ago, they could have the show “Bill O’Rilley’s No Spin Zone” and everybody knew the premise was that they argue against guests who were trying to tell a “massaged” version of the story, and that they’d go for some actual truth (fwiw I thought the whole show was full of crap, but the name was not confusing or ambiguous).

I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.

tptacek a day ago ago

No? "Spin" implies there was something else they could possibly say.

[-]

justacrow a day ago ago

They could choose to not say it

[-]

ethbr1 a day ago ago

Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.

Including lies.

I'd like to aim a little higher, maybe towards expecting correspondence with reality?

IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.

mmooss a day ago ago

I haven't heard that interpretation; I might call it spin of spin.

mrgoldenbrown a day ago ago

If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.

bee_rider 17 hours ago ago

That is unrelated to what the expression means.

adamsb6 a day ago ago

I’m typing these words from a brain that has absorbed copyrighted works.

mhitza a day ago ago

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

ofjcihen a day ago ago

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

[-]

gruez 21 hours ago ago

>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".

[-]

ofjcihen 20 hours ago ago

Here’s a good article that explains what you may be missing.

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[-]

gruez 20 hours ago ago

Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.

[-]

ofjcihen 19 hours ago ago

Sure.

Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.

[-]

lcnPylGDnU4H9OF 15 hours ago ago

> the article explains for those following this conversation why it’s been elevated to a court order

That article does nothing of the sort and, indeed, it is talking about a completely separate incident of deleting data.

[-]

ofjcihen 15 hours ago ago

No worries. I can’t force understanding on anyone.

Here. I had an LLM summarize it for you.

A court order now requires OpenAI to retain all user data, including deleted ChatGPT chats, as part of the ongoing copyright lawsuit brought by The New York Times (NYT) and other publishers[1][2][6][7]. This order was issued because the NYT argued that evidence of copyright infringement—such as AI outputs closely matching NYT articles—could be lost if OpenAI continued its standard practice of deleting user data after 30 days[2][6][7].

This new requirement is directly related to a 2024 incident where OpenAI accidentally deleted critical data that NYT lawyers had gathered during the discovery process. In that incident, OpenAI engineers erased programs and search result data stored by NYT's legal team on dedicated virtual machines provided for examining OpenAI's training data[3][4][5]. Although OpenAI recovered some of the data, the loss of file structure and names rendered it largely unusable for the lawyers’ purposes[3][5]. The court and NYT lawyers did not believe the deletion was intentional, but it highlighted the risks of relying on OpenAI’s internal data retention and deletion practices during litigation[3][4][5].

The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7]. The order aims to prevent any further loss of potentially relevant information as the case proceeds. OpenAI is appealing the order, arguing it conflicts with user privacy and their established data deletion policies[1][2][6][7].

Sources [1] OpenAI Appeals Court Order Requiring Retention of Consumer Data https://www.pymnts.com/artificial-intelligence-2/2025/openai... [2] ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order https://www.eweek.com/news/openai-privacy-appeal-new-york-ti... [3] OpenAI Deletes Legal Data in a Lawsuit From the New York Times https://www.businessinsider.com/openai-delete-legal-data-law... [4] NYT vs OpenAI case: OpenAI accidentally deleted case data https://www.medianama.com/2024/11/223-new-york-times-openai-... [5] New York Times Says OpenAI Erased Potential Lawsuit Evidence https://www.wired.com/story/new-york-times-openai-erased-pot... [6] How we're responding to The New York Times' data ... - OpenAI https://openai.com/index/response-to-nyt-data-demands/ [7] Why OpenAI Won't Delete Your ChatGPT Chats Anymore: New York ... https://coincentral.com/why-openai-wont-delete-your-chatgpt-... [8] A Federal Judge Ordered OpenAI to Stop Deleting Data - Adweek https://www.adweek.com/media/a-federal-judge-ordered-openai-... [9] OpenAI confronts user panic over court-ordered retention of ChatGPT logs https://arstechnica.com/tech-policy/2025/06/openai-confronts... [10] OpenAI Appeals ‘Sweeping, Unprecedented Order’ Requiring It Maintain All ChatGPT Logs https://gizmodo.com/openai-appeals-sweeping-unprecedented-or... [11] OpenAI accidentally deleted potential evidence in NY ... - TechCrunch https://techcrunch.com/2024/11/22/openai-accidentally-delete... [12] OpenAI's Shocking Blunder: Key Evidence Vanishes in NY Times ... https://www.eweek.com/news/openai-deletes-potential-evidence... [13] Judge allows 'New York Times' copyright case against OpenAI to go ... https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-... [14] OpenAI Data Retention Court Order: Implications for Everybody https://hackernoon.com/openai-data-retention-court-order-imp... [15] Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege... [16] Court orders OpenAI to preserve all ChatGPT logs, including deleted ... https://techstartups.com/2025/06/06/court-orders-openai-to-p... [17] OpenAI deleted NYT copyright case evidence, say lawyers https://www.theregister.com/2024/11/21/new_york_times_lawyer... [18] OpenAI slams court order to save all ChatGPT logs, including ... https://simonwillison.net/2025/Jun/5/openai-court-order/ [19] OpenAI accidentally deleted potential evidence in New York Times ... https://mashable.com/article/openai-accidentally-deleted-pot... [20] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://news.ycombinator.com/item?id=44185913 [21] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://arstechnica.com/tech-policy/2025/06/openai-says-cour... [22] After court order, OpenAI is now preserving all ChatGPT and API logs https://www.reddit.com/r/LocalLLaMA/comments/1l3niws/after_c... [23] OpenAI accidentally erases potential evidence in training data lawsuit https://www.theverge.com/2024/11/21/24302606/openai-erases-e... [24] OpenAI "accidentally" erased ChatGPT training findings as lawyers ... https://www.reddit.com/r/aiwars/comments/1gwxr94/openai_acci... [25] OpenAI appeals data preservation order in NYT copyright case https://www.reuters.com/business/media-telecom/openai-appeal...

[-]

lcnPylGDnU4H9OF 14 hours ago ago

You linked this article:

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

Gruez said that is talking about an incident in this case but unrelated to the judge's order in question.

You said the article "explains for those following this conversation why it’s been elevated to a court order" but it doesn't actually explain that. It is talking about separate data being deleted in a different context. It is not user chats and access logs. It is the data that was used to train the models.

I pointed that out a second time since it seemed to be misunderstood.

Then you posted an LLM summary of something unrelated to the point being made.

Now we're here.

As you say, one cannot force understanding on another; we all have to do our part. ;)

Edit:

> The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7].

What did you prompt the LLM with for it to reach this conclusion? The [2][6][7] citations similarly don't seem to explain how that incident from months ago informed the judge's recent decision. Anyway, I'm not saying the conclusion is wrong, I'm saying the article you linked does not support the conclusion.

[-]

ofjcihen 14 hours ago ago

I think in your rush to reply you may have not read the summarization.

Calm down, cool off, and read it again.

The point is that the circumstances of the incident in 2024 are directly related to the how and why of the NYT lawyers request and the judges order.

The article I linked was to the incident in 2024.

Not everything has to be about pedantry and snark, even on HN.

Edit: I see you edited your response after re-reading the summarization. I’m glad cooler heads have prevailed.

The prompt was simply “What is the relation, if any, between OpenAI being ordered to retain user data and the incident from 2024 where OpenAI accidentally deleted the NYT lawyers data while they were investigating whether OpenAI had used their data to train their models?”

[-]

lcnPylGDnU4H9OF 14 hours ago ago

> I see you edited your response after re-reading the summarization.

Just to be clear, the summary is not convincing. I do understand the idea but none of the evidence presented so far suggests that was the reason. The court expected that the data would be retained, the court learned that it was not, the court gave an order for it to be retained. That is the seeming reason for the order.

Put another way: if the incident last year had not happened, the court would still have issued the order currently under discussion.

mmooss a day ago ago

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

[-]

roywiggins 21 hours ago ago

It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.

fallingknife 21 hours ago ago

Why does OpenAI have any obligation to present the NYTs side?

[-]

mmooss 18 hours ago ago

Who said 'obligation'?

conartist6 a day ago ago

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

lxgr a day ago ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

sashank_1509 a day ago ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

[-]

ivape a day ago ago

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

[-]

Workaccount2 21 hours ago ago

> It's all stolen.

LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.

[-]

edbaskerville 20 hours ago ago

Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.

fallingknife 21 hours ago ago

Copyright is pretty narrowly tailored to verbatim reproduction of content so I doubt they will have to pay anything.

[-]

tiahura 20 hours ago ago

incorrect. copyright applies to derived works.

[-]

vel0city 20 hours ago ago

Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.

[-]

fallingknife 20 hours ago ago

Please show me one of these prompts

[-]

vel0city 19 hours ago ago

NYT has examples in their legal complaint. See page 30.

https://www.scribd.com/document/695189742/NYT-v-OpenAI

pritambarhate a day ago ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

[-]

conartist6 a day ago ago

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

DrillShopper 17 hours ago ago

The OpenAI Privacy Policy specifically allows them to keep data as required by law.

mmooss a day ago ago

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.

hiddencost a day ago ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

[-]

nhecker 21 hours ago ago

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

JohnKemeny a day ago ago

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

[-]

HPsquared a day ago ago

An incompetent but honest person is more likely to accept correction and respond to feedback generally.

supriyo-biswas a day ago ago

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

[-]

paxys a day ago ago

Yeah, try explaining any of these words to a lawyer or judge.

[-]

sthatipamala a day ago ago

The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)

[-]

anshumankmr a day ago ago

As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.

[-]

king_magic a day ago ago

a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two

fc417fc802 a day ago ago

I thought that's what GPT was for.

m463 a day ago ago

"you are a helpful law assistant."

landl0rd a day ago ago

"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."

LandoCalrissian a day ago ago

Trying to actively circumvent the intention of a judges order is a pretty bad idea.

[-]

Aeolun a day ago ago

That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.

girvo a day ago ago

Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha

delusional a day ago ago

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.

For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.

I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.

bigyabai a day ago ago

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.

Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.

[-]

landl0rd a day ago ago

Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.

It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

[-]

tdeck a day ago ago

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

The laws have changed since then and it's not for the better:

https://www.aclu.org/press-releases/congress-passing-bill-th...

[-]

tuckerman a day ago ago

Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.

[1] https://en.wikipedia.org/wiki/MUSCULAR

[-]

onli a day ago ago

Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.

[-]

tuckerman a day ago ago

You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).

I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.

dmurray a day ago ago

> You're pointing to the Russel's Teapot of sigint.

If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.

cwillu a day ago ago

> I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

The input is what's interesting.

[-]

Aeolun a day ago ago

It doesn’t change the monumental scope of the problem though.

Though I’m inclined to believe the US gov can if OpenAI can.

Yizahi a day ago ago

Metadata is spying (c) Bruce Schneier

If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.

Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.

[-]

jstanley a day ago ago

The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.

rl3 a day ago ago

>However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.

That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.

These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.

Workaccount2 21 hours ago ago

My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.

[-]

bigyabai 18 hours ago ago

> because it ultimately plays into their hand.

How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.

In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?

Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.

zer00eyz a day ago ago

> However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...

Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )

komali2 a day ago ago

There's no way to know, but it's safer to assume.

7speter a day ago ago

Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!

[-]

nl a day ago ago

As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.

[-]

justacrow a day ago ago

I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?

[-]

stock_toaster a day ago ago

Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.

Jackpillar 18 hours ago ago

I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.

rl3 a day ago ago

>Of course it's backdoored, you can't even begin to try proving me wrong.

On the contrary.

>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.

I think you're being unduly paranoid. /s

https://www.theverge.com/2024/6/13/24178079/openai-board-pau...

https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...

farts_mckensy a day ago ago

Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.

[-]

artursapek a day ago ago

I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.

[-]

refuser a day ago ago

It was that good?

baobun a day ago ago

gief

bigyabai a day ago ago

"We kill people based on metadata." - National Security Agency Gen. Michael Hayden

Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.

[-]

farts_mckensy 20 hours ago ago

This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.

[-]

bigyabai 18 hours ago ago

Citation?

[-]

farts_mckensy 14 hours ago ago

-The Privacy and Civil Liberties Oversight Board’s 2014 review of the NSA “Section 215” phone-record program found no instance in which the dragnet produced a counter-terror lead that couldn’t have been obtained with targeted subpoenas. https://en.m.wikipedia.org/wiki/Privacy_and_Civil_Liberties_...

-After Boston, Paris, Manchester, and other attacks, post-mortems showed the perpetrators were already in government databases. Analysts simply didn’t connect the dots amid the flood of benign hits. https://www.newyorker.com/magazine/2015/01/26/whole-haystack

-Independent tallies suggest dozens of civilians killed for every intended high-value target in Yemen and Pakistan, largely because metadata mis-identifies phones that change pockets. https://committees.parliament.uk/writtenevidence/36962/pdf

brigandish a day ago ago

Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.

[-]

farts_mckensy a day ago ago

Search engines use our data for completely different purposes.

[-]

yunwal a day ago ago

That doesn’t negate the GPs point. It’s easy to make datasets searchable.

[-]

farts_mckensy 20 hours ago ago

Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.

[-]

brigandish 8 hours ago ago

> How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something?

Meta data and investigation.

> That's not something a search function can distinguish.

We know that it can narrow down hugely from the initial volume.

> It requires a human to sift through that data.

Yes, the point of collating, analysing, and searching data is not to make final judgements but to find targets for investigation by the available agents. That's the same reason we all use search engines, to narrow down, they never produce what we intend by intention alone, we still have to read the final results. Magic is still some way off.

You're acting as if we can automate humans out of the loop entirely, which would be a straw man. Is anyone saying we can get rid of the police or security agencies by using AI? Or perhaps AI will become the police, perhaps it will conduct traffic stops using driverless cars and robots? I suppose it could happen, though I'm not sure what the relevance would be here.

tomhow a day ago ago

Related discussion:

OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)

sega_sai a day ago ago

Strange smear against NYT. If NYT has a case, and the court approves that, it's bizarre to to use the court order to smear NYT. If there is no case, "Open"AI will have a chance to prove its case in court.

[-]

lxgr a day ago ago

The NYT is, in my view, exploiting a systematic weakness of the US legal system here, i.e. extremely wide reaching discovery laws with almost no regard for the privacy of parties not involved to a given dispute, or aspects of their lives not relevant to the dispute at hand.

Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.

[-]

JumpCrisscross a day ago ago

> with almost no regard for the privacy of parties not involved to a given dispute

Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.

[-]

Timwi 2 hours ago ago

Hm, this article is absolutely about a court order that exhibits “[...] almost no regard for the privacy of parties not involved to a given dispute”, so I don't get your point. If your point is that OpenAI are contesting it, then that doesn't refute the original point that the legal system allows NYT to issue such a court order in the first place that needs contesting. Ideally the privacy of uninvolved parties would be protected by the legal system, not by OpenAI.

thinkingtoilet a day ago ago

The privacy onus is entirely on the company. If Open AI is concerned about user privacy then don't collect that data. End of story.

[-]

acheron a day ago ago

…the whole point of this story is that the court is forcing them to collect the data.

[-]

thinkingtoilet 21 hours ago ago

You're telling me you don't think Open AI is already collecting chat logs?

[-]

dghlsakjg 21 hours ago ago

Yes.

In the API that is an explicit option, as well as in the paid consumer product as well. The amount of business that they stand to lose by maliciously flouting that part of their contract is in the billions.

[-]

const_cast 7 hours ago ago

I can't remember the last time a tech company has collected less data than they admit.

If you read the privacy policies you agree to, they have access to everything and outright admit it will be logged. That API option is merely a request, and absolutely need not be respected.

I can't believe we're still doing this rigamarole. If the product is not specifically designed, engineered, and open-sourced to be as privacy protecting as possible and it's not literally running on a computer you own, you have zero expectation of privacy. Once this has been proven 1 million times we don't need to prove it anymore, we can just assume and that's a very reasonable assumption.

thinkingtoilet 20 hours ago ago

You can trust Sam Altman. I do not.

Workaccount2 21 hours ago ago

"I'm wrong so here is a conspiracy so I can be right again".

Large companies lose far more by lying than they would gain from it.

taormina 21 hours ago ago

No no, they are being forced to KEEP the data they collected. They didn't have to keep it to begin with.

[-]

pj_mukh 19 hours ago ago

Isn't the only way to do that is for ChatGPT to run locally on a machine? The moment your chat hits their server they are legally required to store it?

Arainach a day ago ago

What right to privacy? There is no right to have your interactions with a company (1) remain private, nor should there be. Even if there was you agree to let OpenAI do essentially whatever they want with your data - including hand it over to the courts in response to a subpoena.

(1) With limited well scoped exclusions for lawyers, medical records, erc.

[-]

ChadNauseam a day ago ago

Given how many important interactions people have with companies in our modern age, saying "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all". When I talk to my friends over facetime or imessage, that interaction is being mediated by Apple, as well as by my internet service provider and (I assume) many other parties.

[-]

wvenable a day ago ago

> "There is no right to have your interactions with a company remain private" is essentially equivalent to saying "there is no right to privacy at all".

Legally that is a correct statement.

If you want that changed, it will require legislation.

[-]

HDThoreaun a day ago ago

Really not so simple. Roe v Wade was decided based on the implied right to privacy. Sure its been overturned but if liberals get back on the court it will be un-overturned

[-]

wvenable 19 hours ago ago

Roe v Wade refers to the constitutional right to privacy under the Due Process Clause of the 14th Amendment. This is part of individual rights against the state and has nothing to do with private companies. There is no general constitutional right that guarantees privacy in interactions with private companies.

maketheman a day ago ago

Given the current balance of the court, I'd say it's about even odds we end the entire century without ever having had a liberal court the entire time. Best reasonable case we're a solid couple of decades from it, and even that's not got great odds.

We'd have a better chance if anyone with power were talking about court reform to make the Supreme Court justices e.g. drawn by lot for each session from the district courts, but approximately nobody is. It'd be damn good and long overdue reform, but oh well.

And the thing is, we've already had a fairly conservative court for decades. I'm pretty likely to die, even if of old age, never having seen an actually-liberal court in the US my entire life. Like, WTF. Frankly, no wonder so much of our situation is fucked up, backwards, and authoritarianism-friendly. And (sigh) any serious attempts to fix that are basically on hold for many decades more, assuming rule of law survives that long anyway.

[EDIT] My point, in short, is that "we still have [thing], we just have to wait for a liberal court that'll support it" is functionally indistinguishable from not having [thing].

[-]

fallingknife 21 hours ago ago

A liberal court will probably start drawing exceptions to 1A out of thin air like "misinformation" and "hate speech." I'd rather stick with what we have.

nativeit a day ago ago

That’s presumably why legislation is needed?

whilenot-dev a day ago ago

Privacy in that example would be if no party except you and your friends can access the contents of this interaction. I wouldn't want neither Apple nor my ISP to have that access.

A company like OpenAI that offers a SaaS is no such friend, and in such power dynamics (individual VS company) it's probably in your best interest to have everything public if necessary.

[-]

lxgr 20 hours ago ago

You're always free to keep records of your ChatGPT conversations on your end.

Why tangle the data of people with very different preferences than yours up in that?

Analemma_ a day ago ago

> essentially equivalent to saying "there is no right to privacy at all".

As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law. Lots of people think the Fourth Amendment is a general right to privacy, and they are wrong: the Fourth Amendment is specifically about government search and seizure, and courts have been largely consistent about saying it does not extend beyond that to e.g. relationships with private parties.

If you want a right to privacy, you will need to advocate for laws to be changed; the ones as they exist now do not give it to you.

[-]

tiahura a day ago ago

No that is incorrect. See eg griswold, lawrence etc.

[-]

Terr_ a day ago ago

That's a fallacy of equivocation, you're introducing a different meaning/flavor of the same word.

As it stands today, a court case (A) affirming the right to use contraception is not equivalent to a court case (B) stating that a phone-company/ISP/site may not sell their records of your activity.

[-]

tiahura a day ago ago

Your response hinges on a fallacy of equivocation, but ironically, it commits one as well.

You conflate the absence of a statutory or regulatory regime governing private data transactions with the broader constitutional right to privacy. While it’s true that the Fourth Amendment limits only state action, U.S. constitutional law, via cases like Griswold v. Connecticut and Lawrence v. Texas, and clearly recognizes a substantive right to privacy, grounded in the Due Process Clause and other constitutional penumbras. This is not a semantic variant; it is a distinct and judicially enforceable right.

Moreover, beyond constitutional law, the common law explicitly protects privacy through torts such as intrusion upon seclusion, public disclosure of private facts, false light, and appropriation of likeness. These apply to private actors and are recognized in nearly every U.S. jurisdiction.

Thus, while the Constitution may not prohibit a website from selling your data, it does affirm a right to privacy in other, fundamental contexts. To deny that entirely is legally incorrect.

[-]

wvenable 19 hours ago ago

You're conflating the existence of specific privacy protections in narrow legal domains with a generalized, enforceable right to privacy which doesn't exist in US law. The Constitution recognizes a substantive right to privacy, but only in carefully defined areas like reproductive choice, family autonomy, and intimate conduct, and critically only against state actors. Citing Griswold, Lawrence, and related cases does not establish a sweeping privacy right enforceable against private companies.

Common law requires a high threshold of offensiveness and are adjudicated on a case-by-case in individual jurisdictions. They offer only remedies and not a proactive right to control your data.

The original point, that there is no general right in the US to have your interactions with a company remain private, still stands. That's not a denial of all privacy rights but a recognition that US law fails to provide comprehensive privacy protection.

[-]

tiahura 18 hours ago ago

The statement I was referring to is:

“As others have said, in the United States this is, legally, completely correct: there is no right to privacy in American law.”

That is an incorrect statement. The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.

If you’re strawman is that in the US there’s no right to privacy because there’s no blanket prohibition on talking about other people, and what they’ve been up to, then run with it.

[-]

wvenable 17 hours ago ago

> The common law torts I cited can apply in the context of a business transaction, so your statement is also incorrect.

I completely disagree. Yes, the Prosser privacy torts exist: intrusion upon seclusion, public disclosure, false light, and appropriation. But they are highly fact-specific, hard to win, rarely litigated, not recognized in all jurisdictions, and completely reactive -- you get harmed first, maybe sue later!

They are utterly inadequate to protect people in the modern data economy. A website selling your purchase history? Not actionable. A company logging your AI chats? Not intrusion. These torts are not a privacy regime - they are scraps. Also when we're talking about basic privacy rights, we just as concerned with mundane material not just "highly offensive" material that the torts would apply to.

[-]

tiahura 17 hours ago ago

Because in the US we value freedom and particularly freedom of speech.

If don’t want the grocery store telling people you buy Coke, don’t shop there.

[-]

wvenable 16 hours ago ago

So you've entirely given up your argument about the legal right to privacy involving private businesses?

[-]

tiahura 14 hours ago ago

no, i'm saying that in many contexts it is. If for example, someone hacked Safeway's store and downloaded your data, they'd be in trouble civilly and criminally. If you don't want safeway to sell your data, deal with that yourself.

[-]

wvenable 14 hours ago ago

That actually reinforces my point: there is no affirmative right to privacy, only reactive liability structures. If someone hacks Safeway, they’re prosecuted not because you have a constitutional or general right to privacy, but because they violated a criminal statute (e.g. the Computer Fraud and Abuse Act). That's not a privacy right -- it's a prohibition on unauthorized access.

As for Safeway selling your data: you're admitting that it's on the individual to opt out, negotiate, or avoid the transaction which just highlights the absence of a rights-based framework. The burden is entirely on the consumer to protect themselves, and companies can exploit that asymmetry unless narrowly constrained by statute (and even then, often with exceptions and opt-outs).

What you're describing isn't a right to privacy -- it's a lack of one, mitigated only by scattered laws and personal vigilance. That is precisely the problem.

jcalvinowens 21 hours ago ago

In practice, the constitution says whatever the supreme court says it says.

While these grand theories of traditional implicit constitutional law are nice, they're pretty meaningless in a system where five individuals can (and are willing to) vote to invalidate decades of tradition on a whim.

I too want real laws.

bobmcnamara a day ago ago

> "there is no right to privacy at all"

First time?

fc417fc802 a day ago ago

> There is no right to have your interactions with a company (1) remain private, nor should there be.

Why should two entities not be able to have a confidential interaction if that is what they both want? Certainly a court order could supersede such a right just as it could most others provided sufficient evidence. However I would expect such things to be both highly justified and narrowly targeted.

This specific case isn't so much about a right to privacy as it is a more general freedom to enter into contracts with others and expect those to be honored.

[-]

nativeit a day ago ago

Hey man, wanna buy some coke? How about trade secrets? State secrets?

bionhoward a day ago ago

It’s also a matter of competition…there are other AI services available today with various privacy policies ranging from no training by default, ability to opt out of training, ability to turn off data retention, or e2e encryption. A lot of workloads (cough, working on private git repos) logically require private AI to make sense

levocardia a day ago ago

But there's a very big difference between "no company is legally required to keep your data private" and "a company that explicitly and publically wants to protect your privacy is being legally coerced into not keeping your data private"

[-]

nativeit a day ago ago

No room here for the company’s purely self-interested motivations?

1shooner a day ago ago

>(1) With limited well scoped exclusions for lawyers, medical records, erc.

Is this referring to some actual legal precedent, or just your personal opinion?

lxgr a day ago ago

That may be your or your jurisdiction's view, but such privacy rights definitely exist in many countries.

You might have heard of the GDPR, but even before that, several countries had "privacy by default" laws on the books.

davedx a day ago ago

Hello. I live in the EU. Have you heard of GDPR?

Imustaskforhelp a day ago ago

But if both the parties agree, then there should be The freedom to stay private.

Your comment is dystopian given how the interaction is basically like how some people treat ai as their "friend" imagine no matter what encrypted messaging app or smth they use, the govt still snoops

[-]

fastball a day ago ago

Dealer-Client privilege.

visarga a day ago ago

NYT wants it both ways. When they were the ones putting freelancer articles into a database to rent, they argued against enforcing copyright and for supporting the new industry, and that it was too hard to revert their original assumptions. Now they absolutely love copyright.

https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...

[-]

moefh a day ago ago

Another way of looking at it is that they lost that case over 20 years ago, and have been building their business model for 20 years accordingly.

In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.

tptacek a day ago ago

They're a party to the case! Saying it's baseless isn't a "smear". There is literally nothing else they can say (other than something synonymous with "baseless", like "without merit").

[-]

lucianbr a day ago ago

Oh they definitely can say other things. It's just that it would be inconvenient. They might lose money.

I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...

[-]

tptacek a day ago ago

I'm not taking one side or the other in the case itself, but it's lazy and superficial to suggest that the defendant in a civil suit would say anything other than that the suit has no merit. The version of this statement where they generously interpret anything the NYT (I subscribe) says, they might as well just surrender.

I'm not sticking up for OpenAI so much as just for decent, interesting threads here.

fastball a day ago ago

This is the nature of the civil court system – it exists for when parties disagree.

Why would a defendant who agrees a case has merit go to court at all? Much easier (and generally less expensive) to make the other party whole, assuming the parties agree on what "whole" is. And if they don't agree on what "whole" is, we are back to square one and of course you'd maintain that the other side's suit is baseless.

wilg a day ago ago

> They might lose money.

I expect it's more about them losing the _case_. Silly to expect someone fighting a lawsuit not to try to win it.

mmooss a day ago ago

They could say nothing about the merits of the case.

eviks a day ago ago

And if NYT has no case, but the court approves it, is that still bizarre?

wyager a day ago ago

Lots of people abuse the legal system in various ways. They don't get a free pass just because their abuse is technically legal itself.

tootie 19 hours ago ago

It's PR. OpenAI stole mountains of copyrighted content and are trying to make NYT look like bad guys. OpenAI would not be in the position of defending a lawsuit if they hadn't done something that is very likely illegal. OpenAI can also end this requirement right now by offering a settlement.

hombre_fatal 19 hours ago ago

You know how it's always been a meme that you'd be mortally embarrassed if your browser history ever leaked?

Imagine how much worse it is for your LLM chat history to leak.

It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.

[-]

vitaflo 18 hours ago ago

WTF are you asking LLMs and why would you expect any of it to be private?

[-]

threecheese 18 hours ago ago

This product is positioned as a personal copilot, and future iterations (based on leaked plans, may or may not be true) as a wholly integrated life assistant.

Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?

I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.

A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.

cedws 4 hours ago ago

Lot of people using ChatGPT as a therapist. I tried it but it was too sycophantic.

ofjcihen 18 hours ago ago

“Write a song in the style of Slipknot about my dumb inbred dogs. I love them very much but they are…reaaaaally dumb.”

To be fair the song was intense.

hombre_fatal 18 hours ago ago

It's not that the convos are necessarily icky.

It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.

At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.

Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.

[-]

alec_irl 18 hours ago ago

> how you copied that long text from your distraught girlfriend and asked it for some response ideas

good lord, if tech were ethical then there would be mandatory reporting when someone consults an LLM to tell them how they should be responding to their intimate partner. are your skills of expression already that hobbled by chat bots?

[-]

lcnPylGDnU4H9OF 15 hours ago ago

> are your skills of expression already that hobbled by chat bots?

You have it backwards. My skills of expression were hobbled by my upbringing, and others' thoughts on self-expression allowed my skills to flourish. I wish I had a chat bot to help me understand interpersonal communication because I could have actually had good examples growing up.

[-]

Timwi 2 hours ago ago

Although I'm in a similar boat as you, I don't think access to ChatGPT would have helped because it's still much too sycophantic to tell people the kinds of things they need to hear in order to learn interpersonal skills.

If you use ChatGPT like people use /r/AmITheAsshole, you'll never get a YTA.

hombre_fatal 18 hours ago ago

These are just concrete examples to get the imagination going, not an exhaustive list of the ways that you are revealing your true self in the folds of your LLM chat history.

Note that it doesn't have to go all the way to "he gets Claude to help him win text arguments with his gf" for an uncomfortable amount of your self to be revealed by the chats.

There is always something icky about someone observing messages you wrote in privacy, and you don't have to have particularly unsavory messages for it to be icky. Why is that?

[-]

alec_irl 18 hours ago ago

i don't personally see messages with an LLM as being different from, say, terminal commands. it's a machine interface. it sounds like you're anthropomorphizing the chat bot, if you're talking to it like you would a human then i would be more worried about the implications that has for you as a person.

[-]

Timwi 2 hours ago ago

Do you think there is nothing private about your terminal commands? Would you be 100% ok with bash sending all of your command lines to a corporation with a database?

AlecSchueler 17 hours ago ago

What does this comment add to the conversation? It feels like an personal attack with no real rebuttal. People with anthropomorphise them all talk to them, the human-like interface is the entire selling point.

hombre_fatal 18 hours ago ago

Focusing on how you anthropomorphize the LLM isn't really interacting with the point since it was one example.

Might someone's google search history be embarrassing even though they don't treat google like a human?

Jackpillar 18 hours ago ago

Might have to reemphasize his question again but - what questions are you asking your LLM? Why are you responding to it and/or "treating" it differently then how you would a calculator or search engine.

[-]

hombre_fatal 18 hours ago ago

Because it's far more capable than a calculator or search engine and because you interact with it with conversational text, it reveals more aspects about your personality.

Why might your search engine queries reveal more about you than your keystrokes in a calculator? Now dial that up.

[-]

Jackpillar 18 hours ago ago

Sure - but I don't interact with it as if its human so my demeanor or attitude is neutral because I'm talking to you know - a computer. Are you getting emotional with and reprimanding your chatbot?

[-]

hombre_fatal 18 hours ago ago

I don't get why I'm receiving pushback here. How you treat the LLM was only a fraction of my examples for ways you can look pathetic if your chats were made public.

You don't reprimand the google search box, yet your search history might still be embarrassing.

[-]

hackinthebochs 18 hours ago ago

Your points were very accurate and relevant. Some people have a serious lack of imagination. The perpetual naysayers will never have their minds changed.

[-]

hombre_fatal 18 hours ago ago

Good god, thank you. I thought I was making an obvious, unanimous point when I wrote that first comment.

AlecSchueler 17 hours ago ago

It's so tiring to read. You're making a reasonable point. Some people can't believe that other people behave or feel differently to themselves.

conartist6 a day ago ago

Hey OpenAI! In your "why is this happening" you left some bits out.

You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!

atleastoptimal a day ago ago

I've always assumed that anything sent to any company's hosted API will be logged forever. To assume otherwise always seemed naive, like thinking that apps aren't tracking your web activity.

[-]

lxgr a day ago ago

Assuming the worst is wise, settling for the worst case outcome without any fight seems foolish.

fragmede a day ago ago

privacy nhilism is a decision all on its own

[-]

morsch a day ago ago

I'd only call it nihilism if you are in agreement with the grandparent and then do it anyway. Other choices are pretending it's not true (denialism), or just not thinking about (ignorance). Or you complicate your life by not uploading your private info.

Barrin92 20 hours ago ago

not really, it's basically just being anti fragile. Consider any corporate entity that interacts with you to be an Eldritch horror from outer space that wants to siphon your soul, because that's effectively what it is, and keep your business with them to a minimum.

It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head

energy123 a day ago ago

> Consumer customers: You control whether your chats are used to help improve ChatGPT within settings, and this order doesn’t change that either.

Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.

[-]

sib301 a day ago ago

Can you please elaborate?

[-]

energy123 a day ago ago

To opt-out of your data being trained on, you need to go to https://privacy.openai.com and click the button "Make a Privacy Request".

[-]

alextheparrot 19 hours ago ago

in the app: Settings ~> Data Controls ~> Improve the model for everyone

curtisblaine a day ago ago

Yes, could you please explain why toggling "Improve model for everyone" off doesn't do anything and provide a link to this off-portal app that you mention?

yoaviram a day ago ago

>Trust and privacy are at the core of our products. We give you tools to control your data—including easy opt-outs and permanent removal of deleted ChatGPT chats (opens in a new window) and API content from OpenAI’s systems within 30 days.

No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.

[-]

that_was_good a day ago ago

Except all users can opt out. Am I missing something?

It says here:

> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.

Enterprise is just opt out by default...

https://help.openai.com/en/articles/8983130-what-if-i-want-t...

[-]

bartvk a day ago ago

Indeed. Click your profile in the top right, click on the settings icon. In Settings, select "Data Controls" (not "privacy") and then there's a setting called "Improve the model for everyone" (not "privacy" or "data sharing") and turn it off.

[-]

bugtodiffer a day ago ago

so they technically kind of follow the law but make it as hard as possible?

[-]

bartvk a day ago ago

Personally I feel it's okay but kinda weird. I mean why not call it privacy. Gray pattern, IMHO. For example venice.ai simply doesn't have a privacy setting because they don't use the data from chats. (They do have basic telemetry, and the setting is called "Disable Telemetry Collection").

atoav a day ago ago

Not sharing you data with other users does not mean the data of a deleted chat are gone, those are very likely two completely different mechanisms.

And whether and how they use your data for their own purposes isn't touched by that either.

agos a day ago ago

what about all the rest of the data they use for training, there's no opt out from that

baxtr a day ago ago

This is a typical "corporate speak" / "trustwahsing" statement. It’s usually super vague, filled with feel-good buzzwords, with a couple of empty value statements sprinkled on top.

amluto a day ago ago

It appears that the “Zero Data Retention” APIs they mention are something that customers need to request access to, and that it’s really quite hard to get this access. I’d be more impressed if any API user could use those APIs.

[-]

JimDabell a day ago ago

I believe Apple’s agreement includes this, at least when a user isn’t signed into an OpenAI account:

> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.

— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...

I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.

[-]

fc417fc802 a day ago ago

> I’m sure after telling their users it’s private, they won’t be happy about everything getting logged,

The ZDR APIs are not and will not be logged. The linked page is clear about that.

singron a day ago ago

If OpenAI cared about our privacy, ZDR would be a setting anyone could turn on.

nraynaud a day ago ago

Isn't Altman collecting millions of eye scans? Since when did he care about privacy?

paxys a day ago ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

That's a lot of words to say "yes, we are violating GDPR".

[-]

3836293648 a day ago ago

No, they're not, because the GDPR has an explicit exception for when a court orders that a company keeps data for discovery. It'd only be a GDPR violation if it's kept after this case is over.

[-]

lompad a day ago ago

This is not correct.

> Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.

So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.

dragonwriter a day ago ago

That's what they are trying to suggest, because they are still trying to use the GDPR as part of their argument challenging the US court order. (Kind of a longshot to get a US court to agree that the obligation of a US party to preserve evidence related to a suit in US courts under US law filed by another US party is mitigated by European regulations in any case, even if their argument that such preservation would violate obligations that the EU had imposed on them.)

kelvinjps a day ago ago

Maybe the will ot store the chats of the European users?

esafak a day ago ago

Could a European court not have ordered the same thing? Is there an exception for lawsuits?

[-]

lxgr a day ago ago

There is, but I highly doubt a European court would have given such an order (or if they did, it would probably be axed by a higher court pretty quickly).

There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.

[-]

dragonwriter a day ago ago

> There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.

> Looking at the actual data seems much more invasive than that

Looking at the data isn't involved in the current order, which requires OpenAI to preserve and segregate the data that would otherwise have been deleted. The reason for segregation is because any challenges OpenAI has to providing that data in disccovery will be heard before anyone other than OpenAI is ordered to have access to the data.

This is, in fact, less invasive than the government mandating collection for speculative future uses, since it applies only to not destroying evidence already collected by OpenAI in the course of operating their business, and only for potential use, subject to other challenges by OpenAI, in the present case.

CjHuber a day ago ago

Even though how they responded is definitely controversial, I‘m glad that they did publicize some response to it. After reading about it in the news yesterday and seeing no response on their side yet, I was worried that they would just keep silent

WorldPeas a day ago ago

So how is this going to impact cursor's privacy mode, which is required by many companies for compliant usage of AI editors? For the uninitiated, in the web console this looks like:

Privacy mode (enforced across all seats)

OpenAI Zero-data-retention (approved)

Anthropic Zero-data-retention (approved)

Google Vertex AI Zero-data-retention (approved)

xAi Grok Zero-data-retention (approved)

did this just open another can of worms?

[-]

qmarchi a day ago ago

Likely, they're using OpenAI's Zero-Retention APIs where there's never data stored in the first place.

So nothing?

[-]

JumpCrisscross a day ago ago

> OpenAI's Zero-Retention APIs

Do we know if the court order covers these?

[-]

brigandish a day ago ago

Yes, follow the link at the top.

[-]

JumpCrisscross 17 hours ago ago

> Yes, follow the link at the top

OpenAI says “this does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.”

8note a day ago ago

at least, openai zero-data-retention will by court order be full retention.

im excited that the law is going to push for local models

[-]

blerb795 a day ago ago

The linked page specifically mentions that these ZDR APIs are not impacted.

> This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.

Kiyo-Lynn a day ago ago

Lately I’m not even sure if the things I say on OpenAI are really mine or just part of the platform. I never used to think much when chatting, but knowing some of it might be stored for a long time makes me feel uneasy. I’m not asking for much. I just want what I delete to actually be gone.

dataflow a day ago ago

> ChatGPT Enterprise and ChatGPT Edu: Your workspace admins control how long your customer content is retained. Any deleted conversations are removed from our systems within 30 days, unless we are legally required to retain them.

I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?

[-]

oxw a day ago ago

Enterprise has an exemption granted by the judge

> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.

[-]

dataflow a day ago ago

Oh I missed that part, thanks. I wonder why. I guess the judge assumes it isn't being used for copyright infringement, but other plans might be?

[-]

bee_rider a day ago ago

No idea, but just to speculate—the court’s goal isn’t actually to scare OpenAI’s users or harm their business, right? It is to collect evidence. Maybe they just figured they don’t need to dip into that pool to get enough evidence.

Grikbdl a day ago ago

Who knows, it's probably the judge's twisted idea of "that'd be too far", as if cancelling basic privacy expectations of all users everywhere wouldn't be.

tmaly 9 hours ago ago

I wonder if this would affect temporary chats too?

Caelus9 a day ago ago

Honestly, this incident makes me feel that it is really difficult to draw a clear line between “protecting privacy” and “obeying the law”. On the one hand, I am very relieved that OpenAI stood up and said “no”. After all, we all know that these systems collect everything by default, which makes people a little panic. But on the other hand, it sounds very strange that the court can directly say “give me all the data”, even those that users explicitly delete. Moreover, this also shows that everyone actually cares about their information and privacy now. No one wants to be used for anything casually.

mosdl a day ago ago

Its funny that OpenAI is complaining, they don't mind saying copyright doesn't apply to them if it makes them money.

[-]

tptacek a day ago ago

You mean, like, a pretty big fraction of everybody who comments on this site?

[-]

mmooss a day ago ago

People here advocate for private use, not profit-making corporate use.

rasengan a day ago ago

The internet is the battle of the narratives.

ivape a day ago ago

In retrospect, Bezos did the smartest thing by buying the Washington Post. In retrospect, Google did a great thing by working on a deal with Reddit. Content repositories/creators are going to sue these LLM companies in the West until they make licensing agreements. If I were OpenAI, I'd work hard to spend the money they raised to literally buyout as many of these outlets as possible.

How much could the NYT back catalog be worth? Just buy it, ask the Saudis.

dumbmrblah a day ago ago

So is this for all chats going forward or does it include conversations retroactively?

[-]

steve_adams_86 a day ago ago

Presumably moving forward, because otherwise the data retention policies wouldn't have been followed correctly (from what I understand)

wand3r a day ago ago

Does anyone know how this can be enforced?

The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.

[-]

mmooss a day ago ago

This isn't the executive branch of the US government, which has Constitutional powers. It's a private company and the court can at least enforce massive penalties, presumptions against them at trial (causing them to lose), and contempt of court. Talk to a lawyer before you try something like it.

[-]

imiric a day ago ago

> the court can at least enforce massive penalties

A.k.a. the cost of doing business.

[-]

mmooss 18 hours ago ago

Businesses care deeply about money. The bravado of many businesspeople these days, that they are immune to criticism, lawsuits, etc. is a bluff. It apparently works, because many people repeat it.

[-]

imiric 15 hours ago ago

When fines are a small percentage of the company's revenue, they do nothing to stop them from breaking the law. So they are in fact just the cost of doing business.

E.g. Meta has been fined billions many times, yet they keep reoffending. It's basically become a revenue stream for governments.

[-]

mmooss 10 hours ago ago

> Meta has been fined billions many times, yet they keep reoffending

They are a large company who do many things, some of which will violate the rules. Do they do it more, less, or the same as they would if there weren't fines?

[-]

imiric 2 hours ago ago

That's a red herring question that's impossible to answer.

The point is not that Meta and other companies break laws. It's that they keep breaking the same ones related to privacy. They do this because their business model depends on exploiting their users' data. Privacy laws to them are a nuisance that directly impact their revenue, so if they calculate that the revenue from their activity is greater than the fines, then it's just the cost of doing business. If, OTOH, it turns out that the amount of resources they would need to expend on fines or to comply with the laws are greater than the possible revenue, i.e. the juice is not worth the squeeze, then they simply bail out and stop doing business in that jurisdiction. But so far, even billion-dollar fines are clearly lower than their revenues.

It's a simple numbers game, so I'm not sure what your argument is.

landonxjames a day ago ago

Repeatedly calling the lawsuit baseless feels like it makes Open AI’s point a lot weaker. They obviously don’t like the suit, but I don’t think you can credibly argue that there aren’t tricky questions around the use of copyrighted materials in training data. Pretending otherwise is disingenuous.

[-]

sigilis 20 hours ago ago

They pay their lawyers and whoever made this page a lot for the express purpose of credibly arguing that it is very clearly totally legal and very cool to use of any IP they want to train their models.

Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.

mediumsmart a day ago ago

Its a newspaper. They are sold for a price, not to one person and they dont come with an nda. They become part of history and Society.

udev4096 21 hours ago ago

The irony is palpable here

lxgr a day ago ago

Does anybody know if this also applies to "temporary chats" on ChatGPT?

Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.

[-]

miles a day ago ago

> But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said.

https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

dvt a day ago ago

> Does this court order violate GDPR or my rights under European or other privacy laws?

> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.

jamesgill a day ago ago

Follow the money.

john2x a day ago ago

Does this mean that if I can get ChatGPT to generate copyrighted text, they'll get in trouble?

vessenes a day ago ago

This is a massssive overreach. Not in the nature of the request: "please don't destroy data that might contain proof my case is strong," but in the scale, and therefore it's a massive overreach by the judge. But shame on NYT for asking.

This request also equals: "Please keep a backup of every Senator's private chats, every Senator's spouse's private chats, every military commander's personal chats, every politician in a foreign country, forever."

There is no way that data will stay safe forever. There is no way that, once such a facility is built, it will not be used constantly, by governments all over the world.

The NYT case seems to currently be on whether or not OpenAI users use ChatGPT to circumvent paywalls. Maybe they do, although when the suit was filed, 3.5 was definitely not a reliable witness to what NYT articles were about. There are 400 million MAUs at ChatGPT - more than the population of the US.

To my mind there's three tranches of information that we could find out:

1. People's primary use case for ChatGPT is to get NYT articles for free. Therefore oAI is a bad actor making a tool that largely got profitable off infringing NYT's copyright.

2. Some core segment used/uses it for infringement purposes; not a lot, but it's a use case that sells licenses.

3. This happens, but just vanishingly rarely compared to most use cases of the tool.

I'd imagine different rulings and orders to cure in each of these circumstances, but why is it that the court needs to know any more than some percentages?

Assuming a 10k system prompt, 500 tokens of chat, 400mm people, five chats a week, that comes to roughly 67 Terabytes of data per week(!) No metadata, just ASCII output.

Nobody, ever, will read all of this. In fact, it would take about 24 hours for a Seagate drive to just push all the bytes down a bus, much less process any of it. Why not agree on representative searches, get a team to spot check data, and go from there?

Personally, I would guess the percentage of "infringement" use cases, IF it is even infringement to get an AI to verbatim quote a news article while it is NOT infringement for Cloudflare to give a verbatim quote of a news article, is going to be tiny, tiny, tiny.

NYT should back the fuck off, remember it's supposed to be a force for good in the world and not be the cause of massive possible downstream harm to people all over the world.

[-]

DrillShopper 17 hours ago ago

> There is no way that data will stay safe forever. There is no way that, once such a facility is built, it will not be used constantly, by governments all over the world.

That's on OpenAI for deciding to retain this data in the first place. They could just not have done that. That was a choice, their choice, and therefore they're responsible for it.

fallingknife 21 hours ago ago

It's obviously 3 because the entire point of the NYT is that it's a newspaper and probably 99% of their traffic is from articles new enough that they haven't had time to go into the training data. So anybody who wanted to use ChatGPT to breach the NYT paywall couldn't get any new articles. Also there are so many other ways to breach a paywall that you would have to be insane to try to do it through prompt engineering ChatGPT. The whole case is a scam and I hope the court makes them pay OpenAI's legal fees.

delusional a day ago ago

I have no time for this circus.

The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.

You do not get to claim to protect the privacy of the customers of your illegal venture.

kingkawn a day ago ago

Once the data is kept it is a matter of time til a new must-try use for it will be born

dangus a day ago ago

I think the court order doesn’t quite go against as many norms as OpenAI is claiming. It’s very reasonable to retain data pertinent to a case, and NYT’s case almost certainly revolves around finding out copyright infringement damages, which are calculated based on the number of violations (how many users queried ChatGPT and were returned verbatim copyrighted material from NYT).

If you don’t retain that data you’re destroying evidence for the case.

It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.

[-]

lxgr a day ago ago

It absolutely goes against norms in many countries other than the US, and the data of residents/citizens of these countries are affected too.

> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.

[-]

dangus a day ago ago

Countries other than the US aren't part of this lawsuit. ChatGPT operates in the US under US law. I don't know if they have separated data storage for other countries.

I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.

[-]

lxgr 20 hours ago ago

> I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.

> You're saying it's unreasonable to store data somewhere for a pending court case?

I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.

> Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.

I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.

Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.

[-]

dangus 19 hours ago ago

The scope of the data access required by the court is being worked out via due process. That’s why there’s an appeal system. OpenAI is just grandstanding in a public forum so that their customers don’t defect.

When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.

Ironburg Inventions, Ltd. v. Valve Corp.

Finjan, Inc. v. Zscaler, Inc.

Corel Software, LLC v. Microsoft

Rollins Ranches, LLC v. Watson

In none of these cases was a GDPR fine issued.

tptacek a day ago ago

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem

The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.

[-]

dangus a day ago ago

No, you're misinterpreting how information discovery and the court system works.

The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.

It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."

I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

danenania a day ago ago

Putting the merits of this specific case and positive vs. negative sentiments toward OpenAI aside, this tactic seems like it can be used to destroy any business or organization with customers who place a high value on privacy—without actually going through due process and winning a lawsuit.

Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.

However you feel about OpenAI, this is not a good precedent for user privacy and security.

[-]

fc417fc802 a day ago ago

That's not entirely fair. The argument isn't "users are using the service to break the law" but rather "the service is facilitating law breaking". To fix your signal analogy suppose you could use the chat interface to request copyrighted material from the operator.

[-]

charcircuit a day ago ago

That doesn't change the outcome being the same in that the app has to send the plain text messages of everyone, including the chat history of every user.

[-]

fc417fc802 a day ago ago

Right. But requiring logs due to suspicion that the service itself is actively violating the law is entirely different from doing so on the basis that end users might be up to no good entirely independently.

Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.

My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.

dangus a day ago ago

Again keep in mind that we are talking about a case limited analysis of that data within the privacy of the court system.

For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.

Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.

The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.

dangus a day ago ago

I'm confused at how you think that NYT isn't going through due process and attempting to win a lawsuit.

The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."

IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.

6510 a day ago ago

The harm this is doing and will do (regardless) seems to exceed the value of the NYT.

If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.

The GDPR mandates specific consent and legal bases for processing data, including sharing it.

Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.

I wonder what the fine would be if they just delete it per user agreement.

I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?

FireBeyond a day ago ago

Sure, OpenAI, I will absolutely trust you.

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.

[-]

mmooss a day ago ago

OpenAI's other policies, and other laws and regulations, do have such requirements. Are they nullified because the data is held under a court order?

[-]

mrguyorama 18 hours ago ago

"The judge and court need to view this information to actually pass justice and decide the case" almost always supersedes other laws.

The GDPR does not say that you can never be proven to have done something wrong in a court of law.

[-]

mmooss 17 hours ago ago

Right. The GGP says the information could be used for other purposes.

fragmede a day ago ago

why is it horse shit that OpenAI is saying they've put the files in a cabinet that only legal has access to?

[-]

FireBeyond 20 hours ago ago

They are saying a “legal hold” means that they have to keep the data but don’t worry they’re not allowed to use it or access it for any other reason.

A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.

vanattab a day ago ago

Protect our privacy? Or protect thier right to piracy?

[-]

charrondev a day ago ago

I mean the court is ordering them to retain user conversations at least until resolution of the court case (in case there is copyrighted responses being generated?).

So user privacy is definitely implicated.

NBJack a day ago ago

Agreed. I don't buy the spin.

tiahura a day ago ago

Every concerned ChatGPT user should file an emergency motion to intervene and request for stay of the order. ChatGPT can help you draft the motion and proposed order, just give it a copy of the discovery order. The SDNY has a very helpful pro se hotline.

The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.

junto a day ago ago

This is disingenuous from OpenAI.

They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.

NYT naively push to find a way to prove that NYT data is being used in user chats and how often.

OpenAI spin that to NYT are invading user privacy.

It’s quite transparent as to what they are doing here.

throwaway6e8f a day ago ago

Agent-1, I want to legally retain all customer data indefinitely but I'm worried about a backlash from the public. Also, I'm having a bunch of problems with the NYT accusing us of copyright violation. Give me a strategy to resolve these issues so that I win in the long term.

How we’re responding to The NYT’s data demands in order to protect user privacy