Looks great. Not sure how big the market is between "need max privacy, need on-prem" and "don't care, just use what is cheap/popular" tho.
Can you talk about how this relates to / is different / is differentiated from what Apple claimed to do during their last WWDC? They called it "private cloud compute". (To be clear, after 11 months, this is still "announced", with no implementation anywhere, as far as I can see.)
Anecdotal, but I work at a company offering an SMB product with LLM features. One of the first questions asked on any demo or sales call is what the privacy model for the LLM is, how the data is used, who has access to it, and can those features be disabled.
There are a few stories on the 'max privacy' stuff; one of the stories goes that you have two companies each with something private that needs to combine their stuff without letting the other see it; for example a bank with customer transactions and a company with analytics software they don't want to share; a system like this lets the bank put their transaction data through that analytics software without anyone being able to see the transaction data or the software. The next level on that is where two banks need to combine the transaction data to spot fraud, where you've now got three parties involved on one server.
It seems that PCC indeed went live with 18.1 - tho not in Europe (which is where I am located). Thanks for the heads up, I will look into this further.
Companies like Edgeless Systems have been building open-source confidential computing for cloud and AI for years, they are open-source, and have published in 2024 how they compare to Apple Private Cloud Compute. https://www.edgeless.systems/blog/apple-private-cloud-comput...
It seems that PCC indeed went live with 18.1 - tho not in Europe (which is where I am located). Thanks for the heads up, I will look into this further.
How large do you wager your moat to be? Confidential computing is something all major cloud providers either have or are about to have and from there it's a very small step to offer LLM-s under the same umbrella. First mover advantage is of course considerable, but I can't help but feel that this market will very quickly be swallowed by the hyperscalers.
Cloud providers aren't going to care too much about this.
I have worked for many enterprise companies e.g. banks who are trialling AI and none of them have any use for something like this. Because the entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
The proposition here is to tell a company that they can trust Azure with their banking websites, identity services and data engineering workloads but not for their model services. It just doesn't make any sense. And instead I should trust a YC startup who statistically is going to be gone in a year and will likely have their own unique set of security and privacy issues.
Also you have the issue of smaller sized open source models e.g. DeepSeek R1 lagging far behind the bigger ones and so you're giving me some unnecessary privacy attestation at the expense of a model that will give me far better accuracy and performance.
> Cloud providers aren't going to care too much about this. ... [E]nterprise companies e.g. banks ... and none of them have any use for something like this.
As former CTO of world's largest bank and cloud architect at world's largest hedge fund, this is exactly opposite of my experience with both regulated finance enterprises and the CSPs vying to serve them.
The entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
On the contrary, many global banks design for the assumption the "CSP is hostile". What happened to Coinbase's customers the past few months shows why your vendor's insider threat is your threat and your customers' threat.
Granted, this annoys CSPs who wish regulators would just let banks "adopt" the CSP's controls and call it a day.
Unfortunately for CSP sales teams — certainly this could change with recent regulator policy changes — the regulator wins. Until very recently, only one CSP offered controls sufficient to assure your own data privacy beyond a CSP's pinky-swears. AWS Nitro Enclaves can provide a key component in that assurance, using deployment models such as tinfoil.
I suspect Nvidia have done a lot of the heavy lifting to make this work; but it's not that trivial to wire the CPU and GPU confidential compute together.
GCP has confidential VMs with H100 GPUs; I'm not sure if Google would be interested. And they get huge discount buying GPUs in bulk. The trade-off between cost and privacy is obvious for most users imo.
Last I checked it was only Azure offering the Nvidia specific confidential compute extensions, I'm likely out of date - a quick Google was inconclusive.
Azure and GCP offer Confidential VMs which removes trust from the cloud providers. We’re trying to also remove trust in the service provider (aka ourselves). One example is that when you use Azure or GCP, by default, the service operator can SSH into the VM. We cannot SSH into our inference server and you can check that’s true.
But nobody wants you as a service provider. Everyone wants to have Gemini, OpenAI etc which are significantly better than the far smaller and less capable model you will be able to afford to host.
And you make this claim that the cloud provider can SSH into the VM but (a) nobody serious exposes SSH ports in Production and (b) there is no documented evidence of this ever happening.
We're not competing with Gemini or OpenAI or the big cloud providers. For instance, Google is partnering with NVIDIA to ship Gemini on-prem to regulated industries in a CC environment to protect their model weights as well as for additional data privacy on-prem: https://blogs.nvidia.com/blog/google-cloud-next-agentic-ai-r...
We're simply trying to bring similar capabilities to other companies. Inference is just our first product.
>cloud provider can SSH into the VM
The point we were making was that CC was traditionally used to remove trust from cloud providers, but not the application provider. We are further removing trust from ourselves (as the application provider), and we can enable our customers (who could be other startups or neoclouds) to remove trust from themselves and prove that to their customers.
There are a multitude of components between my app and your service. You have secured one of them arguably the least important. But you can't provide any guarantees over say your API server that my requests are going through. Or your networking stack which someone e.g. a government could MITM.
I don't know anything about "secure enclaves" but I assume that this part is sorted out. It should be possible to use http with it I imagine. If not, yeah it is totally dumb from a conceptual standpoint.
Confidential computing as a technology will become (and should be) commoditized, so the value add comes down to security and UX. We don’t want to be a confidential computing company, we want to use the right tool for the job of building private & verifiable AI. If that becomes FHE in a few years, then we will use that. We are starting with easy-to-use inference, but our goal of having any AI application be provably private
Does this not require one to trust the hardware? I'm not an expert in hardware root of trust, etc, but if Intel (or whatever chip maker) decides to just sign code that doesn't do what they say it does (coerced or otherwise) or someone finds a vuln; would that not defeat the whole purpose?
I'm not entirely sure this is different than "security by contract", except the contracts get bigger and have more technology around them?
We have to trust the hardware manufacturer (Intel/AMD/NVIDIA) designed their chips to execute the instructions we inspect, so we're assuming trust in vendor silicon either way.
The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).
Hi Nate. Routinely your various networking-related FOSS tools. Surprising to see you now work in the AI infrastructure space let alone co-founding a startup funded by YC! Tinfoil looks über neat. All the best (:
Yeah, totally agree with you. We would love to use FHE as soon as it's practical. And if you have the money and infra expertise to deploy air gapped LLMs locally, you should absolutely do that. We're trying to do the best we can with today's technology, in a way that is cheap and accessible to most people.
> The only way is to not use cloud computing at all and go on-premise.
This point of view may be based on a lack of information about how global finance handles security and privacy critical workloads in high-end cloud.
Global banks and the CSPs that serve them have by and large solved this problem by the late 2010s - early 2020s.
While much of the work is not published, you can look for presentations at AWS reInvent from e.g. Goldman Sachs or others willing to share about it, talking about cryptographic methods, enclaves, formal reasoning over not just code but things like reachability, and so on, to see the edges of what's being done in this space.
Just noticed Tinfoil runs Deepseek-R1 "70b". Technically this is not the original 671b Deepseek R1; it's just a Llama-70b trained by Deepseek R1 (called "distillation").
Tinfoil hat on: say you are compelled to execute a FISA warrant and access the LLM data, is it technically possible? What about an Australian or UK style "please add a backdoor".
I see you have to trust NVidia etc. so maybe there are such backdoors.
An attacker would need to compromise our build pipeline to publish a backdoored VM image [1] and extract key material to forge an attestation from the hardware [2]. The build process publishes a hash of the code to Sigstore’s transparency log [3], which would make the attack auditable.
That said, a sufficiently resourced attacker wouldn’t need to inject a backdoor at all. If the attacker already possesses the keys (e.g. the attacker IS the hardware manufacturer, or they’ve coerced the manufacturer to hand the keys over), then they would just need to gain access to the host server (which we control) to get access to the hypervisor, then use their keys to read memory or launch a new enclave with a forged attestation. We're planning on writing a much more detailed blog post about "how to hack ourselves" in the future.
We actually plan to do an experiment at DEFCON, likely next year where we gives ssh access to a test machine running the enclave and have people try to exfiltrate data from inside the enclave while keeping the machine running.
The pricing page implies you're basically reselling access to confidential-wrapped AI instances.
Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Is your secret sauce the tooling to spin up and manage instances and ease customer UX? Do you aim to attract an ecosystem of turnkey, confidential applications running on your platform?
Do you envision an exit strategy that sells said secret sauce and customers to a cloud provider or confidential computing middleware provider?
>Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Sure they can do that. Despite being open source, CC-mode on GPUs is quite difficult to work with especially when you start thinking about secrets management, observability etc, so we’d actually like to work with smaller cloud providers who want to provide this as a service and become competitive with the big clouds.
>Is your secret sauce the tooling to spin up and manage instances and ease customer UX?
Pretty much. Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty. If we're successful, we absolutely expect there to be a healthy ecosystem of competitors both cloud provider and startup.
>Do you envision an exit strategy that sells that secret sauce to a cloud provider or confidential computing middleware provider?
We’re not really trying to be a confidential computing provider, but more so, a verifiably private layer for AI. Which means we will try to make integration points as seamless as possible. For inference, that meant OpenAI API compatible client SDKs, we will eventually do the same for training/post-training, or MCP/OpenAI Agents SDK, etc. We want our integration points to be closely compatible with existing pipelines.
> Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty
This is not the reason at all. Complexity and difficult are inherent to large companies.
It's because it is a very low priority in an environment where for example there are tens of thousands of libraries in use, dozens of which will be in Production with active CVEs. And there are many examples of similar security and risk management issues that companies have to deal with.
Worrying about the integrity of the hardware or not trusting my cloud provider who has all my data in their S3 buckets anyway (which is encrypted using their keys) is not high on my list of concerns. And if it were I would be simply running on-premise anyway.
Technically my wife would be a perfect customer because we literally just prototyped your solution at home. But I'm confused.
For context:
My wife does leadership coaching and recently used vanilla GPT-4o via ChatGPT to summarize a transcript of an hour-long conversation.
Then, last weekend we thought... "Hey, let's test local LLMs for more privacy control. The open source models must be pretty good in 2025."
So I installed Ollama + Open WebUI plus the models on a 128GB MacBook Pro.
I am genuinely dumbfounded about the actual results we got today of comparing ChatGPT/GPT-4o vs. Llama4, Llama3.3, Llama3.2, DeepSeekR1 and Gemma.
In short: Compared to our reference GPT-4o output, none (as in NONE, zero, zilch, nil) of the above-mentioned open source models were able to create even a basic summary based on the exact same prompt + text.
The open source summaries were offensively bad. It felt like reading the most bland, generic and idiotic SEO slop I've read since I last used Google. None of the obvious topics were part of the summary. Just blah. I tested this with 5 models to boot!
I'm not an OpenAI fan per se, but if this is truly OS/SOTA then, we shouldn't even mention Llama4 or the others in the same breath as the newer OpenAI models.
Ollama does heavily quantize models and has a very short context window by default, but this has not been my experience with unquantized, full context versions of Llama3.3 70B and particularly, Deepseek R1, and that is reflected in the benchmarks. For instance I used Deepseek R1 671B as my daily driver for several months, and it was at par with o1 and unquestionably better than GPT-4o (o3 is certainly better than all but typically we've seen opensource models catch up within 6-9 months).
Please shoot me an email at tanya@tinfoil.sh, would love to work through your use cases.
Excited to see someone finally doing this! I can imagine folks with sensitive model weights being especially interested.
Do you run into rate limits or other issues with TLS cert issuance? One problem we had when doing this before is that each spinup of the enclave must generate a fresh public key, so it needs a fresh, publicly trusted TLS cert. Do you have a workaround for that, or do you just have the enclaves run for long enough that it doesn’t matter?
We actually run into the rate limit issue often particularly while spinning up new enclaves while debugging. We plan on moving to HPKE: https://www.rfc-editor.org/rfc/rfc9180.html over the next couple months. This will let us generate keys inside the enclave and encrypt the payload with the enclave specific keys, while letting us terminate TLS in a proxy outside the enclave. All the data is still encrypted to the enclave using HPKE (and still verifiable).
This is fantastic. One rarely discussed use case is avoiding overzealous "alignment" - you want models to help advance your goals without arbitrary refusals for benign inputs. Why would I want Anthropic or OpenAI to have filtering authority over my queries? Consider OpenRouter ToS - "you agree not to use the Service [..] in violation of any applicable AI Model Terms": not sure if they actually enforce it but, of course, I'd want hardware security attestations that they can't monitor or censor my inputs. Open models should be like utilities - the provider supplies the raw capability (e.g., electrons or water or inference), while usage responsibility remains entirely with the end user.
That's a big reason why we started Tinfoil and why we use it ourselves. I love the utilities analogy, something that is deeply integrated in business and personal use cases (like the Internet or AI) needs to have verifiable policies and options for data confidentiality.
Great work! I'm interested to know where the GPU servers are located. Are they in the US; do you run your own datacenter or rent servers on the hyperscalers?
Yes, in the US right now. We don't run our own datacenters, though we sometimes consider it in a moment of frustration when the provider is not able to get the correct hardware configuration and firmware versions. Currently renting bare metal servers from neoclouds. We can't use hyperscalers because we need bare metal access to the machine.
That's the best part, you don't. You only need to trust NVIDIA and AMD/Intel.
Modulo difficult to mount physical attacks and side channels, which we wrote more about here: https://tinfoil.sh/blog/2025-05-15-side-channels
So impressive - cloud AI that is verifiable with zero trust assumptions is going to be game-changing regardless of the industry application. Looks like it could be used by anyone for making anything trustworthy.
It's not that though. Not close. You are trusting the chip maker, whose process is secret (actually worse, it's almost certainly shared with the state).
This is enforced by the hardware (that’s where the root of trust goes back to NVDIA+AMD). The hardware will only send back signed enclave hashes of the code it’s running and cannot be coerced by us (or anyone else) into responding with a fake or old measurement.
Thats impressive, congrats. You've taken the "verifiable security" concept to the next level. I'm working on a similar concept, without "verifiable" part... trust remains to be built, but adding RAG ad fine tuned modelds to the use of open source LLMs, deployed in the cloud: https://gptsafe.ai/
Is there a frozen client that someone could audit for assurance, then repeatedly use with your TEE-hosted backend?
If instead users must use your web-served client code each time, you could subtly alter that over time or per-user, in ways unlikely to be detected by casual users – who'd then again be required to trust you (Tinfoil), rather than the goal on only having to trust the design & chip-manufacturer.
Yes, we have a customer who is indeed interested in having a frozen client for their app, which we're making possible. We currently have not frozen our client because we're in the early days and want to be able to iterate quickly on functionality. But happy to do so on a case-by-case basis for customers.
It looks like it might be the blur effect in a VM with no Firefox video acceleration. Also, email to support@tinfoil.sh (from "contact" link) just bounced back to me.
Set up *@ and sort it later. ask an intern to monitor that box after lunch for a while. Catchall. You probably know this, but for anyone else thinking of doing email for their business.
For example if you do tools or RAG you probably ought have abuse@ as well, even though only 4 people will think to email that.
The verified trust boundary extends from the CPU to GPU [1], and TLS encrypts all data to/from the enclave and client so we can't see anything in the clear.
HTTP parsing and application logic happens on the CPU like normal. The GPU runs CUDA just like any other app, after it's integrity is verified by the CPU. Data on the PCIe bus is encrypted between the CPU and GPU too.
Could you talk more about how how this works? I don't think linked article doesn't given enough detail on how the trust boundary extends from CPU to GPU.
Does the CPU have the ability to see unencrypted data?
>You're not terminating the TLS connection from the client anywhere besides the enclave?
Yes.
>How do you load balance or front end all of this effectively?
We don't, atleast not yet. That's why all our model endpoints have different subdomains. In the next couple months, we're planning to generate a keypair inside the enclave using HPKE that will be used to encrypt the data, as I described in this comment: https://news.ycombinator.com/item?id=43996849
This is an incredibly robust solution to a really pressing problem for a lot of individuals/orgs who want to use/deploy reasonably powerful LLMs without paying through the nose for hardware. Others have mentioned the hyperscalers have solutions that make some amount of sense (Azure confidential computing, AWS nitro enclaves) but if you read a bit more about Tinfoil, it is clear they want to operate with far less explicit user trust (and thus much better security). This team is setting the standard for provably private LLM inference, and to me, it makes other solutions seem half-baked by comparison. Props to this talented group of people.
CPU-based TEEs (AWS Nitro Enclaves, AMD SEV, Intel TDX) have been around for a few years, but aren’t widely used because they are more akin to primitives than fully baked security solutions. We are trying to make this as user friendly and self serve as possible, with full verifiability by open sourcing the entire server that runs inside the enclave. So far we have not found any end to end verifiably private solution on the market that we could just sign up for to try, which was a big reason we started Tinfoil in the first place. We also strongly believe that verifiably private AI should be the norm, so the more players in the space, the better because a missing piece is market awareness and convincing folks this is actually possible and real.
Been building something along these lines for a while. At Qbix, we call it our QBOX. Full stack, using Nix for the base, and Nitro attestation. No SSH. We have the exact same approach — cron running and only downloading signed scripts and binaries from endpoints. But there is a lot more… Would be great to connect and maybe join forces.
Yes, excited to connect, scheduled a call! We used Nitro back in December when we were prototyping but moved to NVIDIA CC because we wanted to support LLMs.
Bedrock has strong contractual guarantees, but it's still only a legal contract and runs on AWS infra. This is certainly okay for many use cases, we're trying to build for users who want verifiable privacy guarantees beyond legal contracts.
We're also doing more than pure inference, and trying to work with other companies who want to provide their users additional verifiability and confidentiality guarantees by running their entire private data processing pipeline on our platform.
Here is a marketing campaign for you to prove that secure enclaves work.
Host a machine on the internet. Allow competitors to sign up to receive root ssh credentials. Offer a $10K prize if they are able to determine plaintext inputs and outputs over a given time period (say one month).
A bit of a strawman, but a competition like this might help build confidence.
This is a great concept but I think "Enterprise-Ready Security" and your competitive comparison chart are kind of misleading. Yes, zero trust is huge. But, virtually everyone who has a use case for max privacy AI, has that use case because of compliance and IP concerns. Enterprise-Ready Security doesn't mean sigstore or zero trust, it means you have both the security at a technical level as well as certification by an auditor that you do.
You aren't enterprise ready because to address those concerns you need to get the laundry list of compliance certs: SOC 2:2, ISO 27k1/2 and 9k1, HIPPA, GDPR, CMMC, FedRAMP, NIST, etc.
All attestation verification happens client side. We have verifiers in Python [1] and Go [2] (which FFIs to our other SDKs like WASM and Swift). We push all the verification logic to the client so the verification process is entirely transparent and auditable.
Also curious about the potential users of your product, do you target individual users, small businesses, or large enterprises? Pursuing SOC2 and HIPPA make me think about the large ones; but aren't they already happy using hyperscalers?
Not to mention GCP and Azure both have confidential GPU offerings. How do you compete against them, as well as some startups mentioned in other comments like Edgeless Systems and Opaque Systems?
Looks great. Not sure how big the market is between "need max privacy, need on-prem" and "don't care, just use what is cheap/popular" tho.
Can you talk about how this relates to / is different / is differentiated from what Apple claimed to do during their last WWDC? They called it "private cloud compute". (To be clear, after 11 months, this is still "announced", with no implementation anywhere, as far as I can see.)
Here is their blog post on Apple Security, dated June 10: https://security.apple.com/blog/private-cloud-compute/
EDIT: JUST found the tinfoil blog post on exactly this topic. https://tinfoil.sh/blog/2025-01-30-how-do-we-compare
Anecdotal, but I work at a company offering an SMB product with LLM features. One of the first questions asked on any demo or sales call is what the privacy model for the LLM is, how the data is used, who has access to it, and can those features be disabled.
There are a few stories on the 'max privacy' stuff; one of the stories goes that you have two companies each with something private that needs to combine their stuff without letting the other see it; for example a bank with customer transactions and a company with analytics software they don't want to share; a system like this lets the bank put their transaction data through that analytics software without anyone being able to see the transaction data or the software. The next level on that is where two banks need to combine the transaction data to spot fraud, where you've now got three parties involved on one server.
Private Cloud Compute has been in use since iOS 18 released.
It seems that PCC indeed went live with 18.1 - tho not in Europe (which is where I am located). Thanks for the heads up, I will look into this further.
Companies like Edgeless Systems have been building open-source confidential computing for cloud and AI for years, they are open-source, and have published in 2024 how they compare to Apple Private Cloud Compute. https://www.edgeless.systems/blog/apple-private-cloud-comput...
Private Cloud Compute has been live in production for 8 months
It seems that PCC indeed went live with 18.1 - tho not in Europe (which is where I am located). Thanks for the heads up, I will look into this further.
How large do you wager your moat to be? Confidential computing is something all major cloud providers either have or are about to have and from there it's a very small step to offer LLM-s under the same umbrella. First mover advantage is of course considerable, but I can't help but feel that this market will very quickly be swallowed by the hyperscalers.
Cloud providers aren't going to care too much about this.
I have worked for many enterprise companies e.g. banks who are trialling AI and none of them have any use for something like this. Because the entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
The proposition here is to tell a company that they can trust Azure with their banking websites, identity services and data engineering workloads but not for their model services. It just doesn't make any sense. And instead I should trust a YC startup who statistically is going to be gone in a year and will likely have their own unique set of security and privacy issues.
Also you have the issue of smaller sized open source models e.g. DeepSeek R1 lagging far behind the bigger ones and so you're giving me some unnecessary privacy attestation at the expense of a model that will give me far better accuracy and performance.
> Cloud providers aren't going to care too much about this. ... [E]nterprise companies e.g. banks ... and none of them have any use for something like this.
As former CTO of world's largest bank and cloud architect at world's largest hedge fund, this is exactly opposite of my experience with both regulated finance enterprises and the CSPs vying to serve them.
The entire foundation of the IT industry is based on trusting the privacy and security policies of Azure, AWS and GCP. And in the decades since they've been around not heard of a single example of them breaking this.
On the contrary, many global banks design for the assumption the "CSP is hostile". What happened to Coinbase's customers the past few months shows why your vendor's insider threat is your threat and your customers' threat.
Granted, this annoys CSPs who wish regulators would just let banks "adopt" the CSP's controls and call it a day.
Unfortunately for CSP sales teams — certainly this could change with recent regulator policy changes — the regulator wins. Until very recently, only one CSP offered controls sufficient to assure your own data privacy beyond a CSP's pinky-swears. AWS Nitro Enclaves can provide a key component in that assurance, using deployment models such as tinfoil.
I suspect Nvidia have done a lot of the heavy lifting to make this work; but it's not that trivial to wire the CPU and GPU confidential compute together.
Being gobbled by the hyperscalers may well be the plan. Reasonable bet.
GCP has confidential VMs with H100 GPUs; I'm not sure if Google would be interested. And they get huge discount buying GPUs in bulk. The trade-off between cost and privacy is obvious for most users imo.
This. Big tech providers already offer confidential inference today.
Yes Azure has! They have very different trust assumptions though. We wrote about this here https://tinfoil.sh/blog/2025-01-30-how-do-we-compare
Last I checked it was only Azure offering the Nvidia specific confidential compute extensions, I'm likely out of date - a quick Google was inconclusive.
Have GCP and AWS started offering this for GPUs?
GCP, yes: https://cloud.google.com/confidential-computing/confidential...
Azure and GCP offer Confidential VMs which removes trust from the cloud providers. We’re trying to also remove trust in the service provider (aka ourselves). One example is that when you use Azure or GCP, by default, the service operator can SSH into the VM. We cannot SSH into our inference server and you can check that’s true.
But nobody wants you as a service provider. Everyone wants to have Gemini, OpenAI etc which are significantly better than the far smaller and less capable model you will be able to afford to host.
And you make this claim that the cloud provider can SSH into the VM but (a) nobody serious exposes SSH ports in Production and (b) there is no documented evidence of this ever happening.
We're not competing with Gemini or OpenAI or the big cloud providers. For instance, Google is partnering with NVIDIA to ship Gemini on-prem to regulated industries in a CC environment to protect their model weights as well as for additional data privacy on-prem: https://blogs.nvidia.com/blog/google-cloud-next-agentic-ai-r...
We're simply trying to bring similar capabilities to other companies. Inference is just our first product.
>cloud provider can SSH into the VM
The point we were making was that CC was traditionally used to remove trust from cloud providers, but not the application provider. We are further removing trust from ourselves (as the application provider), and we can enable our customers (who could be other startups or neoclouds) to remove trust from themselves and prove that to their customers.
You are providing the illusion of trust though.
There are a multitude of components between my app and your service. You have secured one of them arguably the least important. But you can't provide any guarantees over say your API server that my requests are going through. Or your networking stack which someone e.g. a government could MITM.
I don't know anything about "secure enclaves" but I assume that this part is sorted out. It should be possible to use http with it I imagine. If not, yeah it is totally dumb from a conceptual standpoint.
Confidential computing as a technology will become (and should be) commoditized, so the value add comes down to security and UX. We don’t want to be a confidential computing company, we want to use the right tool for the job of building private & verifiable AI. If that becomes FHE in a few years, then we will use that. We are starting with easy-to-use inference, but our goal of having any AI application be provably private
Does this not require one to trust the hardware? I'm not an expert in hardware root of trust, etc, but if Intel (or whatever chip maker) decides to just sign code that doesn't do what they say it does (coerced or otherwise) or someone finds a vuln; would that not defeat the whole purpose?
I'm not entirely sure this is different than "security by contract", except the contracts get bigger and have more technology around them?
We have to trust the hardware manufacturer (Intel/AMD/NVIDIA) designed their chips to execute the instructions we inspect, so we're assuming trust in vendor silicon either way.
The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).
Maybe one day we’ll have truly open hardware ;)
Isn't this not the case for FHE? (I understand that FHE is not practically viable as you guys mention in the OP.)
Yeah not the case for FHE. But yes, not practically viable. We would be happy to switch as soon as it is.
Hi Nate. Routinely your various networking-related FOSS tools. Surprising to see you now work in the AI infrastructure space let alone co-founding a startup funded by YC! Tinfoil looks über neat. All the best (:
> Maybe one day we'll have truly open hardware
At least the RoT/SE if nothing else: https://opentitan.org/
Love Open Titan! RISC-V all the way babe! The team is bunker: several of my labmates now work there
I agree, it's lifting trust to the manufacturer (which could still be an improvement over the cloud status quo).
Another (IMO more likely) scenario is someone finds a hardware vulnerability (or leaked signing keys) that let's them achieve a similar outcome.
The only way to guarantee privacy in cloud computing is via homorphic encryption.
This approach relies too much on trust.
If you have data you are seriously sensitive about, its better for you to run models locally on air gapped instances.
If you think this is an overkill, just see what happened to coinbase of recent. [0]
[0]: https://www.cnbc.com/2025/05/15/coinbase-says-hackers-bribed...
Yeah, totally agree with you. We would love to use FHE as soon as it's practical. And if you have the money and infra expertise to deploy air gapped LLMs locally, you should absolutely do that. We're trying to do the best we can with today's technology, in a way that is cheap and accessible to most people.
> The only way to guarantee privacy in cloud computing is via homorphic encryption
No. The only way is to not use cloud computing at all and go on-premise.
Which is what companies around the world do today for security or privacy critical workloads.
> The only way is to not use cloud computing at all and go on-premise.
This point of view may be based on a lack of information about how global finance handles security and privacy critical workloads in high-end cloud.
Global banks and the CSPs that serve them have by and large solved this problem by the late 2010s - early 2020s.
While much of the work is not published, you can look for presentations at AWS reInvent from e.g. Goldman Sachs or others willing to share about it, talking about cryptographic methods, enclaves, formal reasoning over not just code but things like reachability, and so on, to see the edges of what's being done in this space.
Just noticed Tinfoil runs Deepseek-R1 "70b". Technically this is not the original 671b Deepseek R1; it's just a Llama-70b trained by Deepseek R1 (called "distillation").
Tinfoil hat on: say you are compelled to execute a FISA warrant and access the LLM data, is it technically possible? What about an Australian or UK style "please add a backdoor".
I see you have to trust NVidia etc. so maybe there are such backdoors.
An attacker would need to compromise our build pipeline to publish a backdoored VM image [1] and extract key material to forge an attestation from the hardware [2]. The build process publishes a hash of the code to Sigstore’s transparency log [3], which would make the attack auditable.
That said, a sufficiently resourced attacker wouldn’t need to inject a backdoor at all. If the attacker already possesses the keys (e.g. the attacker IS the hardware manufacturer, or they’ve coerced the manufacturer to hand the keys over), then they would just need to gain access to the host server (which we control) to get access to the hypervisor, then use their keys to read memory or launch a new enclave with a forged attestation. We're planning on writing a much more detailed blog post about "how to hack ourselves" in the future.
We actually plan to do an experiment at DEFCON, likely next year where we gives ssh access to a test machine running the enclave and have people try to exfiltrate data from inside the enclave while keeping the machine running.
[1] https://github.com/tinfoilsh/cvmimage
[2] https://arxiv.org/abs/2108.04575
[3] https://github.com/tinfoilsh/cvmimage/attestations
What's your revenue model?
The pricing page implies you're basically reselling access to confidential-wrapped AI instances.
Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Is your secret sauce the tooling to spin up and manage instances and ease customer UX? Do you aim to attract an ecosystem of turnkey, confidential applications running on your platform?
Do you envision an exit strategy that sells said secret sauce and customers to a cloud provider or confidential computing middleware provider?
Ps. Congrats on the launch.
>Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Sure they can do that. Despite being open source, CC-mode on GPUs is quite difficult to work with especially when you start thinking about secrets management, observability etc, so we’d actually like to work with smaller cloud providers who want to provide this as a service and become competitive with the big clouds.
>Is your secret sauce the tooling to spin up and manage instances and ease customer UX?
Pretty much. Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty. If we're successful, we absolutely expect there to be a healthy ecosystem of competitors both cloud provider and startup.
>Do you envision an exit strategy that sells that secret sauce to a cloud provider or confidential computing middleware provider?
We’re not really trying to be a confidential computing provider, but more so, a verifiably private layer for AI. Which means we will try to make integration points as seamless as possible. For inference, that meant OpenAI API compatible client SDKs, we will eventually do the same for training/post-training, or MCP/OpenAI Agents SDK, etc. We want our integration points to be closely compatible with existing pipelines.
> Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty
This is not the reason at all. Complexity and difficult are inherent to large companies.
It's because it is a very low priority in an environment where for example there are tens of thousands of libraries in use, dozens of which will be in Production with active CVEs. And there are many examples of similar security and risk management issues that companies have to deal with.
Worrying about the integrity of the hardware or not trusting my cloud provider who has all my data in their S3 buckets anyway (which is encrypted using their keys) is not high on my list of concerns. And if it were I would be simply running on-premise anyway.
Technically my wife would be a perfect customer because we literally just prototyped your solution at home. But I'm confused.
For context:
My wife does leadership coaching and recently used vanilla GPT-4o via ChatGPT to summarize a transcript of an hour-long conversation.
Then, last weekend we thought... "Hey, let's test local LLMs for more privacy control. The open source models must be pretty good in 2025."
So I installed Ollama + Open WebUI plus the models on a 128GB MacBook Pro.
I am genuinely dumbfounded about the actual results we got today of comparing ChatGPT/GPT-4o vs. Llama4, Llama3.3, Llama3.2, DeepSeekR1 and Gemma.
In short: Compared to our reference GPT-4o output, none (as in NONE, zero, zilch, nil) of the above-mentioned open source models were able to create even a basic summary based on the exact same prompt + text.
The open source summaries were offensively bad. It felt like reading the most bland, generic and idiotic SEO slop I've read since I last used Google. None of the obvious topics were part of the summary. Just blah. I tested this with 5 models to boot!
I'm not an OpenAI fan per se, but if this is truly OS/SOTA then, we shouldn't even mention Llama4 or the others in the same breath as the newer OpenAI models.
What do you think?
Ollama does heavily quantize models and has a very short context window by default, but this has not been my experience with unquantized, full context versions of Llama3.3 70B and particularly, Deepseek R1, and that is reflected in the benchmarks. For instance I used Deepseek R1 671B as my daily driver for several months, and it was at par with o1 and unquestionably better than GPT-4o (o3 is certainly better than all but typically we've seen opensource models catch up within 6-9 months).
Please shoot me an email at tanya@tinfoil.sh, would love to work through your use cases.
Excited to see someone finally doing this! I can imagine folks with sensitive model weights being especially interested.
Do you run into rate limits or other issues with TLS cert issuance? One problem we had when doing this before is that each spinup of the enclave must generate a fresh public key, so it needs a fresh, publicly trusted TLS cert. Do you have a workaround for that, or do you just have the enclaves run for long enough that it doesn’t matter?
We actually run into the rate limit issue often particularly while spinning up new enclaves while debugging. We plan on moving to HPKE: https://www.rfc-editor.org/rfc/rfc9180.html over the next couple months. This will let us generate keys inside the enclave and encrypt the payload with the enclave specific keys, while letting us terminate TLS in a proxy outside the enclave. All the data is still encrypted to the enclave using HPKE (and still verifiable).
This will let us fix the rate limit issue.
NVIDIA shared open-source solutions for confidential AI already in mid-2024 https://developer.nvidia.com/blog/advancing-security-for-lar...
This is fantastic. One rarely discussed use case is avoiding overzealous "alignment" - you want models to help advance your goals without arbitrary refusals for benign inputs. Why would I want Anthropic or OpenAI to have filtering authority over my queries? Consider OpenRouter ToS - "you agree not to use the Service [..] in violation of any applicable AI Model Terms": not sure if they actually enforce it but, of course, I'd want hardware security attestations that they can't monitor or censor my inputs. Open models should be like utilities - the provider supplies the raw capability (e.g., electrons or water or inference), while usage responsibility remains entirely with the end user.
That's a big reason why we started Tinfoil and why we use it ourselves. I love the utilities analogy, something that is deeply integrated in business and personal use cases (like the Internet or AI) needs to have verifiable policies and options for data confidentiality.
Great work! I'm interested to know where the GPU servers are located. Are they in the US; do you run your own datacenter or rent servers on the hyperscalers?
Yes, in the US right now. We don't run our own datacenters, though we sometimes consider it in a moment of frustration when the provider is not able to get the correct hardware configuration and firmware versions. Currently renting bare metal servers from neoclouds. We can't use hyperscalers because we need bare metal access to the machine.
Thanks that's great to know. btw does a user need to trust Neoclouds in case they install malicious hardware/firmware/software on the servers?
That's the best part, you don't. You only need to trust NVIDIA and AMD/Intel. Modulo difficult to mount physical attacks and side channels, which we wrote more about here: https://tinfoil.sh/blog/2025-05-15-side-channels
So impressive - cloud AI that is verifiable with zero trust assumptions is going to be game-changing regardless of the industry application. Looks like it could be used by anyone for making anything trustworthy.
> with zero trust assumptions
It's not that though. Not close. You are trusting the chip maker, whose process is secret (actually worse, it's almost certainly shared with the state).
We do have to trust the chip maker until open hardware catches up [1].
[1] https://news.ycombinator.com/item?id=43997856
Even if you had open hardware, how would you even know a chip you have sitting in front of you was fabricated correctly?
Check out incredible work by Bunnie to make this possible at home https://www.bunniestudios.com/blog/2024/iris-infra-red-in-si...
> the client fetches a signed document from the enclave which includes a hash of the running code signed
Why couldn't the enclave claim to be running an older hash?
This is enforced by the hardware (that’s where the root of trust goes back to NVDIA+AMD). The hardware will only send back signed enclave hashes of the code it’s running and cannot be coerced by us (or anyone else) into responding with a fake or old measurement.
Thats impressive, congrats. You've taken the "verifiable security" concept to the next level. I'm working on a similar concept, without "verifiable" part... trust remains to be built, but adding RAG ad fine tuned modelds to the use of open source LLMs, deployed in the cloud: https://gptsafe.ai/
Is there a frozen client that someone could audit for assurance, then repeatedly use with your TEE-hosted backend?
If instead users must use your web-served client code each time, you could subtly alter that over time or per-user, in ways unlikely to be detected by casual users – who'd then again be required to trust you (Tinfoil), rather than the goal on only having to trust the design & chip-manufacturer.
Yes, we have a customer who is indeed interested in having a frozen client for their app, which we're making possible. We currently have not frozen our client because we're in the early days and want to be able to iterate quickly on functionality. But happy to do so on a case-by-case basis for customers.
> rather than the goal on only having to trust the design & chip-manufacturer
If you'd rather self-host, then the HazyResearch Lab at Stanford recently announced a FOSS e2ee implementation ("Minions") for Inference: https://hazyresearch.stanford.edu/blog/2025-05-12-security / https://github.com/HazyResearch/Minions
How is running a model onprem more expensive than on the cloud? Are you including training costs?
Edit: perhaps because you don’t need the model to be available all the time? In that case yeah the cloud can be cheaper
> https://docs.tinfoil.sh/verification/attestation-architectur...
I tried taking a look at your documentation, but the site search is very slow and laggy in Firefox.
Interesting, we haven't noticed that (on Firefox as well). We'll look into it!
It looks like it might be the blur effect in a VM with no Firefox video acceleration. Also, email to support@tinfoil.sh (from "contact" link) just bounced back to me.
Ah we don't have support@tinfoil.sh set up yet. Can you try contact@tinfoil.sh?
Set up *@ and sort it later. ask an intern to monitor that box after lunch for a while. Catchall. You probably know this, but for anyone else thinking of doing email for their business.
For example if you do tools or RAG you probably ought have abuse@ as well, even though only 4 people will think to email that.
Ha we didn't think of that, thanks for the tip!
Does the secure enclave also perform the TLS encryption on data leaving the enclave?
Also, if you're decoding TLS on the enclave, wouldn't that imply that you're parsing HTTP and JSON on the GPU itself? Very interesting if true.
The verified trust boundary extends from the CPU to GPU [1], and TLS encrypts all data to/from the enclave and client so we can't see anything in the clear.
HTTP parsing and application logic happens on the CPU like normal. The GPU runs CUDA just like any other app, after it's integrity is verified by the CPU. Data on the PCIe bus is encrypted between the CPU and GPU too.
[1] https://github.com/NVIDIA/nvtrust/blob/main/guest_tools/atte...
Could you talk more about how how this works? I don't think linked article doesn't given enough detail on how the trust boundary extends from CPU to GPU.
Does the CPU have the ability to see unencrypted data?
The keys are generated on the CPU and never leave the enclave, but the data is decrypted on the CPU so it hits the registers in plaintext.
When the enclave starts, the CPU does a few things:
1. The CPU does a key exchange with the GPU (in confidential compute mode [1]) to derive a key to encrypt data over PCIe
2. The CPU verifies the integrity of the GPU against NVIDIA's root of trust [2]
[1] https://developer.nvidia.com/blog/confidential-computing-on-...
[2] https://github.com/tinfoilsh/cvmimage/blob/b65ced8796e8a8687...
edit: formatting
You're not terminating the TLS connection from the client anywhere besides the enclave? How do you load balance or front end all of this effectively?
>You're not terminating the TLS connection from the client anywhere besides the enclave?
Yes.
>How do you load balance or front end all of this effectively?
We don't, atleast not yet. That's why all our model endpoints have different subdomains. In the next couple months, we're planning to generate a keypair inside the enclave using HPKE that will be used to encrypt the data, as I described in this comment: https://news.ycombinator.com/item?id=43996849
This is an incredibly robust solution to a really pressing problem for a lot of individuals/orgs who want to use/deploy reasonably powerful LLMs without paying through the nose for hardware. Others have mentioned the hyperscalers have solutions that make some amount of sense (Azure confidential computing, AWS nitro enclaves) but if you read a bit more about Tinfoil, it is clear they want to operate with far less explicit user trust (and thus much better security). This team is setting the standard for provably private LLM inference, and to me, it makes other solutions seem half-baked by comparison. Props to this talented group of people.
hasn’t iexec (french co) been doing this for years? what’s your competitive advantage or moat, considering they are the first-movers?
Their GTM doesn't include a $ in front of their company acronym.
I think there is similarity to https://www.anjuna.io/ and https://www.opaque.co/ here. I've heard of these, never iExec.
CPU-based TEEs (AWS Nitro Enclaves, AMD SEV, Intel TDX) have been around for a few years, but aren’t widely used because they are more akin to primitives than fully baked security solutions. We are trying to make this as user friendly and self serve as possible, with full verifiability by open sourcing the entire server that runs inside the enclave. So far we have not found any end to end verifiably private solution on the market that we could just sign up for to try, which was a big reason we started Tinfoil in the first place. We also strongly believe that verifiably private AI should be the norm, so the more players in the space, the better because a missing piece is market awareness and convincing folks this is actually possible and real.
Been building something along these lines for a while. At Qbix, we call it our QBOX. Full stack, using Nix for the base, and Nitro attestation. No SSH. We have the exact same approach — cron running and only downloading signed scripts and binaries from endpoints. But there is a lot more… Would be great to connect and maybe join forces.
Want to connect some time next Tuesday or Wednesday? https://calendly.com/qbix/meeting
Yes, excited to connect, scheduled a call! We used Nitro back in December when we were prototyping but moved to NVIDIA CC because we wanted to support LLMs.
Heh. I sold Tinfoil Security to synopsys in January 2020.
We should chat. :)
Aye comrade, shot you an email :)
How do you see this compare to things like Amazon Bedrock, where it runs OSS in my own infra?
Bedrock has strong contractual guarantees, but it's still only a legal contract and runs on AWS infra. This is certainly okay for many use cases, we're trying to build for users who want verifiable privacy guarantees beyond legal contracts.
We're also doing more than pure inference, and trying to work with other companies who want to provide their users additional verifiability and confidentiality guarantees by running their entire private data processing pipeline on our platform.
great name. good idea if it works.
Here is a marketing campaign for you to prove that secure enclaves work.
Host a machine on the internet. Allow competitors to sign up to receive root ssh credentials. Offer a $10K prize if they are able to determine plaintext inputs and outputs over a given time period (say one month).
A bit of a strawman, but a competition like this might help build confidence.
That's exactly our plan for Defcon next year as Nate just mentioned: https://news.ycombinator.com/item?id=44000103
But making it a public competition is a fantastic idea.
Would it be possible to run something like vLLM or TensortRT-llm with tinfoil?
We’re already using vllm as our inference server for our standard models. We can run whatever inference server for custom deployments.
This is a great concept but I think "Enterprise-Ready Security" and your competitive comparison chart are kind of misleading. Yes, zero trust is huge. But, virtually everyone who has a use case for max privacy AI, has that use case because of compliance and IP concerns. Enterprise-Ready Security doesn't mean sigstore or zero trust, it means you have both the security at a technical level as well as certification by an auditor that you do.
You aren't enterprise ready because to address those concerns you need to get the laundry list of compliance certs: SOC 2:2, ISO 27k1/2 and 9k1, HIPPA, GDPR, CMMC, FedRAMP, NIST, etc.
We're going through the audit process for SOC2 right now and we're planning on doing HIPAA soon
As a user, can I host the attestation server myself?
All attestation verification happens client side. We have verifiers in Python [1] and Go [2] (which FFIs to our other SDKs like WASM and Swift). We push all the verification logic to the client so the verification process is entirely transparent and auditable.
[1] https://github.com/tinfoilsh/tinfoil-python [2] https://github.com/tinfoilsh/verifier
I love the brand and logo
Are you HIPAA compliant?
Not yet, we're about one week away from SOC2, will pursue HIPAA which is arguably easier next.
Also curious about the potential users of your product, do you target individual users, small businesses, or large enterprises? Pursuing SOC2 and HIPPA make me think about the large ones; but aren't they already happy using hyperscalers?
Not to mention GCP and Azure both have confidential GPU offerings. How do you compete against them, as well as some startups mentioned in other comments like Edgeless Systems and Opaque Systems?