As someone who's been doing Infra stuff for two decades, this is very exciting. There is a lot of mindless BS we have to deal with due to shitty tools and services, and AI could save us a lot of time that we'd rather use to create meaningful value.
There is still benefit for non-Infra people. But non-Infra people don't understand system design, so the benefits are limited. Imagine a "mechanic AI". Yes, you could ask it all sorts of mechanic questions, and maybe it could even do some work on the car. But if you wanted to, say, replace the entire engine with a different one, that is a systemic change and has farther reaching implications than an AI will explain, much less perform competently. You need a mechanic to stop you and say, uh, no, please don't change the engine; explain to me what you're trying to do and I'll help you find a better solution. Then you need a real mechanic to manage changing the tires on the moving bus so it doesn't crash into the school. But having an AI could make the mechanic do all of that smoother.
Another thing I'd love to see more AI use of, is people asking the AI for advice. Most devs seem to avoid asking Infra people for architectural/design advice. This leads to them putting together a system using their limited knowledge, and it turns out to be an inferior design to what an Infra person would have suggested. Hopefully they will ask AI for advice in the future.
Glad you find it interesting. A surprising way people are using us right now has been people who are technical but don’t have deep infrastructure expertise, asking datafruit questions about how stuff should be done.
Something we’ve been dealing with is trying to get the agents to not over-complicate their designs, because they have a tendency to do so. But with good prompting they can be very helpful assistants!
Yeah it's def gonna be hard. So much of engineering is an amalgam of contexts, restrictions, intentions, best practice, and what you can get away with. An agent honed by a team of experts to keep all those things in mind (and force the user to answer important questions) would be invaluable.
Might be good to train multiple "personalities": one's a startup codebro that will tell you the easiest way to do anything; another will only give you the best practice and won't let you cheat yourself. Let the user decide who they want advice from.
Going further: input the business's requirements first, let that help decide? Just today I was on a call where somebody wants to manually deploy a single EC2 instance to run a big service. My first question is, if it goes down and it takes 2+ days to bring it back, is the business okay with that? That'll change my advice.
Yes definitely! That's why we do believe the agents, for the time being, will act as great junior devs that you can offload work onto, while as they get better they can slowly get promoted into more active roles.
The personalities approach sounds fun to experiment with. I'm wondering if you could use SAEs to scan for a "startup codebro" feature in language models. Alas this is not something we get to look into until we think that fine-tuning our own models is the best way to make them better. For now we are betting on in-context learning.
Business requirements are also incredibly valuable. Notion, Slack, and Confluence hold a lot of context, but it can be hard to find. This is something that I think the subagents architecture is great for though.
I can see the value, but to do the things you're describing, the AI needs to be given fairly highly-privileged credentials.
> Right now, Datafruit receives read-only access to your infrastructure
> "Grant @User write access to analytics S3 bucket for 24 hours"
> -> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow
These statements directly conflict with one another.
So it needs "iam:CreateRole," "iam:AttachPolicy," and other similar permissions. Those are not "read-only." And, they make it effectively admin in the account.
What safeguards are in place to make sure it doesn't delete other roles, or make production-impacting changes?
Ahh. To clarify, changes like granting users access would be done by our agent modifying IaC, so you would still have to manually apply the changes. Every potentially destructive change being an IaC change helps allow the humans to always stay in the loop. This admittedly makes the agents a little more annoying to work with, but safer.
Lots of people have asked us this! We try to do more than just being an AI-enabled IDE by giving the agent access to your infrastructure and observability tools. So you can query over your AWS, get information about metrics over the past few days, etc etc. We also plan to integrate with more DevOps tools as our customers ask for them. We also try to be less like an IDE, and more like an autonomous agent. We've noticed that DevOps engineers actually like being engineers, and enjoy some infrastructure tasks, while there are others that they would rather automate away. Not sure if you have experienced this sentiment?
Also, auto-revoke right now can be handled by creating a role in Terraform that can be assumed and expires after a certain time. But we’re exploring deeper integrations with identity providers like Okta to handle this better.
IMO it is a smart decision to implement this as a self-hosted system, and have the AI make PRs against the IaC configuration - for devops matters, human-in-the-loop is a high priority. I'm curious how well this would work if I'm using Pulumi or the AWS CDK (both are well-known to LLMs).
I consulted for an early stage company that was trying to do this during the GPT-3 era. Despite the founders' stellar reputation and impressive startup pedigree, it was exceedingly difficult to get customers to provide meaningful read access to their AWS infrastructure, let alone the ability to make changes.
LLMs are pretty awesome at Terraform, probably because there is just so much training data. They are also pretty good at the AWS CDK and Pulumi to a bit of a lesser extent, but I think giving them access to documentation is what helps make them the most accurate. Without good documentation the models start to hallucinate a bit.
And yeah, we are noticing that it’s difficult to convince people to give us access to their infrastructure. I hope that a BYOC model will help with that.
Thank you! We currently mainly use Claude Sonnet and then Opus for more difficult tasks. We experimented with GPT 5 when it came out but we might need to do some more experiments to see if it’s better. Better evals is something we are working on before we experiment too much with different models!
> (1) automated infrastructure audits— agents periodically scan your environment to find cost optimization opportunities, detect infrastructure drift, and validate your infra against compliance requirements.
Why does that need an AI? I’m pretty sure many tools for those things exist, and they predate LLMs.
Glad you mentioned this! We do use open source rule-based scanners internally to make it more deterministic. This is also a new feature, and we'd probably want to integrate with existing tools rather than competing with them. We do think there are some benefits of using LLMs though.
I think the power language models introduce is being able to more tightly integrate app-code with the infrastructure. They can read YAML, shell scripts, or ad-hoc wiki policies and map them to compliance checks, for example.
I think you are under estimating the nuances you have in non faang infrastructure. Also, based on my previous experience you will meet with developer resistance (may be AI can help you beat that). By being broad you also competing with purpose built solution like finops, devsecops etc. Who also seems to have agents now.
It is workflow automation in the end of the day. I would rather pick SOAR or AI-SOC where automation like this is very common. For eg blinkops or torq.
That's fair. For what it's worth, our agents are being used by small startups in the YC batch and they have been helpful for them.
We have not spent as much time working in the security space, and I do think that purpose-built solutions are better if you only care about security. We are purposefully trying to stay broad, which might mean that our agents lack depth in specific verticals.
Totally agree, enterprise is where the most $ is to be made, but from what we've found they care a lot about doing one specific thing very well. This has been something we've been thinking about. For now we've enjoyed working with startups as they have very interesting challenges that only appear at smaller scale.
there have been a lot of attempts to make products like these but this kinda product almost always only one problem. nobody really is sure about the access privileges it requires to operate and what it does on its backend with such privileges
That's an interesting approach. For us, we give it read-only privileges which gives the agent the context of your infrastructure, without giving it the capabilities to break things. But I do see a world where we give it more access, but add additional safeguards.
As SRE/Ops person, sigh checks the founder list and starts internally screaming
YC, you want founders of this companies to have 10 years working at Ford Motor Company. It's all reasons I want to write my blog article of "FAANG, please STFU. I wish I could be focused on 100k Requests per Second but instead I'm dealing with engineers who has no idea why their ORM is creating terrible query. Please stop telling them about GraphQL."
"Grant @User write access to analytics S3 bucket for 24 hours"
Can the user even have access to this? Do they need write access or can't understand why they are getting errors on read? What happens when they forget in 30 days they asked your LLM for access and now their application does not work because they decided to borrow this S3 bucket instead of asking for one of their own. Yes this happened.
"Find where this secret is used so I can rotate it without downtime" Well, unless you are scanning all our Github repos, Kubernetes secret and containers, you are going to miss the fact this secret was manually loaded into Kubernetes/loaded into flat file in Docker container or stored in some random secret manager none of us are even aware of.
How? Likely it's because bad schema or lack of understanding with ORMs. Fix is going to be some PR somewhere to Dev who probably does not understand what they are reviewing.
Most of our headaches is the fact that Devs almost never give a shit about Ops, their bosses don't give a shit about Ops and Ops is trying desperately to keep this train which is on fire from derailing. We don't need AI YOLOing more stuff into Prod, we need AI to tell their bosses what downtime they are causing is costing our company so maybe, just maybe, they will actually care.
These are fair criticisms. I will say, while each of these examples are challenging problems for agents to carry out, I do believe they can be solved. Especially with a tighter integration with app code.
We are always trying to learn more based on our customer's feedback. What we've learned so far is that infra setups are all extremely different, and what works for some companies don't work for others. There's also vastly different company cultures related to ops. Some companies value their ops team a lot, other companies burden them with way too much work. Our goal is to try to make that burden a little lighter :)
We use Azure and sometimes Hetzner. I don't think Azure is a bad product, but it sometimes amazes me just how many different ways they can let you buy something as simple as a "load balancer". Azure obviously has some services that Hetzner does not, a lot, but as far as 95% of what we need in our cloud infra Hetzner does just fine and it's soooooooo much simpler.
I don't mind letting AI's help with infra, but it's with the configs and infra as code files and it will never have any form of access to anything outside it's little box. It's significantly faster at writing out the port ranges for an FTP (don't ask) ingress than I can by hand.
> Translation: The AWS interface is so horrendously complicated that we now need an AI to navigate it.
that's because infrastructure is complicated. the AWS console isn't that bad (it's not great, and you should just use terraform whenever possible because clickops is dull, error-prone work); there's just a lot to know in order to deploy infrastructure cost-effectively.
this is more like "we don't want to hire infra engineers who know what they're doing so here's a tool to make suggestions that a knowledgeable engineer would make, vet and apply. just Trust Us."
I've always heard the theory that if you're not ashamed of your launch announcement then you've launched too late, but a page with just "Book a Call" is stretching the plausibility for who could possibly be in the target demographic
I know dang is going to shake his finger at me for this, but come on.
Also:
> AWS emulator
isn't doing you any favors. I, too, have tried localstack and I can tell you first hand it is not an AWS emulator. That doesn't even get into the fact that AWS is not DevOps so what's up: is it AWS only or does it have GCP Emulation, too?
That's my whole point about the leading observation: without proper expectation management, how could anyone who spots this Launch HN possibly know if they should spend the time to book a call with you?
I'm not sure I understand the criticism here, but let me try to address what I think you (might?) mean, and I hope it doesn't come across as shaking a finger!
You're right that the bar is higher for Launch HNs (I wrote about this here: https://news.ycombinator.com/item?id=39633270) - but it's not uncommon for a startup to have a working product and real customers and yet have a home page that just says "book a call".
For some early-stage startups it makes sense to focus on iterating rapidly based on feedback from a few customers, and to defer building what used to be called the "whole product" (including self-serve features, a complete website, etc.) until later. It's simply about prioritizing higher-risk things and deferring lower-risk things.
I believe this is especially true for enterprise products, since deployment, onboarding, etc. are more complex and usually require personal interaction (at least in the early stages).
In such cases, a Launch HN can still make sense because the startup is real, the product is real, and there are real customers. But since the product can't be tried out publicly, I tell the founders they need a good demo video, and I usually tell them to add to their text an explanation of why the product isn't publicly available yet, as well as an invitation to contact them if people want to know more or want to be an early adopter. (You'll notice that both of those things are present in the text above!)
> Trademarks don’t have to be identical to be confusingly similar. Instead, they could just be similar in sound, appearance, or meaning, or could create a similar commercial impression.
As someone who's been doing Infra stuff for two decades, this is very exciting. There is a lot of mindless BS we have to deal with due to shitty tools and services, and AI could save us a lot of time that we'd rather use to create meaningful value.
There is still benefit for non-Infra people. But non-Infra people don't understand system design, so the benefits are limited. Imagine a "mechanic AI". Yes, you could ask it all sorts of mechanic questions, and maybe it could even do some work on the car. But if you wanted to, say, replace the entire engine with a different one, that is a systemic change and has farther reaching implications than an AI will explain, much less perform competently. You need a mechanic to stop you and say, uh, no, please don't change the engine; explain to me what you're trying to do and I'll help you find a better solution. Then you need a real mechanic to manage changing the tires on the moving bus so it doesn't crash into the school. But having an AI could make the mechanic do all of that smoother.
Another thing I'd love to see more AI use of, is people asking the AI for advice. Most devs seem to avoid asking Infra people for architectural/design advice. This leads to them putting together a system using their limited knowledge, and it turns out to be an inferior design to what an Infra person would have suggested. Hopefully they will ask AI for advice in the future.
Glad you find it interesting. A surprising way people are using us right now has been people who are technical but don’t have deep infrastructure expertise, asking datafruit questions about how stuff should be done.
Something we’ve been dealing with is trying to get the agents to not over-complicate their designs, because they have a tendency to do so. But with good prompting they can be very helpful assistants!
Yeah it's def gonna be hard. So much of engineering is an amalgam of contexts, restrictions, intentions, best practice, and what you can get away with. An agent honed by a team of experts to keep all those things in mind (and force the user to answer important questions) would be invaluable.
Might be good to train multiple "personalities": one's a startup codebro that will tell you the easiest way to do anything; another will only give you the best practice and won't let you cheat yourself. Let the user decide who they want advice from.
Going further: input the business's requirements first, let that help decide? Just today I was on a call where somebody wants to manually deploy a single EC2 instance to run a big service. My first question is, if it goes down and it takes 2+ days to bring it back, is the business okay with that? That'll change my advice.
Yes definitely! That's why we do believe the agents, for the time being, will act as great junior devs that you can offload work onto, while as they get better they can slowly get promoted into more active roles.
The personalities approach sounds fun to experiment with. I'm wondering if you could use SAEs to scan for a "startup codebro" feature in language models. Alas this is not something we get to look into until we think that fine-tuning our own models is the best way to make them better. For now we are betting on in-context learning.
Business requirements are also incredibly valuable. Notion, Slack, and Confluence hold a lot of context, but it can be hard to find. This is something that I think the subagents architecture is great for though.
I can see the value, but to do the things you're describing, the AI needs to be given fairly highly-privileged credentials.
> Right now, Datafruit receives read-only access to your infrastructure
> "Grant @User write access to analytics S3 bucket for 24 hours" > -> Creates temporary IAM role, sends least-privilege credentials, auto-revokes tomorrow
These statements directly conflict with one another.
So it needs "iam:CreateRole," "iam:AttachPolicy," and other similar permissions. Those are not "read-only." And, they make it effectively admin in the account.
What safeguards are in place to make sure it doesn't delete other roles, or make production-impacting changes?
Ahh. To clarify, changes like granting users access would be done by our agent modifying IaC, so you would still have to manually apply the changes. Every potentially destructive change being an IaC change helps allow the humans to always stay in the loop. This admittedly makes the agents a little more annoying to work with, but safer.
So you’re modifying Terraform? How is your tool better than just using an AI-enabled IDE and asking it to apply the change?
How is the auto-revoke handled? Will it require human intervention to merge a PR/apply the Terraform configuration, or will it do it automatically?
Lots of people have asked us this! We try to do more than just being an AI-enabled IDE by giving the agent access to your infrastructure and observability tools. So you can query over your AWS, get information about metrics over the past few days, etc etc. We also plan to integrate with more DevOps tools as our customers ask for them. We also try to be less like an IDE, and more like an autonomous agent. We've noticed that DevOps engineers actually like being engineers, and enjoy some infrastructure tasks, while there are others that they would rather automate away. Not sure if you have experienced this sentiment?
Also, auto-revoke right now can be handled by creating a role in Terraform that can be assumed and expires after a certain time. But we’re exploring deeper integrations with identity providers like Okta to handle this better.
IMO it is a smart decision to implement this as a self-hosted system, and have the AI make PRs against the IaC configuration - for devops matters, human-in-the-loop is a high priority. I'm curious how well this would work if I'm using Pulumi or the AWS CDK (both are well-known to LLMs).
I consulted for an early stage company that was trying to do this during the GPT-3 era. Despite the founders' stellar reputation and impressive startup pedigree, it was exceedingly difficult to get customers to provide meaningful read access to their AWS infrastructure, let alone the ability to make changes.
LLMs are pretty awesome at Terraform, probably because there is just so much training data. They are also pretty good at the AWS CDK and Pulumi to a bit of a lesser extent, but I think giving them access to documentation is what helps make them the most accurate. Without good documentation the models start to hallucinate a bit.
And yeah, we are noticing that it’s difficult to convince people to give us access to their infrastructure. I hope that a BYOC model will help with that.
Congrats on the launch. As a former CI build engineer, I’m very curious about this and look forward to watching your progress. One question
> we’ve talked to a couple of startups where the Claude Code + AWS CLI combo has taken their infra down
Do you care to share what language model(s) you use?
Thank you! We currently mainly use Claude Sonnet and then Opus for more difficult tasks. We experimented with GPT 5 when it came out but we might need to do some more experiments to see if it’s better. Better evals is something we are working on before we experiment too much with different models!
> (1) automated infrastructure audits— agents periodically scan your environment to find cost optimization opportunities, detect infrastructure drift, and validate your infra against compliance requirements.
Why does that need an AI? I’m pretty sure many tools for those things exist, and they predate LLMs.
Glad you mentioned this! We do use open source rule-based scanners internally to make it more deterministic. This is also a new feature, and we'd probably want to integrate with existing tools rather than competing with them. We do think there are some benefits of using LLMs though.
I think the power language models introduce is being able to more tightly integrate app-code with the infrastructure. They can read YAML, shell scripts, or ad-hoc wiki policies and map them to compliance checks, for example.
I think you are under estimating the nuances you have in non faang infrastructure. Also, based on my previous experience you will meet with developer resistance (may be AI can help you beat that). By being broad you also competing with purpose built solution like finops, devsecops etc. Who also seems to have agents now.
It is workflow automation in the end of the day. I would rather pick SOAR or AI-SOC where automation like this is very common. For eg blinkops or torq.
That's fair. For what it's worth, our agents are being used by small startups in the YC batch and they have been helpful for them.
We have not spent as much time working in the security space, and I do think that purpose-built solutions are better if you only care about security. We are purposefully trying to stay broad, which might mean that our agents lack depth in specific verticals.
I wouldnt index on these startups. People who would pay big bucks are in enterprise. Thats largely your market.
Totally agree, enterprise is where the most $ is to be made, but from what we've found they care a lot about doing one specific thing very well. This has been something we've been thinking about. For now we've enjoyed working with startups as they have very interesting challenges that only appear at smaller scale.
I'm very excited about your company. Would be fun to chat about GTM with you guys.
Would love to! I think I just found and added you on LinkedIn
Congrats on the launch! Excited to see you guys adopt a BYOC distribution model
thank you!
Really great stuff! Congrats on the launch!
there have been a lot of attempts to make products like these but this kinda product almost always only one problem. nobody really is sure about the access privileges it requires to operate and what it does on its backend with such privileges
That's an interesting approach. For us, we give it read-only privileges which gives the agent the context of your infrastructure, without giving it the capabilities to break things. But I do see a world where we give it more access, but add additional safeguards.
As SRE/Ops person, sigh checks the founder list and starts internally screaming
YC, you want founders of this companies to have 10 years working at Ford Motor Company. It's all reasons I want to write my blog article of "FAANG, please STFU. I wish I could be focused on 100k Requests per Second but instead I'm dealing with engineers who has no idea why their ORM is creating terrible query. Please stop telling them about GraphQL."
"Grant @User write access to analytics S3 bucket for 24 hours" Can the user even have access to this? Do they need write access or can't understand why they are getting errors on read? What happens when they forget in 30 days they asked your LLM for access and now their application does not work because they decided to borrow this S3 bucket instead of asking for one of their own. Yes this happened.
"Find where this secret is used so I can rotate it without downtime" Well, unless you are scanning all our Github repos, Kubernetes secret and containers, you are going to miss the fact this secret was manually loaded into Kubernetes/loaded into flat file in Docker container or stored in some random secret manager none of us are even aware of.
""Why did database costs spike yesterday?" -> Identifies expensive queries, shows optimization options, implements fixes
How? Likely it's because bad schema or lack of understanding with ORMs. Fix is going to be some PR somewhere to Dev who probably does not understand what they are reviewing.
Most of our headaches is the fact that Devs almost never give a shit about Ops, their bosses don't give a shit about Ops and Ops is trying desperately to keep this train which is on fire from derailing. We don't need AI YOLOing more stuff into Prod, we need AI to tell their bosses what downtime they are causing is costing our company so maybe, just maybe, they will actually care.
These are fair criticisms. I will say, while each of these examples are challenging problems for agents to carry out, I do believe they can be solved. Especially with a tighter integration with app code.
We are always trying to learn more based on our customer's feedback. What we've learned so far is that infra setups are all extremely different, and what works for some companies don't work for others. There's also vastly different company cultures related to ops. Some companies value their ops team a lot, other companies burden them with way too much work. Our goal is to try to make that burden a little lighter :)
Translation: The AWS interface is so horrendously complicated that we now need an AI to navigate it.
Also, as a daily AI user (claude code / codex subs), I'm not sure I want YOLO AIs anywhere near my infra.
We use Azure and sometimes Hetzner. I don't think Azure is a bad product, but it sometimes amazes me just how many different ways they can let you buy something as simple as a "load balancer". Azure obviously has some services that Hetzner does not, a lot, but as far as 95% of what we need in our cloud infra Hetzner does just fine and it's soooooooo much simpler.
I don't mind letting AI's help with infra, but it's with the configs and infra as code files and it will never have any form of access to anything outside it's little box. It's significantly faster at writing out the port ranges for an FTP (don't ask) ingress than I can by hand.
AWS has created a whole economy of companies whose job is to make the dashboard more tolerable. Hopefully our agents help with that haha.
> Translation: The AWS interface is so horrendously complicated that we now need an AI to navigate it.
that's because infrastructure is complicated. the AWS console isn't that bad (it's not great, and you should just use terraform whenever possible because clickops is dull, error-prone work); there's just a lot to know in order to deploy infrastructure cost-effectively.
this is more like "we don't want to hire infra engineers who know what they're doing so here's a tool to make suggestions that a knowledgeable engineer would make, vet and apply. just Trust Us."
I've always heard the theory that if you're not ashamed of your launch announcement then you've launched too late, but a page with just "Book a Call" is stretching the plausibility for who could possibly be in the target demographic
I know dang is going to shake his finger at me for this, but come on.
Also:
> AWS emulator
isn't doing you any favors. I, too, have tried localstack and I can tell you first hand it is not an AWS emulator. That doesn't even get into the fact that AWS is not DevOps so what's up: is it AWS only or does it have GCP Emulation, too?
That's my whole point about the leading observation: without proper expectation management, how could anyone who spots this Launch HN possibly know if they should spend the time to book a call with you?
I'm not sure I understand the criticism here, but let me try to address what I think you (might?) mean, and I hope it doesn't come across as shaking a finger!
You're right that the bar is higher for Launch HNs (I wrote about this here: https://news.ycombinator.com/item?id=39633270) - but it's not uncommon for a startup to have a working product and real customers and yet have a home page that just says "book a call".
For some early-stage startups it makes sense to focus on iterating rapidly based on feedback from a few customers, and to defer building what used to be called the "whole product" (including self-serve features, a complete website, etc.) until later. It's simply about prioritizing higher-risk things and deferring lower-risk things.
I believe this is especially true for enterprise products, since deployment, onboarding, etc. are more complex and usually require personal interaction (at least in the early stages).
In such cases, a Launch HN can still make sense because the startup is real, the product is real, and there are real customers. But since the product can't be tried out publicly, I tell the founders they need a good demo video, and I usually tell them to add to their text an explanation of why the product isn't publicly available yet, as well as an invitation to contact them if people want to know more or want to be an early adopter. (You'll notice that both of those things are present in the text above!)
https://www.adafruit.com/trademarks
https://www.uspto.gov/trademarks/search/likelihood-confusion
> Trademarks don’t have to be identical to be confusingly similar. Instead, they could just be similar in sound, appearance, or meaning, or could create a similar commercial impression.