Cloud providers in general haven't gone very far toward providing hooks for validation.
It seems easier for the cloud provider to implement the equivalent of a dry-run flag in API calls that validate that the call would succeed (even if it's best effort determination) which could be used by tools like Terraform during the planning and dependency tree generation.
Instead, you have platform providers like AzureRM that squint at the supplied objects and make a guess as to whether that looks valid, which causes a ton of failures upon actual application. For instance, if you try to create storage with a redundancy level not supported by the region you're adding it to, Terraform will pass a plan stage, but the actual application of the resource will fail because the region doesn't support that level of redundancy.
There are unlimited other examples in a similar vein, all of which could be resolved if API providers had a dryrun flag.
The most confusing part of terraform for me is that terraform's view of the infrastructure is a singleton config file that is often stored in that very infrastructure. And then you have to share that somehow with your team and be very careful that no one gets it out of sync.
Why don't cloud providers have a nice way for tools like TF to query the current state of the infra? Maybe they do and I'm doing IaC wrong?
At $WORK we have a Git repo set up by the devops team, where we can manage our junk by creating Terraform resources in our main AWS account.
The state however is always stored in a _separate AWS account_ that only the devops team can manage. I find this to be a reasonable way of working with TF. I agree that it is confusing though, because one is using $PROVIDER to both create things and manage those things at the same time, but conceptually from TF’s perspective they are very different things.
There is the code, the recorded state of the infra when you applied the code and the actual state at some point in the future (which may have drifted) . You store the code in git, the recorded state (which contains unique IDs, ARNs etc) in a bucket and you read the "actual state" next time you run a plan, and you detect drift.
These days people store the state in terraform cloud or spaceliftor env0 or whatever. Doesn't have to be the same infra you deployed.
If you were a lunatic you could not use a state backend and just let it create state files in the terraform code directory, check the file into git with all those secrets and unique ids etc.
One big reason I tend to build on GCP instead of AWS is it's much easier to use with Terraform. GCP's APIs are generally defined as a semantic unit while AWS has ad-hoc resources that get strung together by the console or CLIs, not the APIs. An example is a k8s cluster in AWS takes a dozen resources while in GCP it's just one.
While there are then third party (I think) Terraform modules to try to abstract the AWS world into an easier to use interface, they can't really solve the problem that in the end Terraform manages resources and orchestrating changes including deletion across a dozen of resources is much harder than a single one.
GCP is huge so I wouldn't be surprised if there are also problematic units there with less good definition. But I would still argue that there are cloud providers that provide a reasonable view into their infra fo IAC.
That said, it looks like Ansible has deprecated those modules, and that seems fair - I haven't actually heard of anyone deploying infrastructure in a public cloud with Ansible in years. It found its niche is image generation and systems management. Almost all modern tools like Terraform, Pulumi, and even CloudFormation (albeit under the hood) keep a state file.
> The most confusing part of terraform for me is that terraform's view of the infrastructure is a singleton config file that is often stored in that very infrastructure.
That article is way overkill. One should just manually create the backend storage (S3 bucket or whatever you use). No reason to faff about with the steps in the article.
The reason to not create the bucket are because you want to ensure that you don’t have any click ops resources that you can’t track. If you manually create anything, that means it’s not in code and therefore the rest of the team doesn’t know where it lives, who created it, or when.
When you have a hammer… as the expression goes. It’s crazy how many times that even knowing this, I have to catch myself and step back. IaC is a contextually different way of thinking and it’s easy to get lost.
* The state terraform holds which is what it thinks your infrastructure state is
* The actual state of your infrastructure
>Why don't cloud providers have a nice way for tools like TF to query the current state of the infra?
What a terraform provider is is code that queries the targeted resources through whatever APIs they provide. I guess you could argue these APIs could be better, faster, or more tuned towards infrastructure management... but gathering state from whatever resources it manages is one of the core things terraform does. I'm not sure what you're asking for.
for the plan file to be updated to the state of the world in a non-conusing way so that apply does the right thing without a chance it's gonna blow things up.
This is really up to the writer of the provider (very often the service itself) to have the provider code correctly model how the service works. It very often doesn't and allows you to plan error-free what will fail during apply.
Most tools, frameworks and articles in IT, SaaS in particular, are about spinning up things. It is what people find exciting.
Work a few years in Ops and you learn that spinning up things is not a big part of your work. It's maintenance, such as deleting stuff.
Unfortunately this process is the hardest, and there's very little to help you do it right. Many tools, framework and vendors don't even have proper support for it.
Some even recommend 'rinse and repeat' instead of adjusting what you have - and this method is not great if you value uptime, nor if you have state that you want to preserve, such as customer data :-)
Deleting stuff, shutting services down, turning off servers - those are hard tasks in IT.
My acid test for provisioning automation products is asking: Can it rename deployed resources?
Practically none can, even in market segments where this is highly relevant. For example: user identity and access management products. Women get married and change their name all the time!
The next level up is the ability to rename a container such as an organisational unit or a security group.
Then, products that can rearrange a hierarchy to accommodate a merger, split, or a new layer of management. This obviously needs to preserve the data. “Immutable infrastructure” where everything is recreated from scratch and the original is dropped is cheating.
I’ve only ever seen one such provisioning tool, the rest don’t even begin to approach this level of capability.
I love how terraform can describe what I’ve got. Sort of. Assuming I or my colleagues or my noob customers don’t modify resources on the same account.
I don’t love how unreliable providers are, even for creating resources. Clouds like DigitalOcean will 429 throttle me for making too many plans in a row with only 100+ resources. Sometimes the plan goes through, but the apply fails. Sometimes halfway through.
I’d rather use a cloud-specific API, unless I’m certain of the quality of the specific terraform provider.
"Because referential integrity is a thing, and if you don't have all dependencies either explicitly declared or implicitly determinable in your plan, your cloud provider is going to enforce it for you."
Cloud providers in general haven't gone very far toward providing hooks for validation.
It seems easier for the cloud provider to implement the equivalent of a dry-run flag in API calls that validate that the call would succeed (even if it's best effort determination) which could be used by tools like Terraform during the planning and dependency tree generation.
Instead, you have platform providers like AzureRM that squint at the supplied objects and make a guess as to whether that looks valid, which causes a ton of failures upon actual application. For instance, if you try to create storage with a redundancy level not supported by the region you're adding it to, Terraform will pass a plan stage, but the actual application of the resource will fail because the region doesn't support that level of redundancy.
There are unlimited other examples in a similar vein, all of which could be resolved if API providers had a dryrun flag.
The most confusing part of terraform for me is that terraform's view of the infrastructure is a singleton config file that is often stored in that very infrastructure. And then you have to share that somehow with your team and be very careful that no one gets it out of sync.
Why don't cloud providers have a nice way for tools like TF to query the current state of the infra? Maybe they do and I'm doing IaC wrong?
At $WORK we have a Git repo set up by the devops team, where we can manage our junk by creating Terraform resources in our main AWS account.
The state however is always stored in a _separate AWS account_ that only the devops team can manage. I find this to be a reasonable way of working with TF. I agree that it is confusing though, because one is using $PROVIDER to both create things and manage those things at the same time, but conceptually from TF’s perspective they are very different things.
There is the code, the recorded state of the infra when you applied the code and the actual state at some point in the future (which may have drifted) . You store the code in git, the recorded state (which contains unique IDs, ARNs etc) in a bucket and you read the "actual state" next time you run a plan, and you detect drift.
These days people store the state in terraform cloud or spaceliftor env0 or whatever. Doesn't have to be the same infra you deployed.
If you were a lunatic you could not use a state backend and just let it create state files in the terraform code directory, check the file into git with all those secrets and unique ids etc.
One big reason I tend to build on GCP instead of AWS is it's much easier to use with Terraform. GCP's APIs are generally defined as a semantic unit while AWS has ad-hoc resources that get strung together by the console or CLIs, not the APIs. An example is a k8s cluster in AWS takes a dozen resources while in GCP it's just one.
While there are then third party (I think) Terraform modules to try to abstract the AWS world into an easier to use interface, they can't really solve the problem that in the end Terraform manages resources and orchestrating changes including deletion across a dozen of resources is much harder than a single one.
GCP is huge so I wouldn't be surprised if there are also problematic units there with less good definition. But I would still argue that there are cloud providers that provide a reasonable view into their infra fo IAC.
> Why don't cloud providers have a nice way for tools like TF to query the current state of the infra? Maybe they do and I'm doing IaC wrong?
This is technically how Ansible works. Here's an extensive list of modules that deploy resources in various public clouds: https://docs.ansible.com/projects/ansible/2.9/modules/list_o...
That said, it looks like Ansible has deprecated those modules, and that seems fair - I haven't actually heard of anyone deploying infrastructure in a public cloud with Ansible in years. It found its niche is image generation and systems management. Almost all modern tools like Terraform, Pulumi, and even CloudFormation (albeit under the hood) keep a state file.
I think there are active maintained modules https://docs.ansible.com/projects/ansible/latest/collections...
At work we use Ansible to setup Route53 records for infrastructure hosted elsewhere. Not sure if that counts as infrastructure.
> The most confusing part of terraform for me is that terraform's view of the infrastructure is a singleton config file that is often stored in that very infrastructure.
These folks also have an article about that: https://newsletter.masterpoint.io/p/how-to-bootstrap-your-st...
That article is way overkill. One should just manually create the backend storage (S3 bucket or whatever you use). No reason to faff about with the steps in the article.
The reason to not create the bucket are because you want to ensure that you don’t have any click ops resources that you can’t track. If you manually create anything, that means it’s not in code and therefore the rest of the team doesn’t know where it lives, who created it, or when.
This is excellent advice.
When you have a hammer… as the expression goes. It’s crazy how many times that even knowing this, I have to catch myself and step back. IaC is a contextually different way of thinking and it’s easy to get lost.
There are three things:
* Your terraform code
* The state terraform holds which is what it thinks your infrastructure state is
* The actual state of your infrastructure
>Why don't cloud providers have a nice way for tools like TF to query the current state of the infra?
What a terraform provider is is code that queries the targeted resources through whatever APIs they provide. I guess you could argue these APIs could be better, faster, or more tuned towards infrastructure management... but gathering state from whatever resources it manages is one of the core things terraform does. I'm not sure what you're asking for.
for the plan file to be updated to the state of the world in a non-conusing way so that apply does the right thing without a chance it's gonna blow things up.
This is really up to the writer of the provider (very often the service itself) to have the provider code correctly model how the service works. It very often doesn't and allows you to plan error-free what will fail during apply.
It's not an API issue but a terraform provider issue having missing or incomplete code (i.e. https://github.com/hashicorp/terraform-provider-aws )
> Why don't cloud providers have a nice way for tools like TF to query the current state of the infra?
They do! In fact, this is my greatest pet peeve with TF, it adds state when it's not needed.
I was doing infra-as-code without TF with AWS long time ago. It went like this:
AWS has tag-on-create now, making this sort of code reliable. Before that, you could do the same with instance idempotency tokens. GCP also has tags.I am not a fan of abreviations, this article didn't even have terraform written out once.
I assumed it was going to be about tensorflow
Sorry, with Terraform and OpenTofu both using “TF”, I default to that so that articles and my writing pertain to both.
Most tools, frameworks and articles in IT, SaaS in particular, are about spinning up things. It is what people find exciting.
Work a few years in Ops and you learn that spinning up things is not a big part of your work. It's maintenance, such as deleting stuff.
Unfortunately this process is the hardest, and there's very little to help you do it right. Many tools, framework and vendors don't even have proper support for it.
Some even recommend 'rinse and repeat' instead of adjusting what you have - and this method is not great if you value uptime, nor if you have state that you want to preserve, such as customer data :-)
Deleting stuff, shutting services down, turning off servers - those are hard tasks in IT.
My acid test for provisioning automation products is asking: Can it rename deployed resources?
Practically none can, even in market segments where this is highly relevant. For example: user identity and access management products. Women get married and change their name all the time!
The next level up is the ability to rename a container such as an organisational unit or a security group.
Then, products that can rearrange a hierarchy to accommodate a merger, split, or a new layer of management. This obviously needs to preserve the data. “Immutable infrastructure” where everything is recreated from scratch and the original is dropped is cheating.
I’ve only ever seen one such provisioning tool, the rest don’t even begin to approach this level of capability.
I love how terraform can describe what I’ve got. Sort of. Assuming I or my colleagues or my noob customers don’t modify resources on the same account.
I don’t love how unreliable providers are, even for creating resources. Clouds like DigitalOcean will 429 throttle me for making too many plans in a row with only 100+ resources. Sometimes the plan goes through, but the apply fails. Sometimes halfway through.
I’d rather use a cloud-specific API, unless I’m certain of the quality of the specific terraform provider.
Because TF is lacking sequentials state descriptions in rare cases - ex: Termination protections in AWS.
Hell, let's talk about why ^c'ing the plan phase sucks.
"Because referential integrity is a thing, and if you don't have all dependencies either explicitly declared or implicitly determinable in your plan, your cloud provider is going to enforce it for you."