The only thing that spending time with Cloudformation teaches me is how much it makes me prefer doing things with Terraform. I think Cloudformation is considerably better than nothing and it was great when there were no alternatives, but that was a while ago.
Terraform is terrible compared to Cloudformation. Its selling point is multi-cloud support, but you'll never get it, clouds are too different.
- Good CF template is 10x less code for the same solution.
- No corrupted state problems.
- Native tool, supporting all properties of resources
Writing good CF templates takes good AWS knowledge, and system thinking, you group resources that belong together, it actually teaches you good architecting.
I think Terraform's multi-cloud support is a bit better than Cloudformation's. Jokes aside, I don't think the multi-cloud part is really the biggest selling point, the biggest selling points, for me, are:
- Much better than Cloudformation at telling you what it's going to change before you apply the changes and the ability to record those changes. (much better than those dreaded 'conditional' changes)
- The ability to import changes if you found some that were done outside of Terraform. It's not perfect, or easy, but mostly doable.
- The ability to look at the code, the state file and the plan to get a good representation of what's actually deployed.
Those three are more significant than it looks, but together it makes sure you:
- Don't get into a situation where automation is broken and you can only recover by rebuilding the stack.
- Don't get unexpected downtime because a change replaces a resource unexpectedly.
- Being able to track, record and manage changes in easy to read diffs and plans.
The changesets feature of cloudformation allows users u to do most of what you mention here. Also take a look at resource deletion policies and Lambda custom resources.
Unless they fixed it though it didn't work well in certain situations, like with nested stacks, and often doesn't provide nearly the same level of detail as to what EXACTLY is changing and why.
Would you be willing to share a CF template that is 10x less then the equivelent terraform ? It has been my experience that terraform is much less verbose and much more reuseable via modules. Cloud Formation I have seen has always seems excessive and quite convoluted to do simple tasks.
Can’t get much less verbose than cloudformation yaml in combination with usage of only required parameters for a resource. For example, write a cloudformation yaml template that creates an automatically named S3 bucket.
Terraform gives a consistent _workflow_ across clouds, not a consistent codebase. I know personally of many teams using Terraform for significant multi-cloud deployments consisting of thousands of resources. Several switched from CloudFormation and saw their codebase size dramatically reduce.
Furthermore, the other comments on this post should disabuse you of the notion that there are "no corrupted state" problems with cloud formation: they happen all the time.
Disclosure: I worked on Terraform for quite some time at HashiCorp and am still a (community) maintainer of the AWS provider.
Food for thought, but I've never had CloudFormation break because a patch level upgrade changed a regular expression to 1) disallow previously legal inputs, and 2) disallow inputs AWS allows. I've also never had it forget a resource (probably due to race conditions when deleting failed resources).
CFN has its warts, but I full-stop don't trust HashiCorp's operations or their attempt at a SDLC and I wouldn't trust my business's health to them as a company (and if the results of using it weren't enough, their clownshoes sales team's bad attempts to upsell would cinch it).
Totally agree. I don’t care how much less verbose terraform may be (this claim is questionable IMHO). The most important part of infrastructure engineering is being able to debug and fix things quickly by isolating issues to the smallest possible domain. The additional layer of highly unstable terraform source code does not decrease the debugging surface area.
Serious question: how do you go about debugging a CloudFormation stack that's in a broken state, without involving AWS support?
I mean, it's weird, because I agree with your statement that "The most important part of infrastructure engineering is being able to debug and fix things quickly by isolating issues to the smallest possible domain." And that's why I so strongly prefer Terraform, because I actually have control over the state file, how Terraform interacts with it, and I can move things in and out, and change things in-situ if necessary.
Weird. I haven't compared byte sizes of CF templates and Terraform code, but just as far as readability, HCL works a lot better than YAML. YMMV.
As for state, I'm not sure what you did to corrupt your state, but we use Terraform to manage thousands of resources across dozens of AWS accounts for the past three years and haven't had any state corruption, except when a human messes up editing a state file by hand. Obviously in that case you back things up first (or hopefully you are using remote state with some kind of versioning). But the fact that you are _able_ to manipulate the state with the CLI tool, or by hand in extreme cases, is itself a huge advantage over CloudFormation, which has no such capability.
As for coverage, my experience has been that Terraform often has coverage of new resource types and properties _before_ CloudFormation. And it's extremely rare for any new features to take very long to show up in the AWS provider. Anything significant is usually picked up in 2-3 weeks from the API release at most.
I'm skeptical of your 10x less code claim. You can definitely get into broken state problems in CloudFormation - with no recourse but to blow it all away and start over. And despite being a native tool, CloudFormation support for new features and services in AWS is often spotty/missing.
That said, my experience has been that both CloudFormation and Terraform are irritating, just in different ways; they both are warty.
I do ultimately prefer Terraform - even in a single-cloud setup.
Some specific services (namely Data Pipeline) aren’t supported in Terraform. However, some parameters like Enhanced VPC routing in Redshift clusters is supported by Terraform but not CloudFormation.
The rule of thumb that you should generally stick to CloudFormation if you are full bore invested into AWS has some truth.
My issues with CloudFormation are lack of control over rollbacks, missing features for existing and mature services like the above, and forcing me to use custom resources to do anything that vaguely resembles coding that Terraform does just fine like IP address math functions.
Terraform is a much nicer way to deploy your CF templates than with the AWS CLI though. You can get the best of both worlds by deploying the templates with Terraform.
The AWS CLI for CF is simple and consistent with other AWS CLI's. You can also use one of the language specific SDK's such as boto3 or use AWS CodePipeline to create/update stacks.
I want to love Terraform but it's such a horrible platform to code on:
* Error messages are overly verbose yet cryptic, and sometimes even unrelated to the actual error raised by the cloud provider themselves. Coupled with the lack of line numbering or other helpful identifier aside the unnecessarily long module hierarchy and debugging those scripts is a massive exercise in frustration and usually far more time wasted than really should ever be necessary.
* HCL is a hateful "language". The fact you cannot order stuff procedurally means you're constantly running into dependency issues on larger deployments. And dont even get me started on the "count" kludge to work around a lack of proper iteration.
* There is a lack of internal consistency with the support of different methods. Eg "count" does t always work with all resource types. Some resources cannot have properties defined with variables.
* Using calling modules requires so much bootstrapping code. It's just painful.
I get Terraform is the best we have for multi-provider deployments but their idea to create a superset of JSON only to then compile that back down to JSON anyway was such a poor decision in my personal opinion. I get the point was to have something that was accessable to non-programmers while still expressive enough for developers to use; however instead what they've created is a monstrous language that is too complex for the former group and too irrational for the latter.
I've been been very tempted to write my own Terraform alternative based on my experiences using it (and CloudFormation) - I even already have another programming language that Ive written a parser for and would be well suited for this type of application. But my time is pretty limited at the moment so I struggle on Terraform.
Funny thing is, HCL isn't actually a superset of JSON. For example,
{
"foo": { "bar": 1 }
}
can't be represented in HCL. (Even Consul had to add a kludgy hack to support HCL config as a result.) Instead, they call HCL "JSON-compatible", which I think means JSON can be written to represent any equivalent HCL structure (HCL is a subset, essentially).
That said, you might be interested in Terraform 0.12 [0], which will be using some new HCL v2 that actually has first-class expressions and dynamic blocks (for loops). And, finally, a ternary operator that short-circuits. Unfortunately, the dynamic block stuff looks like it's based around for-loops and doesn't support just regular if-statements... but we'll see where that goes.
Thanks for the link. Ive not yet had a chance to play with Terraform 0.12 but from what I've read it does sound like HCL v2 is definitely a step in the right direction. However so long as it's primarily a data serialisation format I think I'm going to take issue writing Terraform code in it because sometime you just need to express something procedurally. Maybe I'm just in the minority here? Or maybe I've just been spoilt with tools like Puppet and Bash but I can't help feeling that HCL is a step backwards in terms of expressiveness.
I wasn't aware about the JSON subset / Consul problem though. That's really interesting to read. It's funny because back when I was building test Consul cluster I did wonder why JSON was used for config instead of HCL. I guess now I know why.
I agree with you 100%. I'm pretty excited about HCL v2 - already simple stuff like a short-circuiting ternary operator makes my life easier (no more weird joins/splats with conditional resources in outputs). Hopefully further improvements are implemented on top of the 0.12 changes.
Otherwise, about like you, I'm tempted to write a Terraform frontend that interfaces with existing providers...
A lot of problems similar to your complaints exist for CloudFormation as well. Instead of complaining about the language warts, you’ll be forced to use an even more cumbersome set of primitives to access arrays and hashes, and the messages will only get less sensible as you look for either YAML whitespace errors or inevitably write a converter to avoid using JSON. CloudFormation limits like number of parameters and outputs start to be a real pain to scale to support a production environment beyond simple demo stacks, and Terraform has more issues scaling with team members concurrently trying to modify system state due to its less strict modularization / locking model.
Infrastructure as code tooling is all very primitive compared to what we take for granted writing most other traditional software but it will take some time and maybe another generation to do it well.
I don't know where you got the impression I was arguing that CloudFormation is better than Terraform from but I assure you that wasn't the arguement I was making. In fact I also made the same points you made in reply to another commenter in this discussion.
I know Terraform is the best tool we currently have (I even explicitly stated that I'm my previous post) but that doesnt mean there isnt still a massive room for improvement. Starting with the depreciation of HCL, in my personal opinion.
You might want to give cfn-builder[0] a try. I'm biased because I wrote it but I find that it's a good way to write and maintain my CFN templates. Also it's written in simple Nodejs and is easy to expand for your own needs.
CloudFormation isn't really good enough because it's AWS specific and doesn't track changes to the state. Plus it just uses data serialisation formats as well so doesn't even address the core problems I raised with HCL. What I really want is Puppet for Infrastructure; the closest I've used to that is Terraform but the syntax isn't quite there...yet.
Plus the pace of development in nodejs worries me. All too often I've ran into issues where modules have changed and broken things downstream. When you're running infrastructure as code you really want to be damn sure your tooling is going to be consistent for years to come and I really don't have that faith in nodejs. Sure, if your a JavaScript developer you can manage it easily enough, but if you're DevOps who rarely touches JS then you really want your tools to be low maintenance. So to that end I wouldn't consider any nodejs projects for any serious production work given the kind of customers I work for (high availability stuff for some major brands). I might be fine but it's just not worth the risk.
What I don't like about Terraform is that the state of your resources is stored locally, which might lead to consistency problems depending on your setup. With CloudFormation state is handled by CloudFormation itself, so you can be confident that stack updates operate on the latest state.
That's only the default. Terraform supports storing state remotely, with locking. Most folks I know who use Terraform at scale recommend using this feature (if they store state at all).
Then choosing where and how to store state using Terraform becomes another point of possible inconsistency in your infrastructure and thing to worry about. In CloudFormation, the state is coupled with the service, and you don't have to worry about it. It just works.
Thanks for the information. Apparently my knowledge about state handling in Terraform did predate the introduction of remote state. I should've checked that first.
You can store the state remotely there are many options for state storage too. You can use s3 or dynamo db or even create your own web service that accepts web hooks from terraform.
Of course you can implement/set up syncing of state on your own, but that's something you don't have to worry about when using CloudFormation. I'm not sure how the options for Terraform handle race conditions when trying to deploy updates in parallel, but again, something you don't have to worry about with CloudFormation.
Lines of code is irrelevant when you are in the middle of a production deploy and you just want the highest visibility into what is going on. In cloudformation, there are only two possible places for resource state: 1) The actual resource state in AWS and 2) The desired state stored in CloudFormation.