I just want to note that Packer (http://www.packer.io) fits perfectly into the model of immutable infrastructure. Packer is an open source tool for automatically creating machine images (perhaps for multiple platforms).
The idea of quickly and easily creating these master images in a way that doesn't slow down agility to change infrastructure is crucial, and Packer enables that.
As far as I know, disnix is just for Nix. Packer creates machine images for multiple platforms. It would make sense, if you were building a machine image for Nix, perhaps, that you would use this. But for an Ubuntu machine, you can't really. Unless I'm mistaken.
Conceptually, it's good. But in practice, it's just an analogy. Immutability in programming languages is enforced at compile time and run time. Something like:
val x = 1;
x = 2;
is an actual error and the compiler/runtime will give a "you can't do that".
Immutable infrastructure, on the other hand, would require read-only enforcement on all aspects of configuration. It can be done, but not easily, and someone always has root. So what you're really talking about is the ability to reconstruct an environment programatically from scratch, with no manual intervention, which is laudable but not exactly immutable.
This is the sort of thing that makes me appreciate the 12 Factor App idea (http://www.12factor.net). Rather than trying to make configuration immutable, make it impossible. Don't rely on the existence/continuity of a filesystem at all. Don't use configuration files.
There is a kind of read-only enforcement when all configuration is produced by automation - you learn, often painfully, that any writes (i.e. manual hacks) are liable to be rewritten, without warning and at arbitrary times.
There's no error message telling you you can't hack an extra vhost into httpd.conf, but once you've got paged at 3am because your hack disappeared, you pretty quickly learn to supply your own error message.
Heroku's an extreme example of this - the dyno filesystem is perfectly writable, but it's definitely going away within 24 hours.
(The large-company version of "all configuration is produced by automation", which produces a surprisingly similar result, is "all configuration is produced by another team".)
It's actually fairly easy. I have a system that has hundreds of thousands of JVM's running this way. The way to go about this is to make doing the right thing easier than doing the wrong thing.
The first piece of the puzzle is to present configuration to runtime processes in a read only manner and enforce this rule from the remote side (e.g S3 bucket where the machine only has a R/O key or R/O NFS mount). For added points you can make the OS image read only (but that's another post).
The second piece is to make sure that you have easy build automata that can push into that repository easily and reliably. This is really key - you have to have a programatic way of doing things.
I think that's fundamentally different. Breaking immutability in a language by stepping outside the language is still programmatic. Breaking it via administration is a human decision.
In programming, it's a conscious decision to break immutability (assuming a language that supports it). It administration, it's a conscious decision to enforce it.
And it's really bad practice (at least in Haskell) to truly break immutability. Uses of unsafePerformIO are strongly urged to be "observationally immutable, pure, safe, transparent".
Chad claims that they throw away the ssh-keys to the machine. So they don't actually have root. Or maybe that was a goal that hasn't been reached yet, don't remember exactly (Chad gave a talk in Berlin on this and related topics).
You could mount /etc and /usr as read-only, and have the upgrade process briefly remount it read-write.
Then you won't accidentally edit something, it will be very obvious that there is a reason for that read-only mount...
The idea of applying purely functional programming concepts to deployment is not new, see the Nix package manager and the NixOS Linux distribution built on top of it (http://nixos.org/). This paper explicitly makes the link with functional programming: http://nixos.org/~eelco/pubs/nixos-jfp-final.pdf
I see that Wunderlist uses EC2. So I speculate that they create a fresh machine image for each new revision of the application, and also use immutable machine images for bits of infrastructure such as the database.
But how do we put the idea of immutable infrastructure into practice on bare-metal servers? One option would be to set up one's own virtualization infrastructure, such as Xen or KVM. But that would undermine the I/O performance and resource consolidation advantages of running on bare metal. Docker looks promising; we just need to discover the best practices for running specific kinds of services (e.g. web applications) on top of it.
Jails have been around for a while in FreeBSD and they are often used for this exact purpose; the concept has only, really, entered the mainstream due to widespread usage of EC2/OpenStack and now Docker (which is basically FreeBSD Jails + ezjail but for Linux and a little different on some other fronts).
Nothing beats a bare metal machine with FreeBSD + ZFS pool + ezjail; the really cool thing about ZFS is that you can build a "base jail" then just use ZFS to make snapshots of the base jail when creating new jails. So with that you can create versioned base jails that are running an upgraded ports tree, or more generally, an upgraded world build!
It also makes transition between application versions easy because you can run a separate database jail(s) (maybe different versions of the db?) and have a jail for app v1 running while a jail for app v2 is running at the same time.
What I would love to see is a tool for abstracting this deployment pattern on-top of ezjail. I've been considering doing this myself but my time is stretched too thin :(
> What I would love to see is a tool for abstracting this deployment pattern on-top of ezjail. I've been considering doing this myself but my time is stretched too thin :(
This is basically what we're doing with Docker. We started with lxc but it can and will be ported to other OS virtualization backends, including jails and zones.
Create a new base image from a running container:
CONTAINER_A=$(docker run -d ubuntu apt-get install curl)
docker wait $CONTAINER_A
docker commit $CONTAINER_A shykes/my-ubuntu-with-curl
CONTAINER_B=$(docker run -d shykes/my-ubuntu-with-curl curl --help)
Share your new image for the rest of the world to enjoy:
docker push shykes/my-ubuntu-with-curl
Transition between application versions:
V1=$(docker run -d shykes/myapp:v1)
V2=$(docker run -d shykes/myapp:v2)
docker stop $V1
Transition between database versions (sharing of persistent data):
V1=$(docker run -d shykes/mydb:v1)
V2=$(docker run -d -volumes-from=$V1 shykes/mydb:v2)
docker stop $V1
Interesting. The one core OS feature that FreeBSD seems to be missing is a server-class virtualization technology, like KVM on Linux. Please correct me if I'm wrong on that.
Of course, Illumos has zones, ZFS, KVM, and DTrace. But it's much less widely used than even FreeBSD.
> the really cool thing about ZFS is that you can build a "base jail" then just use ZFS to make snapshots of the base jail when creating new jails
btrfs also supports this, and that support is built into some LXC commands (lxc-clone, for instance.) Docker just relies on filesystem overlays and bind-mounts instead, though, so it doesn't really have need for full persisted-to-disk subtrees of the instantiated version (it's more like it holds the base version and the persisted delta on disk, and only brings them together in memory.)
http://docker.io is how I'd do this as soon as they have a stable version for production. Pick a common (or at least mostly common) base OS for every server, configure your entire system on top of a LXC container, badda bing, badda boom. Quick, low profile, repeatable, redistributable, etc etc etc.
I like the concept. But where I work, our infrastructure is currently hosted on leased dedicated servers, where (last time I checked) OS reloads require manual intervention from the hosting provider and cost something. So I guess the burning down and rebuilding would have to be done on a layer above the base OS, e.g. Docker containers.
On the one hand, the phrase "snowflake server" encapsulates my observations about long-lived servers that are upgraded and extended in place.
On the other hand: "Perfect replication is the enemy of any robust system." -- http://en.wikiquote.org/wiki/Daniel_Suarez (read the other quotes from the same chapter for context)
But this "immutable infrastructure" idea doesn't mean that the infrastructure is perfectly replicated across machines. It aims for perfect replication through time. That's something quite different.
You have a point. Indeed, the practice of building specialized machine or container images, which seems to be the best way of achieving immutable infrastructure, may make servers more heterogeneous, since each image will contain only the pieces necessary to perform its specific function. Done right, immutable infrastructure may be more resilient in the face of parasites.
I love that article. Now I use the term "snowflake server" all the time as a blunt criticism, and I find it's really sinking in on the management and ops people who hear it. Great analogy.
Been doing this for years at all levels, and it really works. It allows me as a single dev to spend less than five hours (amortized) a week on a decent sized infrastructure to support my business. This includes building multiple racks of custom servers coloed at multiple DCs, networking, private clouds, server admin, hardware maintenance, upgrades, deploys, etc... and saving me 100k+ per year over AWS and having to hire another.
I make use of polyglot persistence. Graph, columnar, document, relational, key-value... each used when they are the right tool for the job. Each scaling horizontally. It gets easy to blow them away. :)
How do you do this outside of an environment like AWS? Do you have to reboot a whole physical server with a new OS image every time you change something?
I've been watching Joyent and SmartOS in particular with interest for a while.
A few questions to satisfy my curiosity.
First, I'm assuming that you're running SmartOS on your own dedicated servers, not a public cloud like Joyent. Is that correct? If not, the rest doesn't apply.
Do you use Ansible in the global zone, in individual non-global zones, or both?
Do you build a new zone for each deploy of an application and each new version of a piece of infrastructure? I imagine this is the way to achieve the idea of immutable infrastructure.
And one other little thing: Do you boot SmartOS via a USB key, PXE, or something else?
I use Ansible for both global and non-global zones. You are correct, both the deploys and the new pieces get new zones (the old are destroy after the migration). The idea of raising entire new zones for simple migrations sounds like a pain in the butt on multiple fronts. In actuality, it is much easier. Zero downtime migrations too.
I personally use USB keys, however it is expected that any one zone and/or server could fail at any moment. Servers are dirt cheap if you do it right. I bet some pay more in RAM costs per month than an entire node costs me with 32GB RAM, E3-1230v3, SSD, etc. There are multiple server types. The E3s are the non-persistent processing nodes... 3RU chassis 8 possible nodes in said chassis... each about 1k with the specs above and they are about 40 ECUs per. Each microcloud is typically around 10RU (1/4 rack). Place at separate datacenters and you get some interesting stuff. I have a whole different approach to the homebase DC. Stuff that REALLY reduce costs.
This is essentially what I do with my own server, but largely because I'm too dim to trust myself to upgrade things well.
That being said, he's abstracting the problem a little bit:
> If you absolutely know a system has been created via automation and never changed since the moment of creation, most of the problems I describe above disappear. Need to upgrade? No problem. Build a new, upgraded system and throw the old one away. New app revision? Same thing. Build a server (or image) with a new revision and throw away the old ones.
The key here is created via automation. You get a choice, I think:
* You can spend time/energy/worry/entropy on making sure the server is up to date
* You can spend time/energy/worry/entropy on making sure the automated process is up to date
The second one is a little easier, to be sure, though depending on the setup it could still involve a lot of research, and it's certainly more expensive.
But hey, if you're in a position where trading some money to buy some time is an option and you have the cash to burn, I'd buy the time at every chance.
We've been doing this for ages where I work. New EC2 AMIs are built inside a disk image. We can pretty easily use all of the standard tools through chroot, so installing packages is relatively simple and flexible.
Instances are ready for use immediately after booting, and deploying new code is as simple as rebuilding the image and booting it. We can also easily scale up most forms of servers by simply booting more and adding them to the proxy as they come up.
The only downside here is that firefighting still kind of remains a two step process. Even after investing a decent bit of time optimizing the process, it still takes 5-10 minutes to build and upload an image. From there it takes another 5-30 minutes to get EC2 to boot it, so there's a strong temptation to tweak things on the server. I've taken to patching, testing, and committing the fix on the tag for that release so I can simply apply a diff to get the server up to date. Not perfect, but it goes a long way to prevent regressions.
No, mostly because it's inconvenient. Making the root filesystem read only means having to track down everything that tries to write to it and fix mounts for them.
After a few years of working with this, I kind of wish we'd bitten that bullet at the start. The AMIs we produce have a root filesystem that's only like 2 gigs so it goes across the network faster. 2 gigs isn't a lot of space when something decides it wants to log or write out temp files in weird places.
And I find its 'nixos-rebuild build-vm' command quite handy to test changes.
On the other hand immutability of the OS image doesn't mean you can easily roll-back after an upgrade, especially if you've been running the new version for a while, because you might still have mutable state (see below).
Sure if you store all your configuration in Nix, then it is possible to rebuild the old configuration as it was, but your mutable state is not managed by Nix.
One such example is database files. You've upgraded to the new configuration, and perhaps even had a script to auto-upgrade from your old schema.
Now you want to go back: what should happen to the database? Should it be entirely rolled back (from backups) to an old snapshot? But then you loose all the new data.
Should you roll-back the schema change? Again what happens to the new data that doesn't fit the old schema?
Another example of mutable state is remote nodes that you depend on for a particular service. You can roll back your local machine to an old version, but will it still be able to talk to your new remote nodes?
It seems that the only way to handle this is to make sure you are always at least 1 major version backwards compatible, so you can freely upgrade/roll-back machine that are part of a cluster.
I think the point made here is quite interesting: instead of relying on (puppet, chef,...) to do the "deltas", always generating the whole system from scratch. This has a couple of advantages:
* You make sure everything is automated
* You can always trust the resulting system works; if it were to be built from scratch
Most of the next machine would have the same state as the old one:
; Machine -> Machine
; Makes the next machine
(define (next-machine m)
(make-machine
(machine-name m)
(machine-cpu m)
(machine-storage m)
(update-os CURRENT-OS)
(machine-applications m)))
I think I must be missing something obvious here, but how can you do this if you use something with state like a database? How do you create a production system from scratch without losing all the data?
So you deploy a new physical server to the rack? Bundle the OS image with the deploy? Ghost it with the OS image before deploying? The article is light on explaining the "how" part.
The idea of quickly and easily creating these master images in a way that doesn't slow down agility to change infrastructure is crucial, and Packer enables that.
Disclaimer: I wrote Packer.