Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google’s scale is insane, but this shows how fragile even the biggest clouds can be. Hope they drop the full technical details soon.


The problem, which I've only gotten enough of a taste of to see how untenable they would be at that scale, is that with enough feature toggles, and enough partitions to rolling them out (%, region, or AZ cutoffs), you eventually spend most of your time shepherding rollouts, or coordinating with other people not to impinge on theirs, instead of writing code for rollouts.

Rollout fatigue should be respected, even feared. It will insidiously tempt people to skip steps. And failing that, it will blur together in your mind the last twenty times you did this procedure, and you will forget if you ran step 5 before you were interrupted by someone. You will remember having done step 5, but you won't remember if that was ten minutes ago, or yesterday.

It's the reason I keep writing tools to force a checklist, or a prompted sequence. If I didn't check off step 5 I still need to do it. And I'm not even in operations.


They did! It was already discussed here: https://news.ycombinator.com/item?id=44274563.


Thanks. This article appears to be no more than an AI-slop-summary of the official Google report (plus of course, some advertising tacked on).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: