Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Companies can architect their backends to be able to fail back to another region in case of outage, and either don't test it or don't bother to have it in place because they can just blame Amazon, and don't otherwise have an SLA for their service.

My CI was down for 2 hours this morning, despite not even being on AWS. We have a set of credentials on that host that we call assumeRole with and push to an S3 bucket, which has a lambda that duplicates to buckets in other regions. All our IAM calls were failing due to this outage, and we have 0 items deployed in us-east-1 (we're european)



You likely used a us-east-1 IAM endpoint instead of a regionalized one ( https://aws.amazon.com/blogs/security/how-to-use-regional-aw... ). We've been using it, and we're not experiencing any issues whatsoever in us-east-2.

One thing that AWS should do is provide an easier way to detect these hidden dependencies. You can do that with CloudTrail if you know how to do it (filter operations by region and check that none are in us-east-1), but a more explicit service would be nice.


We did indeed.

The problem was we couldn’t log into cloud trail, or the console at all, to identify that, because IAM identity center is single region. This was a decision recommended by AWS, and blessed by our (army of) SRE teams.


But you can run TWO identity centers in different regions for the price of one(1)! IAM IDC is just a regular application hosted on the AWS infrastructure, it really has nothing special.

The hindsight is 20/20, of course, it's a good practice to audit CloudTrail periodically for unexpected regional dependencies.

(1) offer void for services that run on AWS.


Indeed. I also noticed this morning that you're not the person I replied to, and I took your response (which was actually helpful) in the context of the original post which was "people are happy to just blame AWS when they're down".

Either way, we would have only made it one step farther in our CI, as the next step is to build a conatiner with a base image from docker hub, and that was down too. The idea of running a multi region nexus repository to avoid Docker hub outages for my 14 person engineering team seems slightly overkill!


The easiest way to provide some resilience to the build process is to add a pull-through cache using AWS ECR. It might backfire due to egress costs, though, if you're building outside the AWS infrastructure.

It's actually an interesting exercise to enumerate _all_ the external dependencies. But yeah, avoiding them all seems to be less than helpful for the vast majority of users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: