> assuming you have enough expertise in-house to run it. Most small to mid-sized...

klodolph · on Dec 2, 2021

For a small company, that's sensible. For a mid-size company, I think it's short-sighted. I'd argue that mid-size tech companies need the expertise that comes with running more of their own stack. If you don't have the expertise, then you're going to be dealing more with support engineers to solve your problems, and you won't have as much insight into your tech stack as you'd want.

There's a big difference between the experience of opening a support ticket to fix a problem with your database and paging one of your salaried engineers to fix it--someone who actually runs the service to begin with.

kqr · on Dec 2, 2021

The way I've heard it phrased best is the way (I think) Allen Ward put it.

If a component is high cost, tightly integrated with the rest of the system, and difficult to design, then you need to be an expert on it in order to evaluate suppliers and make a good purchasing decision.

To become an expert, you have to build it for a while yourself. And once you have built it for a while and become an expert, you might as well keep that up, unless there's a strong economic incentive not to.

raffraffraff · on Dec 2, 2021

I've seen engineers build and run a clustered database and make a total pig's ear out of it. Every day was another outage (or the continuation of yesterday's outage). Several times we were a hair's breadth from actual irrevocable customer data loss. Just because you got a few of your salaried engineers to build a DB doesn't mean that they had the correct knowledge and respect for data to do it properly.

klodolph · on Dec 2, 2021

I’ve seen people make a mess out of SaaS offerings that are supposed to be foolproof. If we assume that we’re hiring incompetent engineers, we can make a mess out of any situation.

CoffeeOnWrite · on Dec 2, 2021

You’re both right: most companies want to hire the least competent (least costly) engineer capable of doing the job. Then they have mixed luck in hiring and wind up with a distribution of engineers, including some plain incompetent ones. They barely have competent enough engineers to build their core product, let alone perform undifferentiated technical operations on self-managed infrastructure.

I agree with you that if we assume all our engineers are strictly competent, then that gives us a major advantage over our vendors, and tilts the scales to build over buy.

raffraffraff · on Dec 2, 2021

I don't think it's strictly about competence and more about specialism. In my personal experience, which you can take with a large grain of salt, what happens is that the engineering manager with a certain type of career history tends to look down on any type of operations work and anyone who does it. I have heard things that come off sounding like this: "We hire world class computer scientists from prestigious Ivy League schools and FAANG experience, so why do I need to hire some one-trick-pony DB guy, when my engineers could build their own databases from scratch?"

fatbird · on Dec 2, 2021

If people still had witty quotes in email signatures, I would totally steal your last sentence.

ethbr0 · on Dec 2, 2021

How are mid-sized companies going to afford to retain that expertise, once they've trained it?

Or, to put it another way, mid-sized company X can never pay as much as hyperscale company Y for Z expertise. Because utilization at mid will always be less than at hyper.

I'm all for retaining inhouse expertise, but it's fundamentally a return-on-capital optimization problem, albeit one where your capital asset has the ability to walk out the door. So inhousing core, using managed everything else is a more reasonable bargain.

klodolph · on Dec 2, 2021

The expertise in question isn’t really used at “hyperscale” companies. Not everyone wants to work at large companies anyway. And the expertise isn’t that difficult to train or acquire (compared to getting another software engineer).

> Or, to put it another way, mid-sized company X can never pay as much as hyperscale company Y for Z expertise. Because utilization at mid will always be less than at hyper.

Being able to run a PostgreSQL cluster is a handy skill but won’t get you hired at Google.

Google can afford to pay an engineer to make cool stuff like a faster malloc implementation or analyze the best possible way to encode data to be stored on disk. That’s because making Google’s malloc 1% faster will pay someone’s salary many times over, or likewise for saving 1% CPU in your disk servers.

But running your own database cluster and container orchestration ain’t exactly rocket science these days, and running your own stack means you have an expert in-house when the shit hits the fan—and that’s when your expert earns their salary, many times over.

raffraffraff · on Dec 2, 2021

One root cause I've seen over during database outages is insufficient IO. It's caused by the other most common root cause: lack of actual expertise. Until you've been through the wringer, it's easy to lack respect for data: how important it is, how fucking large it is, and how long it takes to do basically anything with a few TB of it. If you didn't hire a real expert, and your DB-for-a-day guy miscalculated the spec for your bare metal clusters, you'll find it extremely hard to magic up faster hardware during a 4am full-outage when Europe wakes up. If you're on cloud, at least you have the flexibility to increase block storage IOPS or quickly reboot a node on beefier hardware or faster network connections. But suddenly the "cheap" DIY database is a lot more expensive than you budgeted for, and the phrase "nobody needs a dedicated DB guy" starts sounding really short sighted.

klodolph · on Dec 2, 2021

I’m talking about managing your own software on IaaS as an alternative to running off SaaS, you seem to think I’m talking about ditching the cloud entirely!

raffraffraff · on Dec 2, 2021

Not at all! I did say: "If you're on cloud, at least you have the flexibility to..."

zrm · on Dec 2, 2021

> Because utilization at mid will always be less than at hyper.

Will it?

Economies of scale have diminishing returns. If you have one engineer who is 20% busy, your engineer cost per unit will be five times higher than a company five times your size who keeps that one person 100% busy.

If you have four engineers who are all 100% busy, compared to a company 250 times your size with 1000 engineers, your cost per unit is equivalent.

mellavora · on Dec 2, 2021

Hey, your math makes sense, and is a good illustrative example, but in the real world if you schedule has your engineers 100% busy, then you have no slack to handle emergencies/unexpected events.

I'd target 60-70% utilization of engineer time. The remaining 30-40% is for stuff which can be dropped when needed (refactoring, non-critical/experimental projects, personal learning, ...). And if you have good engineers, they probably have a really good idea of how to prioritize for filling that 30-40% so as to move the company forward.

klodolph · on Dec 2, 2021

The same applies to machines!

There is an inverse relationship between resource utilization and queue length, under reasonable assumptions. If you think of low CPU utilization as "wasted" CPU and try to fill it up, you can completely kill your service's ability to quickly respond to requests.

I like to make this analogy when talking to people who feel like they are not working hard enough. If you fill up an engineer's time with high-priority work, you get the same problem as if you fill up a machine's CPU with high-priority tasks... you get a system that cannot respond quickly, a system that spends a lot of time overloaded.

It's a fun exercise to try and calculate the relationship between utilization and expected queue length.

Plus, those batch jobs are still important, they're just not as time-sensitive.

darkwater · on Dec 3, 2021

> Or, to put it another way, mid-sized company X can never pay as much as hyperscale company Y for Z expertise.

Hyperscale companies usually invent the wheel internally and maybe release it as OS a few years later.

taneq · on Dec 2, 2021

Even for a small company in the software space, running your own self-hosted OSS stack isn't a big ask. I've spent ~80 hours over the past year setting up and running our stack. It's nothing too crazy[1] but it covers our needs with no ongoing fees and, more importantly, no outside dependencies to rely on.

[1] Proxmox hosting a virtual server running Ubuntu, providing Kimai, WeKan, Mattermost, Jitsi, Mediawiki, OpenVPN access, some Samba shares, source control, and comprehensive backups.

jokethrowaway · on Dec 2, 2021

You think hiring AWS engineers is cheaper than hiring some old style sysadmin that can manage any servers?

Most likely you'll end up spending less in ops. The only problem may be finding talent, as everyone goes the AWS way so they can charge more.