Yes, but also no- I see what you're saying from a high level, but my point is th...

derefr · on Jan 14, 2022

> but my point is that the devil's in the details and that you can't know all the details of each domain, the best you can do is be aware that those details are there.

IMHO you can do slightly better — you can know enough about isomorphic decisions you've made in other domains to drive design in a new domain, without necessarily understanding implementation.

(Especially "design" in the sense of "where does the shear-layer exist where I would expect to find libraries implementing the logic below that layer" — allowing you to understand that there should exist a library to solve constrained problem X, and so you should go looking for such a library as a starting point to solving the problem; rather than solving half of problem X, and then looking for a library to solve overly-constrained sub-problem Y.)

Or, to put it another way: knowledge of strategy is portable, even if knowledge of tactics is not. An Army General would — in terms of pure skill — pick up the job of being a Naval Admiral pretty quickly, because there's a lot about high-level strategy that isn't domain specific. And that experience — combined with hard-won domain experience in other domains — could guide them on absorbing the new domain's lower-level tactics very quickly.

Or, another analogy: linguists learn languages more easily than non-linguists, because 1. it turns out that, past a certain point, languages are more similar than they are different; but more importantly, 2. there are portable high-level abstractions that you begin to pick up from experience that can accelerate your understanding of a new language, but which you can also be taught academically, skipping the need to derive those concepts yourself. (You still need practical domain knowledge in at least some domains to root those concepts to, but you can do that in your own head when studying the concepts in a book, rather than "in anger" in the field.)

This belief — that there exists portable high-level strategic knowledge to software engineering — is exactly why some companies promote programmers into product managers. A product manager will be tasked to drive the strategy for writing a new piece of software, potentially in domains they've never worked in. They won't be implementing this software themselves — so they don't actually need to understand the domain with high-enough precision to implement it. But they'll nevertheless be relied upon to make architectural decisions, buy-vs-build decisions, etc. — precisely because these decisions can be made using the portable high-level strategic knowledge of software engineering.

(Though, once again, this knowledge must be rooted in something; which is why I don't really believe in CS / SWEng as programs people should be taking fresh out of university. The best time to learn them is when you already have 10 years' programming experience under your belt. They should be treated as "officer's finishing school", to turn an experienced apprentice 'programmer' into a professional Software Engineer.)

On to specific hand-wavy rebuttals, since my original rhetorical questioning didn't do the job of being particularly clear what I meant:

> but an execution plan is not built from an instruction set

What is a logical-level write-ahead log, but a long bytecode program that replicas execute? What does executing SQL do, but to compile your intent into a sequence of atomic+linearized commands — an instruction stream — in said write-ahead log?

(Alright, in a multi-master scenario this isn't as true. But you can get back to this property if your DBMS is using a CRDT event-streaming model; and AFAIK all the scalable multi-master DBMSes do something approximating that, or a limited subset of that.)

> Hypervisors need to integrate with higher-level RBACs, for one; distributed schedulers also need to be able to evict bad machines from the same system.

Perhaps you haven't dealt with machines of sufficient scale—the abstractions really do come back together! The IBM z/OS kernel knows about draining+evicting hotplug NUMA nodes as schedulable resources (and doing health-checks to automatically enable that), in exactly the same way that Mosix or TORQUE or kube-scheduler knows about draining+evicting machines as schedulable resources (and doing health-checks to automatically enable that.)

Even regular microcomputer OS kernels have some level of support for this, too — though you might be surprised why it's there. It doesn't exist for the sake of CPU hotplug+healthcheck; but rather for the sake of power efficiency. An overheating CPU core is, at least temporarily, a "bad" CPU core, and on CPU paradigms like big.LITTLE, the entire "big" core-complex might be drained+evicted as a response! That code, although different in purpose, shares a lot algorithmically with what the distributed schedulers are doing — to the point that one could easily be repurposed to drive the other. (IIRC, the Xeon Phi did exactly this.)

> e.g. what's your resilience story against a network failure (someone cutting a fiber line, or losing a subgraph of a clos fabric)?

My point with mentioning hierarchical persistence+caching for a distributed key-value store, is that these same problems also occur there. From the perspective of a client downstream of these systems, a network partition in tiered storage, is a network partition in tiered storage, and you have to make the same decisions — often involving (differently-tuned implementations of) the same algorithms! — in the solution domain for both; and, perhaps more importantly, expose the same information to the client in the solution domain for both, to allow the client to make decisions (where needing to capture and make this information available will heavily constrain your design in both cases, forcing some of the similarity.)

joatmon-snoo · on Jan 19, 2022

> you can know enough about isomorphic decisions you've made in other domains to drive design in a new domain, without necessarily understanding implementation.

That's certainly true!

I'm not at 5 years yet in the industry myself, but I've definitely started to notice these patterns - the strategy vs tactics is a nice way to describe it :)

> Perhaps you haven't dealt with machines of sufficient scale—the abstractions really do come back together! The IBM z/OS kernel knows about draining+evicting hotplug NUMA nodes as schedulable resources (and doing health-checks to automatically enable that), in exactly the same way that Mosix or TORQUE or kube-scheduler knows about draining+evicting machines as schedulable resources (and doing health-checks to automatically enable that.)

Oh, this is interesting. My experience has been as a Google-internal user of Borg, where although the fleet is large, most of the machines are individually small enough that it's easier for us to simply evict a machine from the fleet rather than block a core. (The whole "commodity" server strategy, whereas z/OS follows a very different architecture, but I see exactly what you mean there.)