Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why hasn't the cloud killed the mainframe?
26 points by ak_111 on Aug 31, 2023 | hide | past | favorite | 85 comments
I was surprised to see how few mainframes Big Tech companies that handle a huge amount of transactions (Netflix, meta, google,...) use relative to legacy industries (banking, retail, insurance,...)

This makes me suspect that the real reason why mainframes continue to exist are due to industry inertia, vendor lock-in or even legacy code rather than any performance/cost reasons.



First of all, I find I kinda funny that you call banking, retail and insurance "legacy industries".

I would rather be without Netflix and Google, than banking and food ... but each to their own..

While some is inertia (mostly doing to rewriting truly large applications are hard and expensive), there is also the the point that most of those industries cannot easily handle "eventually consistent" data..

Not all transactions are created equally, the hardest usually have a set of requirements called ACID.

ACID in the classic RDBMS is not a random choice, but driven by real requirements of their users (the database user, i.e. applications in the business sense - and not the users as people). The ACID properties are REALLY hard to do in scale in a distributed system with high throughput. Think of the rate of transactions in the bitcoin system (500k/day with many, many "servers") vs. visa (500M+/day) - the latter is basically driven by two (!) large mainframes (with 50ish km distance) the last I heard of any technical details.

None of the companies you mention need to have strict ACID, as nobody will complain if different users see slightly different truths - hence scaling writes is faily easy.


I have no expertise in this area, but two counter arguments pop'ed in my head:

1: I wonder how many transactions the largest e.g Postgres clusters (or other classic RDBMS) handles per day. 500M+/day doesn't seem that incredibly high?

2: Google Spanner, which I would classify as cloudy, promises ACID guarantees at a global distributed scale. Couldn't that be used?

I've listened to a Swedish developer podcast where they interviewed an old school mainframe developer in the banking sector. He brought up similar points about the scale and correctness of database transactions, and it didn't feel convincing to me.

What does Paypal, Klarna, or even maybe Amazon give up by not using mainframes? Does any company founded in the last 10-15-20 years use mainframes? If not, does that mean that "modern" companies can't compete in these high-demand industries like retail or insurance?

I think it's much more in the inertia-point, the cost of rewriting these enormous applications is simply too large.


>What does Paypal, Klarna, or even maybe Amazon give up by not using mainframes?

Simplicity. Because instead of having an enormous team to maintain their multi-million line Kubernetes configuration files that automatically spawns thousands of servers, having an enormous team that tries to fix the CAP problem, you go grab IBM, you tell them "this mainframe goes down, you're dead, here's half a billion in cash" and you run all of your software on a single, massive machine.

Sure, it adds other problems, but to be fair I'd rather deal with a mainframe than yet another microservice that doesn't work because Omega Star doesn't handle ISO timestamps and it blocks Galactus

>Does any company founded in the last 10-15-20 years use mainframes?

The biggest issue with mainframes are:

- High initial costs (although that's been changing)

- Nobody knows how to work on IBM Z mainframes

- The current zeitgeist about having a billion servers spread throughout the world because it's really important to have an edge CDN server for your cat trading card game

These industries didn't care about that because they could absorb these high initial costs, had the knowledge of the people building the mainframes, and were already highly centralized. Decentralization just adds more problems, it doesn't fix anything.

>I think it's much more in the inertia-point, the cost of rewriting these enormous applications is simply too large.

Rewriting just because it's not following the latest trend is garbage. These applications work. They're not going to work better because you're using Spanner now.


Remember inertia goes both ways. As soon as you have a large system "in the cloud" or otherwise "distributed" (also only in your shed).

Rewriting your system to a mainframe architecture is equally as expensive.

Lets say that you save a hundred megabucks per year by going cloud (I am really not convinced, that you save any money at all, but lets say). That is what 1500 man-years of work? No way you can rewrite even a simple "banking" system in that amount of work, and the second problem is that you need to feature-freeze the old system (and hence your business) while you convert. So maybe 5 to 10 years of lost competitive power on top of that..

Also, remember these are not "begin work; select * from table where id = 123; commit;" transactions. These have maybe 50 to 200 queries (selects and updates) in each transaction (has the paying party the funds, is the receiving party blacklisted (terror organisation for example), does this look like money laundring.... etc... and a very detailed logging requirement by law). All of these MUST usually be in the same "snapshot" (in the RDBMS definition).

It makes no sense to talk about "transaction rates" without an intimate knowledge of what it does, as especially marketing departments have a tendency to use the simplest possible "transactions" to get a large number..

And in the end, it is only "money" the might loose, and any choice you make makes some other choices in the future easier (and some harder). That is called path dependency.


I agree, it's obviously very hard to compare transaction rates, and I also agree that I have hard time seeing companies currently using mainframes recouping the cost of migrating. If it works, it works.

But.

> Rewriting your system to a mainframe architecture is equally as expensive.

There was a new bank mentioned in this thread that actually started using mainframes from scratch, but other than that I've never heard of any "modern" fintech (or really any) company introducing mainframes. Organisations actually rewriting functioning systems TO mainframe must be almost never heard of (in the last 10-20 years at least).

If System Z, Cobol and DB2 are so obviously superior, why are so many successful new competitors in industries where they are the norm in older companies choosing to not use them?

I'm not saying banks should rewrite their stuff in node.js (or deno - even better of course), it makes sense for them to stay.

I just have a hard time believing that mainframe systems are so technically impressive, to the point where some people claim it's almost impossible to build a similar system on non-mainframe technologies.


The software on mainframes only shines in reliability and the fact that the machines have been build for money transaction from the start. For example doing "decimal" math (if you think python) is as inexpensive as doing float math due to hardware support.

The machines themselves are impressive (hardware wise) and reliability wise, for example you can swap mainboards one by one in a full frame without ever taking down the machine (think raid on a mainboard level, RAIMB ?).

But the high start-up cost makes most startups going the other road. I am not convinced that the vertically scaling is cheaper than horizontally, if you need the ACID guarantees... but it is hard to say.

The reason why us old dogs say it is hard (not impossible) is due to the single-image and acid requirements. There is no good way to do that distributed (look up the CAP theorem).

So having a massive computer (with double digit terabytes of memory AND cache, and truly massive i/o pipes.. just makes building the need-to-work stuff simpler.

As an example, a few years ago I was (on my own money) on a mainframe conference (not doing mainframe work in my work day).. at that time the machine had more bandwidth to the roughly 200 PCIe adapters that a top-of-the-line intel CPU had between the L1 cache and the computing cores) - and that meant that given enough ssd's you could move more data into the system from disk that you could move into an intel cpu from cache...

Also mainframes can run two mainframes lockstep (as long as they are less than 50km apart), that means if one of them dies during a transaction (which in itself is extremely rare), the other can complete it without the application being any the wiser.. Try that in the cloud :)


I've worked at three banks and it's not about cost. It's because they aren't stupid.

Young developers often think that banks, insurance companies etc. should just rewrite these "legacy" systems because it will bring all of these magical benefits with no risk. Whereas older developers who have worked on (a) mission-critical applications, (b) major re-platforming efforts and (c) projects in a highly regulated industry know the score.

Doing just one is hard. Doing all three at the same time is suicidal. The chance of project success is basically in the single digits. And the risk of failure is billions in lost revenue and your future prospects in the company and within the broader industry ruined.


Not so sure, experienced tech managers are also very wary of vendor lock-in and tech debt which mainframes give you in spades.


The vendor lock-in is a feature to them. They're not a tech company. They're banks. They're insurances. Lock-in means they can send a lot of cash to someone and the problem gets fixed, which is the only thing they care about. And in good news, cash is also a thing they have a lot of. The cost of their tech infrastructure is a blip on the radar compared to payroll, to the cost of their physical spaces.


I assume from this comment that you've never worked in the enterprise.

Because (a) major decisions like choosing a mainframe are not made by tech managers and (b) every company is built around vendor lock-in.

Who do you think companies like Atlassian, Oracle, Salesforce etc sell to ?


It's a little off to think that a mainframe was "chosen". Software and the companies that write it and support it "chose" the hardware for you.


Google internally announced a while back that Bigtable (which powers Spanner etc.) hit 1B queries/second -- there definitely exist systems with far larger scale (though admittedly this is with lower atomicity requirements and probably includes reads etc.).


VISA does 500m+ transactions per day and Spanner does 1B queries per day, but it's quite unlikely that what a transaction means on VISA is the same as what a query means on Spanner.


Spanner does over 1B queries per SECOND (but your other point still stands of course)


If it is readonly shared-nothing queries, I am sure the hardware I have in my flat can do that as well..

The hard parts are updates in a shared system with a single consistent "view" requirement..


The hardware in your flat definitely cannot do 1 billion queries per second -- that requires a massive, global system, which can probably also support a shared system with ~100,000x less queries.


That sounds so impressive I googled it. It's actually 6 billion queries per second: https://id.cloud-ace.com/how-youtube-uses-bigtable-to-power-...

But then I mulled over it for a while, and it occurred to me it's likely Sqlite does orders of magnitude more than that planet wide.

Spanner's 1 billion per second is more impressive: https://cloud.google.com/blog/topics/developers-practitioner..., assuming it's returning a consistent view across many tables. But the Sqlite comparison still stands.

Visa claims 24,000 TPS, but in reality runs at a 10'th of that. It would be interesting to see if Spanner could process the same 2,000 transactions per second. Sqlite definitely can't.


Amazon does use mainframes. They aren't a payment processor and don't handle that part of the processing. That's why the likes of VISA get to charge their fee.


Google Cloud Bigtable and DynamoDB both appear to have ACID -- I don't see why mainframes would be better for this than cloud.

Bitcoin is slow because of the many servers not in spite of it. Because of the design of the network, all servers need to receive every transaction and servers need to be able to be pretty small, which limits the transaction rate.


Both BigTable and DynamoDB only support eventual consistency. That is a big asterisk in ACID for those technologies.


I don't think that's true?

Bigtable: https://cloud.google.com/bigtable/docs/replication-overview

> When using replication, reads and writes to the same cluster are consistent, and between different clusters, reads and writes are eventually consistent. If an instance does not use replication, Bigtable provides strong consistency, because all reads and writes are sent to the same cluster.

DynamoDB: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

> Both tables and LSIs provide two read consistency options: eventually consistent (default) and strongly consistent reads > Eventually consistent reads are half the cost of strongly consistent reads


I heard that mainframes have redundancy built-in. /s


It’s not a technical problem. IBM and whomever owns other “undead” platforms aren’t dumb, they price the stuff high enough to print money, but low enough to make it a poor return to migrate.

In a big enterprise, the mainframes give CIOs leverage for other stuff too - they sell at high margin and IBM will “give away” or subsidize other services by moving money around in the backend.


legacy industry is sort of standard term not sure why you found it funny. It not meant to say they are less important.

From quick google: "Legacy industries are those that have been around for a long time. These industries dominate a specific market and have not always had a positive approach to innovative ideas."


Legacy is mainly a marketing term for someone wanting a piece of an existing, big market. It also usually implies that you expect it to be closed down and replaced...


Well this is Hacker's news, you label something legacy like code, its never a nice thing or showing some respect to product/creators, rather contrary. TBH its also the first time hearing about this term, its simply not common, not even here and much less in general population.

And I have to strongly agree with OP, I couldn't care less about fate of FAANGs of these days, but I do care about those 'legacy' businesses tremendously.

As for original topic - if it works for 3-4 decades, don't be the stupid guy and change it. Tremendous risk to core business with little to gain.


Your comment doesn't make sense, why in Hacker News should I show more respect to Walmart and JP Morgan rather than google or apple by calling the former legacy industry and the later big tech, and why are you so worked up about it?

Relying on the existence of one vendor with highly unportable and unmaintainable code carries its own risk and my post is asking whether it justifies the cost.


It also gives you the opportunity to use their mainframes advantages to the very max. Not the "common feature set" but the very "best features" you can get out of your tech choice.

I do the same, I have run PosgreSQL for more than 2 decades now and I don't care about portability to any other database, all of which I consider inferior (and yes, I do follow most of their releases).


"These industries dominate a specific market and have not always had a positive approach to innovative ideas"

Google is itself starting to sound a bit like that!


I’m not GP, but I found it funny because the term is misusing the word legacy. It doesn’t fit other usage of the word or the dictionary definition of the word. I didn’t look it up because I didn’t think to, it looks like a normal use of an adjective, not a term.


Legacy means highly valuable and proven. Non-legacy means unreliable and highly unlikely to ever make money. Legacy software runs the world. Without it western civilisation would collapse.


The cloud is great for elasticity, so in scenarios where compute or storage demands can surge extremely and unexpectedly/at short notice, that is core "cloud territory".

When you have reliability/continuity as a top business requirement and ACID transactions (e.g. billing) must be processed at scale, then mainframes shine.

The argument "Never change a running system." is not a wrong one, but it does not on its own explain the existence of mainframes; there are non-legacy scenarios where using a mainframe is the most reasonably choice. Finally, "cloud" as a term denoting outsourced compute/storage capacity can also apply to mainframes, see e.g. IBM's pricing brief at https://www.ibm.com/downloads/cas/YM94KV6N - some own their mainframe, some rent it - like a prive or public (internal or external) cloud.


You statement and question are different things. Its also worth pointing out that a mainframe can mean different things. Here I will use the definition of a large central machine (or cluster) designed for HPC.

> Why hasnt cloud killed the mainframe

A mainframe is typically a very powerful machine/cluster. You may be able to get those in the cloud (although I doubt you can get a single machine with >32TB RAM, for example) , but why pay a premium to the cloud provider on top of the base cost?

> Why do newer companies not use mainframes

Some do, but with your examples, its because they have been engineered to scale horizontally instead of vertically - i.e. using lots of less powerful machines instead of a single big machine.

> Are mainframes used because of lock-in

In some cases, sure. But there are many cases where big powerful machines are needed (where you cant have any of the issues that come with distributed computing - i.e. CAP trade-offs)


The word mainframe is often used ambiguously. In my experience it could mean:

System Z - as I understand it, this is descendant from IBMs earliest computers like System 360. Typically apps are written in COBOL, PL1, tied together with JCL. “Newer” apps use DB2, but there are older non-relational databases as well.

System P - an IBM alternative server architecture based on the Power chipset. I think these mostly run UNIX, but I’ve never been hands on.

AS400 - technically an IBM minicomputer, but similarly esoteric with a bespoke OS and some nifty functionality for business apps (very SQL centric).

HP Tandem - traditionally used in high availability transaction processing applications.

Others? I think most of the other surviving old stuff is basically UNIX

Typically people are thinking of System Z. I think both the inertia and scalable ACID explanations are relevant. That said, I’m not sure how DB2 on System Z compares to say Oracle RAC.


I've never heard p-series called mainframe, and while it's been a while, I've both used and sold them ( as a side effect of that company's app often being sold bundled with a p-series )

While there's technically overlap in performance between the lowest spec Z and the higher end P, and these days lots of performance overlap between P and X ranges, P-series is called "midrange" by everyone I've every met.

The Power hardware has been the basis for both the AS-400 and P-Series/AIX line for a long time now, although you can buy p-series with linux too.

As an asside, AIX, with the exception of its volume manager which irks me deeply, is an incredibly usable Unix.. I mean you can really manage the whole thing via the SMIT UI without ever touching the command line. Not my preference, but you can. Some things, like cpu usage reporting are still better on AIX ( core utilisation via hyperthreads for example )

I'm still not a fan as such, mostly I'm annoyed by how good it is, but credit where credit is due.

Of course there are some people, (usually older but not always), who will call any server, or anything in a server room "the mainframe", in the same way the any desktop tower gets called "the cpu" (technically correct via some broader definitions but still..), or they'll call whichever office app they primarily use as "Microsoft", and of course you need to take that into account when communicating with people, the the incorrect usage of mainframe seems pretty rare these days just because mainframes themselves are rare.. and people in businesses who have them generally can distinguish.


> but why pay a premium to rent instead of buy?

I think that most customers lease their mainframe. IIRC IBM will often ship more hardware than the customer requires, with additional CPU/memory etc. being activated if a licence is purchased.


For sure, thats true of a lot of equipment :) I was more making the point that the cloud provider isnt a charity and will be charging a premium on top of that base cost. Ive updated to better reflect this.


You can also pay for additional compute on demand with IBM’s mainframes, bursting above your steady state resource rate.


No, the questions are related. I am asking does the mainframe really deliver better cost/performance than the cloud (some of the mainframe prices are quite huge without even accounting for maintenance and recruitment costs)? Is this the reason why it is still a significant market?


Consider also investment needed for the move. And the horror stories of that sort of moves have surfaced. Cloud might be cheaper, but moving to it is uncertainty where as current systems simply work.

And cloud is not only option, you could also instead of mainframe, build your own data centres and get most what cloud offers. CAPEX vs. OPEX is a thing, but it is not like these are startups that need to scale. Or the scaling limits are pretty well known.


> I am asking does the mainframe really deliver better cost/performance than the cloud

It is irrelevant because those companies don't care about cost/performance.

They care about risks e.g. operational, security, regulatory etc.

And I would trust a mainframe over the cloud for those criteria any day of the week.


Different industries have different amount of willingness to endure foolishness. For at least the first 10 years, if not the first 20, any new technology is mostly foolishness. Some technology never even makes it out of that timeframe, and just disappears before then. It takes a long time to figure out which parts of a new technology are actually useful, and which parts are just foolishness. A small SaaS startup has a relatively high tolerance for foolishness. A giant bank with billions of dollars under management has close to zero tolerance.

When you say "performance/cost reasons" do you mean "just the reasons that I am familiar with and am competent to judge myself", or do you mean "all of the plethora of reasons I am clueless about, and that only an experienced actuary, accountant, or project manager from the industry has the background to judge"? I bet you're just thinking of things from your own (likely very limited) perspective. Anybody who has run even a very small business by themselves quickly finds out that a lot of the costs are hidden, subtle, and not noticed by outsiders. Can you even imagine what sorts of surprise costs are involved in running a bank?! You may think buying a cloud database from Microsoft makes total sense, and has an obvious ROI. You may also have never run a business larger than mowing your neighbors' lawns.

(I don't know you. Maybe you are the CTO of an international telecom firm. Maybe I am rude to assume your background. That doesn't change my answer, though.)


This comment ought to be turned into a macro, and automatically added to the replies every time this topic comes up. Spot on.


Also if you are in banking or insurance and so on. Do you really want to put your trust in someone else? Cloud going down is not uncommon, but what if there is catastrophic loss of data? Or somehow their security systems break and everything or even last day is wiped?

Not that legacy will do better, but keeping your own fate in own hands does feel better. And these are at this point pretty proven systems. They most likely would have already failed.


See my response to another comment. If you use a mainframe provider you are far more beholden to your vendor than if you were using a cloud service provider and engineered it right.


One of the new banks in my country (founded in 2018) decided to use a mainframe for all their core functionality. The actually have statement about it on their website:

"""

Many factors from cost to regulatory requirements to cryptography and other security requirements play a role in making a decision to run a core banking platform in the cloud. Whilst the cloud is good for managing certain services (which we indeed use), it becomes a challenge to manage a bank’s core banking platform.

Once infrastructure is in a cloud, it is outsourced to that cloud provider. That means that the business is bound by those agreements covering aspects such as scalability, usage, capacity-on-demand and disaster recovery. These costs could grow exponentially as volumes grow. By using our own infrastructure for our core banking platform, we have full control over these factors.

Our chosen mainframe solution provides the ability for us to grow exponentially, while controlling all factors (CPU, Memory, Disk, Network, DR, Remote capability, etc) allowing us to manage the environment efficiently and effectively.

"""


For anyone who's wondering which bank this is, it appears to be Bank Zero in South Africa: https://www.bankzero.co.za/faqs/

I also found this extremely brief archived article about their setup, running LinuxONE on IBM Z mainframe(s?): https://www.businesslive.co.za/bd/companies/financial-servic...

Finally, Wikipedia links this article with mostly pictures: https://mybroadband.co.za/news/cloud-hosting/283199-bank-zer...


The main reason is almost certainly legacy code and integrations which are hard, risky and most of all expensive to change.

No business in their right mind would want to be on out of date, proprietary technology with hard to source skills, but the cost and effort of migration is enormous. There are scare stories of SAP migrations costing $billions and I assume a mainframe migration could be multiples of this.

What would be interesting is if they let actual enlightened techies size up and run these projects, rather than giving to Accenture and the like. Maybe it wouldn’t be insurmountable then?


The risk analysis for conversion projects comes out with a negative result every time.

You spend a lot of money, and the best case is you end up with the same services as before.


Something like the Square Kilometre Array (SKA) project(s) are best served by having their own dedicated quiver of mainframes coupled with dedicated instrument to supercomputing centre fibre bundles.

I used the phrase "quiver of mainframes" as the compute tasks are best served by having a cluster of architectures - dedicated RAM in the tens of terrabytes, dedicated fast local storage in the petabytes, large custers of computer nodes optimised for pipelined throughput, others optimised for hyper cubed deep mesh computations, other architures again purely for graphical representations, etc.

See: https://pawsey.org.au/supercomputing/


Now, this is a very naive conclusion, stemming from utter lack of experience of how real world, or even real large systems work. The true value of these “old” legacy systems is in years of development, debugging, validation of how these systems work. Including statistical guarantees of them working the way they were intended to. Hardware choices are thus very much secondary and dictated by how much they serve this preservation of investment in these complex systems. Rewriting them is a huge economical undertaking, not justified at all by transitioning to some new latest vogue technology buzzword, peddled by the big tech PR.


They also aren't exactly legacy if the performance continues to improve in such a manner that they remain impossibly far ahead of any relevant competition.

That's really the part of this whole (tiring and old) conversion that irks me the most. The total lack of willingness to come up with a better definition of legacy than "not in line with the trends that work for companies mostly interested in serving media content to consumers".

Legacy is a system that is still around that cannot be defended on its technical merits. Mainframes can and are constantly defended solely on their technical merits, and the "competition" doesn't appear to even want to try to compete there. It's always about moving the things that can be moved off platform, which is fine in the eyes of IBM, they'll even help you do it. That's why they offer much lower cost on zIIPs, zAAPs, IFLs, etc in comparison to general purpose processors running native workloads.


Or perhaps some industries would rather have servers on premises than trust an external provider? Could be part of the inertia (mainframe being the old way of doing things), but if there is any safety, confidentiality, or trust issue, relying on servers you don’t own may be frowned upon.


IMO companies that use mainframes are so beholden to their vendor (for maintenance, upgrade, consulting,...) that the trust they must have in their external provider must be 10x bigger than what you would have if you were relying on a cloud service, particularly from a business continuity angle.

Put another way if IBM says they are closing shop in a month time, or they were stop upgrading a particular product line this will royally screw many of their clients who will start scrambling for alternative solutions (although I would imagine they baked in significant notice periods into their contracts).

But if you use the cloud and have done it right, it is much easier to migrate from a cloud service to another.


If IBM does that to you they are dead also because nobody will trust them anymore for their bread and butter. Basically a MAD type relationship.


I think part of the answer is that is because of the software, not the hardware. The hard lock in in banking/insurance industry is to things like Cobol/DB2. They happen to run in mainframes but that's not where most of the migration cost would be. Cloud providers afaik don't provide support for this type of software stack and migrating huge transaction processing software backends to other stacks is considered prohibitive from what I've heard.


The maintenance budget the central administrative system at IBM in 1985 was $4B or (about $9.6B in current dollars.) The idea of moving something like this to the cloud is hard to comprehend. Even if it could be done it would take years and cost maybe $100B. And what would be the ROI versus building new applications or businesses. These are business critical systems built to run in a specific environment, it is very hard to justify moving them to another platform.


Well, that is until the old system inevitably fails and has larger and larger outages. Critical knowledge is lost with each generation to keep it running.


A mainframe is your own on-premises AWS.

(IBM should send me money for this tagline).

Seriously though, there are businesses where sending the data to a third party is absolutely impossible. You can spend a lot of money building your own private cloud... or, you can buy a mainframe, which will be cheaper, and if your industry is mainframe-friendly, there will be a lot of support and accumulated experience.


Even for on-premise I think a better solution these days it to build your own cluster rather than depend on IBM or the few vendors that supply mainframe.

IMO the theory, technology, stack, libraries, ... for distributed computing has evolved a lot over the last decades that if you really had to have on-premise it is easier to create your own on-premise EC2 than rely on IBM to supply you with mainframes.


If your business is not in core tech, you are out of your competence to build your own multi-region cluster or a private cloud. It is a quite challenging task to start with, with unpredictable costs. Many HN readers do not understand this, as they always worked in core tech.

And if you rely on systems integrators to do that, it will be quite expensive, and you are not sure if this particular integrator will be around in the next 30 years. (Amazon will probably be around, which is why AWS is popular, but sometimes public cloud is not an option at all).

In this context, mainframes are quite competitive. Their costs are predictable (and comparable to, if not less than to private cloud offers from systems integrators), IBM will probably be around for long or at least acquired by somebody who will carry the support contracts (like Sun -> Oracle), and they will provide full technical support, again with predictable costs.


Interesting take, makes sense.


I'm not sure the question is quite right.

Google and Amazon don't use "the cloud" in the normal sense of a third-party public offering. They own the infrastructure that is also used by other customers as a cloud so in one sense it is private cloud but closer to on-prem than anything else.

Many other large businesses run on-prem instead of "the cloud" because the cost savings of a op-ex cloud system start to diminish when you already have your own infrastructure/networks/specialist staff as these large businesses have. Again, this is often a mix of traditional on-prem infrastructure and private clouds that offer some sharing of resources to applications that do not need an entire physical server.

Netflix, I think is a mixture of on-prem and public cloud but not sure.

So I'm not sure if you are asking "why mainframes instead of distributed e.g. microservices systems" or "why on-prem instead of using the cloud".


I would says "vendor lock-in or even legacy code". It seems to me that the systems that use mainframes have business continuity and stability as main goal. If it works there is not sense in migration. I think that for systems that use mainframes money is not the issue and on top of that migration costs would be huge.


I know that some of them tried to rewrite large systems but failed. You are right that it is not just a money issue since they could spend hundreds of millions of dollars on the rewrite and then basically scrapping it.

It shows that they did want to switch but it was harder than they thought.


The mainframe ecosystem (both hardware and software) is sufficiently different to modern commodity hardware that ports of large applications off the mainframe are highly risky or impossible. For organizations which adopted the mainframe way back when, the cost of staying on the mainframe is often lower than the cost to migrating to something else. It many cases, a migration is likely to mean rebuilding a decades-old codebase from scratch.


I can't think of any valid arguments for moving anything important away from a well understood and proven-reliable system.


in the case of mainframes the main reason would be that nobody wants to learn cobol anymore and maintainers of legacy code are litarally dying out.


I think the issue is more exactly "nobody wants to learn cobol at that price". Lots of people found a passion for JavaScript and React in the last years due to the salaries.


It is simply about a wholly different level of reliability that mainframes provide.

They are not at all a trash or a product of corruption or nepotism as many tend to think. That stuff works all the time, and did since ~1960, while we serfs spend our lives fixing bugs resulting from never-ending updates in that hodgepodge of javascript libraries our "efficient", "cheap", "FOSS-based" products are made of.

We all don't use mainframes because marginal or downright dodgy business cases of our products simply won't pay for it, thus we are stuck in this race to the bottom.


> We all don't use mainframes because marginal or downright dodgy business cases of our products simply won't pay for it, thus we are stuck in this race to the bottom.

Not just because of the expensive hardware, but also because it's all but impossible to get your hands on them as a developer. With Java, JS, Python, .NET, whatever companies have an insanely large pool of people to choose from... quite a few self-taught, some who went to university and studied CS, some who went to more or less decent bootcamps, some who "grew into" programming from other roles. On top of that, last I heard (admittedly a decade ago) the tooling to work on mainframe systems is just as old as the code they're running, so no modern IDEs, debuggers and whatnot.

With mainframes however, companies have to spend their own money, and quite a lot of it, to get trained developers, and on top of that pay a hefty premium for those who submit to fossilized tooling willingly.


IBM's provided a free developer course, including a free z/OS account, for at least a decade now: https://www.ibm.com/z/resources/zxplore

Going through the challenges is worth it just to see how the mainframe way of doing things works, even if you never plan to use a mainframe in your career. There's a lot of core concepts and cultural touchpoints that help you build more reliable systems everywhere else.


Mainframes are really impressive

It feels like non mainframe world decided to not put effort into hardware reliability and tries to fix it at the architecture and software lvl, which is kinda sad.


Not really given the results and price savings.


I don't even work in banking or insurance, but we still have regulations and certifications that require us to have on-prem level of security.


That question can't be answered without defining "the" cloud (a marketing term if anything) and "the" mainframe (there can be multiple). It's conceivable that IBM rents out z/OS machine capacity as a "cloud service".


It's an emulated s390x on x64, but yes, you can spin up emulated z/OS images in their own LPARs just fine in IBM cloud. You can get a real Linux LPAR on mainframe hardware, too. For a while, it was even part of the free tier.


Decades of business logic resides in the mainframe, and the risk of moving that into commodity hardware is in the billions (maybe trillions?)


Big tech products are decoupled from the physical world and ultimately not critical for society to function, so the workloads are highly variable. Mainframes are a good fit when you have workloads that don't vary by several orders of magnitude on a daily basis, and you need an ecosystem that treats reliability like a religion.

If you haven't used z/OS or worked in a z/OS shop, you will never understand how fundamentally different mainframe environments are. Sure, JCL is weird and ISPF doesn't make it clear how much raw power and how many OS features are available under the hood, but once you've worked on a real production application maintained by a good team, none of the syntax or UX stuff matters anymore. The abstract representation of the entire application and everything that supports it starts to live in your head and you start to realize how incredibly backwards a lot of modern cloud scale stuff is.

The dirty little secret of a lot of z/OS shops is that they are already using modern cloud environments on their existing hardware. All the productivity gains of Github Copilot + Visual Studio Code, Python, Rails, are all available on the same high speed, high reliability, high capacity frame where you already keep all your data sets. You can even spin up a Linux LPAR if you don't want to run on top of z/OS directly. The node.js people can whip out Next prototypes and slurp in 500 megs of questionable NPM modules on top of either OS, too.

While "modern" environments focus on bolting APIs together using web technologies, a typical mainframe environment simply hosts everything in the same place and uses the data sets on disk or the database as the interface between applications. This allows you to do all kinds of stuff that's fundamentally impossible in a cloud environment. You can instantly give an application a 100% consistent view of a checkpoint of a 200 terabyte data set, while it's being used by 3 other applications pounding out a million IOPS, and remaining failover ready to the other hosts in your metro sysplex. Many sites are sized well enough that when the LTO robot backs it up while all this other stuff is going on, there's not even a blip in your transaction latency numbers.

There is no cloud vendor, and probably will never be a cloud vendor capable of providing support for all of these features, even if they had them in the first place. I've worked on a service that's white box resold by all three major cloud vendors, and believe me when I tell you that there is no way you can get everyone involved in every component that might be contributing to, say, a weird block storage problem on a conference call. You just can't. You'll get escalated to someone who might have a chance at maybe doing some initial isolation to narrow down the scope of what's causing it, but that's ridiculous if you're a bank or a nationwide grocery store. Your EBS call is returning a HTTP 302 today? There's no documentation for that, not at the level you're going to want when it's breaking production at a factory that employs thousands of people. None of those industries use any of these fad technologies for anything important for exactly that reason.

Mainframes, on the other hand, have these kinds of support considerations built into nearly every component of the environment. The error message culture alone eliminates all the frantic googling that most non-mainframe engineers are used to doing when they get error messages. The OS and applications are expected to emit error and warning codes, which translate to actual English text that tell you what happened, and what to do about it. If necessary, IBM support can remotely JTAG any component in the frame after it's been automatically taken out of service.

Your SAN vendor can't point fingers at your server vendor, because they're the same vendor. You can't even call the network vendor for your ToR switch, because it doesn't exist; the frame has multiple internal PCIe, Infiniband, and IBM coupling adapter backbones, and you don't need a switch because you can simply add 96 WAN-capable 10 gig ports to each frame. There's no separate cluster interconnect transceiver vendor for the optics, that's IBM too.

If any of this detailed documentation refers to some operating system data structure you've never heard of, well, those are all documented in the MVS data area manuals, volumes 1-4, with convenient "eyecatcher" human-readable four character strings so you can quickly identify them in a hexdump of the operating system's memory. Nearly every single thing the OS outputs is also documented in the system message manuals, volumes 1-10. Don't even need to bookmark it, everything's immediately available via the master documentation index at ibm.com/docs/zos.

There is absolutely no comparable equivalent to any of these supportability-oriented resources in any other supported software ecosystem that exists today. Just go look at the manual for DFSORT, which is the z/OS equivalent to the Unix sort utility. Seriously, go look at it:

https://www.ibm.com/docs/en/SSLTBW_2.5.0/pdf/icem100_v2r5.pd...

There's over a hundred pages explaining what to do for any of the runtime messages the utility can output. There's no handwavy "undefined behavior" like you get with C and Unix. IBM has thought through every possible thing that can go wrong with a sort utility, with paragraphs of supplemental information and advice for many of them. If you use the online facilities to look it up on the mainframe itself, it will conveniently ask you if you want to print out the relevant documentation on the printer closest to you (yes, it knows where it is.) Most of the 3rd party application software is the same way; this level of documentation and programmer attention to detail is simply part of the culture.

People say IBM's mainframe support is really good, and in my experience it usually has been, especially when you're hard down and they get their best people on the phone immediately, but no one ever mentions that the best thing about their support is that they do all this behind the scenes work so you can support yourself using the documentation, without having to call them. Meanwhile, things which are extremely basic, fundamental, excruciatingly well-documented operations in z/OS continue to be totally impossible in every single other operating system ecosystem, like "is this file in the page cache, and if not, why?" Imagine trying to answer that question without having to break out bpftrace or WinDBG in kernel debugging mode, and even then you'll have to go read the source code or load ntoskrnl into Ghidra to figure out where the data structures are. The worst thing about all this is that a lot of people in the industry think that it's normal and they're patting themselves on the back for being oh so clever at knowing how to do that.

Once you've been exposed to the mainframe way of doing application development, it ruins you for how most of the rest of the industry prioritizes feature velocity and how their focus is growing DAUs for people looking at cat pictures, or have PR campaigns about how it wasn't actually lying, technically speaking, when their other PR campaigns said the car can drive itself, and none of that had anything to do with killing those drivers when they took it at face value. Oh, by the way, the car can fart now, did you know?

All of this narcissistic Silicon Valley tech industry bullshit starts to feel, quite frankly, kind of disrespectful and like you're in a business relationship with juvenile clowns if you're trying to run a business that people are depending on for something that actually matters. Building things for support and for uptime isn't sexy, is bad for growth, and doesn't require a team of rockstar SREs who dress and act like fighter pilots. There is no "module of the week" for CICS. There will never be an "oh-my-TSO" package with emoji and themeable color schemes. If you go watch "For All Mankind," the 1970's era TSO prompts in the TV show look exactly the same as they do when you fire up your 3270 emulator on your iPhone today and remote into your frame. Everything about the mainframe ecosystem is exactly the kind of boring you want when your applications are supporting real people, spending real hard-earned money, on real products and services. That's why the parts of the tech industry that are constantly in the news don't use any of this technology: it works.


I've heard similar arguments several times, and every single time these kinds of arguments go up in flames when costs are brought up. Even discounting the "elasticity" that building applications on the cloud, its simply far cheaper from a cost perspective to build on cloud. Folks fail to mention that the cost of having the kind of all-round support from IBM that you mention comes at a significant cost that is unaffordable to most business. Its probably cheaper to debug esoteric Linux issues than to call in IBM support. In fact, IBM knows this and has moved a significant chunk of their support team abroad to reduce costs and remain affordable. In my experience, these support teams are usually of poorer quality.


What an excellent answer. Nailed it!


Mainframes are more efficient at some tasks. Most tasks aren't particularly suited for mainframes, but the ones that are become a lot more expensive if you try to put stuff into the cloud.

You're also often dealing with very sensitive information. Zenbleed style attacks are an unacceptable risk, so you'll need separate hardware from the rest of the cloud anyway. Maybe you can find a data center that's reliable and secure enough to put all of that sensitive data, but there's a good chance you'll be wiring up your basement with fiber optics if you're dealing with finance.

The cloud, as in "other people's computers", is horrifically expensive. If you're going to set up mass throughput systems, you'd better start your own data center. This is expensive as well, but it's not impossible.

If you want scalability to reduce power consumption and deal with burst workloads, you'll have to separate out your processing systems from your storage systems. Your average data center probably runs a lot of iSCSI or similar remote disk tech, possibly based on some kind of software wrapper at the cost of latency and performance but with the benefit of quickly swapping drives and expanding capacity.

Then you'll have to architect your software, of course; if you're doing batch work, you'll probably want to distribute programs in batches and run similar programs on similar chips, making optimal use of data locality and CPU cache. Maybe do the whole Hadoop thing depending on your workload.

You'll also want to figure out maintenance. You can hire a team of your own techs dealing with replacements or upgrades, but in many cases hiring external talent for a limited amount of time per month is probably cheaper and saves you the effort of keeping your workers trained.

When a machine fails and the SKU you've selected has gone out of production you'll need to figure out a replacement. This doesn't have to be a problem, but servers are fickle things and you'll probably want to use something that works with the other server vendors' tools, so you're stuck buying hardware from a limited number of suppliers in a limited number of configurations.

For management, you'll want to pretend all the computers are part of the same system. Whether you pick Kubernetes or OpenStack, you want central control to save you the headache of a million dashboards.

When you're done, you've built yourself a mainframe, except you've spent your own money on R&D and need to spend money to upgrade your hardware rather than buying licenses for the rented machines your mainframe vendor already shipped to you, waiting to be licensed. Have you saved money compared to buying overpriced mainframe hardware? Tough to say, mainframe hardware is often better at its job than normal servers. You'll save money on developers, as you no longer need to keep the old COBOL around, but when that happens you've probably paid more than you've saved when the decade long project to rewrite the backend to another language is finally done.

Mainframes are just computers good at batch jobs. They're not better or worse than regular computers, they're just different.


Mainframe will be with us for a long time yet. The reasons are many and complex.

Lack of impetus

There are plenty of neobanks out there that have built scalable, secure infrastructures using modern development practices in the cloud. But despite growing rapidly, none of them has the scale to challenge the big banks. Similarly in areas like the airline industry, the big airlines are all old and back-end technology is rarely the deciding factor in whether or not one airline is more efficient than another. There simply isn't as strong an impetus to change as you'd expect.

Risk

There is very little incentive for any given executive at one of these firms to take the risk involved in staking their reputation on a big technology migration. Equally, there's nothing quite like a failed transformation project to destroy the careers of those associated with it. If you think mainframe is antediluvian, take a look at the ERP software the same companies are running. Layer upon layer of legacy with custom code built to manage myriad edge-cases that nobody understands anymore. Why take the risk when you can build a new system that integrates with the mainframe using (for example) a modern database that gives you a modern transactional API while micro-batching updates back to the mainframe. The incentives are all to create additional cruft.

Outsourcing

Most if not all of the big banks, airlines etc. have outsourced considerable parts of their operations over the years. In doing so, institutional knowledge was shifted out of the business into those outsourcers. The outsourcers in turn have little incentive to drive transformation of the mainframe given that a move to cloud sees their revenue deriving from the infrastructure management go to near zero. The outsourcers don't even have to act in bad faith for this to be a major problem. McKinsey and the rest thrive on complexity and by advising clients to outsource, they layered organizational and contractual complexity on the technology complexity, making the problem of transformation increasingly irreducible.

After risk, outsourcing is probably the most important factor since it is extremely difficult to create outsourced structures which maintain and develop an organic link between those responsible for business processes, and those responsible for technology. The result is an ever growing pile of sclerotic processes, dysfunctional governance bodies and uni-functional teams (often themselves outsourced to different parties for competitive purposes) that purport to control but which really just create complexity.

Outsourcing has served to worsen the organizational complexity that most mainframe users already suffered from. The result is a situation in which any programme of work to get off mainframe becomes fearsomely complex. I've worked in places which would have regular meetings of large parts of the company to try to coordinate major business process change in a single area. I've seen companies nearly break themselves trying to bring a single outsourced business function back in house. The question is why, when they're so incredibly inefficient and inflexible, they aren't competed away. That's a different question on which I have my own opinions, but this comment is too long already.

Knowledge

The loss of COBOL and other mainframe technology knowledge is real. I remember working at a bank in the EU around 2010 where I sat with a bunch of elderly gentlemen (walking sticks were a theme) who had been contracted back into the bank to develop integration between an ancient mainframe application and something modern the bank was building.

But that stereotype aside (there are surprising numbers of younger mainframe experts in India thanks to outsourcing), the problem is real, particularly when it comes to migration of software from mainframe to cloud using modern development practices. Any migration away from mainframe software requires understanding the whole technology stack and more importantly, how that stack interacts with the equally complex stack of business processes.

AI code interpretation and generation might take a COBOL program and translate it into modern code, or even help re-architect it using modern principles. But without that understanding of the business processes as well as the up and downstream dependencies in their many forms, anything other than piecemeal change looks terrifying to anyone who might try to move away from mainframe.

IBM

The fact is that mainframe is an effective technology stack. But more importantly, IBM has become extremely good at both keeping it up to date while also owning the best ways of modernizing it.

They're good at making sure they control the path away from mainframe. The best, simplest and lowest risk approaches to getting off legacy code on mainframe are either developed by or bought by IBM. By enabling Linux on mainframe and providing straightforward migration paths from legacy code to that platform, IBM (and its many partners) ensures that modernization of mainframe for the most part means staying on mainframe. This has gone through multiple phases and taken lots of forms over the years but really, IBM has done a stupendous job of ensuing that the future of mainframe is usually mainframe.

The advent of AI code interpretation and generation is another example of this. IBM has already announced their own AI tooling to help customers make the migration to mainframe Linux faster and smoother: https://newsroom.ibm.com/2023-08-22-IBM-Unveils-watsonx-Gene....

The challenge for any AI startup or professional services company wanting to help customers move away from mainframe is that the people best placed to sell those tools are... IBM and its partners.

Might the situation change?

AI code interpretation and generation is getting better all the time. LLM context sizes are growing rapidly. The possibility of fine-tuning a code-generation model using a business' own source code is there. It's even possible that businesses who no longer have source code can use AI to analyze and decompose binaries. The days when AI can analyze a whole software infrastructure, re-architect it and re-write it whole-cloth are coming. But even with those tools, the organizational layering, process cruft and generalized loss of institutional knowledge is going to make elimination of mainframe a long-term, high-risk project.

This is not to say that it won't happen. But technology change can only ever happen successfully at the rate an organization is able to change along with it. The organizations which still use mainframe tend to be the biggest, most complex and sclerotic organizations on the planet. IBM is going to be enjoying the benefits of what it built decades ago for decades to come.


because cloud is the mainframe


... from Wish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: