I would be comparing how someone like Intel did during dot.com instead of Pets.c...

haldujai · on May 26, 2023

> There is a chance that AMD, Intel, maybe Google etc. catch up with Nvidia in a year or two and data center GPUs become a commodity (clearly the entry bar should be lower than what it was for x86 CPUs back in) and what happens then?

Realistically, there is next to zero chance Intel (especially given the Arc catastrophe and foundry capabilities) or AMD (laundry list of reasons) catchup within 2 years.

Safe bet Google's TPUv5 will be competitive with the H100, as the v4 was with the A100, but their offering clearly hasn't impacted market share thus far and there is no indication Google intends to make their chips available outside of GCP.

With that said I also agree the current valuation seems too high, but I highly doubt there is a serious near-term competitor. I think it is more likely that current growth projections are too aggressive and demand will subside before they grow into their valuation, especially as the space evolves with open source foundation models and techniques come out (like LoRA/PEFT) that substantially reduce demand for the latest chips.

vineyardmike · on May 26, 2023

> there is no indication Google intends to make their chips available outside of GCP.

1. You can buy mini versions of their chips through Coral (coral.ai). But yea, they’d never sell them externally as long as there exists a higher-margin advantage to selling software on top of them, and chips have supply constraints.

2. Google can sell VMs with the tensor chips attached, like GPUs. Most organizations with budgets that’d impact things will be using the cloud. If Apple/MSFT/AWS/Goog/Meta start serious building their own chips, NVidia could be left out of the top end.

haldujai · on May 26, 2023

> Google can sell VMs with the tensor chips attached, like GPUs.

They have already been doing this for quite a while now and even when offered free via TRC barely anyone uses TPUs. There is nothing to suggest that Google as an organization is shifting focus to be the HPC cloud provider for the world.

As it stands TPU cloud access really seems ancillary to their own internal needs.

> If Apple/MSFT/AWS/Goog/Meta start serious building their own chips, NVidia could be left out of the top end.

That's a big "if", especially within two years, given that this chip design/manufacturing isn't really a core business interest for any of those companies (other than Google which has massive internal need and potentially Apple who have never indicated interest in being a cloud provider).

They certainly could compete with Nvidia for the top-end, but it would be really hard and how much would the vertical integration actually benefit their bottom line? A 2048 GPU SuperPOD is what, like 30M?

There's also the risk that the not-always-friendly DoJ gets anti-trusty if a cloud provider has a massive advantage and is locking the HW in their walled garden.

fragmede · on May 26, 2023

> barely anyone uses TPUs

What are you basing that on? I'm not aware of GCP having released any numbers on their usage.

haldujai · on May 27, 2023

Anecdotal data warning but for context my research is in medical informatics and I've quite extensively followed publications on transformers dating back to the early BERT variants (including non-medical).

I'm making that statement as my experience (easily several hundreds of publications read or reviewed over 3 years) is that it is very uncommon to see TPU's mentioned or TRC acknowledged in any non-Google transformer paper (especially major publications) dating back to the early BERT family of models despite the fact that Google is very generous with research credits (they'll give out preemptible v3-32s and v3-64s for 14 days with little question, presumably upgraded now as I haven't asked for credits in a while).

Fully acknowledge this isn't quality evidence to back my claim and I'm happy to be proven wrong but I'm very confident a literature review would support this as when I tried to use TPUs myself I couldn't find much.

This doesn't account for industry use, there is probably a non-insignificant amount of enterprise customers still using AutoML (I can think of a few at least) which I believe uses the TPU cloud but I would be surprised if many use TPU nodes directly outside of Jax shops like cohere and anyone still using TF.

PyTorch XLA has just breaks too much otherwise and when I last tried to use it in January of this year there was still quite a significant throughput reduction on TPUs. Additionally when using nodes there is a steeper learning curve on the ops side (VM, storage, Stackdriver logging) that make working with them harder than spinning up a A100x8 which is relatively cheap, cheaper than the GCP learning curve for sure.

sangnoir · on May 27, 2023

> Anecdotal data warning but for context my research is in medical informatics

Isn't Medical Informatics inherently biased against the cloud? That's my uninformed guess as an outsider.

bagacrap · on May 28, 2023

It seems like one of these two things must be true:

A) Nvidia's TAM is not really what the stock is priced foe

B) Google will try to enter this market and compete

Either way NVDA looks perilously pricey, not that that is very predictive of anything (see TSLA).

brutuscat · on May 27, 2023

So what, are they pathological layers?

https://youtu.be/XVcKLetqf3U

The Intel® Data Center GPU Max Series outperforms Nvidia H100 PCIe card by an average of 30% on diverse workloads1, while independent software vendor Ansys shows a 50% speedup for the Max Series GPU over H100 on AI-accelerated HPC applications.2 The Xeon Max Series CPU, the only x86 processor with high bandwidth memory, exhibits a 65% improvement over AMD’s Genoa processor on the High Performance Conjugate Gradients (HPCG) benchmark1, using less power. High memory bandwidth has been noted as among the most desired features for HPC customers.3 4th Gen Intel Xeon Scalable processors – the most widely used in HPC – deliver a 50% average speedup over AMD’s Milan4, and energy company BP’s newest 4th Gen Xeon HPC cluster provides an 8x increase in performance over its previous-generation processors with improved energy efficiency.2 The Gaudi2 deep learning accelerator performs competitively on deep learning training and inference, with up to 2.4x faster performance than Nvidia A100.

https://www.intel.com/content/www/us/en/newsroom/news/intel-...

worrycue · on May 27, 2023

> next to zero chance Intel (especially given the Arc catastrophe and foundry capabilities)

Arc is manufactured using TSMC N6.

Intel originally wanted to use Intel 4 but it wasn’t ready yet. Maybe the next batch of GPUs assuming Meteor Lake and their other CPUs don’t consume all the Intel 4 capacity.

Also Arc hardware-wise is fine for what it is and the process node it’s using - N6 isn’t a leading edge node to my knowledge. Drivers are unfortunately something that’s going to take time to fix up - there is no way around this.

haldujai · on May 27, 2023

Agree but Intel has yet to show they can successfully make a high end GPU, and they're heavily invested in Arc at the moment.

Given Intel 4 is launching at the end of the year I would expect their focus will be on catching up wth AMD on CPUs and the next-gen Arc GPUs. Assuming everything goes well with their yields and they have extra foundry time (which they won't be using as part of IFS) will they have the institutional energy/capital/will to open a new software+hardware battle in a market the entrenched Nvidia will fight to the death for?

It seems extremely unlikely to me within 1-2 years.

lbrindze · on May 27, 2023

I wouldn’t discount AMD just yet. They closed quite a big gap in the server cpu market against Intel, most probably due to better leadership and management. I wouldn’t be surprised if they are able to pull that trick a second time with GPUs. 2 years isn’t short but it isn’t that long either.

People have said AAPL was overvalued perennially as long as I remember yet their market performance seems to ignore these opinions.

On the other hand, a big part of it also comes down to the tool chain, and NVIDIA owns CUDA. Until OpenCL or other gpu platforms catch up, it seems like NVIDIA can continue to corner the gpu market at large.

bagacrap · on May 28, 2023

Nvidia seems like a tougher competitor to oust than Intel.

> People have said AAPL was overvalued perennially

Yes but they were saying this when AAPL's p/e ratio was in the low teens and now it's near 30. it was never near the insanity that is NVDA. I will grant that there's a lot of uncertainty about the future, but there's immense optimism baked in right now. It will be hard to live up to.

adventured · on May 26, 2023

Arc has already caught up to Nvidia. The latest Nvidia GPUs are a disaster (the 4060ti is being universally mocked for its very pathetic performance), they're intentionally royally screwing their customers.

The A750 and A770 are tremendous GPUs and compete very well with anything Nvidia has in those brackets (and Intel is willing to hammer Nvidia on price, as witnessed by the latest price cuts on the A750). Drivers have rapidly improved in the past few quarters. It's likely given how Nvidia has chosen to aggressively mistreat its customers that Intel will surpass them on value proposition with Battlemage.

antiherovelo · on May 26, 2023

You’re talking about consumer grade graphics, not AI processing, and you’re talking about cheap, not performant.

There is no significant competition to the NVIDIA A100 and H100 for machine learning.

fennecfoxy · on May 30, 2023

Now that I think it's right of them to do, but all consumer Nvidia products are overpriced to hell and have been for a long time, now.

The reason is because they can get away with it, because there's so much demand for their product. Were Nvidia to see AMD release a 4090 equivalent at half the price they need only reduce their own ridiculous prices and take less of a profit margin.

haldujai · on May 26, 2023

> anything Nvidia has in those brackets

This being the operative part of the statement. If we're talking top-end GPUs it's not even close.

> Intel is willing to hammer Nvidia on price

They also have no choice, Intel's spend on Arc has been tremendous (which is what I mean by catastrophe, everything I've read suggests this will be a huge loss for Intel). I doubt they have much taste for another loss-leader in datacenter-level GPUs right now, if they even have the manufacturing capacity.

dekhn · on May 26, 2023

the 4060ti is an entry board, it's designed to be cheap not fast. I believe this pattern was also true for 3060 and 2060.

pixl97 · on May 26, 2023

>and what happens then?

Most likely, all their prices go up...

I mean, your first instinct is to say, "but how could all their prices so up, they'll steal value from each other", but that's not necessarily true. If AI starts solving useful problems, and especially if it starts requiring multi-modality to do so, I would expect the total GPU processing demand to increase by 10,000-100,000X that we have now.

Now, you're going to say "What's going to pay for this massive influx of GPU power by corporations". And my reply would be "Corporations not having to pay for your health insurance any longer".

x3sphere · on May 26, 2023

I'm not sure AMD will catch up to Nvidia. Obviously there are a lot of traders betting on that right now, given that AMD has started to rally in response to Nvidia. However after all this time NV still commands like 80% share of the gaming GPU market despite AMD often (not always) releasing competitive cards. Gaming GPUs are already a commodity - why hasn't AMD caught up there?

I mean, maybe it's not a fair comparison but I don't see why the datacenter/GPGPU market won't end up the same way. Nvidia is notorious for trying to lock in users with proprietary tech too, though people don't seem to mind.

kgwgk · on May 26, 2023

> Did Intel ever ‘grow’ into their massively overvalued valuation? No.. their stock never even reached it’s September, 2000 peak yet.

If you take dividends into account it did break even a few years ago, at least in nominal terms.

Cisco and Sun Microsystems may be even better comparables though.