Pi 5 overclocking: Silicon Lottery

kloch · on March 13, 2024

I remember overclocking the 486DX2-66 in the early 90's. I got the idea after reading my brother's Intel data book and noticed that while the max clock speed was speced at 66 MHz, all of the timing diagrams implied it could run to 80. I borrowed a variable speed clock generator and sure enough it was stable at 80, and started to crash at around 82MHz.

When I started to help friends overclock theirs, I quickly realized the "silicon lottery" variance. Some would only run reliably at 78 or 76 MHz. I bought a bunch of fixed frequency clock generators (that were drop-in replacements for the original on the motherboard) in 2MHz increments due to the variance.

This was back before CPU's had heat sinks or fans, so we quickly figured out that adding those gave better margins. We even made some 10-LED bar temperature display that had a thermocouple glued to the CPU case and indicated 10 degree C increments (green=0-60c, yellow=70-80c, red=90-100c).

gorkish · on March 13, 2024

I remember overclocking my calculator (TI-85, 1992, Z80 CPU) ... its LO was a 2.7K/22pF RC oscillator which gave it an approximately 2.5MHz clock. To get this type of oscillator to speed up you'd normally lower the capacitance a bit.

The reason that this story is interesting is that in most cases you could just yank C9 entirely and with nothing more than a resistor between the clock pins, you'd get a roughly 300% performance increase. I guess the parasitic capacitance was enough to still oscillate a bit although mostly it would have been random. Looking back, this was basically a CPU being clocked with 50mhz noise and still running happily! Amazing!

lloydatkinson · on March 13, 2024

Not quite the same but I once was bored enough to keep trying to see how low power a solar powered calculator could work with.

I held my hand over the solar panel at various lengths until the screen cut out and while doing this I just kept hitting random keys and while lifting my hand up.

One day when I did this I must have hit the one in a million chance. It started rapidly counting up by itself!

I think I only got it to happen once more. I suspect the fluctuating voltage and it trying to do calculations while I was pressing keys was just enough to get some gates latched into the wrong state, somehow.

gorkish · on March 14, 2024

Glitching the power to force a CPU to misbehave at the right time is a time honored hacking technique! Great story

bobim · on March 13, 2024

Good memories... I ended up adding a switch under the cells cover because the mod was just draining these way faster. But curve plotting was finally snappy.

hinkley · on March 13, 2024

That was before binning really got to be a business model. Of course once a production line was stable, they could generate more high end chips than they actually needed, and so the chip you bought from bin 3 might actually be a bin 2 chip. It always seemed like AMD was really conservative that way, which is why hobbyists loved them.

I have a recollection of a guy who got a 486 DX-33 up to 133 MHz by putting the entire computer in mineral oil and floating chunks of dry ice in it. Watch out for asphyxiation.

t0mas88 · on March 14, 2024

Early GPUs also had this, where you could turn it into a higher end model with just software changes.

fragmede · on March 14, 2024

Or hardware changes. Pop off a resistor on your $300 graphics card to turn it into a $2000 Quadro card.

winslow · on March 13, 2024

That sounds like a lot of fun. Do you happen to have any photos of you tinkering especially with the 10-LED bar temp displays?

epakai · on March 13, 2024

I have a ST486-DX2-66GS (1998), and I found it is unstable at the stock 66MHz. I actually have to run it at 80MHz to prevent random freezes.

MuffinFlavored · on March 13, 2024

> The result of that 3.0 GHz overclock? A marginally-improved Geekbench 6 score of 1662, versus 1507 with no OC. To achieve that 10% speedup, it ate up about 20% more power, so efficiency-wise, it's not worth it.

What effect would running an overclock like this permanently have on longevity? Is it even worth thinking about "longevity" of chips?

Obviously stability suffers but for example... how much? Author was able to get a Geekbench 6 benchmark to pass. If they tried 100 times, would it be expected that a non-zero amount would fail?

sweetjuly · on March 13, 2024

Unfortunately, most of the useful data you want about wearout is locked behind tech NDAs so nobody will be able to offer specifics here.

But in general, the way you tend to end up with wearout is through electron migration ("electron wind") which is damage to interconnects from electrons slamming into metal atoms over and over and slowly ripping apart the wires. Modeling electron migration correctly is really hard, but (from memory) a general relationship is that voltage linearly increases failure rate and temperature exponentially increases it. The constants for these models are determined empirically and, of course, are NDA'd.

In general, I wouldn't worry about it. The MTTF for semiconductors is already very high even under awful temperatures 100+ C, and so as long as you cool it properly you'll be fine.

> would it be expected that a non-zero amount would fail?

The failures you describe here are going to be due to setup time violations. These issues shouldn't be transient (assuming identical temperature and voltage) since the performance characteristics of an individual device don't really change over time. Of course, the issues can seem transient as the failures may not actually always cause noticeable corruption (maybe you generate a wrong FPU result but that specific path is only exercised under rare uarch conditions).

So, it's a non-answer, but: yes and no. Maybe your chip is perfectly okay and never has any violations at the parameters you selected. Maybe it does. Neither you (nor in fact the manufacturer, though they do have a better chance since they know the process/design) can ever really be sure--all that's left is empirical burn in testing and hoping for the best :)

magicalhippo · on March 13, 2024

I can't find the references right now, but I recall reading that longevity of microcontrollers and similar could also be somewhat accurately modeled by the Arrhenius equation[1], meaning a 10C increase in operating temperature would result in roughly half the expected lifetime.

[1]: https://en.wikipedia.org/wiki/Arrhenius_equation

sweetjuly · on March 13, 2024

I believe you might be thinking of Black's equation? [1]. It's one equation which attempts to model the failure rate due to electron migration. It isn't a physical model but it seems to, with the right constants, fit reasonably well. The 10C=>halving life time is going to depend on the constants though.

[1] https://en.wikipedia.org/wiki/Black's_equation

magicalhippo · on March 13, 2024

Ah, must be. Though it wasn't mentioned by name in the source[1] I found earlier the equation seems to match. Instead they attribute it to Arrhenius, but I guess Black's equation is a special case.

[1]: https://www.ti.com/lit/an/snva509a/snva509a.pdf

mafuyu · on March 13, 2024

From what I remember from school, the extremely rough rule of thumb, specifically for thermal effects on the silicon itself, is that +10C will halve the lifespan of the chip. When you try to push a chip with an OC, the power/perf gets highly nonlinear, so you end up making tradeoffs here. Chip vendors like Intel and AMD do a lot of testing and validation to pick power curves that will meet the warranty specs of the chip, but they do have some wiggle room.

There’s a whole bunch of other failure modes that aren’t captured by the 10C rule. It’s more for estimating chip failure due to things like electromigration. You can observe this if you run a desktop CPU overclocked for many years. I had a 2600k that I had to keep bumping the OC down on, and jt eventually bit the dust after a decade.

rodgerd · on March 13, 2024

I once attended a talk by one of the people at Weta Digital about how they ran their datacentres; they worked out that they could save six figure sums per month by running their aircon lower and blades hotter; HP were prepared to keep the blades in warranty for a three degree bump but no more.

techwiz137 · on March 13, 2024

Of course! These are like BGA or something similar, the solder will crack after some number of on/off/cool/heat cycles.

If voltage is raised we enter the realm of electromigration, though not sure how relevant it is for such a minuscule OC.

As for stability, yes. If the voltage is not sufficient there will be stability issues which will require further raising it, thus raising temps and requiring more power which you can't be sure if you could deliver. And then of course electromigration.

0x457 · on March 13, 2024

> Of course! These are like BGA or something similar, the solder will crack after some number of on/off/cool/heat cycles.

This was mostly happening during the transition to lead-free solder. Today, component should fail earlier than BGA.

latchkey · on March 13, 2024

My experience with running GPUs is that overclocking tends to go with undervolting and it has zero impact on longevity of the chips themselves. Other components like power supplies, with consumable or hand made things, like hand soldered components, are what end up failing.

We had cards in the worst of the worst environments and they ran fine for years on end.

bayindirh · on March 13, 2024

I'd not be so sure, actually. Because we have seen other processors on the systems, like RAID or Ethernet cards go "insane" after some years. No overheat, no physical stress, nothing. Normal if a bit too much (HPC) work.

Reboot the system, device just disappears, never to be never seen again. It generally starts after ~6 year mark.

Sometimes device starts to corrupt things silently, but not always. However they too disappear after some time.

Oh, sometimes GPUs do that, too.

magicalhippo · on March 13, 2024

If the device has a mean time to failure of say 5 years when running close to thermal limits, then running it at 20 degree C lower operating temperature turns that into 20 years as mentioned in the sibling comment[1].

Thus expected lifetime quickly becomes long enough that it's effectively not an issue for CPUs and GPUs if you provide sufficient cooling.

[1]: https://news.ycombinator.com/item?id=39697289

bayindirh · on March 15, 2024

> Thus expected lifetime quickly becomes long enough that it's effectively not an issue for CPUs and GPUs if you provide sufficient cooling.

Both yes and no.

I still have an old AMD Athlon XP system, which works at 2200MHz (200x11), which is completely out of spec for that generation of AMD systems (2200MHz parts had 166MHz bus), and it still performs as on day one since it's not overclocked and cooled well.

On the other hand, we change parts which fry because they feel like it even they are not even close to their thermal limits, because they're kept in well cooled data center.

Sometimes, things go bzzt even without extreme heat. It's really interesting. Something is working at full throttle with no problems, you update a couple of things, reboot, and the device is gone for good.

latchkey · on March 14, 2024

The point being that you don't know which component on the board failed. If you look at the GPU chip itself, it might be just fine and it was just a capacitor that blew.

bayindirh · on March 14, 2024

The GPUs we change arrive on a large board which hosts multiple GPUs with SXM interface. GPU itself arrives with its heat sink only, and we only change the GPU itself. Board is never replaced.

Same for the RAID card. The processor has a couple of failure modes (no cache or no card), both directly related to RAID processor itself. Again same for the Ethernet cards we fry. They lose their MAC addresses, all pointing to in silica problems.

latchkey · on March 15, 2024

Good point. I've also heard that there are fairly high rate of failures with high end nvidia stuff. I wonder why that is.

bee_rider · on March 13, 2024

> To achieve that 10% speedup, it ate up about 20% more power, so efficiency-wise, it's not worth it.

There’s a trade off between single threaded performance and power, right? I’d expect the increase in power cost to be between the performance increase squared or cubed. If you expect a one-to-one trade it is never worth it to increase frequency, haha.

The universe will give you throughput at a fair rate, but it is very stingy about latency, in general.

latchkey · on March 13, 2024

I ran 150,000 GPUs that were individually tuned for maximum performance.

Silicon lottery, where the chip was on the wafer (edges tend to be less reliable), manufacturing batches, component batches, heat, cooling, power supplies, etc... the list goes on and on...

It is understated how much all of this is a huge impactful thing on performance and stability.

smellf · on March 13, 2024

> RK3588-based SBC

Anyone know which SBCs use this chip?

geerlingguy · on March 13, 2024

Radxa Rock 5 model B, Turing Pi RK1, Orange Pi 5 (and Plus); there are a few others but those are the models I have purchased and tested. All are more efficient/faster... but also more expensive and less supported. Though RK3599 and 3588 SoCs have both been some of the most widely supported out of Rockchip for Linux applications. They still lack compared to Pi's support though.

mort96 · on March 13, 2024

The NanoPi R6C/R6S as well.

The rk3588 is a nice chip, but support just isn't there yet if you want to do anything with the GPU. The "Panthor" GPU driver, which is the FOSS driver which supports its GPU, was just merged in to Linux and mesa this month[1] (yay!) which means you're probably gonna have to build your own kernel if you want it.

The old mali proprietary driver is borderline unusable on anything remotely modern, only really working on Linux 5.10 and special X11 builds with legacy features re-enabled.

It's crazy that the rk3588 has been on the market for many years at this point and is just now starting to be usable on Linux, but it's exciting that things are taking shape.

[1] https://www.collabora.com/news-and-blog/news-and-events/rele...

rcarmo · on March 14, 2024

I’ve been tracking that and NPU support. Neither are… OK at this point.

lenerdenator · on March 13, 2024

> They still lack compared to Pi's support though.

This should be the central lesson learned from the Raspberry Pi by open-source projects.

There will be faster, there will be smaller, there will be cheaper. But if the user can go on the web and find the _exact_ thing they're looking to do spelled out, they'll buy that product, every time.

wmf · on March 13, 2024

Now imagine if RPi applied their magic to slightly newer hardware so there was no need to mess around with poorly-supported Allwinner/Rockchip/Mediatek boards.

sitzkrieg · on March 13, 2024

or if they actually open sourced anything... or if you could buy the broadcom chips directly, or or or

rcarmo · on March 14, 2024

I have been testing quite a few: https://taoofmac.com/space/blog/2024/02/10/2000

TehCorwiz · on March 13, 2024

My Pi 5 is still back ordered. :(

geerlingguy · on March 13, 2024

They've been on the shelf (well, behind the counter) at Micro Center for a month or so now, never seen them out of stock. It seems like all the models are available at one or two retailers at minimum now, via rpilocator.com.

Are you in the US, Canada, or EU? Outside of those places, there may still be some delays in getting stock to meet demand.

duffyjp · on March 13, 2024

Oof, $80 is encroaching on Aliexpress N100 Mini PCs and used "Tiny/Mini/Micro" territory.

I have the original and updated Pi Zero W-- unbelievable bargains at ~$15 but if I needed any horsepower I think I'd rather have an x86_64 so I can run whatever.

bobim · on March 13, 2024

Idle consumption made me lean toward a rk3588 vs a n100. Half the single thread performance though. Supported by dietpi, so no issue with the os.

teamonkey · on March 13, 2024

https://rpilocator.com/?cat=PI5

natebc · on March 13, 2024

I ordered one from pishop.us just last Friday that arrived via USPS on Monday morning. Might look around some and see if you can just order from a different vendor.

TehCorwiz · on March 13, 2024

Yeah, I ordered from Sparkfun. I've had good experiences with them in the past, but it's a little frustrating seeing it available elsewhere. I ended up reaching out about it.

techwiz137 · on March 13, 2024

For the time it took them to develop it, the performance is really lackluster. I would've definitely wanted an OrangePi 5 equivalent performance, but with the mature software of the RPI.

cute_boi · on March 14, 2024

One thing I don't understand why would people buy raspberry pi instead of mini pc. Mini PC seems to be small, better and more versatile than pi.

squarefoot · on March 14, 2024

Mini PCs don't come with GPIOs, which are one of the main reasons that make small SBCs a better choice for some uses. Not a big problem though as there are cheap add on boards bringing GPIO ports to every device. Example: https://www.hardkernel.com/shop/usb-io-board/

The downside is that code becomes slightly more complicated, the upside is that it makes easier to replace the Mini PC with a totally different one, or even emulate it in a VM then connecting the GPIO board using USB pass through.

Rethinking GPIO can also open more possibilities such as putting them beyond a network (say Arduino+ENC28j60 and similar ones for Ethernet, ESP32 for wireless, etc); of course having them outside the main CPU will imply some speed and latency issues, although I'm sure they would remain unnoticed for many non critical use cases.

My question would rather be why the RPi when there are better SBCs out there, but that is my personal taste and the Armbian and DietPi communities being more than enough for me.

cricalix · on March 14, 2024

You know, he has a video on that too. If you want GPIO, a tiny PC is not the answer (for example).

AnthonBerg · on March 14, 2024

I need to try one of these USB GPIO thingies. Add GPIO to, well, any computer?

Adafruit FT232H Breakout - General Purpose USB to GPIO, SPI, I2C - USB C & Stemma QT: https://www.adafruit.com/product/2264

Adafruit MCP2221A Breakout - General Purpose USB to GPIO ADC I2C - Stemma QT / Qwiic: https://www.adafruit.com/product/4471

moffkalast · on March 13, 2024

Speaking of Geerling testing Pi 5 things, is it just me or is it super weird that the uPCity PCIe breakout board hasn't been put on sale yet? I think it's almost half a year since he got it working in a pre-release state.

geerlingguy · on March 13, 2024

I keep pushing Pineberry Pi to release that—apparently they're still working on it—and 52Pi pre-announced something too...

But so far there's no straight PCIe expansion board available to purchase yet. It is a slimmer market than 'NVMe on top' or other more standard use cases, but it's one I think could expand as people do weird things with the Pi 5.

moffkalast · on March 14, 2024

Oh cool I haven't heard of that one, I see you have it listed: https://pipci.jeffgeerling.com/hats/52pi-pcie-x1-slot-hat.ht...

Looks way more compact and well thought out in terms of mounting so definitely seems like the better option once it's released.

tuetuopay · on March 13, 2024

I recently went on the lookout for this board and it is still unavailable. All I could find were M.2 adapters. I guess the market for full-size PCIe is tiny compared to the M.2 one.

Anyways, I ended up making my own and it works great. No fuss, no wait, no complications.

moffkalast · on March 13, 2024

Ah you mean going the Pi Port -> M.2 hat -> M.2 to PCIe riser route? I suppose that should work. Any bandwidth issues?

tuetuopay · on March 13, 2024

No, I literally made my own hand-soldered Pi Port -> PCIe adapter. That's to plug a 2x25Gbps programmable NIC, so no bandwidth needed because the NIC does all the work. All I needed was something to power up the nic :D

As for bandwidth, well, it's one lane of PCIe gen 2. This won't win any races but can be useful to access exotic hardware not available in usb or if you don't care about bandwidth. (e.g. HBA with many drives for mass storage without speed requirement).

stu2010 · on March 13, 2024

How much power is the Pi port capable of delivering, or are you sending additional power to the PCIe adapter from somewhere else?

What SmartNIC are you using? Most SmartNICs that I'm aware of suck a decent amount of power, many more require significant external airflow. Are you using the Mikrotik active cooled one? https://mikrotik.com/product/ccr2004_1g_2xs_pcie

tuetuopay · on March 13, 2024

The Pi port can deliver 5 or 10W max at 5V IIRC. So I'm not using it :D

The 12V comes from an external power supply through a barrel jack, from which I also derive the 3.3V rail. The Pi provides no power whatsoever. I should publish the design files somewhere.

As for the NIC, it's a Netronome Agilio-CX which is fully programmable using eBPF and such.