Ah, adventures in 'not quite as random as you thought' land.
1) I remember building a game in 90s where the locations of resources on a 2d game map were generated by successive calls to random() (and then were probably mod 1024ed or something). But, when we looked at the locations, they were clearly in a line pattern. Huh? After a while of confused debugging, we eventually discovered that the RNG was a linear congruential generator, which, when used the way we did to generate N-tuples, tends to produce points that tend to lie on a small number of N-1 dimensional surfaces! Oops.
2) I remember a big debate with a statistician at a company we were selling data analytics software to. We were putting random data elements into random groups to support certain types of statistical analysis. He was adamant that the randomization we were doing would introduce bias into the data and showed us similar tests he had done that "proved it". He told us that his company would be dropping our product because they couldn't trust it. I told him that there was no way bias was introduced and I took a closer look at his tests. It turns out the RNG in whatever tool he was using was just not very good and it had apparently shaken his very understanding of statistics :) Luckily, I told him that we were using MD5 for randomness (once bitten!) and that if he simply redid his experiments with MD5 he would see the bias disappear. To say he was skeptical would be an understatement, but, thankfully for our contract, his tests didn't actually crack MD5, all bias was gone, and we got a very generous email from the skeptic informing us that he would be trusting our software for everything from now on.
> After a while of confused debugging, we eventually discovered that the RNG was a linear congruential generator, which, when used the way we did to generate N-tuples, tends to produce points that tend to lie on a small number of N-1 dimensional surfaces! Oops.
Kudos to the skeptic at least for being willing to accept, and declare, that he was wrong in the first place.
There are many skeptics who would rather disregard other assertions, than to admit their own possible flawed testing/experimental/verification methodologies.
Indeed. It was many years ago but I clearly remember he used a lot of volume and language on the initial call. It was followed a few days later up by an equally-memorably note that started with 'HERE IS WHERE I EAT CROW' in all caps. The sales guys were sure we had lost the account :)
Thankfully the awful linear congruential generators of past have mostly been killed by now. Mersenne twister isn't ideal (kind of slow and uses a bunch of memory), but it is pretty solid. The xoshiro family also is pretty widely used in now for programming languages, and that's a major an improvement over lcg on all fronts (except roughly tied for speed).
Have they gotten one that implements well in a gpu yet? Last I read many people doing stochastic filters (resampling etc.) Were generating noise cpu side and shipping it over to the device. But that has been a while.
> I ended up implementing the Fowler–Noll–Vo 1a hash, which is known for trading some quality for increased performance. It was designed to be fast
This is a surprisingly resistant belief. It may have been true once, but today FNV is not even close to being a fast hash function. Check the SMHasher benchmarks:
FNV1a almost 2x slower than blake3, a cryptographic hash function! More than 10x slower than modern hashers such as Farm, City, Spooky, ..., all of which have far better statistical quality than FNV1.
There is no good reason to use an FNV function nowadays.
XxHash has some great benchmarks for various hash functions. FNV is still competitive for small inputs. Most hash functions are built to have high throughput for hashing hundreds of bytes or more. XxHash in particular has an explicit mode switch from "small data" to "big data" sizes around a couple hundred bytes (it varies by platform and compiler).
It's hard to do both small and big data correctly, and FNV is one of the few that optimizes for small data.
For moderately small data, larger than around 4 bytes, xxh3 beats it handily. But for extremely small sizes, FNV is still the winner.
Honestly people should probably just try to switch to xxh3 if performance is a concern, but FNV is certainly competitive for integer-size keys.
A function that mixes the bits of an integer, it's a major component in all modern hash functions (they read a few bytes from the input, add them to the state, mix the state, and continue).
The best reason to use FNV is its small codesize, which pays off in icache critical code. such as hash tables. that's why I added the hash tables benchmarks to smhasher.
cycles/map shows you that FNV1a is very good with hash tables, 4x faster than the blake3 monstrosity
Thanks for bringing this up! Someone replied mentioning implementation simplicity, and indeed that was an important factor that I forgot to mention in the post.
Although the most important factor may have been that FNV is just easier to find. Looking for a hash algorithm is hard, the names are weird and opaque and it's hard to find a user friendly ranking or clear directions on how to pick one.
See also "GPU Random Numbers via the Tiny Encryption Algorithm" by Zafar, Olano, and Curtis. HPG '10: Proceedings of the Conference on High Performance Graphics
If you'd like to see an implementation of something that uses Perlin noise, check out this generative art project I did: https://dannyking.uk/artwork/colororbs
Perlin noise is how the lines flow togeher rather than crossing each other during the animations.
I have a technical write up that explains how it works at the bottom.
Maybe I am missing something, but wouldn't a regular random number generator with a defined seed work just as well? If the problem is determinism, then using a deterministic seed should give the same result.
If all you want to do is to generate a texture by looping over all pixels in sequence, then yes, you can use a RNG. But in many use cases you need to evaluate a noise function at an arbitrary (x, y) position (or (x, y, z) or even higher dimension) which means you need to hash the coordinates to get the pseudorandomness you want.
A desirable property for this is to be able to generate only some part of the terrain, but if you're using a seeded PRNG then you have to generate the same bits at a time. You could solve this with chunking, but it's not trivial.
This same hash is often used in Windows malware to avoid using the names of functions that are commonly used for malicious purposes, like `WriteProcessMemory`.
"I couldn’t rely on simple randomness since terrain generation has to be deterministic, especially for multiplayer games."
You want the results to be repeatable. Now you could seed with something, but this has tradeoffs (e.g., you have to produce the same order of elements in order to get the same values out)
1) I remember building a game in 90s where the locations of resources on a 2d game map were generated by successive calls to random() (and then were probably mod 1024ed or something). But, when we looked at the locations, they were clearly in a line pattern. Huh? After a while of confused debugging, we eventually discovered that the RNG was a linear congruential generator, which, when used the way we did to generate N-tuples, tends to produce points that tend to lie on a small number of N-1 dimensional surfaces! Oops.
2) I remember a big debate with a statistician at a company we were selling data analytics software to. We were putting random data elements into random groups to support certain types of statistical analysis. He was adamant that the randomization we were doing would introduce bias into the data and showed us similar tests he had done that "proved it". He told us that his company would be dropping our product because they couldn't trust it. I told him that there was no way bias was introduced and I took a closer look at his tests. It turns out the RNG in whatever tool he was using was just not very good and it had apparently shaken his very understanding of statistics :) Luckily, I told him that we were using MD5 for randomness (once bitten!) and that if he simply redid his experiments with MD5 he would see the bias disappear. To say he was skeptical would be an understatement, but, thankfully for our contract, his tests didn't actually crack MD5, all bias was gone, and we got a very generous email from the skeptic informing us that he would be trusting our software for everything from now on.