A Simple Hash for Perlin Noise

Dave_Rosenthal · on May 10, 2022

Ah, adventures in 'not quite as random as you thought' land.

1) I remember building a game in 90s where the locations of resources on a 2d game map were generated by successive calls to random() (and then were probably mod 1024ed or something). But, when we looked at the locations, they were clearly in a line pattern. Huh? After a while of confused debugging, we eventually discovered that the RNG was a linear congruential generator, which, when used the way we did to generate N-tuples, tends to produce points that tend to lie on a small number of N-1 dimensional surfaces! Oops.

2) I remember a big debate with a statistician at a company we were selling data analytics software to. We were putting random data elements into random groups to support certain types of statistical analysis. He was adamant that the randomization we were doing would introduce bias into the data and showed us similar tests he had done that "proved it". He told us that his company would be dropping our product because they couldn't trust it. I told him that there was no way bias was introduced and I took a closer look at his tests. It turns out the RNG in whatever tool he was using was just not very good and it had apparently shaken his very understanding of statistics :) Luckily, I told him that we were using MD5 for randomness (once bitten!) and that if he simply redid his experiments with MD5 he would see the bias disappear. To say he was skeptical would be an understatement, but, thankfully for our contract, his tests didn't actually crack MD5, all bias was gone, and we got a very generous email from the skeptic informing us that he would be trusting our software for everything from now on.

pcwalton · on May 10, 2022

> After a while of confused debugging, we eventually discovered that the RNG was a linear congruential generator, which, when used the way we did to generate N-tuples, tends to produce points that tend to lie on a small number of N-1 dimensional surfaces! Oops.

Yep, this is the "Random Numbers Fall Mainly in the Planes" problem that the late George Marsaglia of xorshift fame wrote about in 1968: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC285899/

Great paper with one of the best titles I can remember :)

knighthack · on May 11, 2022

Kudos to the skeptic at least for being willing to accept, and declare, that he was wrong in the first place.

There are many skeptics who would rather disregard other assertions, than to admit their own possible flawed testing/experimental/verification methodologies.

Dave_Rosenthal · on May 11, 2022

Indeed. It was many years ago but I clearly remember he used a lot of volume and language on the initial call. It was followed a few days later up by an equally-memorably note that started with 'HERE IS WHERE I EAT CROW' in all caps. The sales guys were sure we had lost the account :)

nescioquid · on May 11, 2022

I really appreciate that you called this part out.

When you need to admit you are wrong, your vis-a-vis will trust you more as a result. When the shoe is on the other foot, you learn who you can trust.

adgjlsfhk1 · on May 10, 2022

Thankfully the awful linear congruential generators of past have mostly been killed by now. Mersenne twister isn't ideal (kind of slow and uses a bunch of memory), but it is pretty solid. The xoshiro family also is pretty widely used in now for programming languages, and that's a major an improvement over lcg on all fronts (except roughly tied for speed).

galangalalgol · on May 10, 2022

Have they gotten one that implements well in a gpu yet? Last I read many people doing stochastic filters (resampling etc.) Were generating noise cpu side and shipping it over to the device. But that has been a while.

ot · on May 10, 2022

> I ended up implementing the Fowler–Noll–Vo 1a hash, which is known for trading some quality for increased performance. It was designed to be fast

This is a surprisingly resistant belief. It may have been true once, but today FNV is not even close to being a fast hash function. Check the SMHasher benchmarks:

https://github.com/rurban/smhasher#smhasher

FNV1a almost 2x slower than blake3, a cryptographic hash function! More than 10x slower than modern hashers such as Farm, City, Spooky, ..., all of which have far better statistical quality than FNV1.

There is no good reason to use an FNV function nowadays.

ris · on May 10, 2022

> There is no good reason to use an FNV function nowadays.

Implementation simplicity?

Sometimes you need a hash function in an environment that's not a Real Programming Language.

vlmutolo · on May 11, 2022

XxHash has some great benchmarks for various hash functions. FNV is still competitive for small inputs. Most hash functions are built to have high throughput for hashing hundreds of bytes or more. XxHash in particular has an explicit mode switch from "small data" to "big data" sizes around a couple hundred bytes (it varies by platform and compiler).

It's hard to do both small and big data correctly, and FNV is one of the few that optimizes for small data.

For moderately small data, larger than around 4 bytes, xxh3 beats it handily. But for extremely small sizes, FNV is still the winner.

Honestly people should probably just try to switch to xxh3 if performance is a concern, but FNV is certainly competitive for integer-size keys.

https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...

ot · on May 11, 2022

For integer-size keys, a simple integer mixer will perform better than anything.

johndough · on May 11, 2022

What is an integer mixer?

ot · on May 11, 2022

A function that mixes the bits of an integer, it's a major component in all modern hash functions (they read a few bytes from the input, add them to the state, mix the state, and continue).

See for example http://jonkagstrom.com/bit-mixer-construction/index.html

When the input is an integer, they're very good hash functions on their own.

rurban · on May 12, 2022

The best reason to use FNV is its small codesize, which pays off in icache critical code. such as hash tables. that's why I added the hash tables benchmarks to smhasher.

cycles/map shows you that FNV1a is very good with hash tables, 4x faster than the blake3 monstrosity

thisismyswamp · on May 11, 2022

Thanks for bringing this up! Someone replied mentioning implementation simplicity, and indeed that was an important factor that I forgot to mention in the post.

Although the most important factor may have been that FNV is just easier to find. Looking for a hash algorithm is hard, the names are weird and opaque and it's hard to find a user friendly ranking or clear directions on how to pick one.

a_e_k · on May 10, 2022

See also "GPU Random Numbers via the Tiny Encryption Algorithm" by Zafar, Olano, and Curtis. HPG '10: Proceedings of the Conference on High Performance Graphics

Paper: https://www.csee.umbc.edu/~olano/papers/GPUTEA.pdf

Slides: https://highperformancegraphics.org/previous/www_2010/media/...

Lerc · on May 11, 2022

For my stack based texture generator I used (utilizing https://github.com/skeeto/hash-prospector )

  let intHash = x => {
    x *= 0xed5ad4bb;
    x ^= x >> 11;
    x = Math.imul(x,0xac4c1b51);
    x ^= x >> 15;
    x = Math.imul(x,0x31848bab);
    x ^= x >> 14;
    return x;
  }

With a little wrapper to blend it with a seed via intHash(x+inthash(x+seed))

Used in a Perlin flame here http://fingswotidun.com/stackie/?code=x1x-*5*dx4**y3*p%2By!-...

pcwalton · on May 10, 2022

FNV is slow because it only goes one byte at a time. An easy improvement for basically no extra complexity is the FxHash, which is popular in Rust for simple non-secure workloads: https://nnethercote.github.io/2021/12/08/a-brutally-effectiv...

Obligatory disclaimer: Don't use any of this in cryptographic contexts.

dannyking · on May 11, 2022

If you'd like to see an implementation of something that uses Perlin noise, check out this generative art project I did: https://dannyking.uk/artwork/colororbs

Perlin noise is how the lines flow togeher rather than crossing each other during the animations.

I have a technical write up that explains how it works at the bottom.

mleonhard · on May 11, 2022

Thanks for sharing that. Someday, an AI will want an eye-catching way to display its mood to others. I expect it will use images like your color orbs.

eerikkivistik · on May 10, 2022

Maybe I am missing something, but wouldn't a regular random number generator with a defined seed work just as well? If the problem is determinism, then using a deterministic seed should give the same result.

Sharlin · on May 10, 2022

If all you want to do is to generate a texture by looping over all pixels in sequence, then yes, you can use a RNG. But in many use cases you need to evaluate a noise function at an arbitrary (x, y) position (or (x, y, z) or even higher dimension) which means you need to hash the coordinates to get the pseudorandomness you want.

foota · on May 10, 2022

A desirable property for this is to be able to generate only some part of the terrain, but if you're using a seeded PRNG then you have to generate the same bits at a time. You could solve this with chunking, but it's not trivial.

thisismyswamp · on May 11, 2022

Indeed like others mentioned, it's about being able to generate terrain chunks starting from anywhere in the world.

A random number generator using the same seed would still have to follow the same steps for all users, meaning generating the world from the origin.

pcwalton · on May 10, 2022

Correct, you can use an RNG. It's no coincidence that many RNGs and hashes (and, for that matter, ciphers) have similar constructions.

w-ll · on May 10, 2022

sfc32 from http://pracrand.sourceforge.net/RNG_engines.txt works pretty well for when I need deterministic random.

A simple implementation in javascript https://stackoverflow.com/questions/521295/seeding-the-rando...

I use it for my generative art project https://bigf.art

ssklash · on May 10, 2022

This same hash is often used in Windows malware to avoid using the names of functions that are commonly used for malicious purposes, like `WriteProcessMemory`.

pubby · on May 10, 2022

I think I'm missing something. Why not call a PRNG for each array element? Seems like you could do this in 4 lines of code.

foota · on May 10, 2022

"I couldn’t rely on simple randomness since terrain generation has to be deterministic, especially for multiplayer games."

You want the results to be repeatable. Now you could seed with something, but this has tradeoffs (e.g., you have to produce the same order of elements in order to get the same values out)