Right, but if the universe of possible characters grow much larger than the wind...

twic · on Dec 13, 2022

So you use a sparse bitmap, ie a hash set. The size of the hash set is bounded to the size of the window, so this should be pretty efficient.

If you have a huge string, a huge alphabet, and also a huge window, then probabilistic techniques start to make sense. But let's not run while we can successfully walk!

thomasahle · on Dec 13, 2022

Sure. We can always use a hash set, but then you need to use chaining or some other collision resolution policy. The whole of this whole exercise is to avoid that and get something really fast and truly constant time :-)

remram · on Dec 16, 2022

In that case, you might still want to use a similar trick as a first pass (like a bloom filter), and go over the window that is potentially unique again with the HashSet. This should still be faster than using the HashSet for all windows.

https://en.wikipedia.org/wiki/Bloom_filter