It's a cool property for sure, but in reality it's pretty difficult to implement...

psychphysic · on June 19, 2023

Can you explain this, how can it be indistiguishable from random noise? What does that even mean? And why is it even important/useful? Surely even if it looks like noise anyone can see that you are communicating with the DHT network?

rainsford · on June 20, 2023

The idea is that from the point of view of an observer able to see every byte of data your protocol puts on the network, there is no way for them to tell whether you are actually speaking the protocol or just exchanging random bytes. Basically this is an extension of the idea that encrypted data should look completely random to someone who doesn't know the encryption key, just applied to an entire network protocol. Achieving this means every single byte your protocol puts on the wire needs to be be encrypted with a key known only to the participants or entirely random. As you can imagine, it's pretty hard to actually do, which is why most protocols don't work this way.

While this is not a universally useful property, it can be valuable in situations where even being able to detect a device is using a particular protocol is a problem. A good example is using an anonymization network like Tor in a repressive country like China. Even if your data is protected by Tor, you probably don't want the authorities to know you're using Tor at all.

As you said, one of the biggest problems with this is that even if the protocol itself is perfect, it's not worth much if the network participants are known and communicating with them is itself evidence you're using the protocol. The solution to that would either be making the participants non-public and hard to discover (so an observer doesn't know you're talking to a network participant) or having participants do a lot of things other than participate in the network you're trying to hide. Tor for example takes the former approach with non-advertised "bridges" that you have to know about via some out of band method (e.g. someone emails one to you).

In practice, this indistinguishably property is becoming less useful even when it works given the ubiquity of "normal" encrypted protocols like SSL/TLS. Arguably just using TLS is far better than trying to look like random noise even if you're trying to hide, since random noise on a network is much less common than TLS and probably more of a red flag these days.

sohle · on June 20, 2023

This is a great explanation, and the point about TLS is well taken as well. If you want to go for that level of misdirection, then depending on your threat model you might consider e.g. using a remote proxy and wrapping your session to it in TLS.

The tricky thing about baking TLS in at the protocol level is that it brings its baggage with it. This is not necessarily a bad thing, but it makes the design more complex to reason about. In particular, it is arguably overkill when you're not planning on using certificates, as is the case here. Just compare the number of steps here: https://tls13.xargs.org/ to any of the patterns here: http://www.noiseprotocol.org/noise.html#interactive-handshak...

Another thing worth mentioning regarding indistinguishability from randomness is the impact of metadata. Even if the bytes you send on the wire look meaningless, there's still the size of the message, the spacing between messages, the time of day, etc. Any of these channels can carry signal, and it is very hard, if not impossible, to get rid of those signals completely.

That said, I still think the original goal of indistinguishability is worthwhile, because if you can force the passive adversary to move from perfectly accurate methods (e.g. fingerprinting message contents) to imperfect ones (e.g. guessing the protocol from message timing), that seems like a win to me.