For what it's worth, I wonder the same thing and think it's not as obvious as ot...

heisenburgzero · on Oct 26, 2023

Thanks, that make sense.

I think my comment was not worded properly. I was thinking "geometry properties = linear properties", what I really should say is:

Why does the latent space has geometry properties where we could use functions like cosine similarity to compare?

So when training, the signal will be mapped to latent space that will minimize the error of the objective function as much as possible.

Many applications already use cosine similarity function at the end the network, it would be obvious why they work. I reviewed other cost functions such as Triplet Loss. They use euclidean distances, so I guess it make sense why the geometry properties exist too.

For "and there I guess the point is it's not maximally information dense, so the geometry exists in the redundancy", what does "maximally information dense" means, I still don't quite get it.