Nah. Least squares is necessarily optimal under a normal distribution, but that does not imply that it does not work if the distribution is not normal. It might even be optimal for some other distributions, I just cannot remember, and I have to watch the frying pan.
Least squares is disastrously bad for anything with a fat tail (e.g., power law decay). The reason is that in these cases one or two outliers will dominate the sum of the tails.
It's not optimal for any other distribution. For a general error distribution g(x), you want to maximize sum(log(g(x-x0))) (or equivalently prod(g(x-x0))) - this only turns to least squares if log(g(x)) = C + D(x-x0)^2.
I don't know of any references, most of this is just stuff that falls out pretty easily once you try and do the math, and that's probably faster than reading a book.
I.e., set up maximal likelihood, take logs, and you immediately get least squares for g(x) a gaussian. If g(x)=exp(-|x|), you get l1 minimization. Other distributions give you other things.