Nah. Least squares is necessarily optimal under a normal distribution, but that ...

yummyfajitas · on May 17, 2015

Least squares is disastrously bad for anything with a fat tail (e.g., power law decay). The reason is that in these cases one or two outliers will dominate the sum of the tails.

It's not optimal for any other distribution. For a general error distribution g(x), you want to maximize sum(log(g(x-x0))) (or equivalently prod(g(x-x0))) - this only turns to least squares if log(g(x)) = C + D(x-x0)^2.

hessenwolf · on May 18, 2015

That sounds neat. Have you got a reference to it handy?

yummyfajitas · on May 18, 2015

I don't know of any references, most of this is just stuff that falls out pretty easily once you try and do the math, and that's probably faster than reading a book.

I.e., set up maximal likelihood, take logs, and you immediately get least squares for g(x) a gaussian. If g(x)=exp(-|x|), you get l1 minimization. Other distributions give you other things.

The general topic to investigate is robust regression: https://en.wikipedia.org/wiki/Robust_regression