I've struggled with the motivation for this too, and over time I've come to the conclusion that there is a much better 1-sentence explanation:
Cost (in the real world) is generally a quadratic quantity.
If this isn't obvious, think of all the formulas you've seen for work (or energy), which is the ultimate notion of "cost" in the real world: W = mv^2/2, W = CV^2/2, W = ɛE^2/2, etc.
Furthermore, the nice thing about energy (a frequent notion of cost) is that it is independent of the basis you measure it in. Put another way, since energy is generally the squared length of a vector, its value won't change no matter how you look at the vector.
Obviously, we're not always trying to minimize work or energy, and our variables of interest aren't always the linear counterparts, but these are true surprisingly often, and so this is a nice motivation for defining cost to be quadratic.
Squaring the error makes sense to me ... having an approximation such that a sample is 2 units off is 4 times worse than having a sample 1 unit off. Building your approximation to minimize the square of the error makes more intuitive sense to me.
Cost (in the real world) is generally a quadratic quantity.
If this isn't obvious, think of all the formulas you've seen for work (or energy), which is the ultimate notion of "cost" in the real world: W = mv^2/2, W = CV^2/2, W = ɛE^2/2, etc.
Furthermore, the nice thing about energy (a frequent notion of cost) is that it is independent of the basis you measure it in. Put another way, since energy is generally the squared length of a vector, its value won't change no matter how you look at the vector.
Obviously, we're not always trying to minimize work or energy, and our variables of interest aren't always the linear counterparts, but these are true surprisingly often, and so this is a nice motivation for defining cost to be quadratic.