> The mere fact of that seems to contradict our current understanding of entropy
What part of our understanding does it contradict? The second law of thermodynamics says that entropy increases with time; this seems entirely consistent with a low-entropy past.
One explanation for why the universe began in a low entropy state is that that state has a very low description length. (This is a bit of a truism, since description length is a measure of entropy). But basically, let's just imagine that the universe is a simulation, with an initial state described by an initialization routine that sets up the simulation to run. If the initial state has high entropy, that initialization routine would need to be very long and detailed to describe exactly the location of every electron, neutrino, etc. If the initial state has very low entropy, that initialization routine is very short. If there's a reason to think that a short program is "more probable" than any particular very long program, then that would explain a low-entropy initial condition.
Another explanation is that, if the universe random-walks through all possible configurations, the "past" will still always look lower-entropy than the "future", for any little life-form that occupies that universe, because that life-form's memories will be much more likely to be correlated with the lower-entropy state. (It would have been nice for the article to go into this detail, but it's rarely discussed).
Still another explanation is provided by Many-Worlds interpretation of QM. Again the "big bang" is akin to initializing the wavefunction of the universe to something very simple and compact like a constant function, which as a whole evolves unitarily; the complexity and increasing entropy arises within particular branches of that wavefunction, where an observer requires an ever-longer description length to identify their particular branch.
A higher entropy state has a longer description length.
For example, let's say I have a magic electron microscope that can scan and record the exact position and velocity of each particle in some 1-cubic-micron volume, to within Heisenberg uncertainty limits and some finite digitization precision.
If my sample is a 1-cubic-micron volume of flawless monocrystalline silicon at 0 Kelvin, I can 'zip' my recording and transmit that description in a much shorter sentence (in fact, I just sent it to you!) than if my sample is a cubic micron of room-temperature saltwater (whose macrostate I just described, but whose microstate I did not).
If you care about describing the details, you can compress your description better if it's a low-entropy state.
But of course, cosmology is full of more mundane explanations about how the limit of the possible entropy of the universe can grow with time, so a high-entropy state suddenly has a lot of room to increase even further.
What part of our understanding does it contradict? The second law of thermodynamics says that entropy increases with time; this seems entirely consistent with a low-entropy past.
One explanation for why the universe began in a low entropy state is that that state has a very low description length. (This is a bit of a truism, since description length is a measure of entropy). But basically, let's just imagine that the universe is a simulation, with an initial state described by an initialization routine that sets up the simulation to run. If the initial state has high entropy, that initialization routine would need to be very long and detailed to describe exactly the location of every electron, neutrino, etc. If the initial state has very low entropy, that initialization routine is very short. If there's a reason to think that a short program is "more probable" than any particular very long program, then that would explain a low-entropy initial condition.
Another explanation is that, if the universe random-walks through all possible configurations, the "past" will still always look lower-entropy than the "future", for any little life-form that occupies that universe, because that life-form's memories will be much more likely to be correlated with the lower-entropy state. (It would have been nice for the article to go into this detail, but it's rarely discussed).
Still another explanation is provided by Many-Worlds interpretation of QM. Again the "big bang" is akin to initializing the wavefunction of the universe to something very simple and compact like a constant function, which as a whole evolves unitarily; the complexity and increasing entropy arises within particular branches of that wavefunction, where an observer requires an ever-longer description length to identify their particular branch.