e^loss. It's a bad name for a confusing concept: Loss. (e^loss is just another way of plotting loss, after all.)
Loss isn't the whole story -- the steepest slope during training often produces the worst quality language models. You want a nice, gentle downward slope.
SubsimulatorGPT2 (https://reddit.com/r/subsimulatorgpt2) continued to improve in terms of human evaluation even though the loss stayed flat for over a week.