> I think Stable Diffusion was first to release a SOTA model (August 2022) that worked locally, not in language but image generation, but it set the tone for Meta. LLaMA only came in February 2023.
Sure, I was only covering LLMs though. If I wanted to cover image generation models and tools as well, the comment would be double its size.
> Mistal is not a finetune of LLaMA, it is a model trained from scratch. Also, Mistral was most of the time better than LLaMA during this period.
Oh, that's right. Iirc it was just the Llama 2 architecture that was used with sliding window attention.
> Quantization does not improve accuracy, except if you trade off precision for longer context maybe, but not on similar prompts. It is like JPEG compression, the original is always better for a specific image, but for the same byte size you get more resolution from JPEG than say... a PNG.
I'm well aware of how quantization works. I meant quantization methods were increasingly able to retain accuracy. Such as methods which quantize less important weights more heavily, improving accuracy for the same LLM size.
Sure, I was only covering LLMs though. If I wanted to cover image generation models and tools as well, the comment would be double its size.
> Mistal is not a finetune of LLaMA, it is a model trained from scratch. Also, Mistral was most of the time better than LLaMA during this period.
Oh, that's right. Iirc it was just the Llama 2 architecture that was used with sliding window attention.
> Quantization does not improve accuracy, except if you trade off precision for longer context maybe, but not on similar prompts. It is like JPEG compression, the original is always better for a specific image, but for the same byte size you get more resolution from JPEG than say... a PNG.
I'm well aware of how quantization works. I meant quantization methods were increasingly able to retain accuracy. Such as methods which quantize less important weights more heavily, improving accuracy for the same LLM size.