Here's a summary of what's happened the past couple of years and what tools are ...

visarga · on July 17, 2024

Overall a good write up, but I have a few quips:

> Awhile later Meta released LLaMA[1],

I think Stable Diffusion was first to release a SOTA model (August 2022) that worked locally, not in language but image generation, but it set the tone for Meta. LLaMA only came in February 2023.

> The company Mistral had proven itself in the past with very impressive LLaMA finetunes

Mistal is not a finetune of LLaMA, it is a model trained from scratch. Also, Mistral was most of the time better than LLaMA during this period.

> Quantization techniques improved which meant LLaMA was able to run with less and less RAM with greater and greater accuracy

Quantization does not improve accuracy, except if you trade off precision for longer context maybe, but not on similar prompts. It is like JPEG compression, the original is always better for a specific image, but for the same byte size you get more resolution from JPEG than say... a PNG.

nabakin · on July 17, 2024

> I think Stable Diffusion was first to release a SOTA model (August 2022) that worked locally, not in language but image generation, but it set the tone for Meta. LLaMA only came in February 2023.

Sure, I was only covering LLMs though. If I wanted to cover image generation models and tools as well, the comment would be double its size.

> Mistal is not a finetune of LLaMA, it is a model trained from scratch. Also, Mistral was most of the time better than LLaMA during this period.

Oh, that's right. Iirc it was just the Llama 2 architecture that was used with sliding window attention.

> Quantization does not improve accuracy, except if you trade off precision for longer context maybe, but not on similar prompts. It is like JPEG compression, the original is always better for a specific image, but for the same byte size you get more resolution from JPEG than say... a PNG.

I'm well aware of how quantization works. I meant quantization methods were increasingly able to retain accuracy. Such as methods which quantize less important weights more heavily, improving accuracy for the same LLM size.

psychoslave · on July 16, 2024

This is one one the most useful and informative comment I ever faced on HN. Thank you very much.

iAmAPencilYo · on July 16, 2024

Thank you! Very helpful as a newbie coming in.

holoduke · on July 16, 2024

Great info. Do you also know the state of the code assistants? Any thoughts on copilot versus others?

hobofan · on July 16, 2024

All the main IDE-integrated ones seem very much on par (Copilot, Sourcegraph Cody, Continue.dev), with cursor.sh liked by some as it has code assistant-first UI.

I've personally went back to the browser with Claude 3.5 Sonnet (and the projects + artifacts feature), as it is one of the most industrious ones, and I really like the UX of artifacts + it integrates new code well into existing code you paste into it.

In the end I think it also often comes down to what languages/frameworks you are using and how well the LLM/product handles it, so I'd still recommend to test around. E.g. some of the main frameworks I'm working with on a daily basis went through big refactors/interface changes 1-2 years ago, and I stopped using ChatGPT because it had a strong tendency to produce code based on the old interfaces/paradigms.

Aider[0] is also quite interesting, especially when it comes to more significant refactorings in the codebase and has gotten quite good with that with the last few bigger model releases, but it takes same time to get used to and doesn't have good IDE-integration.

[0]: https://github.com/paul-gauthier/aider

nabakin · on July 16, 2024

I've been following the state of things, but I'm not sure which ones are the best. There's Meta's CodeLlama[1], Mistral's Codestral[2], DeepSeek AI's DeepSeek-Coder-V2-Instruct[3], CodeGemma[4], Alibaba's CodeQwen[5], and Microsoft's WizardCoder[6].

I'm pretty sure CodeLlama is out of date now. I've heard DeepSeek LLMs are good and DeepSeek-Coder-V2-Instruct was released recently. With the good reputation and its massive size (236b) I'd guess it is the best coding LLM, but if it's not being trained efficiently, maybe Codestral and Codestral Mamba come close.

I don't think the best coding LLMs are close to GitHub Copilot but I could be wrong since I'm just relaying information that I've heard secondhand.

[1] https://ai.meta.com/blog/code-llama-large-language-model-cod...

[2] https://mistral.ai/news/codestral/

[3] https://github.com/deepseek-ai/DeepSeek-Coder-V2

[4] https://developers.googleblog.com/en/gemma-family-expands-wi...

[5] https://qwenlm.github.io/blog/codeqwen1.5/

[6] https://github.com/nlpxucan/WizardLM

attentive · on July 17, 2024

try THUDM/codegeex4-all-9b

ygouzerh · on July 17, 2024

Wow very useful comment, thank you very much for all the work to write it!