It has more in common with GPT-3 than GPT-4 in terms of size, but in reality it'... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		HarHarVeryFunny on April 17, 2023 \| parent \| context \| favorite \| on: MiniGPT-4 It has more in common with GPT-3 than GPT-4 in terms of size, but in reality it's based on Vicuna/Llama which is 10x smaller than either, so as far as the LLM part of it goes its not mini-anything - it's just straight-up Vicuna 13B. The model as a whole is just BLIP-2 with a larger linear layer, and using Vicuna as the LLM. If you look at their code it's literally using the entire BLIP-2 encoder (Salesforce code). https://arxiv.org/pdf/2301.12597.pdf

fire on April 18, 2023 [–]

vicuna was done with sharegpt transcripts, right? did they ever say if those transcripts were while users were using gpt3.5 or gpt4.0?

HarHarVeryFunny on April 18, 2023 | [–]

I haven't read the details of how they created the training data.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact