Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Rumor has it that they weren't trained "from scratch" the was US would, i.e. Chinese labs benefitted from government "procured" IP (the US $B models) in order to train their $M models. Also understand there to be real innovation in the many-MoE architecture on top of that. Would love to hear a more technical understanding from someone who does more than repeat rumors, though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: