That is actually possible. For example, someone wrote python code to do this for... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		SCHiM on Nov 20, 2023 \| parent \| context \| favorite \| on: Switch Transformers C – 2048 experts (1.6T params ... That is actually possible. For example, someone wrote python code to do this for the massive open source model BLOOM. However, it's still slow as tar. When I was running the BLOOM model I think my inference time was 1 token / m. See: https://towardsdatascience.com/run-bloom-the-largest-open-ac...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact