Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm surprised transfer learning via fine-tuning large transformer models hasn't taken off more in the public consciousness a la image recognition models. In my experience, the results can be staggering with very small amounts of training data.


It has taken off. Pretty much everyone and their grandma has fine tuned gpt-2 to generate all kinds of stuff, even poetry: https://www.gwern.net/GPT-2


It's a very new thing. Pretrained ImageNet models were first released around 2011. Pretrained transformer models have only been released recently.

Also, transformers are only useful for short text, not full documents (AFAIK).


are transformer models are the text version of imagenet model? The first time I am hearing this term.


Google (and I assume most ML providers) offers transfer learning on image models: https://cloud.google.com/vision/automl/docs/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: