I'm late to the party, but what stops training this SD model on audio spectrogra...

donkarma · on Aug 30, 2022

openai did this but it doesn't sound great, I think it's because sound has less information so the brain is very picky

akomtu · on Aug 30, 2022

mp3 density: 30sec per 1MB (some instrumental music with background). jpg density: 12M pixels per MB (trees and some landscaping). I'd argue music has a lot more information, if we can compare seconds with pixels. Imho, OpenAI didnt do a great job: a small dataset and a limited model.