Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm late to the party, but what stops training this SD model on audio spectrograms? Then you'd tell it "some mozart-style violin for 5 seconds, add drums in background." The spectrogram is then translated to sound and suddenly you're a very decent music writer.

With img2txt you could give it an audio file, call it "S" and tell "music in S style, but with flute".



openai did this but it doesn't sound great, I think it's because sound has less information so the brain is very picky


mp3 density: 30sec per 1MB (some instrumental music with background). jpg density: 12M pixels per MB (trees and some landscaping). I'd argue music has a lot more information, if we can compare seconds with pixels. Imho, OpenAI didnt do a great job: a small dataset and a limited model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: