Agreed! Those will be much more fun and we plan to support that. However, right ...

akrymski · 2025-03-09T18:49:20 1741546160

But this is not easy, it's the real challenge here as there are lots of text-to-audio models out there. It is far from solved for Stable Diffusion as well. ControlNet is pretty bad. Just try taking the photo of an empty room and asking an image model to add furniture. Or to change a wall colour. Or to style an existing photo as per the style of another and so on. We are very far from being able to truly control the output generated by the AI models, which is something that a DAW excels at. I'd start with an AI-powered DAW rather than text-to-audio and try to add controls to it. It's like Cursor vs Lovable if you get my drift.