I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.
I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?
Remember that you’ve also got a nice natural limitation here: if it’s a hobbyist and not a (commercial) API consumer, there’s only so fast they can listen to the output. Even if they’re rapidly tweaking nobs in a DAW, you can use the play/pause signal to help prioritize the queue, depending on how expensive it is to serialize the GPU state and rehydrate it again. You also might not need to complete generation until the user reaches the play point so you can shuffle around the queue a lot. For example if the user skips after ten seconds you might not need to generate the rest until they try to play that track again, and when they do you usually have enough time before they reach the previous stopping point to generate some more sections.
It might also be helpful to come up with some ways to segregate customers so that “prosumer” users get faster “cold starts” (so that they can iterate faster) at the expense of sometimes having to wait for generation to start back up again.