The killer app in this space will be when someone figures out how to extract a vocal model from existing recordings of singers. Vocaloid already synthesizes singing quite well, but a human singer has to go into a studio and sing a long list of standard phrases to build the singer model. The next step will be to feed existing singing into a system that extracts a model usable for synthesis.
I can imagine a near future whereby an RNN is trained to have an especially lovely reading voice. A large number of people might actually become convinced that the RNN is a person and develop empathy towards it.
It leads me to wonder why we haven't seen (or at least heard about the development of) an operating system with purely voice "UI" based on an RNN... Especially after Her came out.
I understand it's hard but it also sounds like a fun project for people with the relevant know-how.
Andrew Ng gave a keynote at GTC in which he talked about bringing Baidu's Deep Speech technology to phones (for accessibility). You betcha they're working on it!
Counter-point: The RIAA would so love to eliminate the human element from the process. As in, the most profitable artists are the dead ones. If you've read about hit-machine-in-human-form Max Martin, then you're aware the guy is a total obsessive with an approach (balanced lines, best vocal lines, etc) and if he could be in charge of a fake-singer by way of programming, then I think he'd be unstoppable.
Combine that with the extremely realistic 3D renderings available today and you'd be able to make whatever VR... fantasies you wanted. Fun and exciting.
No, you can't copyright a voice. That's come up with "cover" bands. A cover band can sound like the original; they just can't claim to be the original. ("Compare to the ingredients in Elvis").
A cover band has to license the underlying composition, but not the recording they're covering. (This means ASCAP gets royalties, but the RIAA does not.) In the US, there's a compulsory license for compositions, and you can record and distribute any song by paying a relatively modest fee set by law.
This is just automating the cover band industry.
In ten years or less, this will be a common feature in DJ consoles, and we'll hear songs from musician A as if performed by musician B.
Nice. Mechanical turks might be a cheaper option at the creativity level I see in many Fortune 500 companies' advertisements (see IBM's boardroom ads).
The killer app in this space will be when someone figures out how to extract a vocal model from existing recordings of singers. Vocaloid already synthesizes singing quite well, but a human singer has to go into a studio and sing a long list of standard phrases to build the singer model. The next step will be to feed existing singing into a system that extracts a model usable for synthesis.
The RIAA is so going to hate this.