Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anybody experimenting with this for piano: I highly recommend the Google 'Onsets and Frames' algorithm as embodied in their demo:

https://piano-scribe.glitch.me/

I built something similar which is a lot faster but on a large scale test the google software handily outperforms my own (92% accuracy versus 87% or so, and that is a huge difference because it translates into ~30% fewer errors).



Wow the Onsets and Frames algorithm is insanely interesting. It's like a mixture of run-length encoding of (vertical and horizontal) strings of (0dim/time) and (1d/time) structures (onsets as points in time, activations as lines in time). But..hm.. why stop at such low dimensionality structures..! :^)


There has since been follow-up work to extend Onsets & Frames to multi-instrument music: https://magenta.tensorflow.org/transcription-with-transforme...


Shame that it uses quadratically scaling transformers - there are many sub-quadratic transformers that work quite well or better (https://github.com/lucidrains?tab=repositories) - because that 4 second sub-sample limitation seems quite unlike how I imagine most people experience music. Interesting, though. I wonder if I could take a stab at this..

Also interesting that the absolute timing of onsets worked better than relative timing - that also seems kinda bizarre to me, since, when I listen to music, it is never in absolute terms (e.g. "wow I just loved how this connects to the start of the 12th bar" vs "wow I loved that transition from what was playing 2 bars ago".

Another thing on relative timing.. when I listen to music, for me, very nuanced, gradual, and intentional deviations of tempo have significant sentimenal effects - which suggests to me that you need a 'covariant' description of how the tempo needs to change over time, so, not only do you need relative timing of events, you also need relative timing of the relative timing of events as well

Some examples:

- Jonny Greenwood's Phantom Thread II from the Phantom Thread soundtrack [0]

- the breakdown in Holy Other's amazing "Touch" [1], where the song basically grinds to a halt before releasing all the pent up emotional potential energy.

[0] https://www.youtube.com/watch?v=ztFmXwJDkBY, especially just before the violin starts at 1:04

[1] https://www.youtube.com/watch?v=OwyXSmTk9as, around 2:20


There are already tools that can estimate the variation of tempo over time (rubato). Librosa's "tempo" function does the job well for some types of music - it can even give you a 3D/heatmap plot with the likelihood of every tempo value at every moment: https://librosa.org/doc/main/generated/librosa.beat.tempo.ht...

Rubato is everywhere in classical music, and understanding rubato is an essential part of any automatic transcription system that aims to show you notes in musically meaningful units of time.


I would test whether a longer sub window actually helps before worrying too much about this. I doubt it matters in practice. Whether it's how people experience music or not is rather immaterial.


I just tried this and it works very very nicely! Thank you for sharing!


I just tried this with Sexy Sadie and the result was awful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: