It's not an issue of warmup time, it's an issue of jit compilation.
On my server (AMD EPYC 7252):
1) base time of the java program from the repo is 3.23s (which is ~2 worse than the one in linked page, so I assume my cpu is about 2 slower, and corresponding best c++ result will be ~450ms
2) if you count from inside of java program you get 3.17s (so about 60ms of overhead)
3) but if you run it 10 times (inside of same java program) you cut this time to 1570ms
It's still much slower than c++ version, but it's between rust and go. And this is not me optimizing something, it's only measuring things correctly.
update: running vector version of java code from same repo brings runtime to 392ms which is literally fastest out of all solutions including c++.
update2: ran c++ version on same hardware, it takes 400ms, so I would say it's fair to say c++ and vectorized java are on par (and given "allows vectorization" comment in cpp code I assume that's the best one can get out of it).
On my server (AMD EPYC 7252): 1) base time of the java program from the repo is 3.23s (which is ~2 worse than the one in linked page, so I assume my cpu is about 2 slower, and corresponding best c++ result will be ~450ms 2) if you count from inside of java program you get 3.17s (so about 60ms of overhead) 3) but if you run it 10 times (inside of same java program) you cut this time to 1570ms
It's still much slower than c++ version, but it's between rust and go. And this is not me optimizing something, it's only measuring things correctly.
update: running vector version of java code from same repo brings runtime to 392ms which is literally fastest out of all solutions including c++.
update2: ran c++ version on same hardware, it takes 400ms, so I would say it's fair to say c++ and vectorized java are on par (and given "allows vectorization" comment in cpp code I assume that's the best one can get out of it).