I call bullshit. Researchers have been trying to make auto-parallelizing compile...

jotato · on Dec 4, 2012

Look at the authors of the paper. A portion of them created the Parallel Extensions for dotnet. I would assume true over false on this

_3u10 · on Dec 4, 2012

Have you used parallel extensions? They aren't always faster, and can often over-commit resources.

It's just like the GPU, you need massive amounts of data to overcome the communications overhead.

That F# can outperform C on numeric code should tell you that the optimizations available to functional code far exceed those available to languages that poke bits.

This is just one more micro-optimization to overcome what a bad idea it is to poke beads on an abacus rather than use mathematical rigor available in calculus.

MichaelGG · on Dec 4, 2012

"That F# can outperform C on numeric code"

Do you have a citation for this? I love F# and I've written some fairly pointer-intensive code with it, but the CLR's code-gen is pretty bloody abysmal[1]. F# does more optimizations than C#, but most of the heavy lifting is left up to the CLR/JIT (obviously).

In fact, I find I have to experiment with F#'s inline feature for high-perf code, because the CLR optimizer works far better on big functions than inlining/optimizing calls to smaller ones. Even simple stuff like removing a tuple allocation is not done. Even basic things like eliminating the allocation of a tuple when it's obvious you're immediately deconstructing it is not done. F# doesn't even lift lambdas, as of 2.0.

F# doesn't even do fusion like Haskell, so for high-perf code, you're often giving up nice functional code and resorting to loops and mutability. Just check the F# stdlib implementation.

So while I really do love F# and enjoy writing in it, "faster than C" is not really applicable. A: You'll be writing C-in-F# (which is fine, if most of your app isn't that way), B: the codegen isn't remotely competitive with a modern C compiler.

[1] Edit: OK, that's an exaggeration, but I mean in comparison to what you'll get out of a C compiler for equivalent, low-level code.

duaneb · on Dec 4, 2012

Well, Java has also been shown to outperform C on numeric code. I don't exactly see it as a winning declaration of garbage collection and dynamic compilation. I suspect it's the same way for those numeric benchmarks you have.

jotato · on Dec 4, 2012

Yes, I have used them heavily and just like any other tool you need to know when to use it. You fully misunderstand my statement, and that is probably my fault for being succinct. I wasn't saying that parallelism is the one-all solution, but rather I wouldn't shrug the research off as fallacy when they have a history of delivering.

DanWaterworth · on Dec 4, 2012

Another problem, even in functional code, is that data dependencies tend to be linear. It's difficult to write code that can be automatically parallelized.

jules · on Dec 4, 2012

Actually the problem is a bit more complicated than that. It is often quite easy to write a program with a large degree of parallelism in its data dependencies. The problem is to actually implement an execution strategy with sufficiently low overhead that it speeds up execution. Communication overheads and poor memory locality often overwhelm any gains due to parallel execution.

tkellogg · on Dec 4, 2012

Great points. I recommend actually reading the paper behind the blog post (http://research.microsoft.com/pubs/170528/msr-tr-2012-79.pdf). I guarantee it'll blow your mind how well the system works. It's much more compete than you imagined.

mikhael · on Dec 4, 2012

I second the call of bullshit. Though I've only skimmed the paper, I don't see any concrete performance numbers. I only see this:

"The group has written sev- eral million lines of code, including: core libraries (includ- ing collections with polymorphism over element permis- sions and data-parallel operations when safe), a webserver, a high level optimizing compiler, and an MPEG decoder. These and other applications written in the source language are performance-competitive with established implementa- tions on standard benchmarks; we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, includ- ing removing mutable static/global state, has not come at the cost of performance in the experience of the Microsoft team."

tkellogg · on Dec 4, 2012

You forgot to read all the text you just copied:

    "we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, ...has not come at the cost of performance in the experience of the Microsoft team."

Stop trolling

DanWaterworth · on Dec 4, 2012

That's another problem, but writing code that can be parallelized is also non-trivial, especially if you are used to using lists excessively.

7952 · on Dec 4, 2012

Surely auto-parallelization is exactly what CPU pipelining and branch prediction do quite effectively?

benzor · on Dec 4, 2012

Well, sort of, but then the moment you run into hazards [1], bad branch predictions [2] or any other problems, the CPU will either stall the pipeline a few cycles or just flush the whole thing, so it's not like it's a magic solution just waiting to be adapted.

[1] http://en.wikipedia.org/wiki/Pipeline_hazard [2] http://en.wikipedia.org/wiki/Branch_predictor

DanWaterworth · on Dec 4, 2012

Yes, but being inside the CPU means that it can be much finer grained which makes the problem much easier.

DannyBee · on Dec 4, 2012

Actually, fortan has had successful auto-parallelizing compilers in industry for many decades.

Same with C/C++ I'm not sure what you mean "without anything substantive making it to industry".

p4bl0 · on Dec 4, 2012

I think you're right about the industry part, but during a research internship I did two years ago I worked on SAC [1], which is a high-level array-based functional language and which does implicit parallelization. Okay, it's not really a general purpose programming language and more of a DSL for nurmerically intensive computation, but the auto-parallelization does work.

[1] http://www.sac-home.org/

suhastech · on Dec 4, 2012

Should this qualify?

http://en.wikipedia.org/wiki/Grand_Central_Dispatch

rogerbinns · on Dec 4, 2012

Nope. Note the use of "auto" in the OP's message. With GCD the developer has to manually indicate where parallelism happens. There are numerous other systems for doing that.

The hard problem is making the compiler automatically figure it out, and as OP says they haven't done so usefully (yet).