I call bullshit. Researchers have been trying to make auto-parallelizing compilers for decades without anything substantive making it to industry.
Even in functional languages, auto-parallelization hasn't worked well because of the coarseness issue: it's difficult for compilers to figure out what to run in multiple threads and what to run single-threaded because of tradeoffs with inter-thread communication.
Have you used parallel extensions? They aren't always faster, and can often over-commit resources.
It's just like the GPU, you need massive amounts of data to overcome the communications overhead.
That F# can outperform C on numeric code should tell you that the optimizations available to functional code far exceed those available to languages that poke bits.
This is just one more micro-optimization to overcome what a bad idea it is to poke beads on an abacus rather than use mathematical rigor available in calculus.
Do you have a citation for this? I love F# and I've written some fairly pointer-intensive code with it, but the CLR's code-gen is pretty bloody abysmal[1]. F# does more optimizations than C#, but most of the heavy lifting is left up to the CLR/JIT (obviously).
In fact, I find I have to experiment with F#'s inline feature for high-perf code, because the CLR optimizer works far better on big functions than inlining/optimizing calls to smaller ones. Even simple stuff like removing a tuple allocation is not done. Even basic things like eliminating the allocation of a tuple when it's obvious you're immediately deconstructing it is not done. F# doesn't even lift lambdas, as of 2.0.
F# doesn't even do fusion like Haskell, so for high-perf code, you're often giving up nice functional code and resorting to loops and mutability. Just check the F# stdlib implementation.
So while I really do love F# and enjoy writing in it, "faster than C" is not really applicable. A: You'll be writing C-in-F# (which is fine, if most of your app isn't that way), B: the codegen isn't remotely competitive with a modern C compiler.
[1] Edit: OK, that's an exaggeration, but I mean in comparison to what you'll get out of a C compiler for equivalent, low-level code.
Well, Java has also been shown to outperform C on numeric code. I don't exactly see it as a winning declaration of garbage collection and dynamic compilation. I suspect it's the same way for those numeric benchmarks you have.
Yes, I have used them heavily and just like any other tool you need to know when to use it. You fully misunderstand my statement, and that is probably my fault for being succinct. I wasn't saying that parallelism is the one-all solution, but rather I wouldn't shrug the research off as fallacy when they have a history of delivering.
Another problem, even in functional code, is that data dependencies tend to be linear. It's difficult to write code that can be automatically parallelized.
Actually the problem is a bit more complicated than that. It is often quite easy to write a program with a large degree of parallelism in its data dependencies. The problem is to actually implement an execution strategy with sufficiently low overhead that it speeds up execution. Communication overheads and poor memory locality often overwhelm any gains due to parallel execution.
I second the call of bullshit. Though I've only skimmed the paper, I don't see any concrete performance numbers. I only see this:
"The group has written sev- eral million lines of code, including: core libraries (includ- ing collections with polymorphism over element permis- sions and data-parallel operations when safe), a webserver, a high level optimizing compiler, and an MPEG decoder. These and other applications written in the source language are performance-competitive with established implementa- tions on standard benchmarks; we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, includ- ing removing mutable static/global state, has not come at the cost of performance in the experience of the Microsoft team."
"we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, ...has not come at the cost of performance in the experience of the Microsoft team."
Well, sort of, but then the moment you run into hazards [1], bad branch predictions [2] or any other problems, the CPU will either stall the pipeline a few cycles or just flush the whole thing, so it's not like it's a magic solution just waiting to be adapted.
I think you're right about the industry part, but during a research internship I did two years ago I worked on SAC [1], which is a high-level array-based functional language and which does implicit parallelization. Okay, it's not really a general purpose programming language and more of a DSL for nurmerically intensive computation, but the auto-parallelization does work.
Nope. Note the use of "auto" in the OP's message. With GCD the developer has to manually indicate where parallelism happens. There are numerous other systems for doing that.
The hard problem is making the compiler automatically figure it out, and as OP says they haven't done so usefully (yet).
Even in functional languages, auto-parallelization hasn't worked well because of the coarseness issue: it's difficult for compilers to figure out what to run in multiple threads and what to run single-threaded because of tradeoffs with inter-thread communication.