Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Chris Lattner on Swift and dynamic dispatch (swift.org)
183 points by _quhg on Dec 13, 2015 | hide | past | favorite | 81 comments


The part of this (really excellent) writeup that caught my attention was this:

someone writing a bootloader or firmware can stick to using Swift structs and have a simple guarantee of no dynamic overhead or runtime dependence

There seems to be a pretty strong implication here that Swift could be used to write firmware/bootloaders, and other low level constructs - including operating systems. Has anyone worked with Swift yet on that type of project?


Except for the minor part that it's not true.

The performance variations in Swift are (as of now) much larger and less predictable than the overhead of dynamic messaging in Objective-C, and the latter can always be removed by using IMP-caching or converting to a C function.


The claim is not about Swift in general, but of a limited subset of the language: "someone writing a boot loader or firmware can stick to using Swift structs and have a simple guarantee of no dynamic overhead or runtime dependence"

Based on what I know about Swift (reading Apple's manual, a tiny bit of experimenting, and decent knowledge about how compilers work), I would expect Swift to compile purely statically, like C, for this case. Performance may still be lower because the compiler is newer, but I wouldn't expect performance variations.

And of course, the standard library may not be optimal for this use case. The implementation of strings, for example, may be too dynamic for embedded work.


if you're going to limit yourself to a subset of a language, then any "language" can offer that guarantee. but it's really not that language any longer


My measurements were also with a limited subsets: integers, floats and arrays.

Depending on compiler optimization settings, you get from 30% to 1000x (yes, 3 orders of magnitude!) slower than C. Yes, technically that's not "dynamic" overhead, but from where I stand that's still orders of magnitude more variability and unpredictability (for reasonable values of "predictability") than even byte-coded Smalltalk.


Unbounded integers are much slower than wrap-around integers that C uses. So, unless you are comparing some kind of Swift int64 type with C ints, you are comparing apples to oranges. The reason is that for every "x + 1" expression, Swift has to ensure that the value doesn't overflow the size of a hardware register. C of course has no such qualms.


So integers are also not in the 'restricted subset'? Like arrays? Ooookay...

Anyway, Swift's integers are not unbounded, just checked.


I don't know. Apparently integer arithmetic can cause overflow exceptions which I assume requires the runtime environment which would be outside of the restricted subset. I don't think Swift can be used to profitably write a bootloader but we'll see.


The operation + on integer types maps to a built in method which performs addition with overflow checks. At the assembly level this looks like an "add" followed by a "jo" (jump on overflow set) to an undefined exception such as UD2.

If you want wraparound addition you can use +& which removes the overflow check.


Exceptions can't be caught in Swift, so not much runtime needed except an interface to kill(2) or plain old exit. :-)


If you're using arrays, you've departed from the limited subset in question. They're structs on the outside but complicated on the inside.


I wonder why the designers of Swift decided to make arrays structs on the outside but complex on the inside. Why not make them objects, i.e. reference types, as in Java and .NET?


Because having collections be value types makes them really nice to work with.


I've seen increasing levels of unpredictability at a surface level with all languages as compilers have improved. The difference between a constant expression being folded or not can be easily of the order of 1000x (once it has been propagated through loops etc.). If you've still got the source for your tests then it's well worth looking at the IR emitted and the assembly produced from that to see where those differences are coming from.


Yes, with enough information, sometimes the compiler can compute the result of a loop and then discard the loop itself.

I made sure this wasn't happening.


I'd be very interested to know what that benchmark is, because you're either doing something very wrong, or it isn't relevant to the systems programming topic being discussed. A 1000x perf difference should be easy to show.


Just summing floats in an array, compiled without optimization, and yes, that surprised me.

My Postscript interpreter written in Objective-C was 2x faster. :-)

With optimization that went down to around 10x and wouldn't budge, even with -Ounchecked, UnsafeMutablePointer, varying an indexed for-loop and reduce. As I wrote, it's not just the absolute perf., it's the variability and unpredictability.


>Just summing floats in an array, compiled without optimization,

Ok, so you're both doing it wrong and doing something irrelevant to that level of systems programming. Measuring the performance of non-optimized builds will always be pretty meaningless (even C compilers can have radical perf swings from release to release at -O0), and reporting that as a rebuttal of whether swift is relevant for "bootloader or firmware" is just outright trying to mislead people.

> With optimization that went down to around 10x and wouldn't budge, even with -Ounchecked, UnsafeMutablePointer, varying an indexed for-loop and reduce.

Beyond what you said, I suspect that all of these impressions were formed with Swift 1. Believe it or not, things have improved a lot since then. On a "add array of floats" benchmark, the vectorizer kicking in or not is a 4x or more performance difference. This should be fixed in Swift 2, but more to the point is a benchmark irrelevant to "boot loaders and firmware".

-Chris


> doing something irrelevant to that level of systems programming

Except: no. I reported the float-array result because it was literally the 2nd test I ran and hadn't gotten around to integer arrays yet. I was expecting this to be completely doozy, one of the things Swift should be acing by now. Of course I am aware that you wouldn't use floats in systems programming, but as you should know, floats today are simple machine-level scalars that present little if any overhead to the CPU compared to integers. That is, this test uses a few basic machine operations: indexed memory access, loops, scalar arithmetic.

Whether those scalars are floats or integers should be largely irrelevant to the outcome of the test and therefore the test is a good enough (if not perfect) proxy for the types of operations you would see in systems programming.

Are you seriously claiming that the results would be different if the scalars in question had been integers instead of floats? Seriously??

If you had made such a claim, you'd be wrong: I've now run the same test with integer arrays and the result is exactly the same. Next you are going to tell me that integers are irrelevant to systems programming?

> [non-optimized builds] and reporting that as a rebuttal of whether swift is relevant for "bootloader or firmware" is just outright trying to mislead people.

No, it is you who is misleading people by completely mischaracterizing what I wrote. I wrote that performance varies from 30% worse to 1000x worse "depending on optimization level". This is exactly what is happening. (Of course, for this precise test it is 10x worse to 1000x worse, but who's counting?)

Claiming that non-optimized builds are irrelevant is also disingenuous at best, because non-optimized is the default setting for non-release builds in Xcode. Having this type of performance (variance) means that debug builds are effectively unusable: an operation that would take 100ms in a release build would take close to two minutes in a debug build.

Yes, other languages also have different performance levels between debug and non-debug builds, but size matters. 2-10x is something you can deal with, 1000x is not.

Furthermore, the problem is not so much the 1000x worse performance (although that is funny), it is the difference between optimized and non-optimized builds. The optimizer is not part of the language spec, meaning you can't rely on it. Next compiler release the compiler can change a little bit so it won't optimize one part of the code that you needed to run fast. Or you change the code a little bit in a way that the optimizer doesn't like and the effect is the same. And of course the compiler doesn't tell you what optimizations ran or did not run, so you are dealing with the blackest of black boxes. One day your code is fast, the other day it is not.

This is not predictable performance, and today predictability is often more important than raw speed for performance work. See my Jitterdämmerung post[1] or The Death of Optimizing Compilers[2].

> Beyond what you said, I suspect that all of these impressions were formed with Swift 1

You suspect wrong, all this was using the newest Xcode tools (7.1.1 previously, just repeated with 7.2):

    Apple Swift version 2.1.1 (swiftlang-700.1.101.15 clang-700.1.81)
    Target: x86_64-apple-darwin15.2.0
And by suspecting wrong, you kinda make my point for me, so thank you, Chris! Swift's performance model is so amazingly unpredictable that even the language's creator, compiler- and optimizer-wizard extraordinaire can't figure it out. How are mere mortals supposed to do so?

When doing systems programming and embedded programming, predictability is paramount. I remember the story of the railway engineer who scoured the planet for remaining stocks of 8085 processors, because he thought he understood the bugs in that particular processor.

Now sure, you certainly can use Swift for these types of tasks despite all the issues. But I suspect you'd be better off using FORTH.

Marcel

[1] http://blog.metaobject.com/2015/10/jitterdammerung.html

[2] http://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf


Gist?


‘It’s designed to scale from “hello, world” to an entire operating system.’

https://developer.apple.com/library/ios/documentation/Swift/...


"Swift is a successor to both the C and Objective-C languages."

https://developer.apple.com/swift/

It is only a matter of trying and seeing what might still be missing in terms of toolchain.


People interested in Swift at this level should also read Lattner’s comment on /r/rust about the possibility of extending Swift to include Rust’s statically checked ownership and borrowing concepts:

https://www.reddit.com/r/rust/comments/3vadg8/swift_is_open_...


If you read that though, it is not something that's a high priority for the team. Basically, 90% of devs won't need or want some of these features. Unless Apple decides to have a focused effort around recoding the core of OS X into Swift which would require these changes, it's hard to see when the language would gain these options.

Don't get me wrong, Swift is an awesome language, but I see Rust as maintaining a position as a systems language equivalent to C. Where Swift will possibly start supplanting Java/Go/C++/Python use cases.


Historically, languages that have tried to move from automatic memory management to opt-in manual memory management haven't had a lot of success in doing so. Java direct buffers don't let you do anything more than simple flat heap allocations, and you can't really get away from GC. D hasn't been able to shed a practical dependency on the GC, despite years and years of work. There was some early work to try to use Go in places where GC isn't acceptable, but in practice you can't really avoid it. And so on.

Interestingly, the reverse is also true. C++ does not make very good managed language: despite the occasional claim to the contrary, the lack of memory safety and the library ecosystem mean you can't just use shared_ptr "for everything" and forget about memory management in practice. Rust is also in a similar situation: borrowing is front and center, and the language doesn't really try to make managed programming palatable: you have to explicitly clone reference counted pointers, and you need Cell/RefCell if you want to mutate their contents.

A lot of this has to do with the library ecosystem. Even if the language supports manual memory management, if the package ecosystem doesn't you can't really program with it in practice. I don't know of any language (with one possible exception below) that has effectively subdivided its package landscape into managed and unmanaged ecosystems.

Objective-C is probably one of the few exceptions where you can really mix and match the two worlds. That's because there are effectively two library ecosystems: the C/C++ ecosystem on one hand, and the native Objective-C system on the other. It's a special case because Objective-C/Objective-C++ basically mashes two popular languages together while maintaining full compatibility with the existing C/C++ library landscape. I don't know if you can replicate this without source-level C and C++ compatibility (which Swift drops).

The upshot of this, in my mind, is that assuming history repeats itself, multiple languages are probably going to thrive. There won't be one language to rule them all, and large software stacks will mostly continue to consist of components written in multiple languages. Which is, of course, great for us compiler developers, and programming language innovation in general. :)


I get your points, and they're totally valid criticisms of Rust, but I see them as learning curve issues as opposed to oppressive things that are incompatible with certain development practices.

In C it's easy to pick up the language, but to manage memory properly is very hard, if not complely unclear in certain cases, C++ is similar.

Rust does have direct support for exposing FFI from C quickly in both directions. Objective-C while it could easily incorporate C, it is not easy to go the other direction because of the runtime requirements. The safety of Rust requires learning some patterns to develop properly, but is quite honestly amazing in terms of the guarantees you get.


I'm not criticizing Rust. I'm on the Rust core team and I think it's the best choice out there if you want or need the performance that low-level systems programming offers. In particular, I think it's pretty much an unmitigated improvement over C++. ;)

I just happen to like dynamic languages too, as well as managed languages like Swift.


Then I guess we're in total agreement ;)


As someone that would like to see C replaced by something better and has always been a fan for Wirth influenced languages, Swift has actually a better position to achieve that than Rust.

The reason being that Apple, like it happened to OS vendors that adopted C and later C++, can allow themselves to push the language into the developers regardless of what they think they should use.

All the systems languages that have survived in the mainstream, have done so by being adopted by an OS vendor.


Apple isn't going to stop supporting C and C++ on their platforms, so it can't "push" Swift at the systems level the same way it can push Swift at the application level on iOS.

I also disagree with your repeated claims that systems languages can only survive when they are pushed by an OS vendor. C and C++ dominate for a host of historical reasons (which I'm sure you're well acquainted with). Meanwhile, Rust, at a low level, is such a transparent skin over C and C++'s low-level semantics that it can almost be considered an alternate syntax for those languages (albeit an alternate syntax with extreme static analyzability).

The takeaway should not be "systems languages only succeed when pushed by an OS vendor", it should be "programming languages tend to succeed when backed by a large corporate vendor and when they are used to write widely-used software that people come to rely on". Swift is guaranteed a bright future by dint of being the application programming language of choice for iOS. Rust's long-term fate likely hinges on Servo, but it's too soon to say whether that's the only large project that has a chance of entrenching Rust (give it six more months at least to see what nascent Rust projects are gathering in the post-1.0 calm).


> I also disagree with your repeated claims that systems languages can only survive when they are pushed by an OS vendor.

I was there when Assembly was the only viable choice for targeting home computers, so excuse me if my life experience has made be a bit biased.

So when I see Apple going with Swift, Google and Microsoft pimping up C++ and contributing to the C++ Core Guidelines, I get a deja-vu feeling.

I hope that Rust does get adoption, that killer feature project that blows wind into its sails.


  > excuse me if my life experience has made be a bit biased.
I appreciate the breadth of your experience, but I think that you're overlooking the context of modern systems languages. :) The systems languages of old were wedded to their intended operating systems (indeed, the motivating product was usually the OS itself, with the language being but a necessary feature (Unix/C is no exception here)). As history ran its course and those OSes fell by the wayside, those languages were naturally marginalized (did there even exist usable Mesa/Modula-2/etc. compilers for Unix?).

Rust is different because it is not beholden to the fate of any single operating system. Not only is it cross-platform by intent, it has hitched its cart to a world-class retargetable backend for the dominant systems languages of today, meaning that platform availability will never be a problem.

There are still plenty of reasons why Rust could fail, of course. But Rust does not require the OS vendors' permission to exist on their platforms (well, except for on non-jailbroken iOS, anyway), and the Rust devs are not especially concerned with those platforms integrating Rust into their kernels (though I'm sure they wouldn't mind).


Out of interest, in what way do you consider Swift a Wirth-influenced language?

Go seems a closer cousin to Wirth's languages. Its strict type coercion, value copying semantics, package system, lack of OO, use of ":=", type declarations with "type", "var", GC, colons, etc. are all reminiscent of Wirth's languages, especially Oberon. (Robert Griesemer worked with Oberon under Wirth at ETH Zürick, so there's a clear relationship.)

Another highly wirthian language is Nim, though its influence comes via Delphi/ObjectPascal.


> Out of interest, in what way do you consider Swift a Wirth-influenced language?

In the context that system programming languages shouldn't sacrifice safety, except when needed and be explicit about it with constructs like unsafe/SYSTEM and so on.

Swift is very explicit when doing C style tricks, e.g. UnsafePointer<>.

There is more to Wirth school of languages than just syntax.


The only "safe" Wirth language I know about is Oberon-2, which has bounds-checking of arrays, sentinels and runtime-checking of pointers — though no compile-time checking. Is this what you're referring to?

I like Wirth, but his type systems were always super simple, and as far as I know, he never innovated at compile time safety in the way that, say, Rust or Haskell do. Again, Go seems a better analogue; anything that needs to deal with unsafe pointers needs to use the "unsafe" package.


Not sure what you mean with runtime-compile time checking of pointers.

Wirth type systems weren't always super simple, not when compared with C style ones.

Wirth has designed Pascal, Modula-2, Oberon, Oberon-2, Active Oberon and Oberon-07

He has contributed to Extended Pascal, Object Pascal, Component Pascal.

Also worked on Algol compilers and some of its dialects, Mesa and Cedar while at Xerox PARC.

His work had an influence in Ada, Modula-2+ and Modula-3.

Some of the type safe things that Modula-2 in 1978 already offered, that C still doesn't do today:

- Type safe enumerations

- Bounded checked arrays

- Arrays indexed by enumerations

- Type safe sets

- Range typed integers

- No implicit conversions

- reference parameters instead of pointers for function/procedure calls

- tagged records

- low level coding/assembly intrisics require use of the SYSTEM package

The only thing lacking was automatic memory management, which we later introduced in the Oberon family.

However as you can see from the list there are a whole class of errors that aren't possible.

If you like to research old stuff, have a look at B5000 system, which used Algol as their system programming language in 1961.

Common to his languages and Xerox PARC ones, was the idea that you can do systems programming in higher level languages, provided the language provided some unsafe path that needs to be explicitly mentioned.

But type safety should precede "performance at any cost" mantra.

Something also mentioned by Hoare on his ACM Award speech in relation to Algol in 1980.

"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. "

The problem is that younger devs never learned any of this and probably the only thing they know about Wirth languages is some braindead Pascal compiler.


Oberon-2 had runtime-checking of pointers: It could check the validity of a pointer and immediately panic rather than return garbage or crash with a page fault. Similarly with arrays.

Wirth did some nice things with type safety, of course (I've always liked his type-safe set/range types), but he never was much into type systems. For example, his set and range types were hardwired into the language; you couldn't build a set-like type yourself, or build something that acted as a range, or overload any of the standard function such as inc() or min(), or indeed any type coercion. With the exception of Oberon-2's (limited and highly Go-like) notion of abstract classes, nothing was polymorphic, every type was only "itself" and their purpose was always entirely about holding data. This is what I mean by his type systems always being "simple". He was attacking practical engineering problems, not type-theory problems.

Go is exactly the same. Swift isn't; it has generics and sum types and protocols and OO and pattern matching and lambdas and a bunch of other stuff that Wirth would probably find too complex (his last few languages tended towards mercilessly removing features).

I don't disagree that Wirth's languages are more advanced than C, though!


> Oberon-2 had runtime-checking of pointers: It could check the validity of a pointer and immediately panic rather than return garbage or crash with a page fault. Similarly with arrays.

Ah, but this was a property of any sane systems programming language with automatic memory management.

> With the exception of Oberon-2's ....

How well do you know Active Oberon, Zonnon and Component Pascal?

They extend Oberon(-2) with abstract classes, generics, tasks (active objects), type extensions, method definition signatures.

Also many consider his work as influence in Ada and Modula-2 successors, whose type systems are not very far from C++ ones.

Incidentally he became disillusioned with these language variants and went with a minimalist view (Oberon-07) that makes even Go look like a complex type system.


Sure, Active Oberon is pretty advanced, and Modula-3 had all sorts of things, including generics — but to my knowledge Wirth was not directly in those languages (or with Zonnon or Component Pascal), and he was unhappy with their complexity. (He apparently prefers the entire language's grammar to fit on a single screen.)


> in what way do you consider Swift a Wirth-influenced language?

It looks to me you misunderstood the post.


How so? He wrote: "As someone that [...] has always been a fan for Wirth influenced languages, Swift [is in a better position]." Why is the sentence premised on Wirth, if it's unrelated to Swift?


Because Wirth designed/influenced languages for system programming that I consider better than C.

Algol, Mesa, Object Pascal, Modula-2, Modula-3, Oberon, Oberon-2, Active Oberon, Oberon-07, Ada

All of them were used to write full operating systems for a few years.

However, except for Ada, all of them suffered from not being adopted by OS vendors that were able to deliver mainstream OS, as mainstream OS vendors adopted C to replace Assembly (Object Pascal on Apple's case) and eventually C++.


I feel like Swift reduces the runtime availability of dynamic dispatch by default... so for things like live programming or runtime hacking Swift is worse than Obj-C.

I envision a future similar to the "Smalltalk dream" where one can build apps live, with minimal recompilation and where dynamic dispatch should be the default. I think this scenario is less ideal for "systems programming" but for areas when one is snapping together UI components and data sources I think it's ideal. Perhaps someone here can come up with good counter examples why this isn't the case?

Finally, I think static dispatch, as is favored by Swift, tends to paint one into a corner down the road... but it's possible Swift has enough leeway so that this isn't the case?


Snapping together UI components and data sources has existed for a long time, but it always seems to fail to be popular outside of niche uses. That's often due to the overhead of runtimes and the unsuitability of generic UI components when you get into the nitty gritty of the application being designed. Of course, that doesn't mean it can't be done, just an observation that the attempts I've seen over the past 20 years or so haven't been very good, ultimately.

I've often been curious why programming by manipulating metaphors on screen is seen as an ultimate goal in the evolution of programming languages. It seems a bit like creating a book by sticking pre-generated paragraphs together.

On your final point - static dispatch doesn't prevent you from making your own dynamic dispatch mechanism in the language.


> It seems a bit like creating a book by sticking pre-generated paragraphs together.

But books are not tools or machines, whereas software applications are (not counting art & entertainment, generally.)

Nobody creates every screw and electrical component from scratch when designing and assembling other tools and machines.

"Programming by manipulating metaphors on screen" would be more akin to using predefined mathematical symbols and formulas, and just putting in the numbers and variables related to the problem you want to solve or the task you want to perform.

Even most popular genres of games could be made entirely from wiring up predesigned components together in a visual environment, without writing any code at all. You would just supply your own graphics and sound and other content.


I think your vision of these tools is far more sophisticated and capable than the current reality. It's certainly been an industry dream to produce tools that work at the high level you've described, but they have very limited domains and thus far, less than impressive results.


> It seems a bit like creating a book by sticking pre-generated paragraphs together.

A poor metaphor. Maybe more like "creating a book by joining high-level expressive words together instead of having to define every concept from scratch for every book"


The funny thing is, this type of programming for just about any type of program can be done right now if you are willing to let go of total live programming at the granular expression level and use it for the more course structure of a program. A lot of ideas have seem to fail due to people trying to force them into being a silver bullet that should be applied %100 consistently down to the base constructs.


I'm definitely a novice programmer but that entire post is a foreign language to me.

On what level do these concepts impact me? Performance? Managing large code base? Easy, hard, impossible to do something?


Chris Lattner is a compiler author (LLVM/Clang, Swift), so this post is most definitely talking about compiler semantics that is pretty low level stuff that isn't really visible to the end-user (the programmer of the language), except for in C++.

When a compiler is compiling a program, and you make a function call (lets say `int x = obj.compute()`), the compiler has to know where the code for that function is. In C (and C++ to an extent), this is easy, functions aren't very fancy and compiler can just go to that place in memory which doesn't change during runtime. These types of functions are called "static functions" and the compilers method of calling these functions is called "single dispatch". Since the code that runs is very predictable, its easy for the compiler to optimize the function call.

In other languages (like Ruby, Java, Swift), a statement like `int x = obj.compute()` has multiple meanings if `obj` can be subclassed. For example, if you have an inheritance structure like (Dog, Cat) > Mammal > Animal, `obj.compute` could mean the compute function on Animal, Mammal, Dog or Cat. At compile time, the compiler may have no idea of knowing which definition of `compute()` to call. So during runtime, it will analyze the type information and call the correct function. These are called "virtual functions" and they are called with a "dynamic dispatch". Because there are multiple functions that could run, actually predicting what will run on the machine is hard.

Dynamic Dispatch function calls are generally slower than static ones, and a compiler would prefer static calls. Chris goes over the many different methods languages attempt to make dynamic calls faster - and Swift, thanks to its simple programming model, (AFAIK) gets "smart" about function calls and can make single dispatch calls when needed (Java can do this to, but what makes Swift "cool" is that it doesn't need a Just In-time Compiler (JIT) to do so).

What all this means to you (the end user) is that you can use all these fancy functions in Swift without having to worry about performance. The overhead of a function call may be something you have never considered (I certainly don't think about it), but I'm sure its something compiler geeks obsess over.

I may be 100% wrong here, I'm not a compiler author, can only make a guess from C/C++ experience, and I've never used Swift before.


Thank you for the explanation!


great explanation, thank you


When you call a function f(x) it may, broadly, be the case that either the compiler knows exactly which function f is at compile time or f may be determined using information such as which x is being passed in—something only known during runtime execution.

Requiring that the compiler be able to determine what f is at compile time is "static dispatch" and it enables powerful compilers to work hard to vastly improve program efficiency Ahead-of-Time. Without that information, a program either has to look up which f to call every time during execution (dynamic dispatch overhead) or using a Just-in-Time compiler to optimize out this lookup when, at runtime, it has the information required to know that f is always going to be looked up the same way.

Notice that this is still runtime overhead as you have to bundle a compiler alongside every program, but it's runtime overhead only once. Java is famous for this JIT behavior: expensive runtime calls have to be called enough times to pass the "burn in" period where they get optimized away and then run quickly.

Languages can choose from various points on this spectrum. Lattner, the designer of Swift, is reviewing the choice he made in Swift which pairs some static features against some dynamic features. It's a fine balance to make and he claims that Swift's choice is particularly nice since it provides user-predictable performance without (a) completely eliminating all dynamic dispatch and (b) without requiring a Just-in-Time compiler.


It's talking about the trade-off between dynamic dispatch and more static constructs. Basically it has to do with when and how overridden functions, class methods, struct members etc. are chosen.

Generally, static dispatch is a lot faster, but dynamic dispatch makes polymorphism a lot easier, which is an important feature of object oriented programming.

Really, the details of the different language designs are mostly only important if what you are doing is highly performance critical or if you are limited by your runtime (I.e writing a bootloader or bare-metal OS code/firmware).


It's about the performance of selecting which function to call:

https://en.wikipedia.org/wiki/Dynamic_dispatch


Novice or not, the post shouldn't be foreign language to you. I mean, if you just start out it's ok if it is, but eventually you should learn about that stuff.

Now, if you only program with languages like Javascript or Python, then you can maybe ignore it, since those have just one method to do those things.

Otherwise, there's nothing that low level in 80% of the post. It's not like low level intricacies of IEEE floating point, or heavy internal compiler workings. Stuff like static or dynamic dispatch are things that all programmers can benefit of knowing, especially how it's handled in the language that they use.


How does the presence of C++'s placement new() affect devirtualization opportunities? I'm sure Chris knows what he's talking about, so I'd like to hear more.


I was going to write a response but bdash over on reddit did a great job, so I'll just link that: https://www.reddit.com/r/programming/comments/3wla9f/chris_l...

One interesting note is that, the last time I looked into this, the compiler _is_ allowed to assume that multiple sequential vtable calls do not replace the vtable, allowing the vtable lookup to be cached. But that is not true for some arbitrary external-to-the-compilation-unit function that.


In practice neither GCC nor Clang seem to be willing to believe that the vtable is immutable. See https://goo.gl/y0cx1r, for example, which shows only the first of two calls being devirtualized. The call to fprintf within B::work is sufficient to cause to both compilers to reload the vtable pointer, preventing devirtualization of the second call.

Interestingly enough, when a use of placement new is visible to the compiler it can prevent the conservative behavior mentioned above. In https://goo.gl/Y8rwYG the vtable store that placement new conceptually generates allows Clang to devirtualize the second virtual call. GCC doesn't appear to catch this case.


Thanks for digging in! I saw the same thing with clang last year, in the context of Emscripten, where we were hoping to eliminate the redundant vtable loads as a JS code size optimization.


Can someone elaborate on what methods JIT uses to boost perfomance as compared with AOT compilation? Do modern JIT compilers use statistics about what code actually got called to optimize call dispatch? Where can I find more info about this topic?


You should really watch "A jit does that" video by cliff click, which is just excellent, but the brief answer is this:

Java has global knowledge of all loaded classes which implement an interface. So the easy answer is, for a given call site, only one class implements a given interface. Wonderful, devirtualize it (direct calls, inline if the methods are small.) As compared to c++, you skip the vtable memory load and the jump.

They're also smart enough that, even if multiple classes implement an interface, they count who gets called at a call site and speculatively devirtualize (ie devirtualize or inline with a guard check). This is a big piece of why they're so much faster than c++.

Imagine this situation: I have an interface for probability distributions, and that interface is implemented by eg gaussian, exponential, weibull, etc. At run time, I draw a million times from one of them, then I pick another by asking the user, draw a million times, etc. So there's a tight loop that bangs on that interface over and over. In c++, you cannot devirtualize that; each call to random->getRand() involves a vtable lookup and a jump. Java will notice that, eg, you picked gaussian so the call site will be devirtualized, skipping the vtable lookup and possibly the jump. Then, later, when the user says draw a million times from poisson, it will re-virtualize it, notice it's monomorphic with a different implementation, and revirtualize it. Inlining code based on usage, and being able to change that inlining at runtime is often a very effective performance improvement technique.

Now the ugly bit, and why the jvm is one hell of a piece of engineering, is you can load classes at any time during the life of a java program, so even if at start there is only one known class implementing a given interface that can change. Therefore, you have to be able to go to already generated code and revirtualize/deoptimize it. But this magic all just works.


https://www.youtube.com/watch?v=uL2D3qzHtqY

A JVM Does That? Google Tech Talk by Cliff Click


JITs do know the call target. This is a quote from "A JVM does that?"[0]:

  C++ avoids virtual calls – because they are slow
  ● Java embraces them – and makes them fast
  ● Well, mostly fast – JIT's do Class Hierarchy Analysis
  ● CHA turns most virtual calls into static calls
  ● JVM detects new classes loaded, adjusts CHA
  – May need to re-JIT
  ● When CHA fails to make the call static, inline caches
  ● When IC's fail, virtual calls are back to being slow
----

There are several misconception about Java in the write-up. The real reason there won't be a bootloader is the need for the Java runtime and the need to GC. Unoptimized virtual calls won't affect any bootloader for that matter (and you can/should use static and/or private methods for the inner loops)

[0]:https://www.youtube.com/watch?v=uL2D3qzHtqY


He doesn't say you won't see a Java bootloader because of unoptimized virtual calls, he says it's because the runtime is too big. CL: It also means that Java doesn’t “scale down” well to small embedded systems that can’t support a JIT, like a bootloader.


OTOH, since the compilation model assumes a JIT, this means that purely “AOT” static compilers (which have no profile information, no knowledge of class loaders, etc) necessarily produce inferior code. It also means that Java doesn’t “scale down” well to small embedded systems that can’t support a JIT, like a bootloader.

This the quote. He talks about the JIT only, no runtime mentioned.


He might somewhat tersely be conflating JIT and 'runtime'as whole - see the mention of classloaders but his meaning is clear, it's about size rather than performance. 'It also means' etc.


> it's about size rather than performance.

The tradeoff really: what he's saying is that you can JIT java to get speed but you'll lose size (resident), or you can AOT compile java to improve size but then you lose speed because calls may not be possible to devirtualize (unless you have PGO maybe? You'd lose some speed and size to guards and duplicated dispatch though)


You can't really AOT compile java and have it be java and it seems fairly obvious to me he's talking about size and runtime-related things, in which he throws in a JIT (i.e. the compilation model 'assumes' a JIT). I think he's just being slightly loose with his terminology because he's just replying on a mailing list without expectation that random nerds on an internet forum will be talumidcally dissecting his semiotic intent.


> You can't really AOT compile java and have it be java

Of course you can.

> and it seems fairly obvious to me he's talking about size and runtime-related things

I'm not sure why you're so intent in reinterpreting his rather plain words (and insulting people when they don't agree with your broad reinterpretation).

Lattner is literally just noting that an AOT-compiled java program has no knowledge of java's dynamic semantics (classloader &al), so it can't devirtualize calls and must necessarily pay the dynamic dispatch cost, in the same way e.g. messages in objective-c can't be devirtualized. In essence, an AOT java is slightly better than a straight interpreter but not by much.

The whole essay is about static versus dynamic dispatch and how languages provide for those, it's not exactly a stretch to assume that when Lattner talks about inferior code that's what he's talking about: bootloader = no JIT = pervasive dynamic dispatch = slow, you don't have to bring your own pet issues into that.


Not nearly as bad as an interpreter, especially if you do whole program optimization. In fact, Android switched from the Dalvik JIT to ART, which compiles AOT-ish.


> You can't really AOT compile java and have it be java

IBM J9, Aonix PERC, Excelsior JET, JamaicaVM, RoboVM are just a few of the commercial JVMs that also AOT compile Java.

Oracle is also planning to have AOT as an option in future versions of the compiler.


I don't think you can have a JIT without a runtime. Something has to decide when it is 'just in time' to compile a piece of code, and something has to run the piece before it gets JITted. Or am I overlooking a narrower definition of 'runtime'?


Maybe I'm missing something, because people have implemented JVMs on embedded systems, including GC. An example is lejOS, which runs on the Lego Mindstorms, including the original RCX with its 8-bit microcontroller and 32K of RAM.


Read about polymorphic inline caches which Java and Javascript JITs do.


Here's a good, if old, quick intro to some of the stuff the JVM does:

http://blog.headius.com/2008/05/power-of-jvm.html

Choice section:

The JVM is basically a dynamic language runtime. Because all calls in Java are virtual (meaning subclass methods of the same name and parameters always override parent class methods), and because new code can be loaded into the system at any time, the JVM must deal with nearly-dynamic call paths all the time. In order to make this perform, the JVM always runs code through an interpreter for a short time, very much like JRuby does. While interpreting, it gathers information about the calls being made, 'try' blocks that immediately wrap throws, null checks that never fail, and so on. And when it finally decides to JIT that bytecode into native machine code, it makes a bunch of guesses based on that profiled information; methods can be inlined, throws can be turned into jumps, null checks can be eliminated (with appropriate guards elsewhere)...on and on the list of optimizations goes (and I've heard from JVM engineers that they've only started to scratch the surface).

This is where the call site optimizations get their second boost. Because JRuby's and Groovy's call sites now move the target of the invocation much closer to the site where it's being invoked, the JVM can actually inline a dynamic call right into the calling method. Or in Groovy's case, it can inline much of the reflected call path, maybe right up to the actual target. So because Groovy has now added the same call site optimization we use in JRuby, it gets a double boost from both eliminating the dispatch overhead and making it easier for the JVM to optimize.

Of course there's a catch. Even if you call a given method on type A a thousand times, somewhere down the road you may get passed an instance of type B that extends and overrides methods from A. What happens if you've already inlined A's method when B comes along? Here again the JVM shines. Because the JVM is essentially a dynamic language runtime under the covers, it remains ever-vigilant, watching for exactly these sorts of events to happen. And here's the really cool part: when situations change, the JVM can deoptimize.

This is a crucial detail. Many other runtimes can only do their optimization once. C compilers must do it all ahead of time, during the build. Some allow you to profile your application and feed that into subsequent builds, but once you've released a piece of code it's essentially as optimized as it will ever get. Other VM-like systems like the CLR do have a JIT phase, but it happens early in execution (maybe before the system even starts executing) and doesn't ever happen again. The JVM's ability to deoptimize and return to interpretation gives it room to be optimistic...room to make ambitious guesses and gracefully fall back to a safe state, to try again later.


Does this really buy you that much? For most programs the runtime is going to be dominated by loop execution. If you're in a situation where the all the objects have the same type and the compiler could switch in static dispatch the computer's branch predictor should catch on very quickly and the performance gains won't be that high. In cases where you have heterogeneous objects the branch predictor won't be able to catch on but the compiler will be forced to stick with static dispatch anyways.

I suppose there is a bigger gain where you can inline static functions or optimize the calling convention.


Inlining is often called the mother of all optimizations. On its own, it doesn't gain you much compared to any other optimization, but once it's done it enables much, much more.

There's a reason languages like C that default to static dispatch on value types are so much faster without much effort.


Good writeup. What is a checked downcast?


In this context, it's a runtime check inserted by the compiler to be sure that an object retrieved via a generic interface matches the more specific type the caller expects.


A call from an abstract interface to a concrete type which signals if the concrete type is wrong.

In C++, `dynamic_cast` is checked (it returns a null if the cast is incorrect) while the others are unchecked (at runtime, they do have some static checking/limitations, except for C-style casts where there's no check whatsoever)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: