Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The webassembly exploit part of the chain bums me out (I was always afraid of stuff like this when I was working on the design for it) but it's pretty uninteresting, really. The simple sort of bug you get when you insist on writing stuff in C++.

I really hope people don't think webassembly is the fault for this, this vulnerability is no different from any other memory corruption vulnerability you would find in the js interpreter or the css parser or whatever.



Well, WebAssembly's primary near term contribution will be introducing the world of C++ exploits to web apps, which are already groaning under the load of XSS, XSRF, path traversal, SSRF etc attacks. Adding double frees, use after frees and buffer overflows on top doesn't seem ideal.

As for the rest, well, it'd be nice if there was any sort of plan to make Blink safer. I know about oilpan but what Mozilla is doing with Rust is impressive. The JVM guys are working on rewriting the JVM in Java. What's Blink's plan to make its own code safer? Sandboxing alone?


Microsoft is also taking steps to incrementally rewrite .NET runtime in C#.

https://www.infoq.com/articles/virtual-panel-dotnet-future

And the D guys have been rewriting dmd, the reference compiler, into D.


What exploits are you specifically worried about with wasm?

The ones you call out in this post don’t have the same impact as native, even when it’s C or C++ compiled to wasm.


http://foo.bar.com/url?q=<base64 encoded stuff>

wasm program parses q

stack smash occurs

ROP chain is used to gain code execution

user cookie is stolen

attacker now controls your account

I don't know enough about wasm to know if it has some special mitigations for this but when I looked at it, wasm seemed closer to a CPU emulation than a high level language VM. Flat memory space, no GC, no pointer maps.


WASM memory is a set of memory specific to a module (and they only allow one memory instance right now). It can be imported/exported to other modules, but there is no sandbox escape (in theory). For the web backend, it's just backed by a UInt8Array IIRC. It's all userland. If anything escapes the WASM interpreter/compiler, it is the fault of the interpreter/compiler (as is the case here) and not the fault of the WASM bytecode itself which has no escape mechanism. Think of a WASM VM just like a JS VM. Even though it may appear low level just because it can JIT better/cleaner, it operates in the same arena as JIT'd JS (at least for the web target).


You don't need to escape a sandbox when the application has access to all the user's data.

The attack surface of a gmail implemented in C++-compiled-to-wasm is almost certainly going to be larger than a gmail implemented in JS, because the runtime environment is vulnerable to double frees and heap corruption and other attacks, even if it won't escape the browser sandbox. My gmail tab basically has access to my entire life.


I don't understand. In the gmail example, the attack surface to who, a malicious email sender? As in something being handled by wasm in the browser has a better chance at XSS than if it was handled with JS? Why would untrusted content like that be handled by a client-side language anyways? Whether it is wasm, JS, wasm-interpreted-by-a-JS-interpreter, JS-interpreted-by-a-c++-intrpreter, wasm-interpreted-by-a-c++-interpreter or whatever the risks are similar. If you are talking about untrusted wasm or JS scripts accessing things inside the same sandbox, that's a different vector and it's less about the size of surface area and more about the introduction of the vector in the first place.


Simple example (though not something I think the Gmail team would actually ship): I want to load a .png file that's attached to an email. If I decide to use a build of libpng I control (for example, to work around broken color profile support in browsers), a bug in that libpng build could allow privilege escalation within the tab to get access to my gmail contacts or send emails. Bugs in loaders for image file formats are not unheard of, and people treat image files as innocuous.


Ah that example makes it clear. I'll ignore the obvious issue of libpng being written in JS having a similar problem. So the libpng WASM would have its memory instance (probably imported from the caller) and functions would reference the memory. It's not like with regular RAM where if you overflowed a mem write that it would write executable instructions. Code is different from memory. There is no eval. There is a "call_indirect" which can call a function by a dynamic function index, but what would a dangerous function that libpng imported? You can't execute memory or anything.

I can see some site DOSing though where you use the equivalent of a png-bomb to blow up CPU from the parser, but that is any not-meticulously written client-side parser of untrusted input.

So while you can toy with memory and maybe even affect the function pointer before a successive indirect call, it's not near as dangerous to the outside-of-wasm world as raw CPU instructions. I can see an issue where the caller that imports libpng and exports his memory to it might have something secret in that memory...hopefully multi-memory and GC-like-structs and what not can make passing and manipulating large blocks more like params than shared mem (and all of shared mem's faults).


It's a bit extreme but the reality is that a lot of production libraries tend to pull in imports that are as dangerous as eval, because the scope of the library is enormous, or it's actively supposed to interact with the DOM or JS. At that point, if someone can more trivially exploit it with a double free or buffer overflow, you've increased your security risk relative to JS (because overflowing a Uint8Array is basically never going to result in arbitrary function invocation)

The way function addresses are sequential in wasm tables (and deterministic) also means it is probably easier to get to the function you want once you get code execution.


WASM resulted in adding a lot of new API to JS, like thread-shared buffers and coming atomics. This requires quite a few new lines of native code in the implementation significantly increasing attack surface. Another thing is that WASM makes code faster so exploiting timing bugs or cache leaks gets easier.


My sibling comment is correct; the only way this can happen is an interpreter bug. Bugs happen, but they can happen in JS too. I think you’re assuming things the spec doesn’t allow.


"Controls your account" is possible without ever exploiting an interpreter bug or escaping the sandbox. Your account credentials are usually available inside the current tab.


I’m not sure what you mean, specifically, here. Or at least, how wasm is somehow worse than JavaScript in this regard, which is the baseline here.

In fact, it should be better, given the static declaration of external calls that can be inspected.


The example I gave in another comment holds here: Let's say I want to load PNGs and I'm fed up with color profile bugs in browsers' image decoders (sigh...) so I decide to compile a known-good build of libpng or stb_image with wasm. Now someone finds a png decoder exploit that works against my build. If I'm not cautious about my imports, they can escalate privilege out of my wasm library and then take control over my gmail.

Ideally wasm libraries will always be narrowly scoped and good about what they import, but there will definitely be broadly scoped libraries that import a ton of dangerous stuff, and there will be some that import a function that is effectively eval because they don't want to declare a thousand imports by hand.

It's certainly possible for JS libraries to have these kinds of vulnerabilities, but it's hard for me to imagine how a JS PNG decoder would end up with the same sort of attack possible on it since it's parsing binary data into pixel buffers. At worst, you'd crash it.


> I gave in another comment

Yeah, sorry about the duplication here, I'm extremely interested in this specific topic.

> Now someone finds a png decoder exploit that works against my build.

I think this is the part I don't get. Specifically, how would an exploit work within wasm? That is, in the wasm environment is different than in native; the memory is bounds checked, for example. Basically, I 100% agree that some security bugs are logic bugs, but take the above stack smash, for example: that can't happen, in my understanding. Again, modulo interpreter bugs, like any sandboxing technique.

> it's hard for me to imagine how a JS PNG decoder would end up with the same sort of attack possible on it since it's parsing binary data into pixel buffers. At worst, you'd crash it.

It's hard for me to imagine how wasm is any different than JS here.


How does wasm stop stack smashes? I can see that if it's not a von Neumann machine i.e. code is in a different memory space to data, it'd be harder, but that doesn't seem really compatible with C/C++?

Just in general if I have an arbitrary memory write primitive inside the wasm memory space, how much control over the program can I obtain?


wasm does some stack isolation so that the function's local stack variables are not next to things like the return value in memory and loads to/from stack slots are special instructions that reference slot identifiers. Most stack operations aren't arbitrary memory reads/writes from addresses so it's not possible to overflow them and corrupt other values.

The caveat is that not everything native apps put on the stack can currently be stored in wasm's safe stack, so applications often put a secondary stack inside their heap. This will also happen if you're - for example - passing large structs around as arguments. You can smash the heap stack if you manage to find an exploit, and if function pointers or other important data are stored there, you can turn that into an attack.

It's absolutely the case that a large subset of stack smashing attacks don't work on wasm, because of the safety properties. Some of them will still work though. The way function pointers work in wasm raises the risk profile a bit if you manage to get control over the value of a function pointer, since function pointer values are extremely easy to predict.


Thanks.

I am curious how JIT compilation policies will affect this. Wasm is a new bytecode form that has no mature JIT compilers. I wonder how many safety properties the compilers will assume. For instance if the wasm VM only really tries to stop wasm code escaping its own sandbox, then I guess compiling all stack ops down to a unified stack, C style, is a perfectly legitimate approach as long as the sandbox properties can be maintained. I don't think wasm is claiming it will make all C/C++ hackproof code.


Yup, and thanks for this; this helps me understand what I was missing, specifically "applications often put a secondary stack inside their heap".


I recently read a paper about security exploits in WebGL, thanks to bugs on shader compilers and drivers.

"Automated Testing of Graphics Shader Compilers"

http://multicore.doc.ic.ac.uk/publications/oopsla-17.html


> I really hope people don't think webassembly is the fault for this

Nah, I think it's pretty clear GP meant "when you insist on writing interpreters/compilers in C++" not that C++ was compiled into wasm.


Yeah, sorry for being unclear - that is what I meant. I don't see wasm as at fault here, it's just a bummer that this new attack surface was introduced by writing the wasm implementation in C++ instead of memory-safe languages. It's not something so complex that it really needs to be C++.

Most (all?) browser wasm backends function by just generating the internal IR used by the existing JS runtime, so it's not especially necessary to write the loader/generator in C++. The generated native module(s) are often cached, also, which diminishes the importance of making the generator fast at the cost of safety.

I wrote all the original encoder and decoder prototypes in JS for this reason - you can make it fast enough, and the browser already has a high-performance environment in which you can run that decoder. When the result is already being cached I think writing this in C++ is a dangerous premature optimization.

Similarly it's common to write decoders as a bunch of switch statements and bitpacking, which creates a lot of duplication and space for bugs to hide. You can build these things generally out of a smaller set of robust primitives to limit attack surface, but that wasn't done here either, despite my best efforts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: