Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Snabb: 100 Gbit/s pure software switching using Lua (2019) (github.com/snabbco)
133 points by pdmccormick on June 9, 2020 | hide | past | favorite | 38 comments


I would love to see an IXIA or Spirent line rate RFC2544 test with an imix packet sizes. Most servers and OS stacks cannot do line rate 100G/bits today without a ton of tweaking. Netflix has a few blogs on it. In my current company we moved our NAS to 2x100G but the build servers at 10G and 25g (Ubuntu) cannot max a link in their normal operation (which includes pulling 4g VMs). There is some low hanging fruit here tweeting wise we can do but..

The NAS is TrueNAS with SSD and all the ZFS goodness, ZIL, etc. I can built hand crafted iperf streams between it and another 100g server that hit line rate (including overhead).

The reason for the 100g is there are racks and racks of BS that pull down their images so aggregate we can hit 100g with a fully non-blocking Clos fabric.


You're kind of mixing different things here. Endpoints (like Netflix or your NAS and build server) have completely different characteristics than switches/routers. Likewise, interrupt-based kernel-based networking is very different and ~10x slower than polling-based kernel-bypass. Doing 10G routing per core using polling was demonstrated years ago so 100G really isn't an extraordinary performance claim.


> The reason for the 100g is there are racks and racks of BS that pull down their images so aggregate we can hit 100g with a fully non-blocking Clos fabric.

It may not pan out in practice, but in theory isn't that the perfect usecase for p2p image distribution?


could probably do it with BTsync if you don't want to build your own. Or there's also Syncthing which is more open source


I was thinking ipfs, but yeah there's lots of ways to skin this cat.


Is there a hardware write up anywhere? Would be very interested to read up on it.


As an alternative, back in the mid-2000s there was a multicast system image distribution tool called flamethrowerd. It was used to netboot clusters of Linux & Mac machines. Probably abandoned, but still available for download.[1] I wonder what happened to that tool, or if there are any good modern replacements in common use. (It was a single purpose UDP multicast file distribution tool.)

[1]: https://sourceforge.net/projects/systemimager/files/flamethr...


Something "fast" named "snabb", from Switzerland. I bet they get their kicks out of triggering Swedish people.

Snabb is Swedish for fast. Switzerland is often confused with Sweden and vice-versa, stereotypically by the under-educated kind of americans.

Grr. :]


This is the first thing I noticed! I guess the reason is maybe just that the creator used to live in Sweden (as indicated in the comments below), but I would like to believe this some inside joke among the Swiss workers. As a Swede, I find it hilarious. :)

edit: Looking at the older threads linked above I found this:

https://news.ycombinator.com/item?id=8009309

Oh well I still choose to believe the Swiss are trolling the Swedes. :)


Their logo also seems to be a tiger! :-D

https://en.wikipedia.org/wiki/En_svensk_tiger


Well the creator of Snabb lives in Sweden.


He used to, but I believe he now resides in Switzerland.


Moved back to Sweden recently, as it happens :)


Sorry about the timing :)


Two letters should be enough for every country!


It can be confusing how the country code for Sweden is .se (from "Sverige"?) but the language code for Swedish is sv (from "svenska"?).


Similarly, ja_JP, ne_NP, nb_NO, my_MM, sl_SI.


Luxembourg is .lu, but the language code is LB. It causes all kinds of funny bugs when people set their language code to 'lu' and end up with things like date strings automatically translated to Luba-Katanga, a very melodic and completely different language spoken in the Democratic Republic of the Congo.


.sv is El Salvador; both SE and SV were assigned in the same 1974 standard.


Yeah, and as a language code, "se" refers to Northern Sami.


Interesting, thanks for sharing. It's always nice to see Lua in the wild. This headline immediately reminded me of this[0] talk from CCC 2018 comparing different high level languages for writing userspace network drivers.

[0]: https://media.ccc.de/v/35c3-9670-safe_and_secure_drivers_in_...



Just out of curiosity, what kind of speeds can be achieved by a pure software switch using say C or pure assembly?


We demonstrated 1 Tb/s back in 2017 on a high-end dual-socket Broadwell server, and the performance increased steadily since then (thanks to both HW and SW).

It is hard to compare w/o knowing packet sizes (64-bytes? IMIX? 1500-bytes?), CPU type, #cores, workload (L2 patch? L2 switching with MAC-learning and 1M entries? IPv6 forwarding with 500K routes? etc.)

This is why we have the FD.io CSIT project hosted by LinuxFoundation where we can compare lots of interesting, automated, reproducible benchmarking scenario in a controlled environment on various HW platforms: https://docs.fd.io/csit/master/report/


Around the same. VPP gets some benefit from prefetching which you might classify as inline asm.


C probably won't give you much, difference. Assembly, carefully tweaked, might - the kind of stuff like Quake issuing FDIV every 16 pixels in software renderer, where the assembly intimately matches various aspects of microarchitecture.


What are you basing this on exactly?


Despite years of propaganda, C is not well matched to CPUs currently in use (quite opposite in places), and the typical optimizations don't necessarily work when dealing with external I/O that you need to do in a switch.

Essentially, even if you write in C, reaching higher speeds will involve using "assembly masquerading as C" rather than depending on compiler optimizations.

Also, Snabb uses LuaJIT, which already generated quite tight code, so the performance gap that I suspect some imagine just isn't that wide.


>C is not well matched to CPUs + >depending on compiler optimizations + >Snabb uses LuaJIT .. quite tight code

==

You can write great C-based systems and avoid assembly if you a) know, always, what your compiler is doing and b) know, always, what your CPU is doing...


My point was that the same optimizations that made C fast break down when you need to take into account often intricate dance between CPU caches, memory, I/O bus etc. - so that unless you go into cpu-model-specific assembly tweaks just using C might not bring you as much benefit.

Is it possible to get better? yes. Would it count as "normal C"? I would say not really (if we say yes, then CL code on SBCL with huge custom VOPs counts)


I'd be interested to learn about significantly complex, nontrivial systems of say 100K to 1M LOC scale that require reasoning about every single instruction from the perspective of every other instruction, in order for the system to work.


You do not do that. Instead you optimize the short, critical part.

Of course it does not apply to everything, you need a few hotspots, but it is quite common: audio/video codecs, scientific computation, games, crypto... And even networking.


And with Lua (and C, and C++) its pretty easy to manage the complexity. Just put things where they belong.


I think the answer here is that LuaJIT is fast, and that well written native programs would still be faster, not that C "isn't well matched to CPUs". Modern optimizations are more about memory access patterns than anything else, with SIMD and concurrency beyond that. Focusing on assembly is really not the apex is used to be. For starters CPUs have multiple integer and floating point units, and they get scheduled in an out of order CPU. Out of order execution is as much about keeping the various units busy as it is about doing loads as soon as possible to avoid stalling.

I think if you are going to claim that C or C derivatives aren't actually fast and the idea that they are is due to "propaganda" then you should back that up with something concrete, because it goes against a lot of established expertise.


could someone explain where in the stack does Snabb sit vs something like BPF ? http://www.brendangregg.com/blog/2019-12-02/bpf-a-new-type-o...

it seems that Snabb uses pflua at its core - which claims to be faster than BPF ( https://github.com/snabbco/snabb/blob/b9da7caa1928256a3b4908... )


BPF's claim to fame is that it is confidently secured enough that it can be used to run user-provided code in kernel space.

Snabb side-steps the kernel by providing the performance aspects of kernel-like access while staying securely in user space.


Currently that user is almost always the administrator (root). BFP is more safe than secure, if that distinction makes some sense.

So it's great because the eBPF verification makes sure that whatever the user loads doesn't blow up the kernel immediately, it's very restricted (no infinite loops, so not Turing complete, can only call predefined API functions, etc). But it still gets JITed into machine code, and still has access to a vast trove of kernel gadgets and data. So it's like a power tool that was designed to be almost impossible to cut your own limbs off with.

But it's still a power tool, it still not something you give to any random passer by.


eBPF has had many exploits, the most recent of which was just a few months ago: https://www.thezdi.com/blog/2020/4/8/cve-2020-8835-linux-ker...

Original BPF was safe by design (no verifier needed) but slow and far more restrictive, and even then there were exploits over the years. When eBPF and then eBPF JIT became a thing it was entirely predictable what would happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: