Linux 3.8 Released

contingencies · on Feb 19, 2013

If I understand correctly, user namespace support was the major thing missing for the secure use of Linux Containers (LXC). This should bring widespread, extremely rapid container-based virtualization under Linux closer to a reality. Redhat is nominally backing this via libvirt, but mainly offers paravirt-based solutions. IBM who authored a lot of the kernel code seems to be interested the same stuff for large-scale servers. We live in interesting times.

dylanvee · on Feb 19, 2013

Container-based virtualization is already here, but without user namespaces root privilege in the container implies root privilege in the host. That's no longer the case; interesting times indeed.

DASD · on Feb 19, 2013

Makes me wonder(or maybe I don't want to know) how Heroku has managed this until now? Only unprivileged users?

dylanvee · on Feb 19, 2013

I don't remember specifically, but that'd be my guess. Non-root users in containers have been fine for a while now.

DASD · on Feb 19, 2013

Does this solve the problem of constraining root to the container?

justincormack · on Feb 19, 2013

Maybe. Remember it is new code. In principle it should be no worse than constraining a user; there are still risks of kernel compromise (one just the other day). I would check if you use any kernel modules that are not in the main tree though. Also I don't believe it is a full root, ie it can only do operations that have been whitelisted (eg creating other namespaces), otherwise you could just use mknod and overwrite the host harddrive. So your code you want to run as root may not work. You can do stuff like open low numbered ports though if you open a network namespace, so long as the host sets up bridging for it.

dylanvee · on Feb 19, 2013

DASD · on Feb 19, 2013

Thank you.

andyl · on Feb 19, 2013

I know open-source is old-hat by now - but damn - look at the man-years of contributions. Linux is a miracle.

chubot · on Feb 19, 2013

I've had the same sentiment for many years... Linux is indeed a miracle.

But after spending a lot of time with it recently (e.g. getting LXC running last night), I've concluded that it would be very hard to design a system with a security API that is worse than Linux.

The issue is that Linus doesn't design ANYTHING. He doesn't believe in design; he only believes in evolution.

Unix was designed, whereas Linux is mostly a bunch of code bolted on top of Unix. It's not sustainable in the long term. Someone needs to actually design something eventually, so there is a stable base for more evolution.

Spend some time looking through these:

    - traditional Unix ACL-based security
    - traditional resource limits
    - chroot (not secure, but used as a "part" of many  security solutions)
    - capabilities
    - seccomp
    - LSM-based
      - SELinux
      - AppArmor
      - ...
    - LXC
      - cgroups
      - namespaces (apparently completed with this kernel release)
      - LXC user space tools
    - ptrace sandboxing 
      - (at least a dozen projects use this)
    - user mode linux

And you'll realize it's just a huge mess. I'm sure the complexity makes Linux measurably more insecure in practice. Or it just provides employment for a lot of people -- who knows.

There's never going to be a way to clean this all up, since people are relying on all of it.

I don't have that much experience with the alternatives; I'm sure they're messy in their own right. (I've used many OSes, but not security-wise.) But this definitely has me looking towards FreeBSD and such. Too bad it is more expensive on EC2.

I mentioned Minix 3 here before -- it's probably a pipe dream, but being a microkernel, it seems like a good basis for a future secure Unix. It actually was designed in some sense.

From what I gather people take the existence of root escalation exploits on Linux for granted. If that weren't so (and it shouldn't be with a microkernel), then traditional Unix security might actually cover a lot of cases that all these hacks on top are patching up.

EDIT: Also, Linux should look to DJB for guidance. Out of all the hairiniess, how do you even do this on Linux (or any Unix)? http://cr.yp.to/unix/disablenetwork.html It just seems crazy.

bcantrill · on Feb 19, 2013

With the disclaimer that I very much have a dog in the fight, you might want to look at illumos[1] and its distributions like SmartOS[2] and OmniOS[3]. It has a secure, robust container model (with a hat tip to FreeBSD jails for providing inspiration over a decade ago) and a mature least-privilege model that minimizes attack surface -- not to mention ZFS, DTrace, KVM and other goodies. At the very least, you can take solace in knowing that others share your desire for cleaner alternatives...

[1] http://smartos.org/2011/12/15/fork-yeah-the-rise-and-develop...

[2] http://smartos.org/

[3] http://omnios.omniti.com/

chubot · on Feb 19, 2013

Thanks for the links; I had heard of SmartOS but not known much about the technology.

What sort of disappointed me about LXC is that you end up with an init process and 7 or 8 children of it in each container. I am more interested in sandboxing at the level of a single process. In a lot of cases you just want to run somebody else's Python code and look at its stdout; you don't need to spin up init and family do that.

There are a hundred and one projects like this but most of them seem half-baked.

Capsicum [1] looks like what I'm interested in; there seemed to be effort around a Linux port a couple years ago but I don't think it happened. Does Illumos/SmartOS provide anything like this?

http://www.cl.cam.ac.uk/research/security/capsicum/

justincormack · on Feb 19, 2013

You don't need init in each container, and the encouraged model of having a whole distro in a container is bonkers. Play around with clone(2)/unshare(2) directly, and it is fairly simple. All you need to know about pid 1 is if it terminates your namespace goes, and orphan processes will reparent to it (and some signals are blocked). If you have a single process then this doesn't matter really. You can do all this from Python I expect, I have done it all from Lua with no issues.

chubot · on Feb 19, 2013

OK from what I understand "LXC" is basically the user space tools that give you the distro in the container... it's more of a VM model.

But yeah I think I just need the underlying cgroups, and possibly some of the namespaces. Although I don't car aell that much if untrusted code can see what processes are running; just as long as it can't affect them.

Just curious what you were using containers for from Lua? Sounds interesting.

justincormack · on Feb 19, 2013

I started using them largely for testing netlink code, as it is much easier to create some isolated network devices than risk messing about with the real ones. This is part of a fairly comprehensive Linux binding for Lua https://github.com/justincormack/ljsyscall

zokier · on Feb 19, 2013

Have you read Spenders (the grsec maintainer) recent comments on Linux security development practices? They provided interesting insight imho, even if they were bit flame-y.

https://lwn.net/Articles/538221/ (ctrl-f spender)

FooBarWidget · on Feb 19, 2013

Unix was designed? You wish. The very early parts of Unix were designed, but pretty much everything after that was bolted on under the banner of "worse is better".

Read about the currently decades old POSIX TTY subsystem to see how bad it is: http://www.linusakesson.net/programming/tty/

ioctls and fcntl? Quick hacks to make things work. Fcntl lock files? Insane semantics that were quickly hacked on. Etc etc.

justincormack · on Feb 19, 2013

For disable network as linked, the new seccomp mechanism is probably what you need, as you can blacklist socket system calls (or better whitelist). But it is pretty new.

pilgrim689 · on Feb 19, 2013

"it seems like a good basis for a future secure Unix"

There's QNX for that: http://www.qnx.com/products/neutrino-rtos/secure-kernel.html It's used in industries such as automotive, nuclear, etc.

Perhaps you meant "future secure open-source Unix" ;) (QNX source was closed the day RIM announced the purchase)

beagle3 · on Feb 19, 2013

Was it ever open? Can you give a link to the last open version if it was?

rbanffy · on Feb 19, 2013

> Linux is a miracle.

A miracle is something you are grateful for. This is something we should all be proud of.

arielweisberg · on Feb 19, 2013

I have high hopes for the automatic NUMA balancing work. We are getting a lot of cores per socket capable of generating a lot of traffic per core and the disparity which was already pretty large continues to grow.

That said the scheduler does pretty well, it beats manually binding without a lot of experimentation.

sour-taste · on Feb 19, 2013

Does anyone else love reading these, even though they are mostly way over your head? There is something about kernel release notes that makes them fascinating...

loser777 · on Feb 19, 2013

3.7.x has been borderline breaking on my laptop for a month now and I'm fairly sure that the issues I've experienced affect a large number of users (excessive heat, low battery life on Sandy/Ivy-Bridge based notebooks). Compiling 3.8 as I'm typing this.

mtgx · on Feb 19, 2013

Pretty excited about seeing F2FS in mobile devices eventually, as mobile devices still need all the I/O performance they can get, considering most manufacturers are using cheap flash storage, and even the high-end ones aren't that fast.

knackers · on Feb 19, 2013

Agreed. Really looking forward to trying this on my Raspberry Pi.

stefantalpalaru · on Feb 19, 2013

Will the ext4 inline data be enabled by default? Any idea how older kernels will deal with files inlined by 3.8?

wmf · on Feb 19, 2013

Older kernels refuse to mount a filesystem with feature flags they doesn't understand; see INCOMPAT_INLINE_DATA in https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout

msoad · on Feb 19, 2013

Does F2FS requires special hardware? Can I format my SSD with F2FS in near future?

wmf · on Feb 19, 2013

It works with normal SSDs.

drivebyacct2 · on Feb 19, 2013

Yeah, I'd like to know more about this. Should I be using it instead of EXT4 on my old, boring desktop SSD?

simcop2387 · on Feb 19, 2013

At the moment it's had some reports of syncing issues (e.g. not syncing when it should), and a few performance odditites. All that being said, I'm not sure i've seen reports of any corruption outright but it is fairly new compared to the ext family. I'd say stick with EXT4/3 if you're using the desktop ssd for anything serious, if not, go for it; it doesn't look like it'll kill your cat and/or wife.

drivebyacct2 · on Feb 19, 2013

Oh, I don't care if it eats my root install. I've got a "reinstall everything and reconfigure everything" script that gets me back to 99% after a new install. Plus, I keep my home partition on a "totally stable TM" RAID10 BTRFS array, so I don't mind a bit of risk.

wazoox · on Feb 19, 2013

Isn't "totally stable" and "btrfs" somewhat contradictory?

specto · on Feb 19, 2013

_So far_ my experience has been btrfs is stable, meaning I haven't had any data corruption and TRIM works great. That being said I have multiple backups and images of the system :)

drivebyacct2 · on Feb 19, 2013

It was tongue-in-cheek humor.

Symmetry · on Feb 19, 2013

Well, Phoronix has put out some benchmarks for it here:

http://www.phoronix.com/scan.php?page=article&item=linux...

I'd generally say you shouldn't use a just-merged filesystem on anything really critical, though.

SoapSeller · on Feb 19, 2013

Storing files data inside inodes is really great feature - it should make usages of file-system-as-DB approaches usable in a lot of cases where it didn't make sense before.

codex · on Feb 20, 2013

It's human nature to tool worship. I am no exception. But please, let's worship tools that enable radical innovation. The Linux kernel was that tool--in 1999. It's still improving, but it's not worthy of this much attention. It merely allows existing innovation to work better. It's like celebrating the latest Xeon processor--cool, yes, but not worth this much collective distraction.

a_a_a · on Feb 19, 2013

Support for 386 being removed. My children compile kernels on their 386 to keep warm in the winter. Linux 3.8 MURDERS CHILDREN (http://xkcd.com/1172/)

beatgammit · on Feb 21, 2013

I guess it's time to upgrade to a 486. I hear they've come down in price.

drivebyacct2 · on Feb 19, 2013

Sad. This had Thunderbolt hotplug working on my Mac at one RC and then it got removed and didn't get added back in. Thunderbolt Ethernet adapter does work, but it was kernel panicing yesterday on rc7. I'll have to see tonight if it's fixed.

To others, where would I report such an issue if it's still broken?

caf · on Feb 19, 2013

The linux-pci mailing list: http://vger.kernel.org/vger-lists.html#linux-pci