How tcmalloc Works

bcantrill · on Nov 10, 2014

Interesting stuff. These guys made some of the same conclusions around management that we made when implementing per-thread caching for libumem[1] (in particular, around summing all per-thread caches and managing that number). It would be interesting to benchmark these two allocators; we do some dynamic code generation that allows for cache sizes tuned without sacrificing performance.

[1] http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-...

EliRivers · on Nov 10, 2014

This blog post is incredibly long.

It's not. It's really not. I don't know whether to feel slighted that the author assumes my attention span is so pitiful, or bemused that the author feels the need to mention the length of the post at all. I suppose someone who feels the need to open a post with the new hip way of saying "Summary" or "Abstract" has already given up on his audience anyway.

AceJohnny2 · on Nov 10, 2014

Indeed. Instead of being encouraging, it ends up sounding condescending.

personZ · on Nov 11, 2014

Given how it is mentioned at the outset and conclusion, I almost have to think it's stated ironically or something.

thrownaway2424 · on Nov 10, 2014

IMHO one of the nicest things about tcmalloc isn't the performance it's the profiling. It samples your allocations and records the stack trace where an object was allocated, and records that information over the life of your process. This can be invaluable when tracking leaks or performance problems suspected to be due to excessive new and delete.

http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.h...

jpfr · on Nov 10, 2014

Have you tried valgrind for this?

thrownaway2424 · on Nov 11, 2014

Yes and my experience is that valgrind is a tremendously slow way to do one-off debugging of a suspected memory leak. The instrumentation in tcmalloc is really different as it has almost no cost (depending on the size of the system you might want to adjust the sample parameter for highly multithreaded programs) and is running at all times so you can use it to troubleshoot in production. When I've used valgrind the program was so slow it wasn't the kind of thing you could put under a live workload.

jpfr · on Nov 11, 2014

True. Valgrind slows things considerably.

In the end, what to use is a matter of workflow and convenience. Good to know how tcmalloc makes debugging memleaks easier.

Just for for completeness, a third option would be a tracing tool that hooks into the kernel syscalls (e.g. the lttng project). They have nearly zero performance penalty as well.

suprjami · on Nov 10, 2014

James is one of the two hosts of the Real Talk podcast, which is by far the best podcast I've ever listened to. Such a shame they only made half a dozen episodes. Check it out: http://realtalk.io/

orange_sharpie · on Nov 10, 2014

I actually found their podcasts to be extremely rudimentary. I listened to week 3, where they discuss an article on "high scalability". I feel these guys try to hard to sound like hipsters, and are lacking any fundamental training in computer science. Every three minutes, Joe would state "I don't know what a unix kernel is"...or "I don't know what having a large application on a 4 core kernal locking up is".

To be completely honest, i was so excited when I saw your link for a "technical podcast". I thought, "hey, i finally have something with a lot of content to listen to on my way to work!". There were expectations that weren't met...

kawsper · on Nov 10, 2014

That webpage just says "bye". Did they shut it down?

general_failure · on Nov 10, 2014

http://goog-perftools.sourceforge.net/doc/tcmalloc.html had less of architecture insights