Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How tcmalloc Works (jamesgolick.com)
63 points by luu on Nov 10, 2014 | hide | past | favorite | 12 comments


Interesting stuff. These guys made some of the same conclusions around management that we made when implementing per-thread caching for libumem[1] (in particular, around summing all per-thread caches and managing that number). It would be interesting to benchmark these two allocators; we do some dynamic code generation that allows for cache sizes tuned without sacrificing performance.

[1] http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-...


This blog post is incredibly long.

It's not. It's really not. I don't know whether to feel slighted that the author assumes my attention span is so pitiful, or bemused that the author feels the need to mention the length of the post at all. I suppose someone who feels the need to open a post with the new hip way of saying "Summary" or "Abstract" has already given up on his audience anyway.


Indeed. Instead of being encouraging, it ends up sounding condescending.


Given how it is mentioned at the outset and conclusion, I almost have to think it's stated ironically or something.


IMHO one of the nicest things about tcmalloc isn't the performance it's the profiling. It samples your allocations and records the stack trace where an object was allocated, and records that information over the life of your process. This can be invaluable when tracking leaks or performance problems suspected to be due to excessive new and delete.

http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.h...


Have you tried valgrind for this?


Yes and my experience is that valgrind is a tremendously slow way to do one-off debugging of a suspected memory leak. The instrumentation in tcmalloc is really different as it has almost no cost (depending on the size of the system you might want to adjust the sample parameter for highly multithreaded programs) and is running at all times so you can use it to troubleshoot in production. When I've used valgrind the program was so slow it wasn't the kind of thing you could put under a live workload.


True. Valgrind slows things considerably.

In the end, what to use is a matter of workflow and convenience. Good to know how tcmalloc makes debugging memleaks easier.

Just for for completeness, a third option would be a tracing tool that hooks into the kernel syscalls (e.g. the lttng project). They have nearly zero performance penalty as well.


James is one of the two hosts of the Real Talk podcast, which is by far the best podcast I've ever listened to. Such a shame they only made half a dozen episodes. Check it out: http://realtalk.io/


I actually found their podcasts to be extremely rudimentary. I listened to week 3, where they discuss an article on "high scalability". I feel these guys try to hard to sound like hipsters, and are lacking any fundamental training in computer science. Every three minutes, Joe would state "I don't know what a unix kernel is"...or "I don't know what having a large application on a 4 core kernal locking up is".

To be completely honest, i was so excited when I saw your link for a "technical podcast". I thought, "hey, i finally have something with a lot of content to listen to on my way to work!". There were expectations that weren't met...


That webpage just says "bye". Did they shut it down?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: