Were these guys a very small shop? Any info on which exchanges they bidded on? T...

lpgauth · on June 1, 2015

Hi, this article is from 2012, we're listening to much more inventory these days.

getsat · on June 1, 2015

That's great!

chupy · on June 1, 2015

I suspect a lot of people are interested in the tech used when handling 300k QPS and I wish you could give more details about it.

getsat · on June 1, 2015

I'll write a blog post on it later called "So you want to build a realtime bidding system?" and submit it to HN. :)

ddorian43 · on June 1, 2015

I did this little presentation on doing a toy adserver with python: http://slides.com/dorianhoxha/ad-serving-with-python-uwsgi-l...

It wasn't rtb (just normal adserver).

Wonder what your comments would be?

getsat · on June 1, 2015

I gave your presentation a quick look.

Maxmind - good. Not like there's many choices here, though. :)

Instead of "waking up to sync logs", consider using something like NSQ to emit events as they happen. You can scale the number of servers/processes generating messages and the number of workers consuming those messages (and committing them to your database) very easily.

You could also replace the writing of the transaction log with a NSQ event. It lets you avoid having to write and scale the log shipping stuff.

We precalculated which ads a given user was eligible for and a separate process was contacted when a bid request came in to get the info for the ad to show. We never had to do anything funky to do geotargeting exclusion at scale.

Instead of having your adserver connect to a database, have a separate process generate a working set (as JSON or whatever you fancy), compress it, and ship it to the adserver periodically. The adserver can just do a straight load from the file every minute or whatever interval you'd like. If the file's mtime is too old, raise an alert and stop serving ads if necessary. Keeping things separate and simple lets you scale more simply. Our working sets were on average about 2gb uncompressed and they could be loaded in a few seconds (C++/JSON and later Go + JSON).

Seems like it was a fun project and I hope you learned a lot!

cmrdporcupine · on June 2, 2015

At a previous job we did something similar with separate file sets describing ad inventory, but kept them in a native binary format and mmap'd them.

ddorian43 · on June 1, 2015

How do you do load-balancing there? Meaning, can 1 box of nginx/haproxy handle that many requests ?

getsat · on June 1, 2015

Anycast will spread incoming requests out to N physical machines, then you can do your layer 3 load balancing, then you can do your SSL termination and HTTP load balancing.

We didn't use anycast, though. I suggested it to our CTO many times and it would have saved us over $5,000/mo in DNS costs, but it never got done.

chupy · on June 1, 2015

Even if a single box would be capable to handle that many requests, you should never have just one box to be the main entry point to your server (failsafe).

I also suspect he refers to 300k distributed across multiple datacenters.

getsat · on June 1, 2015

Yes, you are correct. The 300k was total across all datacenters. 85% of that was in the US, 10% in Europe, and 5% in Asia.