Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, it's surprising how small it is. 560M views a month. I had a not-that-popular VoIP company, and at peak, we were hitting that many calls per day. And each call got it's routing into over HTTP to less than 9 servers. (Not totally trivial instructions- they had to check several datasets with hundreds of millions of rows.)

Also the data size they have of a TB or so means they can easily ("cheaply" enough) shove it into RAM, using a database like VoltDB (horizontal scaling/sharding, which seems like it'd be a perfect fit for vote recording).

There's still fun stuff outside of large sites. I'm writing a network search engine (record every packet and index it all) and even for small $10M telco, it can easily be a TB a week in signaling, plus a few TB of audio (but we don't tend to record every audio stream).



.. Do the customers know you're recording calls?


It's a product for troubleshooting (my customers are carriers) and call recording is either on a selective basis (eg record the next n calls from this IP that's having trouble) or for monitoring (record 1% of calls and check for quality issues and alarm if needed). But some companies may be interested in having more extensive recordings for whatever reason. Perhaps as an easy way to offer call recordings for call centers, for instance. Or, I could see robodialling carriers wanting permanent, full audio, logs of many calls to prove they are complying with laws.

You'd be surprised how many intermediaries a call will go through. Sometimes even calling a neighbor, on landlines, will hit a tiny VoIP company in the middle. It's fair to assume, in such cases, random techies can record all sorts of end user calls without any real oversight. But I'm unaware of anyone actually bothering with this, usually the signalling provides all you need to figure problems out.

The only routine call recording I'm aware of is for 911 calls, for obvious reasons.

Anyways, my only point was there's lots of data even outside top websites and you can still have fun writing cool algorithms and systems. E.g. one of my todos is to implement SSE-optimized indexes. Maybe only a few percent boost, but fun and fairly justifiable use of time:)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: