Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article didn't address the question of why AT&T has phone records going back so many years. I doubt that five year old data about a customer would be useful for marketing, considering that they have the most recent data. Nor is old data useful for billing once the time period for contesting a phone bill has elapsed. Network capacity planning could be done with anonymized or aggregated data. So it would seem that the only reason why they hang on to all this data is because the government asks them to, or because they can make money selling it to the government (or both).

Now, think about what adverse effects such old data could have on justice. For example, let's say a friend of mine from school, who I spent a lot of time talking to on the phone a decade ago, decided this year to become a drug dealer. The government could start investigating me based on this stale data, despite not having any reasonable suspicion that I was involved in any crime.

I think that what we really need is a privacy-oriented phone company built on the model of DuckDuckGo, which doesn't keep any data about customers beyond what's necessary for billing purposes. But maybe the U.S. already has laws that would make such an ethical phone company illegal.



" … let's say a friend of mine from school, who I spent a lot of time talking to on the phone a decade ago, decided this year to become a drug dealer. The government could start investigating me based on this stale data, despite not having any reasonable suspicion that I was involved in any crime."

Next time you're pulled over for a "random traffic stop" or some inexplicably-enforced minor violation (brake lights out, rolling thru a stop sign, singled out in a stream of traffic doing ~10 over on an interstate), I assume the words "parallel construction" will loom large in your mind – along with the memory of that old girlfriend's junkie ex, and your college room-mate's best pal who used to deal weed…


Theory #1 – Because deleting stuff at scale is harder than you think. If you've got structured data (think "relational database, much of which is pretty much guaranteed to be append-only") and you've optimised it's on-disk structure for reporting (think "highly indexed, possibly even with both the index data and the transaction data being stored in ways that take advantage of known query patterns and physical disk geometry) – removing data from it is likely to be _much_ more effort than just flipping a "deleted" flag and continuing to expand your storage pool. This is even more true if you'd also need to consider replicated copies/archives/snapshots/backups. Facebook/Google/Twitter et al store everything, not just because they think it'll make them more valuable to advertisers, but because deleting data from distributed/sharded/backedup/archived databases is more expensive than just marking it "deleted" and leaving it in place.

Theory #2 – The NSA (or it's predecessor or a related agency) have been paying them to store whatever they can since whenever it became possible. If you're prepared to ignore the privacy implications, it's obvious that some _tiny_ percentage of that data will become useful for law enforcement purposes. Unfortunately – when the "privacy implications" are considered in terms of "Is _your_ privacy more important than _my_ career?" – it's pretty clear that a very powerful cohort of law enforcement and intelligence agency decision makers say "Hell no! I just need one or two more big successes and I'll get that promotion I want! Of course it's worth monitoring every single person on the planet to give me a shot at the executive bathroom and an office with a door!" (or, less cynically but equal in consequence "Should I listen to every phone conversation in America, if it might possibly mean we can stop the next 9-11? Yeah, I think it might be…").


Because deleting stuff at scale is harder than you think.

If you've got an age-based retention policy, and you've built in the capacity to delete the data, not really. Periodic purge. I'm not saying this is trivial, but it's not hard. The system was clearly designed not to delete the data.

Your theory #2 holds much more water.


The optimist in me wants to argue that there may be OPEX vs CAPEX reasons to have not invested in "building in the capacity to delete", or possibly even tax implications of development budgets vs ongoing maintenance which might reasonably explain why you'd choose to build such a system as "append-only, archive indefinitely". The pessimist in me fears you're correct…


Re: theory #1- there are commodity solutions to the queriable data warehouse problem, e.g. Vertica or Greenplum. Looking at these can give you an idea of what might be hard. You've just described Vertica pretty accurately (column-oriented storage, materialised on disk for particular query patterns, ad-hoc deletion comparatively costly). And ageing out in Vertica is easy- you just drop a partition. I can't imagine a custom system would be built without so e provision for this.


Sure, if you were building it from scratch today, but Vertical was founded in 2005 - the article talks about data going back to 1987. Cynical-me suspects the NSA has been "requiring" them to retain all that data since before it became easy/practical to expire it properly.


27 years ago was 1986.

Are you sure that 27 years worth of data storage within the bulk of that time frame was easier and/or more cost effective than deletion?


The article says that AT&T has data not just on its customer but on any traffic that passes through its switches. I suspect that's much of it.


Because the government requires they retain it.


I wonder if you could do it as an MVNO.


If you operated an MVNO (Mobile Virtual Network Operator), wouldn't your traffic still be going over AT&T's or some other carrier's network? They'd have access to the information whether you wanted them to or not.


I don't know if they keep billing data for MVNO customers. And, a $500mm/yr MVNO would have the power to negotiate contractual terms which regular individual or business customers wouldn't; not like showing up with pitchforks at exec houses in the middle of the night when violated, but with huge payments required at least.


From my brief research, they still handle all of the SS7 part of things as an MVNO, so even if they don't have 'billing' information, they definitely have the call origination/termination records/history.


Money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: