Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that big data has made math sexy, and selling applied statistics and operations research to small and medium-sized businesses under the guise of "big data" with the intention of providing applied mathematical tools is what is happening in the market.


Statistics involves checking modeling assumptions. A lot of what I've seen with the big data people is the repetition of algorithms to the exclusion of understanding and checking modeling assumptions.

While it's nice that the big data craze is making statistics more popular in the mainstream press, it is important that statistics does not become just an application of numerical methods without consideration of underlying assumptions. I stress this because this has been largely underappreciated in my experience.


This is why I am unconvinced about the prefab products that are currently available. No matter how much you "automate" things, the fact is that you need a human brain, and a decent and careful one at that, to do anything worthwhile. I don't think the majority of companies understand this.


Your comment reminded me of this recent Krugman post which makes a similar point about economics.

http://krugman.blogs.nytimes.com/2013/05/11/harpooning-ben-b...


There are a huge number of useful machine learning techniques that don't have checkable "modelling assumptions" per se, just good performance on given tasks (decision trees for instance are really difficult to think about in terms of underlying statistical properties). Heck, even most statistical models are demonstrably false for any given application, yet simultaneously very useful.


Reminds me of a quote I read somewhere about simulation models (specifically referring to Agent Based Modelling) saying that (heavily paraphrasing): "A lot of models are great random number generators"... or "garbage in, garbage out".

I suspect a lot of these people doing "big data models" are as you say, ignoring the importance of having solid assumptions.

Oh well, that's exactly in part what brought down the financial collapse: A bunch of kids get a formula (Blach-Scholes) and believe blindly in its magic powers so they apply it to everything. Fast forward several years and we've got what everybody knows.


"Big data" also checks model assumptions, if only if by monitoring whether or not acting on the information moves a business metric.

Statistics involves inference over prediction, but either one when done right validates assumptions.

By the way big data will sit on your face for days.


I meant checking assumptions not just to see whether the use of the big data moved a business metric, but also that the model makes sense from a statistical perspective.

A lot of statistics in business does not bother to check modeling assumptions. Models are chosen based on whether they've been used in the past and what the team is familiar with.

I don't doubt that big data (as we call it now) will one day rule. Ronald Fisher would keel over if he saw the size of datasets we work with nonchalantly on a daily basis. 50 data points (the size of the Iris data) is laughable these days.

My reservation with big data is that the technologies are often unnecessary for the size of the tasks being done. Other than a few data scientists working on truly large projects, most of the big data talk I hear comes from people who aren't fighting in the trenches (execs, marketing, journalists).


It's still amazing what businesses are able to accomplish with summing, counting, percentage of total, % change period over period, average, median, min, max.


It's even more amazing how few businesses are able to compute those operations.


Add in bonuses based on those numbers and its amazing any consistency exists in their calculation. Basically in practice you're only allowed a consistent and analytically defensible system if no ones bonus depends on the process being obfuscated. This is why a lot of "big data" is oriented around generating new ideas and new numbers, rather than fixing existing systems and data...


This is exactly right. I'm a member of INFORMS (the operations research professional society), and I can report that a staggering amount of ink has been spilled over the last few years about how to capitalize on the recent "Analytics" and "Big Data" trends.

On the one hand, people are starting to realize that quantitative analysis can help their businesses (mind blowing, right?) -- on the other hand, so much of what you see about "analytics" and "big data" is nonsensical jargon. You have two camps within the OR world: people who want to ride this bandwagon all the way to the bank, and people who want to refocus on getting the message out about what OR really is.

The bandwagon-riders have succeeded to some extent. INFORMS created a monthly "Analytics" magazine[1], created an Analytics Certification[2] (their first professional certification), and so on.

The other camp has a legitimate concern that OR already has an "identity crisis" (operations research vs. management science vs. systems engineering vs. industrial engineering vs. applied math vs. applied statistics etc etc). INFORMS has spent millions trying to get business people to just be aware that it exists. The fear is that hitching our wagon to these trends will just be another blow to our profile when these fad words are replaced by the next big thing.

[1] http://analytics-magazine.org/ (you can get a good feel for the type of content in this publication just by reading the article titles...)

[2] https://www.informs.org/Build-Your-Career/Analytics-Certific...


Not a bad assessment of what seems to be going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: