Its random forests ... each tree is trained on a **subset** of the data. You can...

msellout · on July 16, 2013

The training subset for each tree can still be quite large. Note that most of the implementations failed on their 12 GB dataset.

Although I'm a big believer in streaming/online machine learning, it's not necessarily the best solution. There are many cases when batch is the better option, especially for big data. Anything historical, really.

bayesianhorse · on July 16, 2013

I was thinking the same thing. Ensemble methods like this scale very well with the number of machines, with moderate efforts of coordination at least.