Its random forests ... each tree is trained on a subset of the data. You can split the massive dataset into chunks and train independently. That sidesteps the "big data" hangup.
If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself.
Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data.
The training subset for each tree can still be quite large. Note that most of the implementations failed on their 12 GB dataset.
Although I'm a big believer in streaming/online machine learning, it's not necessarily the best solution. There are many cases when batch is the better option, especially for big data. Anything historical, really.
We have been working hard to reduce computing times and memory footprint (though, there is still a lot of improvement on that side).
(Unfortunately, I cannot run your benchmarks myself, because the compiled version of WiseRF requires a newer version of glibc than the one on my cluster, and crashes.)
Question: Why do I have to implement hyperparameter selection?
For me, the promise of in-the-cloud machine learning is that I can call 'train' method, and specify one single hyperparameter: training budget (i.e. $). Perhaps also the max time before I am returned a trained model.
This is exactly what we're enabling with our ML Platform (currently in private beta). Such a system needs to be built on top of fast & scalable ML technology with smart & efficient tuning/optimization.
Would love to hear about your use cases & get you on the beta.
There are different ways of finding optimal hyperparameters, and while a cloud system might be capable to provide a mechanism that can figure this out on its own, generally this will be less efficient...
If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself.
Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data.