Friday, November 2, 2012

Is online (machine) learning necessary?

This is a question on Quora which I answered a while back
I used to ask this question myself, but in retrospect, I clearly didn't understand machine learning at that time. 

A ML system consists of 2 parts: a (re)trainer and a predictor. 

In most applications, the predictor needs to be real-time. For example: once an email arrives, a spam detector needs to immediately classify if the email is a spam or not. The same responsiveness applies to web search ranking, pattern recognition, etc.

Making the predictor real-time is easy. It just applies the model function, with the coefficients trained by the learner, to the new input (e.g. new email or new search query).

However, the learner typically doesn't need to be real-time. The output by the learner, loosely speaking, captures the the behavior and/or tastes of users and/or other factors. These factors do change over time, but slowly. So usually, it's fine to train a ML model every day or even every month.

The learning part can be time-critical when your model is dumb due to small data and you need to make it smarter as soon as possible.


Here's another way to look at it. Think of a human learner. So many new things that I digest (see/hear/read/etc.) everyday. Yet, I rarely convert what I digest into experience (and act based on this updated experience) on a weekly basis, let alone real time.

The fact that I can't immediately convert what I see into experience is totally fine. In fact, if I could relearn myself, in a non-heuristic way, on a weekly basis, I'd have been much more intelligent than I am today.


Online learning is an area in ML that focuses on applications that need real-time (or incremental) training.

I am very interested in the technical challenges posed in this field, and even working on a new approach to online logistic regression (*). However, I think the impact of online learning in the near future is small. (This opinion is controversial, I know.)