Wednesday, December 11, 2013

Highlights of NIPS 2013


Overall

  1. Deep Learning (or Deep Neural Network or DNN) is again the most trendy topic of the conference. Its workshop session is perhaps twice (or more than that) as big as the one last year and it was packed for most of the day. Interestingly, Mark Zuckerberg of Facebook stopped by for a Q&A and then a panel discussion session. His visit was mostly to announce the new AI Lab of Facebook. For technical highlights, see below.
  2. Distributed machine learning is another topic of great interest.
  3. Growing markets and interests in predictive analytics on sensor data (e.g. activity detection on mobile phones or wearable devices), in ML for Health Care, and in ML for Education.
  4. And there are certainly bad-ass research in other areas which I have missed. Among the topics of my interest, Optimization (particularly non-convex optimization) however hasn't made much progress.  

Deep Neural Nets

  • Natural Language Processing. Application of DNN in NLP is the theme of the deep learning research this year. This is natural because NLP is the holy-grail of machine learning research. DNN has already convincingly demonstrated its power in Computer Vision and Speech Recognition. There were some cool research using DNN in NLP such as Compositional Natural Language Parsing with Compositional Vector Grammars by a team at Stanford (led by Richard Socher) and Word2Vec project at Google (led by Tomas Mikolov). I think that this is just the beginning though.
  • Computer Vision. New benchmark for ImageNet has been established by Matt Zeiler et al. although it's not a big improvement from the previous record set by Alex Krizhevsky et al. (see their famous paper). Although Matt's work received a lot of attention from the community (come on, he set a new record), I was slightly disappointed. The spirit of his paper is about understanding convolutional neural networks but he did not explain why his network (which is a customized version of Alex's network) yields better results. He also wasn't able to rigorously explain certain mysteries (such as why rectified linear units work so well) in training neural networks.
  • Non-convex Optimization. This is the topic that I care the most in DNN because solving a DNN is a non-convex optimization problem. The current techniques only try to find a local minimum, at best. Here's an experiment that my team did for activity detection on mobile devices using sensory data. After features extraction using PCA and feeding the extracted model into a neural network, we got very good results. We then simulated the PCA feature extraction by introducing another layer to the neural network. We expected that the the optimized weights of the first layer should be identical to the PCA weights, if not better. However, we got worse results. This indicates that the optimizer converged to a non-so-good local optimum.

Distributed Machine Learning

There're many interesting posters/talks on the topic of distributed and large scale machine learning at NIPS this year (such as this, this, and this workshop). However, what really excited me is not the work presented at NIPS but Spark, an Apache project built on top of HDFS for running distributed iterative operations distributedly. (It's a shame that I hadn't been aware of this project earlier.)

The cool thing about Spark is that it inherits the goods of Hadoop (i.e. HDFS) and re-engineers the rest. In particular:
  1. Iterative algorithms, which is the norm in ML, can be run in memory during their lifetime. No more reading from / writing to the disk for each iteration.
  2. Support of more operators, beyond Mapping and Reducing, which can be piped and lazily evaluated. See more here. In a sense, Spark is similar to Parallel LINQ but on HDFS data.
These are the main reasons why serious machine learning practitioners don't use Hadoop beyond HDFS. Yes, stay away from the Hadoop hype. HDFS is cool. The rest of Hadoop is not so.

Machine Learning on Sensor Data

To be (never) updated ...