- Deep Learning (or Deep Neural Network or DNN) is again the most trendy topic of the conference. Its workshop session is perhaps twice (or more than that) as big as the one last year and it was packed for most of the day. Interestingly, Mark Zuckerberg of Facebook stopped by (bringing security guards with him) for a Q&A and then a panel discussion session. His visit was mostly to announce the new AI Lab of Facebook which focuses on "long-term" deep learning research and to promote AI research at FB. For technical highlights, see below.
- Distributed machine learning is another topic of huge interest.
- Growing market and interest in predictive analytics on sensor data (e.g. activity detection on mobile phones or wearable devices)
- And there are certainly bad-ass research in other areas which I have missed. Among the topics of my interest, Optimization (particularly non-convex optimization) however hasn't made much progress.
Deep Neural Nets
- Natural Language Processing. Application of DNN in NLP is the theme of the deep learning research this year. This is natural because NLP is the holy-grail of machine learning research. DNN has already convincingly demonstrated its power in Computer Vision and Speech Recognition. There were some cool research using DNN in NLP such as Compositional Natural Language Parsing with Compositional Vector Grammars by a team at Stanford (led by Richard Socher) and Word2Vec project at Google (led by Tomas Mikolov). I think that this is just the beginning though.
- Computer Vision. New benchmark for ImageNet has been established by Matt Zeiler et al. although it's not a big improvement from the previous record set by Alex Krizhevsky et al. (see their famous paper). Although Matt's work received a lot of attention from the community (come on, he set a new record), I was slightly disappointed. The spirit of his paper is about understanding convolutional neural networks but he did not explain why his network (which is a customized version of Alex's network) yields better results. He also wasn't able to rigorously explain certain mysteries (such as why rectified linear units work so well) in training neural networks.
- Non-convex Optimization. This is the topic that I care the most in DNN because solving a DNN is a non-convex optimization problem. The current techniques only try to find a local minimum, at best. Here's an experiment that my team did for activity detection on mobile devices using sensory data. After features extraction using PCA and feeding the extracted model into a neural network, we got very good results. We then simulated the PCA feature extraction by introducing another layer to the neural network. We expected that the the optimized weights of the first layer should be identical to the PCA weights, if not better. However, we got worse results. This indicates that the optimizer converged to a non-so-good local optimum.
From the scientific standpoint, the sexy part about DNN is that it can model very complicated machine learning tasks without doing too much feature engineering. Until there's a breakthrough in global optimization technique for non-convex problem or some convex remodeling of DNN, DNN will be just another periodic trend. That's why training DNNs still requires a significant amount of engineering.
Distributed Machine Learning
The cool thing about Spark is that it inherits the good of Hadoop (HDFS) and re-engineers the rest. In particular:
- Iterative algorithms, which is the norm in ML, can be run in memory during their lifetime. No more reading from / writing to the disk for each iteration.
- Support of more operators, beyond Mapping and Reducing, which can be piped and lazily evaluated. See more here. In a sense, Spark is similar to Parallel LINQ but on HDFS data.
The not-so-cool thing is that the whole Hadoop technology stack is in Java. Not sure whether we should wait for (or initiate) a .NET version of Spark or convert our ML code base to Java/Scala.
Machine Learning on Sensor Data
To be updated ...