Monday, July 14, 2014

ICML 2014 Highlights 2: On Deep Learning and Language Modeling

Previously: Highlights #1 - On ML Fundamentals

Deep Learning and Language Modeling

Images classification seems to be the past. The current wave of DL research is all about language modeling. Here’re some interesting works on this front.

Distributed representations of sentences

(by Quoc Le and Tomas Mikolov)

This is an extension from word vectors to sentence and paragraph vectors.

  • All word/sentence/paragraph vector representation training is unsupervised
  • In all cases (word, sentence, or paragraph), the training is done by solving a meta problem: trying to predict the next word.
  • Suppose that a paragraph can be represented by a vector, not necessarily of the same length as of the word vectors.
  • In predicting the next word, the paragraph vector and the previous words are joined (via concatenation or averaging) to form a new vector.
  • There are two stages: training and inference
  • Training
    • Paragraph and word representations are trained together
    • The objective of this stage is to build a dictionary of word vectors
  • Inference
    • Freeze the word vectors and run gradient descent on the paragraph vector (remember that we're training for the task of predicting the next words)
    • Quoc commented that the inference for each new paragraph takes about 30 minutes, which is not acceptable in practice. FYI, their implementation was done in C++, not MATLAB.
Still unclear
  • What is the semantics of concatenating or averaging word (and paragraph) vectors?
  • Is training to predict the next (or middle) word the best mechanism to learn the vector representation of a sentence or paragraph?

Multimodal Neural Language Models

(by Rusland, Richard Zemel, and their student Ryan Kiros - Univ. of  Toronto)

They investigated an image-text multimodal language model. There are two applications to this multimodal model:
  1. Retrieve images given complex sentences or queries
  2. Generate description and tags given an image
They have a demo at I've tried it briefly and here's my impression:

  • The description generation is very off for many images that I've tried (which are a variety of people, outdoor scenes, objects, etc.)
  • However, the tags generation seems quite good.

Workshop keynote: Learning to Represent Natural Language

(By Yoshua Bengio)

This is a survey talk about Deep Learning applied to NLP, which touches on many topics within Deep Learning.
  • Successful representation of words. However, sentence/paragraph representation is still a challenging problem.
  • Deep architectures are more expressive: some functions compactly represented with k layers may require exponential sizes if using just 2 layers.
  • DL should be thought of as representation learning (not some magical AI technology that is overhyped by media)
    • Its true power would come in multi-task learning

      Unfortunately, there’s no real world example yet to show this power.
    • Relational learning
      • Traditional ML treats data as a matrix
      • In relational learning: data come from multiple sources, with different schemas
      • Relational learning tries to learn the shared representations
  • Curriculum learning
  • Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones.
  • Optimization challenges in DL: the problem is non-convex => many local minima; even worse: the problem is perhaps ill-conditioned.
  • Recurrent neural net tricks: if non-recurrent DNN papers are a garden full of tricks, then the recurrent neural net literature is a zoo of tricks. Too many too list here in a meaningful way.
  • Machine translation: my favorite one
    • Machine translation is Bengio's current focus; he is looking to radically improve machine translation through deep learning.
    • He proposed the encoder-decoder framework for machine translation. I'm sold on this approach. Here're a few captured slides about this approach.

    • He also composed a list of papers on neural nets for machine translation here. Cool!

Take home for our team
  • Implement RNN as soon as possible. It is the state-of-the-art for sequence labeling, speech recognition, machine translation, and language understanding.
  • Which RNN to implement: long-short-term-memory RNN (confirmed to work the best by Bengio and Li Deng)