This tutorial aims to cover the basic motivation, ideas and theory of Gaussian Processes and several applications to natural language processing tasks. Gaussian Processes (GPs) are a powerful modelling framework incorporating kernels and Bayesian inference, and are recognised as state-of-the-art for many machine learning tasks. This tutorial will focus primarily on regression and classification, both fundamental techniques of wide-spread use in the NLP community. We argue that the GP framework offers many benefits over commonly used machine learning frameworks, such as linear models (logistic regression, least squares regression) and support vector machines (SVMs). GPs have the advantage of being a fully Bayesian model, giving a posterior over the desired variables. Their probabilistic formulation allows for much wider applicability in larger graphical models, unlike SVMs. Moreover, several properties of Gaussian distributions means that GP (regression) supports analytic formulations for the posterior and predictive inference, avoiding the many approximation errors that plague approximate inference techniques in common use for Bayesian models (e.g. MCMCM, variational Bayes). GPs provide an elegant, flexible and simple means of probabilistic inference. GPs have been actively researched since the early 2000s, and are now reaching maturity: the fundamental theory and practice is well understood, and now research is focused into their applications, and improve inference algorithms, e.g. for scaling inference to large and high-dimensional datasets. Several open-source packages (e.g. GPy and GPML) have been developed which allow for GPs to be easily used for many applications. This tutorial aims to present the main ideas and theory behind GPs and recent applications to NLP, emphasising their potential for widespread application across many NLP tasks.
  • GP Regression
    • The Gaussian distribution
    • GPs as a prior over functions
    • Kernels
  • NLP Applications
    • Sparse GPs: Predicting user impact
    • Multi-output GPs: Modelling multi-annotator data
    • Model selection: Identifying temporal patterns in word frequencies
  • Further topics
    • Non-conjugate likelihoods: classification, counts and ranking
    • Structured prediction
The tutorial assumes a basic understanding of probabilistic inference, calculus and linear algebra.

Materials

You can download the ACL slides or the newer ALTA slides. Note that some slides are borrowed from Neil Lawrence, Richard Turner and Daniel Preotiuc-Pietro, who kindly agreed to share their materials.

Papers

Here are links to my papers relevant to the tutorial.
Joint Emotion Analysis via Multi-task Gaussian Processes
Daniel Beck, Trevor Cohn and Lucia Specia. In Proceedings of EMNLP (short), 2014.
Abstract PDF Code
Predicting and characterising user impact on Twitter
Vasileios Lampos, Nikolaos Aletras, Daniel PreoŇ£iuc-Pietro, and Trevor Cohn. In Proceedings of EACL, 2014.
Abstract PDF
Reducing annotation effort for quality estimation via active learning
Daniel Beck, Lucia Specia, and Trevor Cohn. In Proceedings of ACL short papers, 2013.
PDF
Modelling annotator bias with multi-task Gaussian Processes: An application to machine translation quality estimation
Trevor Cohn, and Lucia Specia. In Proceedings of ACL, 2013.
PDF
A temporal model of text periodicities using Gaussian Processes
Daniel Preotiuc-Pietro, and Trevor Cohn. In Proceedings of EMNLP, 2013.
PDF