Language and Document Analysis: Motivating Latent Variable Models

Tutorial by

Wray Buntine
(NICTA, Helsinki Institute of IT)

Monday 7 September (Room Jupiter 2, Hotel Golf)

Abstract:

This tutorial first covers a variety of aspects of text and document modeling in order to motivate the use of topic models. It will briefly consider aspects of named entity recognition, natural language processing, information retrieval, and look at some standard aspects of these problems. This will not be an extensive coverage but rather a brief look at some of the issues, problems, methods and software.  Second, the tutorial will present the recent area of topic models, for instance Latent Dirichlet Allocation, Non-negative Matrix Factorisation, and related methods.  These will be reviewed and examples given using the author’s software DCA.  Some basic experimental methods and variations will also be covered. This will draw on material from the author’s papers here as well as the broader literature.