How to Combine Variational Inference and MCMC and Get the Best of Both Worlds

Speaker :

Francisco J. R. Ruiz

Date :

February 4th 2020


Approximating the posterior of a probabilistic model is the central goal of Bayesian inference, being variational inference (VI) and Markov chain Monte Carlo (MCMC) two of the main inference tools. Each method enjoys its own advantages: MCMC is asymptotically exact, while VI runs typically faster and is amenable to amortization, which allows for further speed-up in latent variable models. In this talk, we present a method that combines MCMC and VI, leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs).


Francisco J. R. Ruiz is a Research Scientist at DeepMind in London. Before joining DeepMind, he was a Postdoctoral Research Fellow in the Department of Computer Science at Columbia University and in the Engineering Department at the University of Cambridge, where he held a Marie-Skłodowska Curie Individual Fellowship in the context of the E.U. Horizon 2020 program. Francisco completed his Ph.D. in 2015 and M.Sc. in 2012, both from the University Carlos III in Madrid. His research is focused on statistical machine learning; in particular, his interests include: approximate Bayesian inference, probabilistic models for discrete data, topic modeling, generative models, and time series models.