active: mid 2011 to present

# Binary Log-Linear Models

And Information Theoretic Priors

In my pursuit of simple, elegant machine learning algorithms, I came across Boltzmann machines probably somewhere around 2010. The idea seemed mysterious at first, especially since I was used to deterministic, directed graph-based models (i.e feedforward neural networks), whereas these were probabilistic, undirected models. But the more I studied them, the more I was intrigued, and the more connections I found to other areas of machine learning, statistics, and physics.

One useful generalization of Boltzmann machines is the set of binary "log-linear models," whose log probabilities are linear functions of the parameters. Log-linear models are a flexible language for representing structure (or lack thereof) among variables, which includes all kinds of undirected graph-based models (e.g. Boltzmann machines, Markov networks, Ising models) as a special case.

In terms of learning these powerful models from data, I'm mainly focused on Bayesian methods, e.g. sampling the posterior and predictive distributions, or at least optimizing parameters over the posterior. These approaches of course require a well-defined prior distribution. Choosing a prior can be difficult for high-dimensional models, where less intuition is available. Fortunately, information theory provides a useful framework for defining priors which are optimal for prediction. This approach, which is well-studied in the field of objective Bayesian inference, leads to a particular prior with an interesting mathematical form and a wide range of useful properties. My goal here is to provide a principled default solution to the problem of overfitting with high-dimensional parameter spaces.

Deriving information theoretic priors for log-linear models, and finding a way to make them computationally practical, have been my main focus now for several years. I am close to finishing a few key derivations, at which point I will write up the resulting math and begin running software experiments. As this has been quite a long process (from initial inspiration to where I am now), I am excited to see the results. More to come...