Bayesian Generalized Linear Models
This lecture briefly introduces how to fit a generalized linear model from a Bayesian perspective. Prior specification proceeds similar to Bayesian linear regression, and JAGS can be used fairly simply to sample from the posterior.
Although we will be discussing a fairly simple example, the GLM framework could be combined with some aforementioned topics such as using random intercepts to aggregate data at a group level.
Sparrow example
This motivating example comes from Hoff’s book, page 244. Younger male sparrows may or may not nest during a mating season, based on their physical characteristics. Researchers have recorded the nesting success of 43 young male sparrows of the same age, as well as their wingspan in centimeters.




The goal is to model the relationship between nesting success and wingspan. Our response data is binary:
It seems we tend to observe greater wingspans among sparrows which successfully nested.


The response is no longer a continuous variable, so the normal distribution doesn’t seem appropriate here.
Logistic regression model
We observe the binary response variable \(Y_i \in \{0, 1\}\)
, indicating whether sparrow \(i\)
has successfully nested. For the sampling model, we can think of this as a Binomial experiment:
where \(\theta_i\)
is the probability of nesting success for sparrow \(i\)
and is constrained to be between 0 and 1. We are going to assume conditional independence too, i.e. conditional independence of the responses based on the probabilities of success. The probability density function is thus:
The question is how do we relate this nesting success to wingspan, the covariate in this model? One idea is to treat this as a linear regression model, e.g. \(Y_i \mid \theta_i \sim \text{Ber}(\theta_i)\)
, where
\begin{equation} \theta_i = \beta_1 + \beta_2 x_i \tag{1} \end{equation}
The problem is \(\theta_i\)
is a probability, so it must be between 0 and 1. The way the regression function is specified in Eq.(1) places no constraints for \(\beta_1\)
, \(\beta_2\)
, or a positive \(x_i\)
. There’s no guarantee that for a sparrow with wingspan \(x_i\)
, we get a valid nesting probability \(\theta_i\)
between 0 and 1.
From a Bayesian perspective, we may construct a prior distribution on \(\beta_1\)
and \(\beta_2\)
so that \(\theta_i\)
has to be between 0 and 1, but it’s not really a rational way to address this issue.
We may consider resolving this issue by mapping \(\theta_i\)
to the real number line, i.e. from \(\infty\)
to \(\infty\)
, through the following transformation:
Here \(g\)
is the logit
or logodds
function. Note that:
\begin{gathered} \theta_i \approx 0 \implies g(\theta_i) \rightarrow \infty \\ \theta_i \approx 1 \implies g(\theta_i) \rightarrow \infty \end{gathered}
\(g\)
maps probabilities \(\theta_i\)
to the real number line, which is exactly what we wanted! The issue could then be fixed by:
By doing this, we no longer need a constraint on \(\beta_1\)
, \(\beta_2\)
, or \(x_i\)
. As for prior models, we don’t need priors on \(\theta_i\)
’s anymore, but we still need priors on \(\beta_1\)
and \(\beta_2\)
. Assuming we don’t have much information:
Generalized linear models
The previous section is an example of what we call generalized linear models
. GLMs are used to relate a linear function of the predictor variables to a response variable \(Y\)
which is not normally distributed. The basic idea is that we specify a sampling model for the response variable \(Y_i\)
with parameters \(\eta_i\)
. We use a link function
\(g\)
to connect the parameters to the \(p\)
predictors, i.e.,
The motivation behind this is that the parameter space of \(\eta_i\)
might not be all real numbers, but it’s possible for the RHS to take any real number. So \(g\)
(and its inverse^{1}) maps this to the appropriate parameter space.
There are a lot of examples for GLMs, and two of the most common are logistic regression
and Poisson regression
. Logistic regression is used for binary (01) response data, and it uses a logit link. Poisson regression is used for count response data, and a log link is used:
\begin{gathered} Y_i \mid \theta_i \sim \text{Pois}(\theta_i) \\ g(\theta_i) = \log(\theta_i) = \beta_1 + \sum_{j=2}^{p+1} \beta_j x_{ij} \end{gathered}
The only constraint here is \(\theta_i > 0\)
, as it represents the mean number of events occurring.
Back to sparrows
The full Bayesian logistic regression model for the sparrow data is given below. For \(i = 1, \cdots, n\)
, the sampling model is given by:
\begin{gathered} Y_i \mid \theta_i \sim \text{Ber}(\theta_i) \\ \log\left(\frac{\theta_i}{1\theta_i} \right) = \beta_1 + \beta_2 x_i \end{gathered}
The prior model is:
We can draw approximate samples from the posterior using MCMC. Fitting GLMs is fairly simple in JAGS, as shown in the code below.


Numerical summaries of the posterior regression coefficients are given below.
Parameter  Posterior mean  Posterior sd  95% posterior CI 

$\beta_1$  11.01  4.91  (21.37, 2.03) 
$\beta_2$  0.87  0.38  (0.17, 1.68) 
Note that the 95% credible interval for \(\beta_2\)
doesn’t include 0, which is strong evidence suggesting that wingspan is related to nesting success.
Interpretation of the coefficients is still possible, just somewhat more complicated than linear regression. \(E(\beta_1 \mid y) = 11.01\)
means that for wingspan of \(x_i = 0\)
, the logodds of nesting success is 11.01:
\(E(\beta_2 \mid y) = 0.87\)
means for an increase in wingspan of 1cm, we expect the logodds of nesting success to increase by 0.87. The odds is expected to increase by a factor of \(\exp(0.87) = 2.39\)
.
Predictions
Suppose we have a new sparrow with a wingspan of 13.7cm. Do we expect this sparrow to be able to successfully nest? Given \(x^* = 13.7\)
, one way to answer this question is by looking at its posterior probability of nesting success, based on the observed data.
We may get \(\theta_{\text{pred}}\)
from the posteriors of \(\beta_1\)
and \(\beta_2\)
. The posterior mean is 0.71 and the 95% CI is (0.52, 0.87), which suggest it’s more likely than not this particular sparrow is going to nest.
Notes
We can build GLMs with multiple predictors, and use similar ideas from Bayesian linear regression to think about model selection, i.e. DIC. We can account for random effects by combining GLMs with ideas from hierarchical modeling. Keeping the sampling model unchanged, the new regression function is:
where \(s_i\)
represents the nesting location of sparrow \(i\)
, and \(\alpha_{s_i}\)
are the random intercepts. The “locationlevel model” is:
Priors for \(\beta_1\)
and \(\beta_2\)
are also the same as before.
There are entire courses devoted to Bayesian GLMs, and this lecture is only meant to be a simple introduction of this topic.
One requirement of link functions is they should be invertible. ↩︎
Apr 26  A Bayesian Perspective on Missing Data Imputation  11 min read 
Apr 12  Penalized Linear Regression and Model Selection  18 min read 
Apr 05  Bayesian Linear Regression  18 min read 
Mar 29  Hierarchical Models  18 min read 
Mar 22  MetropolisHastings Algorithms  17 min read 