 ## Dice, Polls &amp; Dirichlet Multinomials | |
[ 所属分类 开发（python） | 发布者 店小二03 | 时间 2019 | 作者 红领巾 ] 0人收藏点击收藏 Photo by Jonathan Petersson on Unsplash

As part of a longer term project to learn Bayesian Statistics, I’m currently reading Bayesian Data Analysis, 3rd Edition by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin, commonly known as BDA3 . Although I’ve been using Bayesian statistics and probabilistic programming languages, like PyMC3 , in projects for the last year or so, this book forces me to go beyond a pure practioner’s approach to modeling, while still delivering very practical value.

Below are a few take aways from the earlier chapters in the book I found interesting. They are meant to hopefully inspire others to learn about Bayesian statistics, without trying to be overly formal about the math. If something doesn’t look 100% to the trained mathematicians in the room, please let me know, or just squint a little harder.;)

We’ll cover:

Some common conjugate distributions An example of the Dirichlet-Multinomial distribution using dice rolls Two examples involving polling data from BDA3 Conjugate Distributions

In Chapter 2 of the book, the authors introduce several choices for prior probability distributions, along with the concept of conjugate distributions in section 2.4.

From Wikipedia

In Bayesian probability theory, if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.

John Cook has this helpful diagram on his website that shows some common families of conjugate distributions: Conjugate Priors

Conjugate distributions are a very important concept in probability theory, owing to a large degree to some nice mathematical properties that make computing the posteriors more tractable. Even with increasingly better computational tools, such as MCMC, models based on conjugate distributions are advantageous.

Beta-Binomial

One of the better known examples of conjugate distributions is the Beta-Binomial distribution, which is often used to model series of coin flips (the ever present topic in posts about probability).

While the Binomial distribution represents the probability of success in a series of Bernoulli trials, the Beta distribution here represents the prior probability distribution of the probability of success for each trial.

Thus, the probability p of a coin landing on head is modeled to be Beta distributed (with parameters α and β), while the likelihood of heads and tails is assumed to follow a Binomial distribution with parameters n (representing the number of flips) and the Beta -distributed p, thus creating the link.

p Beta(α,β)

y Binomial(n,p)

Gamma-Poisson

Another often-used conjugate distribution is the Gamma-Poisson distribution, so named because the rate parameter λ that parameterizes the Poisson distribution is modeled as a Gamma distribution:

λ Gamma(k,θ)

y Poisson(λ)

While the discrete Poisson distribution is often used in applications of count data, such as store customers, eCommerce orders, website visits, the Gamma distribution serves as a useful distribution to model the rate at which these events occur (λ), since the Gamma distribution models positive continuous values only, but is otherwise quite flexible in its parameterization: Gamma Distributions

This distribution is also known as the Negative-Binomial distribution , which we can think of as a mixture of Poission distributions.

If you find this confusing, you’re not alone, and maybe you’ll start to appreciate why so often we try to approximate things using the good old Normal distribution…

Dirichlet-Multinomial

A perhaps even more interesting yet seemingly less talked-about example of conjugate distributions is the Dirichlet-Multinomial distribution, introduced in chapter 3 of BDA3.

One way of think about the Dirichlet-Multinomial distribution is that while the Multinomial (i.e. multiple choices) distribution is a generalization of the Binomial distribution (i.e. binary choice), the Dirichlet distribution is a generalization of the Beta distribution. That is, while the Beta distribution models the probability of a single probability p, the Dirichlet models the probabilities of multiple , mutually exclusive choices, parameterized by a which is referred to as the concentration parameter and represents the weights for each choice (we’ll see more on that later).

In other words, think of coins for the Beta-Binomial distribution and dice for the Dirichlet-Multinomial distribution.

θ Dirichlet(a)

y Multinomial(n,θ)

In the wild, we might encounter the Dirichlet distribution these days often in the context of topic modeling in natural language processing, where it’s commonly used as part of a Latent Dirichlet Allocation (or LDA) model, which is a fancy way of saying we’re trying to figure out the probability of an article belonging to a certain topic given its content.

However, for our purposes, let’s look at the Dirichlet-Multinomial in the context of simple multiple choices, and let’s start by throwing dice as a motivating example.

Throwing Dice

(If you want to try out the code snippets here, you’ll need to import the relevant python libraries first. Or you can follow along with the Jupyter notebook accompanying this article .)

import numpy as np from scipy import stats import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import pymc3 as pm

1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。 技术大类 | 开发（python） | 评论(0) | 阅读(113)