Negative Binomial Mixture In Pymc
Solution 1:
You have correctly implemented a Bayesian estimation of a mixture of three distributions, but the MCMC model gives wrong-looking values.
The problem is that category
is not converging quickly enough, and the parameters in means
, alphas
, and dd
run away from the good values before category
decides which points belong to which distribution.
data = np.atleast_2d(list(mc.rnegative_binomial(100., 10., size=s)) +
list(mc.rnegative_binomial(200., 1000., size=s)) +
list(mc.rnegative_binomial(300., 1000., size=s))).T
nsamples = 10000
You can see that the posterior for category
is wrong by visualizing it:
G = [data[np.nonzero(np.round(mcmc.trace("category")[:].mean(axis=0)) == i)]
for i inrange(0,3) ]
plt.hist(G, bins=30, stacked = True)
Expectation-maximization is the classic approach to stabilize the latent variables, but you can also use the results of the quick-and-dirty k-means fit to provide initial values for the MCMC:
category = mc.Categorical('category', p=dd, size=ndata, value=kme.labels_)
Then the estimates converge to reasonable-looking values.
For your prior on alpha, you can just use the same distribution for all of them:
alphas = mc.Gamma('alphas', alpha=1, beta=.0001 ,size=n)
This problem is not specific to the negative binomial distribution; Dirichlet-mixtures of normal distributions fail in the same way; it results from having a high-dimensional categorical distribution that MCMC is not efficient at optimizing.
Post a Comment for "Negative Binomial Mixture In Pymc"