Skip to content Skip to sidebar Skip to footer

Creating A Best Fit Probability Distribution From Pdf Sample Coordinates With Scipy

Problem: I have data points indicating coordinates sampled from a probability distriabution (in this case we will assume a discrete probability distribution function) We are essent

Solution 1:

Calling your PDF f(x):

If your data really represents {x, f(x)} then you could try simply optimizing for the parameters of f using e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq

If your data on the other hand are samples from the probability distribution, i.e. your data looks like {x} but each x is chosen with probability f(x), then you should try Markov Chain Monte Carlo to estimate f. There are several choices for Python:

https://pystan.readthedocs.io/en/latest/

http://docs.pymc.io/notebooks/getting_started.html#Model-fitting

Solution 2:

I think your data represents a pdf {x, y = pdf(x)} since sum(y) = 1. When we plot your data with a slight correction x = list(range(39)) we get a curve similar to a lognormal (?).

import matplotlib.pyplot as plt

x = list(range(39))
plt.plot(x, y)

enter image description here

One trick you could use to avoid optimisation algorithms is to transform your data into a sample since each y[i] is proportional to the frequency of x[i] . In other words, if you want a 'perfect' sample S of size N, each x[i] will appear N * y[i] times.

N = 20.000n_times = [int(y[i] * N) for i in range(len(y))]
S = np.repeat(x, n_times)

All that remains to be done is to fit a LogNormal distribution to S. Personally, I am used to OpenTURNS library. You just need to format S as an ot.Sample by reshaping into N points of dimension 1

import openturns as otsample= ot.Sample([[p] for p in S])
fitdist = ot.LogNormalFactory().build(sample)

fitdist is an "ot.Distribution", you can print to see its parameters

print(fitdist)
>>> LogNormal(muLog = 1.62208, sigmaLog = 0.45679, gamma = -1.79583)

or plot both curves using fitdist.computePDF built-in method which takes as argument ot.Sample format

plt.plot(x, y)
plt.plot(x, fitdist.computePDF(ot.Sample([[p] for p in x])))

enter image description here

Post a Comment for "Creating A Best Fit Probability Distribution From Pdf Sample Coordinates With Scipy"