Creating A Best Fit Probability Distribution From Pdf Sample Coordinates With Scipy
Solution 1:
Calling your PDF f(x)
:
If your data really represents {x, f(x)}
then you could try simply optimizing for the parameters of f
using e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq
If your data on the other hand are samples from the probability distribution, i.e. your data looks like {x}
but each x
is chosen with probability f(x)
, then you should try Markov Chain Monte Carlo to estimate f
. There are several choices for Python:
https://pystan.readthedocs.io/en/latest/
http://docs.pymc.io/notebooks/getting_started.html#Model-fitting
Solution 2:
I think your data represents a pdf {x, y = pdf(x)} since sum(y) = 1
.
When we plot your data with a slight correction x = list(range(39))
we get a curve similar to a lognormal (?).
import matplotlib.pyplot as plt
x = list(range(39))
plt.plot(x, y)
One trick you could use to avoid optimisation algorithms is to transform your data into a sample since each y[i]
is proportional to the frequency of x[i]
. In other words, if you want a 'perfect' sample S
of size N, each x[i]
will appear N * y[i]
times.
N = 20.000n_times = [int(y[i] * N) for i in range(len(y))]
S = np.repeat(x, n_times)
All that remains to be done is to fit a LogNormal distribution to S. Personally, I am used to OpenTURNS library. You just need to format S
as an ot.Sample
by reshaping into N points of dimension 1
import openturns as otsample= ot.Sample([[p] for p in S])
fitdist = ot.LogNormalFactory().build(sample)
fitdist
is an "ot.Distribution", you can print to see its parameters
print(fitdist)
>>> LogNormal(muLog = 1.62208, sigmaLog = 0.45679, gamma = -1.79583)
or plot both curves using fitdist.computePDF
built-in method which takes as argument ot.Sample
format
plt.plot(x, y)
plt.plot(x, fitdist.computePDF(ot.Sample([[p] for p in x])))
Post a Comment for "Creating A Best Fit Probability Distribution From Pdf Sample Coordinates With Scipy"