Pandas Representative Sampling Across Multiple Columns
I have a dataframe which represents a population, with each column denoting a different quality/ characteristic of that person. How can I get a sample of that dataframe/ population
Solution 1:
You create a combined feature column, weight that one and draw with it as weights:
df["combined"] = list(zip(df["favourite_colour"],
df["favourite_knight"],
df["favourite_quality"]))
combined_weight = df['combined'].value_counts(normalize=True)
df['combined_weight'] = df['combined'].apply(lambda x: combined_weight[x])
df_sample = df.sample(140, weights=df['combined_weight'])
Post a Comment for "Pandas Representative Sampling Across Multiple Columns"