Get Rows Based On A Condition And Separate Them Into Subsets
am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition Condition, if Column A == 0, column B should start with 'a'. Datas
Solution 1:
May be try with cumsum
as well ~
{x : y.to_dict('list')forx , y in df.groupby(df['A'].eq(0).cumsum())}
Out[87]:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['rr', 'hh', 'ww']},
4: {'A': [0, 1], 'B': ['jj', 'll']}}
Solution 2:
Do a cumsum on the condition to identify the groups, then groupby:
groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()
{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}
Output:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}
Solution 3:
This answers this question's revision 2020-11-04 19:29:39Z
. Later additions/edits to the question or additional requirements in the comments will not be considered.
First find the desired rows and select them into a new dataframe. Group the rows and convert them to dicts.
g = (df.A.eq(0).astype(int) + df.B.str.startswith('a')).replace(0, method='ffill') - 1
df_BeqA = df[g.astype('bool')]
{x: y.to_dict('list') forx , y in df_BeqA.groupby(df_BeqA.A.eq(0).cumsum() - 1)}
Out:
{0: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
1: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}
Post a Comment for "Get Rows Based On A Condition And Separate Them Into Subsets"