Skip to content Skip to sidebar Skip to footer

Get Rows Based On A Condition And Separate Them Into Subsets

am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition Condition, if Column A == 0, column B should start with 'a'. Datas

Solution 1:

May be try with cumsum as well ~

{x : y.to_dict('list')forx , y in df.groupby(df['A'].eq(0).cumsum())}
Out[87]: 
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
 3: {'A': [0, 1, 2], 'B': ['rr', 'hh', 'ww']},
 4: {'A': [0, 1], 'B': ['jj', 'll']}}

Solution 2:

Do a cumsum on the condition to identify the groups, then groupby:

groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()

{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}

Output:

{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
 3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}

Solution 3:

This answers this question's revision 2020-11-04 19:29:39Z. Later additions/edits to the question or additional requirements in the comments will not be considered.

First find the desired rows and select them into a new dataframe. Group the rows and convert them to dicts.

g = (df.A.eq(0).astype(int) + df.B.str.startswith('a')).replace(0, method='ffill') - 1
df_BeqA = df[g.astype('bool')]

{x: y.to_dict('list') forx , y in df_BeqA.groupby(df_BeqA.A.eq(0).cumsum() - 1)}

Out:

{0: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 1: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}

Post a Comment for "Get Rows Based On A Condition And Separate Them Into Subsets"