Combine Or Iterate Pandas Rows On Specific Columns
I am struggling to figure this row by row iteration out in pandas. I have a dataset that contains chat conversations between 2 parties. I would like to combine the dataset to row b
Solution 1:
You could groupby
on consecutive line_by
and the using agg
aggregate for lastest timestamp
, and ''.join
line_text
In [1918]: (df.groupby((df.line_by != df.line_by.shift()).cumsum(), as_index=False)
.agg({'id': 'first', 'timestamp': 'last', 'line_by': 'first',
'line_text': ''.join}))
Out[1918]:
timestamp line_text id line_by
0 02:54.3 Text Line 1 1234 Person1
1 03:47.0 Text Line 2Text Line 3 1234 Person2
2 05:46.2 Text Line 4Text Line 5 1234 Person1
3 06:44.5 Text Line 6 9876 Person2
4 07:27.6 Text Line 7 9876 Person1
5 10:20.3 Text Line 8Text Line 9 9876 Person2
Details
In [1919]: (df.line_by != df.line_by.shift()).cumsum()
Out[1919]:
0 1
1 2
2 2
3 3
4 3
5 4
6 5
7 6
8 6
Name: line_by, dtype: int32
In [1920]: df
Out[1920]:
id timestamp line_by line_text
0 1234 02:54.3 Person1 Text Line 1
1 1234 03:23.8 Person2 Text Line 2
2 1234 03:47.0 Person2 Text Line 3
3 1234 04:46.8 Person1 Text Line 4
4 1234 05:46.2 Person1 Text Line 5
5 9876 06:44.5 Person2 Text Line 6
6 9876 07:27.6 Person1 Text Line 7
7 9876 08:17.5 Person2 Text Line 8
8 9876 10:20.3 Person2 Text Line 9
Post a Comment for "Combine Or Iterate Pandas Rows On Specific Columns"