Find A Subset Of Columns Based On Another Dataframe?
I'm collecting heart rate data across time for multiple subjects. Different events occur during the course of the data collection, so the start of each event is recorded elsewhere.
Solution 1:
I was able to put together a function that I think works for this, but assumes that columns don't change orders or more get added. If there would be changes to the df shape, this would need to be updated for that.
First, I merged together your example_g_table
and example_s_table
to get them all together.
df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')
Date_Time CID 012345 event_1 event_2 event_3
04/20/214:2030201.02.03.04.05.002312/17/219:2013511.41.82.08.010.001422/17/219:2011145.05.15.25.35.4345
Now we use a new function that will pull out the values of event_2
and event_3
, and return the average of the values of those previous column-values. We will later run df.apply
on this, so it will take in just a row at a time, as a series (I think, anyway).
deffunc(df):
event_2 = df['event_2']
event_3 = df['event_3']
start = int(event_2 + 2) # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
end = int(event_3 + 2) # same as above
total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
avg = total/(end-start+1) #(end-start+1) gets the count of things in our rangereturn avg
Last, we run df.apply
on this to get our new column.
df['avg'] = df.apply(func,axis=1)
df
Date_Time CID 0 1 2 3 4 5 event_1 event_2 event_3 avg
0 4/20/21 4:20 302 0 1.0 2.0 3.0 4.0 5.0 0 2 3 2.50
1 2/17/21 9:20 135 1 1.4 1.8 2.0 8.0 10.0 0 1 4 3.30
2 2/17/21 9:20 111 4 5.0 5.1 5.2 5.3 5.4 3 4 5 5.35
Post a Comment for "Find A Subset Of Columns Based On Another Dataframe?"