Skip to content Skip to sidebar Skip to footer

Find A Subset Of Columns Based On Another Dataframe?

I'm collecting heart rate data across time for multiple subjects. Different events occur during the course of the data collection, so the start of each event is recorded elsewhere.

Solution 1:

I was able to put together a function that I think works for this, but assumes that columns don't change orders or more get added. If there would be changes to the df shape, this would need to be updated for that.

First, I merged together your example_g_table and example_s_table to get them all together.

df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')
       Date_Time    CID 012345   event_1 event_2 event_3
04/20/214:2030201.02.03.04.05.002312/17/219:2013511.41.82.08.010.001422/17/219:2011145.05.15.25.35.4345

Now we use a new function that will pull out the values of event_2 and event_3, and return the average of the values of those previous column-values. We will later run df.apply on this, so it will take in just a row at a time, as a series (I think, anyway).

deffunc(df):
    event_2 = df['event_2']
    event_3 = df['event_3']
    start = int(event_2 + 2) # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
    end = int(event_3 + 2) # same as above
    total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
    avg = total/(end-start+1) #(end-start+1) gets the count of things in our rangereturn avg

Last, we run df.apply on this to get our new column.

df['avg'] = df.apply(func,axis=1)
df
       Date_Time    CID 0   1   2   3   4   5   event_1 event_2 event_3 avg
0   4/20/21 4:20    302 0   1.0 2.0 3.0 4.0 5.0     0   2          3    2.50
1   2/17/21 9:20    135 1   1.4 1.8 2.0 8.0 10.0    0   1          4    3.30
2   2/17/21 9:20    111 4   5.0 5.1 5.2 5.3 5.4     3   4          5    5.35

Post a Comment for "Find A Subset Of Columns Based On Another Dataframe?"