Skip to content Skip to sidebar Skip to footer

Grouping Messages By Time Intervals

I'm currently trying to group messages that are sent out by 1 second time intervals. I'm currently calculating time latency with this: def time_deltas(infile): entries = (line.sp

Solution 1:

Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts) truncates the timestamp to the second which we can use as a grouping key.

Simplest way to do the grouping would be to use itertools.groupby:

from itertools import groupby

data = get_time_deltas(INFILE)  
get_key = lambda x: int(x[0])  # function to get group key from data
bins = [(k, list(g)) for k, g in groupby(data, get_key)]

bins will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*).

Example usage:

# print out the number of messages for each second
for sec, data in bins:
    print('{0} --- {1}'.format(sec, len(data)))

# write (sec, msg_per_sec) out to CSV file
import csv
with open("test.csv", "w") as f:
    csv.writer(f).writerows((s, len(d)) for s, d in bins)

# get average message per second
message_counts = [len(d) for s, d in bins]
avg_msg_per_second = float(sum(message_count)) / len(message_count)

P.S. In this example, a list was used for bins so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict instead.


Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key function to:

get_key = lambda x: int(x[0] / 60)  # truncate timestamp to the minute

Solution 2:

This is easier if you don't base your grid on time intervals with bisection.

Instead, do this. Transform each interval to a single number.

def map_time_to_interval_number( epoch, times )
    for t in times:
        delta= (t - epoch)
        delta_t= delta.days*60*60*24 + delta.seconds + delta.microseconds/1000000.0
        interval = delta_t / 50
        yield interval, t

counts = defaultdict( int )
epoch = min( data ) 
for interval, time in map_time_to_interval_number( epoch, data ):
    counts[interval] += 1

The interval will be an integer. 0 is the first 50-second interval. 1 is the second 50-second interval. etc.

You can reconstruct the timestamp from the interval knowing that each interval is 50-seconds wide and begins at epoch.


Post a Comment for "Grouping Messages By Time Intervals"