Grouping Messages By Time Intervals
Solution 1:
Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts)
truncates the timestamp to the second which we can use as a grouping key.
Simplest way to do the grouping would be to use itertools.groupby
:
from itertools import groupby
data = get_time_deltas(INFILE)
get_key = lambda x: int(x[0]) # function to get group key from data
bins = [(k, list(g)) for k, g in groupby(data, get_key)]
bins
will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438
) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*
).
Example usage:
# print out the number of messages for each second
for sec, data in bins:
print('{0} --- {1}'.format(sec, len(data)))
# write (sec, msg_per_sec) out to CSV file
import csv
with open("test.csv", "w") as f:
csv.writer(f).writerows((s, len(d)) for s, d in bins)
# get average message per second
message_counts = [len(d) for s, d in bins]
avg_msg_per_second = float(sum(message_count)) / len(message_count)
P.S. In this example, a list
was used for bins
so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict
instead.
Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key
function to:
get_key = lambda x: int(x[0] / 60) # truncate timestamp to the minute
Solution 2:
This is easier if you don't base your grid on time intervals with bisection.
Instead, do this. Transform each interval to a single number.
def map_time_to_interval_number( epoch, times )
for t in times:
delta= (t - epoch)
delta_t= delta.days*60*60*24 + delta.seconds + delta.microseconds/1000000.0
interval = delta_t / 50
yield interval, t
counts = defaultdict( int )
epoch = min( data )
for interval, time in map_time_to_interval_number( epoch, data ):
counts[interval] += 1
The interval will be an integer. 0 is the first 50-second interval. 1 is the second 50-second interval. etc.
You can reconstruct the timestamp from the interval knowing that each interval is 50-seconds wide and begins at epoch.
Post a Comment for "Grouping Messages By Time Intervals"