Python Pandas Change Duplicate Timestamp To Unique
I have a file containing duplicate timestamps, maximum two for each timestamp, actually they are not duplicate, it is just the second timestamp needs to add a millisecond timestamp
Solution 1:
Setup
In [69]:df=DataFrame(dict(time=x))In [70]:dfOut[70]:time02013-01-01 09:01:0012013-01-01 09:01:0022013-01-01 09:01:0132013-01-01 09:01:0142013-01-01 09:01:0252013-01-01 09:01:0262013-01-01 09:01:0372013-01-01 09:01:0382013-01-01 09:01:0492013-01-01 09:01:04
Find the locations where the difference in time from the previous row is 0 seconds
In [71]: mask = (df.time-df.time.shift()) == np.timedelta64(0,'s')
In [72]: mask
Out[72]:
0False1True2False3True4False5True6False7True8False9True
Name: time, dtype: bool
Set theose locations to use an offset of 5 milliseconds (In your question you used 500 but could be anything). This requires numpy >= 1.7. (Not that this syntax will be changing in 0.13 to allow a more direct df.loc[mask,'time'] += pd.offsets.Milli(5)
In [73]:df.loc[mask,'time']=df.time[mask].apply(lambdax:x+pd.offsets.Milli(5))In [74]:dfOut[74]:time02013-01-01 09:01:0012013-01-01 09:01:00.00500022013-01-01 09:01:0132013-01-01 09:01:01.00500042013-01-01 09:01:0252013-01-01 09:01:02.00500062013-01-01 09:01:0372013-01-01 09:01:03.00500082013-01-01 09:01:0492013-01-01 09:01:04.005000
Solution 2:
So this algorithm should work very well... I'm just having a hell of a time with numpy's datetime datatypes.
In [154]:dfOut[154]:002011/1/49:14:0012011/1/49:15:0022011/1/49:15:0132011/1/49:15:0142011/1/49:15:0252011/1/49:15:0262011/1/49:15:0372011/1/49:15:0382011/1/49:15:04In [155]:((dt.diff()==0)*.005)Out[155]:00.00010.00020.00030.00540.00050.00560.00070.00580.000Name:0,dtype:float64
And the idea is to add those two together. Of course, one is datetime64
and the other is float64
. For whatever reasons, np.timedelta64
doesn't operate on arrays? Anyway if you can sort out the dtype issues that will work.
Solution 3:
Assuming - as you have shown in your example that they are sequential:
lasttimestamp = None
for ts = readtimestamp(infile): # I will leave this to you
if ts == lasttimestamp:
ts += inc_by # and this
lasttimestamp = ts
writetimestamp(outfile, ts) # and this to
Post a Comment for "Python Pandas Change Duplicate Timestamp To Unique"