Python Pandas Change Duplicate Timestamp To Unique
I have a file containing duplicate timestamps, maximum two for each timestamp, actually they are not duplicate, it is just the second timestamp needs to add a millisecond timestamp
Solution 1:
Setup
In [69]:df=DataFrame(dict(time=x))In [70]:dfOut[70]:time02013-01-01 09:01:0012013-01-01 09:01:0022013-01-01 09:01:0132013-01-01 09:01:0142013-01-01 09:01:0252013-01-01 09:01:0262013-01-01 09:01:0372013-01-01 09:01:0382013-01-01 09:01:0492013-01-01 09:01:04Find the locations where the difference in time from the previous row is 0 seconds
In [71]: mask = (df.time-df.time.shift()) == np.timedelta64(0,'s')
In [72]: mask
Out[72]:
0False1True2False3True4False5True6False7True8False9True
Name: time, dtype: bool
Set theose locations to use an offset of 5 milliseconds (In your question you used 500 but could be anything). This requires numpy >= 1.7. (Not that this syntax will be changing in 0.13 to allow a more direct df.loc[mask,'time'] += pd.offsets.Milli(5)
In [73]:df.loc[mask,'time']=df.time[mask].apply(lambdax:x+pd.offsets.Milli(5))In [74]:dfOut[74]:time02013-01-01 09:01:0012013-01-01 09:01:00.00500022013-01-01 09:01:0132013-01-01 09:01:01.00500042013-01-01 09:01:0252013-01-01 09:01:02.00500062013-01-01 09:01:0372013-01-01 09:01:03.00500082013-01-01 09:01:0492013-01-01 09:01:04.005000Solution 2:
So this algorithm should work very well... I'm just having a hell of a time with numpy's datetime datatypes.
In [154]:dfOut[154]:002011/1/49:14:0012011/1/49:15:0022011/1/49:15:0132011/1/49:15:0142011/1/49:15:0252011/1/49:15:0262011/1/49:15:0372011/1/49:15:0382011/1/49:15:04In [155]:((dt.diff()==0)*.005)Out[155]:00.00010.00020.00030.00540.00050.00560.00070.00580.000Name:0,dtype:float64And the idea is to add those two together. Of course, one is datetime64 and the other is float64. For whatever reasons, np.timedelta64 doesn't operate on arrays? Anyway if you can sort out the dtype issues that will work.
Solution 3:
Assuming - as you have shown in your example that they are sequential:
lasttimestamp = None
for ts = readtimestamp(infile): # I will leave this to you
if ts == lasttimestamp:
ts += inc_by # and this
lasttimestamp = ts
writetimestamp(outfile, ts) # and this to
Post a Comment for "Python Pandas Change Duplicate Timestamp To Unique"