Making Pandas Work With Pendulum
Solution 1:
What would be the canonical way to create a custom
to_<something>method - in this caseto_pendulum()method which would be able to convert Series of date strings directly toPendulumobjects?
After looking through the API a bit, I must say I'm impressed with what they've done. Unfortunately, I don't think Pendulum and pandas can work together (at least, with the current latest version - v0.21).
The most important reason is that pandas does not natively support Pendulum as a datatype. All the natively supported datatypes (np.int, np.float and np.datetime64) all support vectorisation in some form. You are not going to get a shred of performance improvement using a dataframe over, say, a vanilla loop and list. If anything, calling apply on a Series with Pendulum objects is going to be slower (because of all the API overheads).
Another reason is that Pendulum is a subclass of datetime -
from datetime import datetime
isinstance(pendulum.now(), datetime)
TrueThis is important, because, as mentioned above, datetime is a supported datatype, so pandas will attempt to coerce datetime to pandas' native datetime format - Timestamp. Here's an example.
print(s)02017-11-09 18:43:4512017-11-09 20:15:2722017-11-09 22:29:0032017-11-09 23:42:3442017-11-10 00:09:4052017-11-10 00:23:1462017-11-10 03:32:1772017-11-10 10:59:2482017-11-10 11:12:5992017-11-10 13:49:09s=s.apply(pendulum.parse)s02017-11-09 18:43:45+00:0012017-11-09 20:15:27+00:0022017-11-09 22:29:00+00:0032017-11-09 23:42:34+00:0042017-11-10 00:09:40+00:0052017-11-10 00:23:14+00:0062017-11-10 03:32:17+00:0072017-11-10 10:59:24+00:0082017-11-10 11:12:59+00:0092017-11-10 13:49:09+00:00Name:timestamp,dtype:datetime64[ns,<TimezoneInfo [UTC, GMT, +00:00:00, STD]>]s[0]Timestamp('2017-11-0918:43:45+0000',tz='<TimezoneInfo [UTC, GMT, +00:00:00, STD]>')type(s[0])pandas._libs.tslib.TimestampSo, with some difficulty (involving dtype=object), you could load Pendulum objects into dataframes. Here's how you'd do that -
v=np.vectorize(pendulum.parse)s=pd.Series(v(s),dtype=object)s02017-11-09T18:43:45+00:0012017-11-09T20:15:27+00:0022017-11-09T22:29:00+00:0032017-11-09T23:42:34+00:0042017-11-10T00:09:40+00:0052017-11-10T00:23:14+00:0062017-11-10T03:32:17+00:0072017-11-10T10:59:24+00:0082017-11-10T11:12:59+00:0092017-11-10T13:49:09+00:00s[0]<Pendulum [2017-11-09T18:43:45+00:00]>However, this is essentially useless, because calling anypendulum method (via apply) will now not only be super slow, but will also end up in the result being coerced to Timestamp again, an exercise in futility.
Post a Comment for "Making Pandas Work With Pendulum"