Sql/python: Transform Data From Csv And Into Table With Different Schema With Condition
Solution 1:
As discussed over comments, you may easily accomplish this by using COPY
command and a temporary table to hold your data from the file.
Create a temporary table with the structure of your CSV,note that all are of text datatypes. This makes the copying faster as the validations are minimised.
CREATE TEMP TABLE temptable
( id TEXT ,
TYPE TEXT,
sum_cost TEXT ,
date_time TEXT );
Use COPY
to load from the file into this table. If you are loading the file from a server, use COPY
, If it's in a client machine use psql's \COPY
. Change it to a different delimiter appropriately if needed.
\COPY temptable from'/somepath/mydata.csv'with delimiter ',' CSV HEADER;
Now, simply run an INSERT INTO .. SELECT
using expressions for various transformations.
INSERTINTO maintable (
_id,start_time,end_time,pound_cost,euro_cost,count )
SELECT id,
date_time::timestamp-INTERVAL'1 HOUR',
date_time::timestamp-INTERVAL'30 MINUTES',
CASE type
WHEN'pound'THEN sum_cost::numericELSE0END,
CASE type when'euro'THEN sum_cost::numeric--you have not specified what --happens to USD,use it as required.ELSE0END,
1as count -- I have hardcoded it based on your info, not sure what it --actually meansfrom temptable t;
Now, the data is in your main table
select * from maintable
;
_id|start_time|end_time|pound_cost|euro_cost|count-----+---------------------+---------------------+------------+-----------+-------a1|2019-04-21 09:50:06|2019-04-21 10:20:06|500|0|1b1|2019-04-21 09:40:00|2019-04-21 10:10:00|0|100|1c1|2019-04-21 10:00:00|2019-04-21 10:30:00|650|0|1d1|2019-04-20 23:30:00|2019-04-21 00:00:00|0|0|1
Solution 2:
Here's how you might be able to reshape data for your specification:
import os
import pandas as pd
import datetime as dt
dir = r'C:\..\..'
csv_name = 'my_raw_data.csv'
full_path = os.path.join(dir, csv_name)
data = pd.read_csv(full_path)
data = pd.read_csv(full_path)
defprocess_df(dataframe=data):
df1 = dataframe.copy(deep=True)
df1['date_time'] = pd.to_datetime(df1['date_time'])
df1['count'] = 1### Maybe get unique types to list for future needs
_types = df1['type'].unique().tolist()
### Process time-series shifts
df1['start_time'] = df1['date_time'] - dt.timedelta(hours=1, minutes=0)
df1['end_time'] = df1['date_time'] - dt.timedelta(hours=0, minutes=50)
## Create conditional masks for the dataframe
pound_type = df1['type'] == 'pound'
euro_type = df1['type'] == 'euro'### Subsection each dataframe by currency; concatenate results
df_p = df1[df1['type'] == 'pound']
df_e = df1[df1['type'] == 'euro']
df = pd.concat([df_p, df_e]).reset_index(drop=True)
### add conditional columns
df['pound_cost'] = [x if x == 'pound'else0for x in df['type']]
df['euro_cost'] = [x if x == 'euro'else0for x in df['type']]
### Manually input desired field arrangement
fin_cols = [
'id',
'start_time',
'end_time',
'pound_cost',
'euro_cost',
'count',
]
### Return formatted dataframereturn df.reindex(columns=fin_cols).copy(deep=True)
data1 = process_df()
Output:
idstart_timeend_timepound_costeuro_costcount0a12019-04-21 09:50:06 2019-04-21 10:00:06 pound011c12019-04-21 10:00:00 2019-04-21 10:10:00 pound012b12019-04-21 09:40:00 2019-04-21 09:50:00 0euro1
To load to the main SQL table, you'd have to get a connection with SQLAlchemy or pyodbc. Then, assuming all data types match, you should be able to utilize pandas.DataFrame.append() to add data.
Post a Comment for "Sql/python: Transform Data From Csv And Into Table With Different Schema With Condition"