Skip to content Skip to sidebar Skip to footer

Sql/python: Transform Data From Csv And Into Table With Different Schema With Condition

So, I have a csv file containing data like this: id type sum_cost date_time -------------------------------------------------- a1 pound 500 201

Solution 1:

As discussed over comments, you may easily accomplish this by using COPY command and a temporary table to hold your data from the file.

Create a temporary table with the structure of your CSV,note that all are of text datatypes. This makes the copying faster as the validations are minimised.

CREATE TEMP TABLE  temptable 
      ( id TEXT ,
        TYPE TEXT,
        sum_cost TEXT ,
        date_time TEXT );

Use COPY to load from the file into this table. If you are loading the file from a server, use COPY, If it's in a client machine use psql's \COPY. Change it to a different delimiter appropriately if needed.

\COPY temptable from'/somepath/mydata.csv'with delimiter ',' CSV HEADER;

Now, simply run an INSERT INTO .. SELECT using expressions for various transformations.

INSERTINTO maintable (
          _id,start_time,end_time,pound_cost,euro_cost,count )
SELECT id,
     date_time::timestamp-INTERVAL'1 HOUR', 
     date_time::timestamp-INTERVAL'30 MINUTES',
  CASE type
      WHEN'pound'THEN sum_cost::numericELSE0END,
  CASE type when'euro'THEN sum_cost::numeric--you have not specified what --happens to USD,use it as required.ELSE0END, 
   1as count       -- I have hardcoded it based on your info, not sure what it --actually meansfrom temptable t; 

Now, the data is in your main table

select * from maintable;

_id|start_time|end_time|pound_cost|euro_cost|count-----+---------------------+---------------------+------------+-----------+-------a1|2019-04-21 09:50:06|2019-04-21 10:20:06|500|0|1b1|2019-04-21 09:40:00|2019-04-21 10:10:00|0|100|1c1|2019-04-21 10:00:00|2019-04-21 10:30:00|650|0|1d1|2019-04-20 23:30:00|2019-04-21 00:00:00|0|0|1

Solution 2:

Here's how you might be able to reshape data for your specification:

import os
import pandas as pd
import datetime as dt

dir = r'C:\..\..'
csv_name = 'my_raw_data.csv'
full_path = os.path.join(dir, csv_name)
data = pd.read_csv(full_path)

data = pd.read_csv(full_path)

defprocess_df(dataframe=data):
    df1 = dataframe.copy(deep=True)
    df1['date_time'] = pd.to_datetime(df1['date_time'])
    df1['count'] = 1### Maybe get unique types to list for future needs
    _types = df1['type'].unique().tolist()

    ### Process time-series shifts
    df1['start_time']  = df1['date_time'] - dt.timedelta(hours=1, minutes=0)
    df1['end_time']  = df1['date_time'] - dt.timedelta(hours=0, minutes=50)

    ## Create conditional masks for the dataframe
    pound_type = df1['type'] == 'pound'
    euro_type = df1['type'] == 'euro'### Subsection each dataframe by currency; concatenate results
    df_p = df1[df1['type'] == 'pound']
    df_e = df1[df1['type'] == 'euro']
    df = pd.concat([df_p, df_e]).reset_index(drop=True)

    ### add conditional columns
    df['pound_cost'] = [x if x == 'pound'else0for x in df['type']]
    df['euro_cost'] = [x if x == 'euro'else0for x in df['type']]

    ### Manually input desired field arrangement
    fin_cols = [
        'id',
        'start_time',
        'end_time',
        'pound_cost',
        'euro_cost',
        'count',
        ]
    ### Return formatted dataframereturn df.reindex(columns=fin_cols).copy(deep=True)

data1 = process_df()

Output:

idstart_timeend_timepound_costeuro_costcount0a12019-04-21 09:50:06 2019-04-21 10:00:06      pound011c12019-04-21 10:00:00 2019-04-21 10:10:00      pound012b12019-04-21 09:40:00 2019-04-21 09:50:00          0euro1

To load to the main SQL table, you'd have to get a connection with SQLAlchemy or pyodbc. Then, assuming all data types match, you should be able to utilize pandas.DataFrame.append() to add data.

Post a Comment for "Sql/python: Transform Data From Csv And Into Table With Different Schema With Condition"