8

I have a long series of .csv files, which I want to import into a local database. I believe my query is correct, but there are some problems in parsing DATE and TIMESTAMP columns. PostgreSQL reads these columns expecting an ISO format "yyyy/mm/dd", but my data has it in another format: "dd/mm/yyyy".

I read online and on other Stack Overflow answers that one can SET the datestyle to be different, but it's not recommended.

Is there a way to specify the format of the columns to import? Also, I do not need to import all columns from the csv file: can I leave some out?

Details

First, I wrote the code to create the table (sorry if column names are in Italian, but it's not important):

CREATE TABLE IF NOT EXISTS bikes (
    bici INT,
    tipo_bici VARCHAR(20),
    cliente_anonimizzato INT,
    data_riferimento_prelievo DATE,
    data_prelievo TIMESTAMP,
    numero_stazione_prelievo INT,
    nome_stazione_prelievo TEXT,
    slot_prelievo SMALLINT,
    data_riferimento_restituzione DATE,
    data_restituzione TIMESTAMP,
    numero_stazione_restituzione INT,
    nome_stazione_restituzione TEXT,
    slot_restituzione SMALLINT,
    durata VARCHAR(10),
    distanza_totale REAL,
    co2_evitata REAL,
    calorie_consumate REAL,
    penalità CHAR(2)
);

Then I add the query to copy data into the table:

COPY bikes(
    bici,
    tipo_bici,
    cliente_anonimizzato,
    data_riferimento_prelievo,
    data_prelievo,
    numero_stazione_prelievo,
    nome_stazione_prelievo,
    slot_prelievo,
    data_riferimento_restituzione,
    data_restituzione,
    numero_stazione_restituzione,
    nome_stazione_restituzione,
    slot_restituzione,
    durata,
    distanza_totale,
    co2_evitata,
    calorie_consumate,
    penalità
)
FROM '/Users/luca/tesi/data/2019q3.csv'
DELIMITER ','
CSV HEADER;

The code seems fine, except the following error pops up:

ERROR:  date/time field value out of range: "31/07/2019"
HINT:  Perhaps you need a different "datestyle" setting.
CONTEXT:  COPY bikes, line 25296, column data_riferimento_restituzione: "31/07/2019"
SQL state: 22008

How can I specify in the CREATE TABLE portion of the code the format to parse? Also, I do not actually need all the cols of this csv, how do I leave these out? I tried to specify only those I need but I get an import error:

ERROR:  extra data after last expected column

1 Answer 1

5

Set datestyle to ISO, DMY, and your dates will be parsed as you want. There is nothing wrong with setting that parameter - do it with SET right before you COPY.

There is no way to skip columns from the CSV file. Add extra columns to the table and drop them later, that is cheap.

3
  • 1
    Dropping column might be cheap but it has some severe consequences as you can never really get rid of them. Not even a vacuum full will clear up the entries in pg_attribute. So if this is run often, it might result in a "too many columns" error. For a one time import this is completely irrelevant though. I really wish it would be possible to teach copy to ignore excess columns at the end. Commented Apr 30, 2021 at 5:53
  • @a_horse_without_name True. I didn't test it but what about a view with an INSTEAD OF INSERT trigger as target? The performance would of course suffer. Commented Apr 30, 2021 at 6:03
  • 2
    Hi Laurenz, thank you for your help! Everything worked. The line I added was SET datestyle TO iso, dmy; and it was fine. As a reference, I looked at the official documentation for SET. Commented Apr 30, 2021 at 15:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.