Problem with cleaning csv data before Importing in PostgreSQL

Question

I started studying SQL a month ago. I'm trying to import a .CSV file but I have a problem.

In the that file, there are numbers written in quotes and where decimals are divided by commas.

In Excel, if I try to change the comma in point I misfit the format and data, if I try instead to change the commas in points in the .CSV, it causes problems because the comma is the delimiter.

Can someone help me?

I would like to understand if maybe it is advisable to put the datatype of the table all in text and then change for the data that I need or if it is better to do by Excel and in case how

If you can ask the person providing the file to add text qualifiers to the file for each field that will help most of this. This is when they put double quotes around each data set (or some other text value) to seperate the columns so you dont have this issue. So it coud look like this "COL1", "COL2", "COL 3 WITH , CommaInIt", "Col4" and the comma inbetween the double quotes would not cause issues — Brad, Commented Mar 4 at 17:33
See my answer here stackoverflow.com/questions/78022060/… for one suggestion. In that case the final result was going to be timestamp, you can ignore that part. The important part is setting lc_numeric to a locale that recognizes , as decimal placeholder and then using to_number() — Adrian Klaver, Commented Mar 4 at 17:55
You may need to look at regional settings on the PC. The decimal character is set there. stackoverflow.com/questions/11421260/csv-decimal-dot-in-excel ALSO - be aware that only so many rows are analyzed to determine the data type of the column. If you have an alphanumeric column with all numbers at the beginning you may have blank cells in the column later on. — Jan, Commented Mar 4 at 19:26
Please provide enough code so others can better understand or reproduce the problem. — Community, Commented Mar 6 at 2:33

Stefanov.sm · Accepted Answer · 2025-03-05 05:36:37Z

My suggestion is to use an intermediate table. I have been using this approach for some time and - although somewhat verbose - it works well. So here it is.

Create an intermediate table with all fields as text. Here is an example with three fields.

create table intermediate_t (field_a text, field_b text, field_c text)

Populate the intermediate table using copy. You may need to use extra parameters like encoding, null etc. A foreign table as intermediate is another viable option.

copy intermediate_t from 'c:\temp\test_data.txt'
with (
 format 'csv', delimiter ',', quote '"'
);

The sample data file c:\temp\test_data.txt is:

Onion soup, 12.30, false
Creme caramel, "10,50", true
Fish and chips, 15.00, false
Creme brulee, "6,20", true
Mercimek corbasi, 12.00, false

Select from the intermediate table and format/cast the column values as per your needs. You have the full power of SQL to do this, no matter how complex it might be, at your disposal.

select field_a, 
    case when field_b ~ ',' then replace(field_b,',','.') else field_b end::numeric field_b,
    field_c::boolean
from intermediate_t;

You may also create your target table or a view using the above query.

create table target_t as
select field_a, 
    case when field_b ~ ',' then replace(field_b,',','.') else field_b end::numeric field_b,
    field_c::boolean
from intermediate_t;

field_a	field_b	field_c
Onion soup	12.30	false
Creme caramel	10.50	true
Fish and chips	15.00	false
Creme brulee	6.20	true
Mercimek corbasi	12.00	false

Collectives™ on Stack Overflow

Problem with cleaning csv data before Importing in PostgreSQL

1 Answer 1

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Linked

Related