1

I started studying SQL a month ago. I'm trying to import a .CSV file but I have a problem.

In the that file, there are numbers written in quotes and where decimals are divided by commas.

In Excel, if I try to change the comma in point I misfit the format and data, if I try instead to change the commas in points in the .CSV, it causes problems because the comma is the delimiter.

Can someone help me?

I would like to understand if maybe it is advisable to put the datatype of the table all in text and then change for the data that I need or if it is better to do by Excel and in case how

5
  • If you can ask the person providing the file to add text qualifiers to the file for each field that will help most of this. This is when they put double quotes around each data set (or some other text value) to seperate the columns so you dont have this issue. So it coud look like this "COL1", "COL2", "COL 3 WITH , CommaInIt", "Col4" and the comma inbetween the double quotes would not cause issues
    – Brad
    Commented Mar 4 at 17:33
  • I take this file on Kaggle just for practice on SQL Commented Mar 4 at 17:43
  • See my answer here stackoverflow.com/questions/78022060/… for one suggestion. In that case the final result was going to be timestamp, you can ignore that part. The important part is setting lc_numeric to a locale that recognizes , as decimal placeholder and then using to_number() Commented Mar 4 at 17:55
  • You may need to look at regional settings on the PC. The decimal character is set there. stackoverflow.com/questions/11421260/csv-decimal-dot-in-excel ALSO - be aware that only so many rows are analyzed to determine the data type of the column. If you have an alphanumeric column with all numbers at the beginning you may have blank cells in the column later on.
    – Jan
    Commented Mar 4 at 19:26
  • Please provide enough code so others can better understand or reproduce the problem.
    – Community Bot
    Commented Mar 6 at 2:33

1 Answer 1

0

My suggestion is to use an intermediate table. I have been using this approach for some time and - although somewhat verbose - it works well. So here it is.

  1. Create an intermediate table with all fields as text. Here is an example with three fields.
create table intermediate_t (field_a text, field_b text, field_c text)
  1. Populate the intermediate table using copy. You may need to use extra parameters like encoding, null etc. A foreign table as intermediate is another viable option.
copy intermediate_t from 'c:\temp\test_data.txt'
with (
 format 'csv', delimiter ',', quote '"'
);

The sample data file c:\temp\test_data.txt is:

Onion soup, 12.30, false
Creme caramel, "10,50", true
Fish and chips, 15.00, false
Creme brulee, "6,20", true
Mercimek corbasi, 12.00, false
  1. Select from the intermediate table and format/cast the column values as per your needs. You have the full power of SQL to do this, no matter how complex it might be, at your disposal.
select field_a, 
    case when field_b ~ ',' then replace(field_b,',','.') else field_b end::numeric field_b,
    field_c::boolean
from intermediate_t;

You may also create your target table or a view using the above query.

create table target_t as
select field_a, 
    case when field_b ~ ',' then replace(field_b,',','.') else field_b end::numeric field_b,
    field_c::boolean
from intermediate_t;  

field_a field_b field_c
Onion soup 12.30 false
Creme caramel 10.50 true
Fish and chips 15.00 false
Creme brulee 6.20 true
Mercimek corbasi 12.00 false

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.