0
$\begingroup$

This is my stock market csv data:

Date,Open,High,Low,Close,Adj Close,Volume
43283,511,514.950012,503.5,512.599976,512.599976,261839
43284,512.599976,520,509.700012,512,512,332619
43285,512,515.950012,507.950012,514.299988,514.299988,173621
43286,515.549988,517.5,509.399994,510.899994,510.899994,117474
43287,510.049988,516.5,510.049988,514.25,514.25,82106
43290,514.200012,528.5,514.200012,523.650024,523.650024,322861
43291,530,534.900024,522.099976,532.549988,532.549988,404132
43292,533.400024,541.75,531,536.599976,536.599976,267510
43293,539.450012,545,535.25,537.25,537.25,254942
43294,540,540.799988,520.5,523.900024,523.900024,240378
43297,524,529.75,518.549988,523.099976,523.099976,191192
43298,523,540,519.799988,538.049988,538.049988,213308
43299,542.349976,542.799988,515.849976,524.200012,524.200012,557333
43300,528,536.900024,518.849976,527.299988,527.299988,201716
43301,527.599976,536.450012,524.950012,534.450012,534.450012,156703
43304,534.5,544.950012,531.049988,540.799988,540.799988,209083
43305,542.950012,549,538.450012,546,546,216217
43306,547,547.5,529.450012,531.849976,531.849976,145508
43307,537,543.900024,527,541.650024,541.650024,547093
43308,545,555,538,553.650024,553.650024,540695
43311,555,570,551.099976,568.450012,568.450012,564010
43312,582,584.950012,548,550.099976,550.099976,942588
43313,552.450012,555.549988,538.650024,544.900024,544.900024,440881

I am trying to load stock market data csv file in a jupyter note book using

import numpy as np

np.loadtxt(r"C:\Users\Souro\Downloads\Data.csv",delimiter=",")

but it shows the following error after compiling:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-6552d575b229> in <module>
----> 1 np.loadtxt(r"C:\Users\Souro\Downloads\Data.csv",delimiter=",")

c:\python3.7.2\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
   1139         # converting the data
   1140         X = None
-> 1141         for x in read_data(_loadtxt_chunksize):
   1142             if X is None:
   1143                 X = np.array(x, dtype)

c:\python3.7.2\lib\site-packages\numpy\lib\npyio.py in read_data(chunk_size)
   1066 
   1067             # Convert each value according to its column and store
-> 1068             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1069 
   1070             # Then pack it according to the dtype's nesting

c:\python3.7.2\lib\site-packages\numpy\lib\npyio.py in <listcomp>(.0)
   1066 
   1067             # Convert each value according to its column and store
-> 1068             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1069 
   1070             # Then pack it according to the dtype's nesting

c:\python3.7.2\lib\site-packages\numpy\lib\npyio.py in floatconv(x)
    773         if '0x' in x:
    774             return float.fromhex(x)
--> 775         return float(x)
    776 
    777     typ = dtype.type

ValueError: could not convert string to float: '"Date"'

How can I get rid of this error?

$\endgroup$
1
  • $\begingroup$ What you're seeing is a byte order marker in some kind of unicode format. I'd suggest trying the encoding parameter to np.loadtxt() - potential values would be "utf-8-sig" and "utf-16", I think. $\endgroup$ Commented May 13, 2020 at 17:13

1 Answer 1

2
$\begingroup$

The problem might arise because of the meta-text in the .csv or .txt file that is not really written there but is copied when its content is loaded somewhere.

I think it is better to first import your text in an array or a string and then split it and save into the dataframe specifically when your data is not too large.

import csv 
arrays = []
path = "C:\\Users\\Souro\\Downloads\\AXISBANK.csv"
with open(path, 'r') as f: 
   reader = csv.reader(f) 
   for row in reader: 
       row = str(row).replace('\\', '')  # delete backslash
       arrays.append(row)

Then take a look at arrays[:10] to find where the meta data ends and delete the unwanted data (meta data) and then converting the 'arrays' array into the dataframe. for instance:

arrays = arrays[9:]
df = pd.DataFrame(arrays[1:], columns=arrays[0]) #arrays[0] is the columns names

About your comments:

If you look at the text in each row (print each row), you would find out that a backslash is at the end of each row, so by

row = str(row).replace('\\', '') 

we are substituting each backslash by nothing (''), effectively deleting it. Why '\\'? The backslash usually introduces an escape sequence (e.g. you can write '\n' for a newline character), so you have to escape it (raw parsing like r'a\b' works, but r'\' does not: the creator of Python chose for the latter to be considered a syntax error instead).

And

open('text.txt', 'r')

opens the file text.txt in read-only mode (r).

$\endgroup$
3
  • $\begingroup$ Can You please explain the above code row=str(row).replace('\\',' ') and what s with open('text.txt','r')? $\endgroup$ Commented Jul 1, 2019 at 7:22
  • $\begingroup$ Please rewrite the code in your answer I am getting error my file path is "C:\\Users\\Souro\\Downloads\\AXISBANK.csv" $\endgroup$ Commented Jul 1, 2019 at 7:32
  • $\begingroup$ I answered to your comments in my answer. $\endgroup$ Commented Jul 1, 2019 at 11:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.