2

In a csv file I have data representing the date, open, close, high, low, and volume for a particular stock. The data is stored in the following format:

20150601 000000;1.094990;1.095010;1.094990;1.094990;0

I am attempting to use the following code to extract the date into a numpy array so i can analyze the data using algorithms. However, when converting the date I do not get the correct date.

Can anyone identify the error that I am making?

datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%y%m%d%H%M %f'))
date,high,low,open,close,volume = np.loadtxt('DAT_ASCII_EURUSD_M1_201506.csv',unpack=True, 
                              delimiter=';',
                              converters={0:datefunc})

Any help is much appreciated.

5
  • Is your sample line incorrect? Also what is mdates.date2num? Commented Jun 30, 2015 at 21:53
  • 1
    I suspect he has done import matplotlib.dates as mdates. Commented Jun 30, 2015 at 21:59
  • Your date format is also incorrect Commented Jun 30, 2015 at 21:59
  • what would be the correct date format? Commented Jun 30, 2015 at 22:00
  • it would be '%Y%m%d' but you cannot have datetimes and floats in the same array. I think pandas would be pretty useful Commented Jun 30, 2015 at 22:01

1 Answer 1

2

Your date format is incorrect, it needs to be year,month and day "%Y%m%d", you also cannot have a datetime object and floats in your array but using a structured array allows you to have mixed types.

If mdates returns a float using the correct format should work again providing you have a ; delimited lines:

from datetime import datetime
import numpy as np
datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%Y%m%d'))

a = np.loadtxt('in.csv', delimiter=';',
                  converters={0: datefunc})

Which would output:

[  7.35750000e+05   0.00000000e+00   1.09499000e+00   1.09501000e+00
1.09499000e+00   1.09499000e+00   0.00000000e+00]

You have seven elements in your example input line so you will get an error unpacking, if that is a typo then it will be ok but if not you will need to fix it.

If you have mixed types you could use a structured array with genfromtxt :

from datetime import datetime
import numpy as np
datefunc = lambda x: datetime.strptime(x, '%Y%m%d')
a = np.genfromtxt('in.csv', delimiter=';',
              converters={0: datefunc}, dtype='object, float, float,float,float,float',
              names=["date", "high", "low", "open", "close", "volume"])

print(a["date"])
print(a["high"])
print(a["low"])
print(a["open"])
print(a["close"])
print(a["volume"])

2015-06-01 00:00:00
0.0
1.09499
1.09501
1.09499
1.09499

This presumes your input is actually delimited by ; and does not have spaces like you have in your sample line.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for correcting my format and for fixing the mixed data type issue. However, when I use this method of conversion i get the following error: names=["date", "high", "low", "open", "close", "volume"]) TypeError: loadtxt() got an unexpected keyword argument 'names' >>>
@Jerryberry123, you need to use genfromtxt for that, my mistake
Thank you! However the date is formatted as year month day followed by a space then the millisecond value {20150601 000000;1.094990;1.095010;1.094990;1.094990;0} {20150601 000100;1.094990;1.094990;1.094920;1.094940;0} {20150601 000200;1.094940;1.095060;1.094890;1.095050;0} {20150601 000300;1.095090;1.095130;1.095050;1.095060;0}
Ah ok now it all makes sense, change to '%Y%m%d %f'
Thanks all working now. Although the data is represented as datetime.datetime(2015, 6, 1, 0, 0, 0, 100). Should it be in that format or in the 2015-06-01 000000 format?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.