6

I was practicing to import stock market data from Google Finance into a Pandas DataFrame:

import pandas as pd
from pandas import Series

path = 'http://www.google.com/finance/historical?cid=542029859096076&startdate=Sep+22%2C+2001&enddate=Sep+20%2C+2016&num=30&ei=3HvhV4n3D8XGmAGp4q74Ag&output=csv'
df = pd.read_csv(path)

So far so good, and df also shows the complete data set I need.

However, when calling particular columns, like

df['Date']

Python shows the error codes below:

Traceback (most recent call last):

  File "<ipython-input-31-cb486dd31fbc>", line 1, in <module>
    df['Date']

  File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
    return self._getitem_column(key)

  File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)

  File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)

  File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
    loc = self.items.get_loc(item)

  File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)

  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)

  File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)

  File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)

KeyError: 'Date'

On the other hand, other columns such as df['High'] turns out to be okay. Is there anyway I can fix this issue?

8
  • 1
    It works fine when I try. Commented Sep 20, 2016 at 18:29
  • (Based on MaxU's answer, it probably works fine because I use Python 3.5). Commented Sep 20, 2016 at 18:33
  • 1
    @MaxU yes, it worked: imgur.com/tmRLNQu Commented Sep 20, 2016 at 18:38
  • 2
    @MaxU I also upgraded to 0.19.0rc1, could that be the reason? Commented Sep 20, 2016 at 18:40
  • 2
    @ayhan, thank you for the hint! I've added an excerpt from what's new 0.19.0 to my answer... Commented Sep 20, 2016 at 18:48

1 Answer 1

4

this CSV file contains BOM (Byte Order Mark) signature, so try it this way:

df = pd.read_csv(path, encoding='utf-8-sig')

How one can easily identify this problem (thanks to @jezrael's hint):

In [11]: print(df.columns.tolist())
['\ufeffDate', 'Open', 'High', 'Low', 'Close', 'Volume']

and pay attention at the first column

NOTE: as @ayhan has noticed, starting with version 0.19.0 Pandas will take care of it automatically:

Bug in pd.read_csv() which caused BOM files to be incorrectly parsed by not ignoring the BOM GH4793

Sign up to request clarification or add additional context in comments.

2 Comments

Hey thanks! It works fine this way. Can you explain a bit more about why it makes the difference or point me to some sources about BOM signature? Thanks again.
Nicer and better see if use print(df.columns.tolist()), +1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.