Pandas throwing "Error tokenizing data. C error" while loading data sets from URL

Question

I am trying to work on the Titanic competition to get hands on experience with data science & machine learning. I tried to load up the datasets from GitHub but pandas threw the following error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 32, saw 2

I tried to follow the advice of other SO users so I added skiprows=1 parameter in my pd.csv() call to skip the first row but it didn't work.

import pandas as pd

train_dataset = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/train.csv", skiprows=1)
test_dataset = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/test.csv", skiprows=1)
ground_truths = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/gender_submission.csv", skiprows=1)

train_dataset.head()

Danny · Accepted Answer · 2019-05-09 14:49:37Z

2

The path that you are accessing from is a Github repository page which is a webpage, it does not return CSV. You have to click on 'raw' option in Github and then pass the URL which in your case is:

test = pd.read_csv('https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv')

answered May 9, 2019 at 14:49

Danny

1,1861 gold badge8 silver badges16 bronze badges

$\begingroup$ Thank you. But Spyder doesn't let me print the head with train_dataset.head(). I have to explicitly place it inside a print() cal. $\endgroup$

Andros Adrianopolos
– Andros Adrianopolos

2019-05-10 01:26:07 +00:00
Commented May 10, 2019 at 1:26

Add a comment |

Stack Exchange Network

Pandas throwing "Error tokenizing data. C error" while loading data sets from URL

1 Answer 1

Hot Network Questions

Pandas throwing "Error tokenizing data. C error" while loading data sets from URL

1 Answer 1

Related

Hot Network Questions