1

I have been exploring the titanic dataset. I am trying to create a dataframe which will have the ages of the people who survived the titanic sinking, and those who didn't, in two separate columns.

    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')    
    whole = pd.concat([train, test])
    df = pd.DataFrame({'survived': whole['Age'][whole['Survived'] == 1],
                       'died': whole['Age'][whole['Survived'] == 0]})

But I am getting this error

pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

What am I doing wrong?

4
  • It runs without an error on pandas 0.20.1. Commented May 28, 2017 at 18:00
  • Change this : whole = pd.concat([train, test]) to whole = pd.concat([train, test]).reset_index(drop=True) Commented May 28, 2017 at 18:00
  • @Nain Yes, it worked. Can you explain what was the problem? Commented May 28, 2017 at 18:07
  • @ayhan I was using pandas version 0.19.2 Upgrading to 0.20.1 did not work for me. Commented May 28, 2017 at 18:08

1 Answer 1

3

Make this change in your code whole = pd.concat([train, test]).reset_index(drop=True)

Sign up to request clarification or add additional context in comments.

3 Comments

we can use: pd.concat([train, test], ignore_index=True) instead ;)
@MaxU This works too. What happens when you set ignore_index to True?
pd.concat will create a new default index (np.arange(len(concatenated_df))) for you, so it will not need to join two existing indexes and then again drop it and create a new one...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.