python, pandas: InvalidIndexError when creating dataframe

Question

I have been exploring the titanic dataset. I am trying to create a dataframe which will have the ages of the people who survived the titanic sinking, and those who didn't, in two separate columns.

    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')    
    whole = pd.concat([train, test])
    df = pd.DataFrame({'survived': whole['Age'][whole['Survived'] == 1],
                       'died': whole['Age'][whole['Survived'] == 0]})

But I am getting this error

pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

What am I doing wrong?

Change this : whole = pd.concat([train, test]) to whole = pd.concat([train, test]).reset_index(drop=True) — enterML
– enterML, Commented May 28, 2017 at 18:00
@ayhan I was using pandas version 0.19.2 Upgrading to 0.20.1 did not work for me. — Sounak
– Sounak, Commented May 28, 2017 at 18:08

enterML · Accepted Answer · 2017-05-28 18:23:43Z

3

Make this change in your code whole = pd.concat([train, test]).reset_index(drop=True)

answered May 28, 2017 at 18:23

enterML

2,3255 gold badges29 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

MaxU - stand with Ukraine Over a year ago

we can use: pd.concat([train, test], ignore_index=True) instead ;)

Sounak Over a year ago

@MaxU This works too. What happens when you set ignore_index to True?

MaxU - stand with Ukraine Over a year ago

pd.concat will create a new default index (np.arange(len(concatenated_df))) for you, so it will not need to join two existing indexes and then again drop it and create a new one...

Collectives™ on Stack Overflow

python, pandas: InvalidIndexError when creating dataframe

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related