I don't understand some code from Kaggle's solution.
Here is an example of the data:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
The goal is to extract an array with only the female, and they do it like this:
# data contains all the passengers
women_only_stats = data[0::,4] == "female"
females_data = data[women]
print(data[women][0]) # Will print the first women of the dataset of only women.
I understand that women_data_only
will be an array of True
and False
which is the result of the evaluation of the expression data[0::,4] == "female"
.
What I do not understand is why data[women] is an array of only women?
How is numpy
evaluate that?