8

I have a numpy array

z = array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica'])

I want to replace

Iris-setosa -0
Iris-versicolor - 1
Iris-virginica - 2

to apply logistic regression.

Final output should be like

z = [ 0, 0 ,.. 1,1,.. 2,2,..]

Is there a simple way to do this operation instead of iterating through the array and use replace command?

3
  • 1
    Not exactly what you want, but maybe another idea: pd.Series(z, dtype="category"), see pandas.pydata.org/pandas-docs/stable/categorical.html Commented Feb 18, 2018 at 15:00
  • Your example is ambiguous. Are the strings supposed to be numbered in order of appearance or substituted with a given value? Commented Feb 18, 2018 at 15:15
  • The fact that you want to subsequently apply logistic regression does not make this a machine-learning question; please do not spam the tag (removed) Commented Feb 18, 2018 at 23:22

4 Answers 4

14

Use factorize:

a = pd.factorize(z)[0].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]

Or numpy.unique:

a = np.unique(z, return_inverse=True)[1].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]
Sign up to request clarification or add additional context in comments.

1 Comment

@Sanjay - Glad can help!
11

you can use a dictionary:

my_dict = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}

then use list comprehension:

z = [my_dict[zi] for zi in z]

2 Comments

That really helped. I need to convert it from numpy array to list before doing the operation.
this syntactic sugar is useful right now for me
0

Are you trying to count the number of occurrence as you are trying to do logistic regression?

If yes, you can use the following as well.

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
print (collections.Counter(z))

It will print as below:

Counter({'Iris-setosa': 4, 'Iris-versicolor': 3, 'Iris-virginica': 3})

If you want to print in another way, you can do the following:

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
for item in collections.Counter(z):
    print(str(item)+ ' ' + str(collections.Counter(z)[item]))

The output will be

Iris-setosa 4
Iris-versicolor 3
Iris-virginica 3

Comments

-1
[list(set(z)).index(val) for val in z]

simply put, cast a set out of z to get only unique values, then cast a list out of that set for indexing, then finally use a list comprehension to get the final list. If you have a very large list of strings, I would suggest setting list(set(z)) to a variable outside of the list comprehension

2 Comments

I got the output as [2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2] but isn't Iris-setosa be set to 0
How about this [list(np.unique(z)).index(val) for val in z]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.