Replace a string numpy array with a number

Question

I have a numpy array

z = array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica'])

I want to replace

Iris-setosa -0
Iris-versicolor - 1
Iris-virginica - 2

to apply logistic regression.

Final output should be like

z = [ 0, 0 ,.. 1,1,.. 2,2,..]

Is there a simple way to do this operation instead of iterating through the array and use replace command?

Not exactly what you want, but maybe another idea: pd.Series(z, dtype="category"), see pandas.pydata.org/pandas-docs/stable/categorical.html — stephan
– stephan, Commented Feb 18, 2018 at 15:00
Your example is ambiguous. Are the strings supposed to be numbered in order of appearance or substituted with a given value? — Mr. T
– Mr. T, Commented Feb 18, 2018 at 15:15
The fact that you want to subsequently apply logistic regression does not make this a machine-learning question; please do not spam the tag (removed) — desertnaut
– desertnaut, Commented Feb 18, 2018 at 23:22

jezrael · Accepted Answer · 2018-02-18 15:09:38Z

14

Use factorize:

a = pd.factorize(z)[0].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]

Or numpy.unique:

a = np.unique(z, return_inverse=True)[1].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]

edited Feb 18, 2018 at 15:09

answered Feb 18, 2018 at 14:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@Sanjay - Glad can help!

Jean-François Fabre · Accepted Answer · 2018-02-18 15:08:13Z

11

you can use a dictionary:

my_dict = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}

then use list comprehension:

z = [my_dict[zi] for zi in z]

edited Feb 18, 2018 at 15:08

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

answered Feb 18, 2018 at 14:49

shayelk

1,6861 gold badge17 silver badges32 bronze badges

2 Comments

Sanjay Over a year ago

That really helped. I need to convert it from numpy array to list before doing the operation.

exaulo Over a year ago

this syntactic sugar is useful right now for me

Cyber Square Professional · Accepted Answer · 2018-02-18 15:12:19Z

Are you trying to count the number of occurrence as you are trying to do logistic regression?

If yes, you can use the following as well.

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
print (collections.Counter(z))

It will print as below:

Counter({'Iris-setosa': 4, 'Iris-versicolor': 3, 'Iris-virginica': 3})

If you want to print in another way, you can do the following:

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
for item in collections.Counter(z):
    print(str(item)+ ' ' + str(collections.Counter(z)[item]))

The output will be

Iris-setosa 4
Iris-versicolor 3
Iris-virginica 3

Louis Barto · Accepted Answer · 2018-02-18 14:58:21Z

-1

[list(set(z)).index(val) for val in z]

simply put, cast a set out of z to get only unique values, then cast a list out of that set for indexing, then finally use a list comprehension to get the final list. If you have a very large list of strings, I would suggest setting list(set(z)) to a variable outside of the list comprehension

answered Feb 18, 2018 at 14:58

Louis Barto

112 bronze badges

2 Comments

Sruthi Over a year ago

I got the output as [2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2] but isn't Iris-setosa be set to 0

Louis Barto Over a year ago

How about this [list(np.unique(z)).index(val) for val in z]

Collectives™ on Stack Overflow

Replace a string numpy array with a number

4 Answers 4

1 Comment

2 Comments

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

2 Comments

Related