Convert a numpy array of strings to an index array

Question

I have an array as follows:

strArray = np.array(['ab','abc','ab','bca','ab','m-2','bca'])

For the example, this is a short array with short strings, but consider that the strings and the array are actually much longer with many repetitions and taking up too much space.

Is there a function which takes this array and outputs two arrays, one is a dictionary of unique strings and one is the strArray but with an integer identifier:

keyArray, intArray = some_function(strArray)
print(keyArray) # output: { 0:'ab', 1:'abc', 2:'bca', 3:'m-2' }
print(intArray) # output: [ 0, 1, 0, 2, 0, 3, 2 ]

In the alternative, I will settle for just intArray just so that I have a reduced size array with which I can work more easily - the original string would be useful, but not at the sacrifice of size/speed/ease.

Divakar · Accepted Answer · 2019-11-15 07:21:27Z

5

We can use np.unique with return_inverse arg -

In [16]: unq,tags = np.unique(strArray, return_inverse=True)

In [17]: dict(zip(range(len(unq)),unq))
Out[17]: {0: 'ab', 1: 'abc', 2: 'bca', 3: 'm-2'}

In [18]: tags
Out[18]: array([0, 1, 0, 2, 0, 3, 2])

answered Nov 15, 2019 at 7:21

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

AOK Over a year ago

That's perfect. Thank you

Collectives™ on Stack Overflow

Convert a numpy array of strings to an index array

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related