1

I have an array as follows:

strArray = np.array(['ab','abc','ab','bca','ab','m-2','bca'])

For the example, this is a short array with short strings, but consider that the strings and the array are actually much longer with many repetitions and taking up too much space.

Is there a function which takes this array and outputs two arrays, one is a dictionary of unique strings and one is the strArray but with an integer identifier:

keyArray, intArray = some_function(strArray)
print(keyArray) # output: { 0:'ab', 1:'abc', 2:'bca', 3:'m-2' }
print(intArray) # output: [ 0, 1, 0, 2, 0, 3, 2 ]

In the alternative, I will settle for just intArray just so that I have a reduced size array with which I can work more easily - the original string would be useful, but not at the sacrifice of size/speed/ease.

1 Answer 1

5

We can use np.unique with return_inverse arg -

In [16]: unq,tags = np.unique(strArray, return_inverse=True)

In [17]: dict(zip(range(len(unq)),unq))
Out[17]: {0: 'ab', 1: 'abc', 2: 'bca', 3: 'm-2'}

In [18]: tags
Out[18]: array([0, 1, 0, 2, 0, 3, 2])
Sign up to request clarification or add additional context in comments.

1 Comment

That's perfect. Thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.