1

I'm having a problem sorting a numpy array that has numbers as strings. I need to keep these as strings because there are other words after the integers.

It's sorting negative numbers in reverse order:

>>> import numpy as np
>>> a = np.array(["3", "-2", "-1", "0", "2"])
>>> a.sort()
>>> a
array(['-1', '-2', '0', '2', '3'], dtype='|S2')

I would have expected the output to be:

array(['-2', '-1', '0', '2', '3'], dtype='|S2')

Any suggestions?

3
  • 1
    So you are keeping two types of data in a single string? Doesn't seem particularly suited to numpy. Commented Oct 3, 2011 at 18:05
  • "I need to keep these as strings because there are other words after the integers". So you have a string like "76 trombones", and you want to treat it like the number 76 followed by the word "trombones"? Then do that. Parse the strings and create 2-tuples of (number, rest of string). Commented Oct 4, 2011 at 0:11
  • No, it's not well-behaved. Sometimes it's a number and string, sometimes it's just a string. The "natural sorting" approach works. Commented Oct 4, 2011 at 0:33

2 Answers 2

6

You could use natural sorting:

import numpy as np
import re

def atoi(text):
    try:
        return int(text)
    except ValueError:
        return text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    '''    
    return [ atoi(c) for c in re.split('([-]?\d+)', text) ]

a = np.array(["3", "-2", "-1", "0", "2", "word"])
print(sorted(a,key=natural_keys))
# ['-2', '-1', '0', '2', '3', 'word']

a = np.array(["3", "-2", "-1", "0", "2", "word", "-1 word", "-2 up"])
print(sorted(a,key=natural_keys))
# ['-2', '-2 up', '-1', '-1 word', '0', '2', '3', 'word']
Sign up to request clarification or add additional context in comments.

3 Comments

That will get the wrong order if you try sort ["-1 word", "-2 up"], which is what I think the OP meant by "other words after the integers".
I posted the output when the array contains ["-1 word", "-2 up"]. I think the order is correct, no?
You're right. I misread your regex. Looks good to me, depending on how you want to handle the case where no integer appears at the beginning of a string! (Mine raises a ValueError.)
2

Assuming there's a space after the integer before the other words, then if a were a regluar python list you'd do:

a.sort(key = lambda s: int(s.split()[0]))

Not sure what the equivalent is in numpy (don't see how to specify a key), but one possibility is to convert to a list and back to an array.

1 Comment

Your version works with sorted(a, key = lambda s: int(s.split()[0])). Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.