1

I would like to extract only the numbers contained in a string. Can isdigit() and split() be combined for this purpose or there is simpler/faster way?

Example:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

Output:

numbers = [122, 35, 1052]
text = ['How to extract only number', 'The number must be extracted', 'must be extracted']

My code:

text = []
numbers = []
temp_numbers = []
for i in range(len(m)):
    text.append([word for word in m[i].split() if not word.isdigit()])
    temp_numbers.append([int(word) for word in m[i].split() if word.isdigit()])
for i in range(len(m)):
    text[i] = ' '.join(text[i])
for elem in temp_numbers:
    numbers.extend(elem)

print(text)
print(numbers)
2
  • You could omit ==True and ==False and factor out the common for word in m[i].split() if word.isdigit() but other than that this looks as simple as it can get.
    – mkrieger1
    Commented Aug 29, 2022 at 16:03
  • This has been address here: stackoverflow.com/questions/19715303/…
    – JDR
    Commented Aug 29, 2022 at 16:07

3 Answers 3

2

Import regex library:

import re

If you want to extract all digits:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string))
    texts.append(re.sub("\d+", "", string).strip())

If you want to extract only first digit:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string)[0])
    texts.append(re.sub("\d+", "", string).strip())
1
  • 2
    Why not use the same pattern twice?
    – mkrieger1
    Commented Aug 29, 2022 at 16:07
1

So if we take m as a list you can just loop through it and check if the current char is a digit then if so append it.

For loop solution:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = []
temp_num = ""

for string in m:
    # Presuming m only contains strings

    for char in string:
        if char.isdigit():
            temp_num += char
    
    numbers.append(int(temp_num))
    temp_num = ""

List comprehension solution - appends each number at different indexes:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = [int(char) for string in m for char in string if char.isdigit()]

Hope this helped, also if you want to only get the values of an iterable (e.g. a list) just use for varname in iterable it's faster and cleaner.

If you need both index and the value, use for index, varname in enumerate(iterable).

0
nums_list = []
m = ["How to extract only number 122", "The number 35 must be extracted", "1052 must be extracted"]
for i in m:
    new_l = i.split(" ")
    for j in new_l:
        if j.isdigit():
            nums_list.append(int(j))
print nums_list

OP:

[122, 35, 1052]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.