0

I am trying to match all phone numbers from text.

https://pythex.org/?regex=%5C%2B%3F(%5Cd*)%5Cs%3F%5C(%3F(%5Cd*)%5C)%3F%5Cs%3F(%5Cd*)%5B%5Cs-%5D%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F&test_string=(510)%20588-3915%0A%2B1%20(510)%20879-4700%0A%2B1(888)654-0143%0A%2B1(919)277-2172%0A%2B1(866)707-7709%0A%2B1(919)597-7014%0A%2B44%20(0)%2020%208435%206555%0A%2B44%20(0)%2020%208435%206555%0A%2B33%201%2070%2070%2096%2061%0A%2B41%20(44)%20595%2094%2001%0A%2B32%20(9)%20277%2094%2021%0A%2B34%20(0)%20931%20790%20659%0A045%204750666%0A%2B41%2044%20595%2094%2001%0A%2B31%20(0)%2020%20262%203824%20okay.2%0A%2B31%20478-511014%0A%2B32%209%20277%2094%2021%0A%2B91%20900%20133%205555&ignorecase=0&multiline=0&dotall=0&verbose=0

This regex works well for me. When I check on regex match website. But when I use in actual code, it gives me wrong result

>>> text = 'my phone is +31 478-511014 and +91 900 133 5555'
>>> mobile = re.findall(r'\+?(\d*)\s?\(?(\d*)\)?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)
>>> mobile
[('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('31', '478', '', '511014', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('91', '900', '133', '5555', '', ''), ('', '', '', '', '', '')]

Am I doing something wrong?

6
  • If I enter your string ('my phone is +31 478-511014 and +91 900 133 5555') on the website it gives me pretty much the same result. Commented Nov 8, 2017 at 13:32
  • All the pattern parts are optional, it matches empty strings before each non-matching sequence and whitespace chunks. Re-write to match at least some digits. If you need help, please post the pattern requirements. Or filter out blanks (demo). Commented Nov 8, 2017 at 13:34
  • To implement the above suggestion you could change the first \d* to \d+ Commented Nov 8, 2017 at 13:38
  • And forgot to add to the above comment: all (...) create items in the resulting list of tuples, you need to either remove the unnecessary groups or turn those necessary ones into non-capturing. Or use re.finditer as in my demo. Although I have not considered + and -, so it is not a final answer. Commented Nov 8, 2017 at 13:40
  • Thanks, It matches on website, but when I try programatically it gives values in broken string like [('31', '478', '', '511014', '', ''), ('91', '900', '133', '5555', '', '')] How can I get exact phone number as result Commented Nov 8, 2017 at 13:41

2 Answers 2

1

This one should work, if you had issue with yours:

[(+\d]\d[-\d\s()]*\d
Sign up to request clarification or add additional context in comments.

3 Comments

It gives sre_constants.error: bad character range
Edited, - should be 1st in square brackets, and regular brackets don't need to be escaped.
Thanks, Can you please help in using my regex which I mentioned in question
0

it's working if you replace one of the * to +

mobile = re.findall(r'+?(\d+)\s?(?(\d*))?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.