129

What is best pure Python implementation to check if a string contains ANY letters from the alphabet?

string_1 = "(555).555-5555"
string_2 = "(555) 555 - 5555 ext. 5555

Where string_1 would return False for having no letters of the alphabet in it and string_2 would return True for having letter.

4
  • 3
    Should this be limited to english a/z alphabet only ? Should 'special' characters from others alphabets, like German, be taken in account ? Commented Jan 31, 2012 at 0:35
  • Is there any chance that you will receive unicode? Or just plain ascii roman letters? Commented Jan 31, 2012 at 0:39
  • Nice timing there :) Anyway, check this similar question out if you need help testing strings with unicode characters. Commented Jan 31, 2012 at 0:44
  • 2
    Limited to English a/z alphabet only and only plain ascii roman letters :) Commented Jan 31, 2012 at 17:15

7 Answers 7

169

Regex should be a fast approach:

re.search('[a-zA-Z]', the_string)
Sign up to request clarification or add additional context in comments.

10 Comments

Regex certainly seems a bit overkill. any(c.isalpha() for c in string_1) is deliciously Pythonic.
@Joseph No, it is not. This regex is far more readable than your expression. Also, what does isalpha even means? This will have totally different behaviors when comparing Python 2 with Python 3. Is Chinese part of the alphabet? If not, you are blindly matching it with your generator on Python 3 (Or Python 2 for unicode strings!). If you want Pythonic, here it is: Simple is better than complex.. And check OP's comment above: He wants only the roman alphabet to be matched.
I think Joseph's answer is perfectly readable and it's certainly faster than an additional import; plus you don't have to remember the order of arguments in re.search
In case anyone else is wondering what the return value is, you get a Match object if there is a match, or None if there isn't. So this is compatible with a if re.search(... pattern.
@JBernardo Knowing from which module to import is not a triviality. It should be at least mentioned. Import Regular Expression Operation from re module (Python 2.7 to 3.9.5).
|
116

How about:

>>> string_1 = "(555).555-5555"
>>> string_2 = "(555) 555 - 5555 ext. 5555"
>>> any(c.isalpha() for c in string_1)
False
>>> any(c.isalpha() for c in string_2)
True

8 Comments

Would set(string_1) be more efficent?
@Rik. You mean converting string_1 to a set before testing it? No it won't be more efficient. That is guaranteed to deal with all characters at least once while I believe the any function will short circuit (stop) when it encounters the first false.
This code will be somewhat slow because it requires a function call per char. Converting to set may or may not reduce function calls, but adds some overhead.
@JBernardo: timeit suggests it's about an order of magnitude slower than a compiled regex and takes only about 66% more time than a non-compiled one. That's well within my "I hate regular expressions" limits.
Sure: and if you use "(555).555-5555 ext. 5555"*1000 you're back to comparable speeds because of the short-circuiting. I much prefer writing in Python to writing regular expressions, which I find hard to debug unless they're trivial, and I'm not going to give up on writing clear Python unless performance requirements demand it.
|
29

You can use islower() on your string to see if it contains some lowercase letters (amongst other characters). or it with isupper() to also check if contains some uppercase letters:

below: letters in the string: test yields true

>>> z = "(555) 555 - 5555 ext. 5555"
>>> z.isupper() or z.islower()
True

below: no letters in the string: test yields false.

>>> z= "(555).555-5555"
>>> z.isupper() or z.islower()
False
>>> 

Not to be mixed up with isalpha() which returns True only if all characters are letters, which isn't what you want.

Note that Barm's answer completes mine nicely, since mine doesn't handle the mixed case well.

6 Comments

I like that this will test if it CONTAINS letters, not just test if input is ALL letters.
@Cornbeetle yes, that kind of really answers the question after all those years, thanks
Very nice way to put this. How is it in terms of efficiency ? better than regex?
there are no python loops involved, so the efficiency is good. I didn't compare with regex but I suppose it's slightly faster, specially for the initialization phase because there's no regex to compile
doesn't handle mixed case, that's stated in the answer
|
20

I liked the answer provided by @jean-françois-fabre, but it is incomplete.
His approach will work, but only if the text contains purely lower- or uppercase letters:

>>> text = "(555).555-5555 extA. 5555"
>>> text.islower()
False
>>> text.isupper()
False

The better approach is to first upper- or lowercase your string and then check.

>>> string1 = "(555).555-5555 extA. 5555"
>>> string2 = '555 (234) - 123.32   21'

>>> string1.upper().isupper()
True
>>> string2.upper().isupper()
False

Comments

14

I tested each of the above methods for finding if any alphabets are contained in a given string and found out average processing time per string on a standard computer.

~250 ns for

import re

~3 µs for

re.search('[a-zA-Z]', string)

~6 µs for

any(c.isalpha() for c in string)

~850 ns for

string.upper().isupper()


Opposite to as alleged, importing re takes negligible time, and searching with re takes just about half time as compared to iterating isalpha() even for a relatively small string.
Hence for larger strings and greater counts, re would be significantly more efficient.

But converting string to a case and checking case (i.e. any of upper().isupper() or lower().islower() ) wins here. In every loop it is significantly faster than re.search() and it doesn't even require any additional imports.

2 Comments

You can also compile the regex for furhter optimization. alpha_regex = re.compile('[a-zA-Z]') later alpha_regex.search(string)
Not to mention isalpha() doesn't workout well for multi languages. I was looking for this because I wanted to check whether a string that is expected to be Korean contains any English letters and the isalpha() method returns True for every korean string.
11

You can use regular expression like this:

import re

print re.search('[a-zA-Z]+',string)

Comments

1

You can also do this in addition

import re
string='24234ww'
val = re.search('[a-zA-Z]+',string) 
val[0].isalpha() # returns True if the variable is an alphabet
print(val[0]) # this will print the first instance of the matching value

Also note that if variable val returns None. That means the search did not find a match

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.