All Questions
10 questions
1
vote
2
answers
404
views
regex returning true with incorrect Japanese characters
I am checking with a form, whether a postal code fitting the Japanese format has been input or not.
I realized today that some information got through, even though it should never have "passed" the ...
1
vote
2
answers
195
views
why python2's re module can't identify the u'®' character
I got a string and I want to re.sub this string in Python2, so I tried the following statement, it worked
>>> import re
>>> re.sub(u"[™®]", "", u"a™b®c")
'abc'
But when I tried the ...
11
votes
4
answers
1k
views
Treat an emoji as one character in a regex [duplicate]
Here's a small example:
reg = ur"((?P<initial>[+\-👍])(?P<rest>.+?))$"
(In both cases the file has -*- coding: utf-8 -*-)
In Python 2:
re.match(reg, u"👍hello").groupdict()
# => {u'...
15
votes
3
answers
5k
views
Regex to Match Horizontal White Spaces
I need a regex in Python2 to match only horizontal white spaces not newlines.
\s matches all whitespaces including newlines.
>>> re.sub(r"\s", "", "line 1.\nline 2\n&...
3
votes
1
answer
972
views
find all the matches for unicodes in a string in python
import re
b="united thats weak. See ya 👋"
print b.decode('utf-8') #output: u'united thats weak. See ya \U0001f44b'
print re.findall(r'[\U0001f600-\U0001f650]',b.decode('utf-8'),flags=re....
1
vote
2
answers
261
views
Python regex is having problems finding a special unicode character
I am currently parsing through some old exams to determine the frequency of the questions (because many questions would resurface at this years exam). I am using pyperclip to get the input for the re....
4
votes
2
answers
407
views
How to build a regular vocabulary of emoticons in python?
I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is
\U0001F600
\U0001F601
\U0001F602
\U0001F603
\U0001F604
\U0001F605
\U0001F606
\...
0
votes
1
answer
76
views
Python unicode regex issue
Why does this work:
>>> ss
u'\U0001f300'
>>> r = re.compile(u"[u'\U0001F300-\U0001F5FF']+", re.UNICODE)
>>> r.search(ss) # this works
<_sre.SRE_Match object at ...
1
vote
4
answers
293
views
Unicode search not working
Consider this.
# -*- coding: utf-8 -*-
data = "cdbsb \xe2\x80\xa6 abc"
print data
#prints cdbsb … abc
^
print re.findall(ur"[\u2026]", data )
Why can't re find this unicode character ? ...
0
votes
1
answer
847
views
Parsing log file using python and storing its valid value in database using sqlite
Hi i am a newbie to python.i am creating a small program which parses the load log file of a particular website and stores the valid data in particular feild of a database.but some of the feild has ...