2

Let's assume I have some string like that:

x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 

So, I want to get from that :

:heavy_black_heart:
:smiling_face:

To do that I did the following :

import re
result = re.search(':(.*?):', x)
result.group()

It only gives me the ':heavy_black_heart:' . How could I make it work ? If possible I want to store them in dictonary after I found all of them.

4
  • Maybe set(re.findall(r':[^:]+:', x)) will do? Not sure what there might be between :, maybe r':\w+:' will work better. Commented Sep 14, 2017 at 12:13
  • @WiktorStribiżew for the example, it works, but I couldn't understand why you're not sure Commented Sep 14, 2017 at 12:20
  • See my answer with some explanations. Actually, you have not provided all the requirements, just two examples, that is why I said I was not sure. Commented Sep 14, 2017 at 12:23
  • Do you really want to match ::? As I said, you did not post exact specs. If you need to match any chars inside :...: that are not whitespaces, use :[^\s:]+: - see my updated answer. Commented Sep 14, 2017 at 12:48

5 Answers 5

3

print re.findall(':.*?:', x) is doing the job.

Output:
[':heavy_black_heart:', ':heavy_black_heart:', ':smiling_face:']

But if you want to remove the duplicates:

Use:

res = re.findall(':.*?:', x)
dictt = {x for x in res}
print list(dictt)

Output:
[':heavy_black_heart:', ':smiling_face:']

Sign up to request clarification or add additional context in comments.

7 Comments

re.MULTILINE is not doing anything with the pattern since there are no ^ and $ to modify the behavior of. re.match only searches for a match at the beginning of the string.
Now, you do not have : in the matches.
Check now @WiktorStribiżew
You do not need any capturing group, remove ( and ). It will still match :: (not sure it is expected).
Thanks for pointing out . Capturing parentheses are removed. No , it won't match ::
|
2

You seem to want to match smilies that are some symbols in-between 2 :s. The .*? can match 0 symbols, and your regex can match ::, which I think is not what you would want to get. Besdies, re.search only returns one - the first - match, and to get multiple matches, you usually use re.findall or re.finditer.

I think you need

set(re.findall(r':[^:]+:', x))

or if you only need to match word chars inside :...::

set(re.findall(r':\w+:', x))

or - if you want to match any non-whitespace chars in between two ::

set(re.findall(r':[^\s:]+:', x))

The re.findall will find all non-overlapping occurrences and set will remove dupes.

The patterns will match :, then 1+ chars other than : ([^:]+) (or 1 or more letters, digits and _) and again :.

>>> import re
>>> x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> print(set(re.findall(r':[^:]+:', x)))
{':smiling_face:', ':heavy_black_heart:'}
>>> 

Comments

0

try this regex:

:([a-z0-9:A-Z_]+):

2 Comments

When I try it, it produces ':heavy_black_heart::heavy_black_heart:' which isn't what I want
@zwlayer It returns that match because : is inside the character class and + is a greedy quantifier, so all the chars defined in the character class are matched first, as many as possible occurrences, up to the last : that occurs after _, letters and digits.
0
import re
x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 
print set(re.findall(':.*?:', x))

output:

{':heavy_black_heart:', ':smiling_face:'}

Comments

0

Just for fun, here's a simple solution without regex. It splits around ':' and keeps the elements with odd index:

>>> text = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'
>>> text.split(':')[1::2]
['heavy_black_heart', 'heavy_black_heart', 'smiling_face']
>>> set(text.split(':')[1::2])
set(['heavy_black_heart', 'smiling_face'])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.