Regular expressions for extracting text inside curly brackets [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 2 days ago.

The community is reviewing whether to reopen this question as of 2 days ago.

Improve this question

I'm working with some marked text and I need to extract information in order to use later. I want to use regular expressions from Python using the module re, but I can't construct the right expression. I have two situations:

Text in format string="{some text}{other text 1}{other text 2}". Here I use the regexp "\\{(.*?)\\}" but I obtain
```
>> string="{some text}{other text 1}{other text 2}"
>> elements = re.split("\\{(.*?)\\}",string) 
>> print(elements)
>> ['', 'some text', '', 'other text 1', '', 'other text 2', '']
```
I can't understand why the empty strings appear in positions 0, 2, 4 and 6. If I edit my original string to string="}{some text}{other text 1}{other text 2}{" and use the regexp "\\}\\{(.*?)\\}\\{" I obtain
```
>> string="}{some text}{other text 1}{other text 2}{"
>> elements = re.split("\\}\\{(.*?)\\}\\{",string) 
>> print(elements)
>> ['', 'some text', 'other text 1', 'other text 2', '']
```
the internal empty strings in the output dissapear, but not the first and last. How should I construct the regular expression in order to obtain only the elements inside brackets?
Text in format string="some text {other text}". In this case I need to extract "some text" and also "other text". Here I don't know how to proceed.

Can someone help me, please?

Why did you expect otherwise? Split gives you what's between the matches and, if any, the groups captured from the matches: docs.python.org/3/library/re.html#re.split. It's like doing "1,2,3".split(",") and asking why there are numbers in the result. — jonrsharpe
– jonrsharpe, Commented 2 days ago
How about re.findall(r"\{([^}]+)\}",string). Notice the r"" for a raw string, so you do not have to escape \ . Also [^}]+ is cleaner than .*? since it can never overstep a closing curly bracket. — DuesserBaest
– DuesserBaest, Commented 2 days ago
This question is similar to: What exactly is a "raw string regex" and how can you use it?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. — ti7
– ti7, Commented 2 days ago
I assume it was closed as they are asking 3 questions: 1. How to extract text between braces 2. Why does split produce empty strings 3. How to tokenize text between braces and not between braces. There are also many existing answers for all these questions, so voting as a duplicate would also be valid. — jqurious
– jqurious, Commented yesterday

ti7 · Accepted Answer · 2025-11-28 12:25:31Z

1

A good strategy is often to use raw strings (r"", realistically always consider for regex) and re.escape() for complicated inputs, building your regex in parts if you need to

>>> s = re.escape(r"a [complex]* string's literal value") + ", " + r"exact.* or escaped\?"
>>> print(s)
a\ \[complex\]\*\ string's\ literal\ value, exact.* or escaped\?

Then use re.findall() or re.finditer() to get every match

.findall() directly returns a list of string matches, either the entire match or group if used, while .finditer() creates a generator which yields Match instances .. each of which can be situationally more useful than the other

>>> data = "}{some text}{other text 1}{other text 2}{"
>>> RE_braces = re.compile(r"\{([^\}]+)\}")  # { group(not '}', 1 or more) }
>>> RE_braces.findall(data)  # match -> group -> string
['some text', 'other text 1', 'other text 2']
>>> next(RE_braces.finditer(data))  # for match in RE.finditer(): ...
<re.Match object; span=(1, 12), match='{some text}'>

edited 2 days ago

answered 2 days ago

ti7

19.9k8 gold badges50 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

sln 2 days ago

Raw string and regex escaping is not a part of this question. Since you point it out, the target string should be made raw as well. And it's a distraction to escape curly braces \{ \} when not in the form of a range quantifier (m,n}. Your regex should be in this form re.findall(r"{([^\}]+)}", r"}{some text}{other text 1}{other text 2}{") if you follow your own advice. Also you should either provide a solution to part 2 of his question or note that you are omitting that.

Casimir et Hippolyte 2 days ago

the curly bracket is escaped in the character class. You have been distracted.

sln 2 days ago

Indeed ... ... . {([^}]+)}

ti7 2 days ago

@sin raw strings are just an aid for string creation and I highly recommend starting with good escaping here to help cut away the scope where bugs can be hidden! you may be conflating them with binary inputs b"", which do need to match (re.match("a", b"a") -> TypeError and can be exchanged with .encode()/.decode())! further, I think here providing good technique is more important than an overly-specific Answer, as there is not enough of the text body to completely understand their problem .. still, perhaps they would be satisfied with list(filter(None, re.split(r"[\}\{]", s)))

sln 2 days ago

I'm just on the regex side, not interested in absorbing language ambiguities in string escaping. That is not the topic here. You should check out the String tag if that's your focus here.

sln 2 days ago

There is plenty of information to provide answers to both parts of his question. Its all about regex not any filters.

Collectives™ on Stack Overflow

Regular expressions for extracting text inside curly brackets [closed]

1 Answer 1

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Linked

Related