1

I have the below input text for which I want to extract the following values with a regex (I firstly look for a Regex that works for VBA Excel, but I think there is no too much difference if the Regex is for PCRE):

1- the number after num --> 3025285000

2- The ip addresses --> 111.111.112.223 and 13.100.1.11

3- The Uplink and Downlink values --> 139161 and 6269538

4- The date and time after xTime --> 2019-07-22 18:09:55 -03:00

input text is like this:

{ num:{3025285000} }
{ ipadd:{iPadd:{iPv4add:{111.111.112.223} } } }
{ ipval:{iPadd:{13.100.1.11} } }
{ :{abc:{-} ddf:{-} mainVALUplink:{139161} mainVALDownlink:{6269538} kppacRR:{bbckdo} xTime:{2019-07-22 18:09:55 -03:00 } ppwo:{-} wwe:{-} iiurur:{qCI:{8} wie:{-} iiwww:{-} oop:{-} } } }

I've being trying with the following Regex (You can check here https://regex101.com/r/J9kGMy/1)

\B\d+\b|\B{\d+\..+\b}|dataVolumeGPRS(Up|Down)link:\B\d+|xTime:{\B\d+

But the matches are incomplete or taking more characters, since the current matches are:

Match 1: Full match 8-17 025285000 Should be 3025285000

Match 2: Full match 45-62 {111.111.112.223} Should be 111.111.112.223

Match 3: Full match 84-97 {13.100.1.11} Should be 13.100.1.11

Match 4: Full match 138-143 39161 Should be 139161

Match 5: Full match 163-169 269538 Should be 6269538

Match 6: Full match 196-199 019 Should be 2019-07-22 18:09:55 -03:00

Match 7: Full match 201-202 7
Match 8: Full match 204-205 2
Match 9: Full match 207-208 8
Match 10: Full match 210-211 9
Match 11: Full match 213-214 5
Match 12: Full match 217-218 3
Match 13: Full match 220-221 0
Match 14: Full match 305-307 04

1
  • If you want to use an alternation to get those fields, you could make the match more specific by matching the text in front and capturing the value in a capturing group num:{(\d+)}|iPadd:{([^{}]+)}|iPv4add:{([^{}]+)}|xTime:{([^{}]+)}regex101.com/r/EmeggH/1 Is the order of the input always the same or can the position of the fields vary? Commented Jul 27, 2019 at 8:25

2 Answers 2

1

In the pattern that you tried, you are using an alternation using the |

If you want to match your values in that way you could make the match more specific by matching num, ipAdd , iPv4add or xTime and then use a capturing group to capturing what is between { and }.

Inside the capturing group (\d+) you could match 1+ digits using or make use of a negated character class ([^{}]+) and match all characters except { and }

The result will have 4 capturing groups.

num:{(\d+)}|iPadd:{([^{}]+)}|iPv4add:{([^{}]+)}|xTime:{([^{}]+)}

Regex demo

Sign up to request clarification or add additional context in comments.

3 Comments

Hello, thanks for your help and solution. It seems to work for pcre. The only thing is not capturing uplink nor downlink values. I think could be similar to how you did to capture num right?
@GerCas That is correct, it would look like num:{(\d+)}|iPadd:{([^{}]+)}|iPv4add:{([^{}]+)}|xTime:{([^{}]+)}|mainVALUplink:{(\d+)}|mainVALDownlink:{(\d+)} regex101.com/r/YImYKM/2 Is the format of the data always in this order with all the values present and the newlines at consistent places? If that is the case, then you could go for a more efficient match.
Thanks for your addition. Actually, the different strings are in different columns in same line and there are several lines with the same format. I was thinking in use a single regex to remove unneeded data because of that I was trying with | or sign or alternation like you said.
1

Since each expected result is on a line of its own, the logical solution would be to split the input line-by-line, and then apply simpler individual regexes to them to capture what is needed.

However, the following regex, though lengthy, works as expected:

num:\s*{\s*(\d+)\s*}[\s\S]+?(?:ip\S+:{\s*([\d\.]+)\s*}[\s\S]+?)[\s\S]+?(?:ip\S+:{\s*([\d\.]+)\s*}[\s\S]+?)[\s\S]+?Uplink:\s*{\s*(\d+)\s*}[\s\S]+?Downlink:\s*{\s*(\d+)}\s*[\s\S]+xTime:{\s*([\s\d-:]*?)\s*}

Each capture group contains the required data.

Demo

3 Comments

Hi CinCout. Thanks for your solution. I see that your regex doesn't use alternation |. I thought that only with alternation was possible to get different matches. How would be your solution in order to match uplink and downlink values too?
@GerCas updated the answer to include Uplink and Downlink
Thanks so much. It works pretty fine. I only selected the other as answer due to that is smaller.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.