0

I am trying to parse a log file and replace all the IP's in it. This is straightforward for me but what i would like to do is to replace the IP's by keeping a track of which IP address it was such as consider this below as my log file :

abcdef 192.168.1.1
kbckdbc 10.10.10.10
abcdef 192.168.1.1
yuosdj 100.100.100.100

I would like to see the output as :

abcdef IP_1
kbckdbc IP_2
abcdef IP_1
yuosdj IP_3

How can i achieve this?

Here is what i have so far :

ip_list = []
_IP_RE = re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}", re.S)
counter = 0
f1 = open('logfile.txt', 'r')


for line in f1:
    for matchedip in re.findall(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}",line):
        if matchedip in ip_list:
            matchedip =  '<IP_Address_'+str(ip_list.index(matchedip)+1)+'>'        
        else:
            counter = counter + 1       
            ip_list.append(matchedip)
            matchedip = '<IP_Address_'+str(counter)+'>'
        print matchedip 

Here is a test file:

2018-09-13 19:00:00,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Client IP:192.168.100.98
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,892 INFO  -authentication.Tokenization: Token:192.168.100.98:20180913_183401is present in map
2018-09-13 19:00:01,892 INFO  -configure.ConfigStatusCollector: status.
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Client IP:192.168.100.98
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,910 INFO  -authentication.Tokenization: Token:192.168.100.98:20180913_183401is present in map
2018-09-13 19:00:01,910 INFO  -restadapter.ConfigStatusService: configuration status.
2018-09-13 19:00:01,910 INFO  -configure.Collector: Getting configuration status.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution results standard output.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution standard error.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Remote command using SSH execution status: Host     : [10.2.251.129]   User     : [root]   Password : [***********]    Command  : [shell ntpdate -u 132.132.0.88]  STATUS   : [0]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDOUT   : [Shell access is granted to root
            14 Sep 01:00:01 ntpdate[16063]: adjust time server 132.132.0.88 offset 0.353427 sec
]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDERR   : []
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Successfully executed remote command using SSH.
2018-09-13 19:00:02,318 INFO  Successfully executed the command on VCenter :10.2.251.129
1
  • Not an answer but a suggestion: since IPv4 are 32-bit numbers, would it be a problem to actually encode the number itself in each identifier (as hex, base36, whatever) ? For instance: 192.168.1.1 (0xc0a80101) would become IP_c0a80101 (hex), or IP_1hge135 (base36). Not as human readable as IP_1, IP_2, etc... but deterministic and consistent.
    – xbug
    Commented Sep 25, 2018 at 2:16

2 Answers 2

1

You can use a dictionary:

val = {}
result = ''
count = 1
content = """
abcdef 192.168.1.1
kbckdbc 10.10.10.10
abcdef 192.168.1.1
yuosdj 100.100.100.100
sadfsdf 192.168.1.1
newstuff 342.344.23.2
yuosdj 100.100.100.100
"""

data = [i.split() for i in filter(None, content.split('\n'))]
for a, b in data:
  if b not in val:
    result += f'{a} {count}\n'
    val[b] = count
    count += 1
  else:
    result += f'{a} {val[b]}\n'

print(result)

Output:

abcdef 1
kbckdbc 2
abcdef 1
yuosdj 3
sadfsdf 1
newstuff 4
yuosdj 3

Edit: to update the IP's in the file, you can use re:

import typing, re
def change_ips(ips:typing.List[str]) -> typing.Generator[str, None, None]:
   val = {}
   count = 1
   for i in ips:
     if i not in val:
       yield f'IP_{count}'
       val[i] = count
       count += 1
     else:
       yield f'IP_{val[i]}'


with open('filename.txt') as f:
  content = f.read()
  with open('filename.txt', 'w') as f1:
    f1.write(re.sub('\d+\.\d+\.\d+\.\d+', '{}', content).format(*change_ips(re.findall('\d+\.\d+\.\d+\.\d+', content))))

Output:

2018-09-13 19:00:00,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Client IP:IP_1
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,892 INFO  -authentication.Tokenization: Token:IP_1:20180913_183401is present in map
2018-09-13 19:00:01,892 INFO  -configure.ConfigStatusCollector: status.
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Client IP:IP_1
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,910 INFO  -authentication.Tokenization: Token:IP_1:20180913_183401is present in map
2018-09-13 19:00:01,910 INFO  -restadapter.ConfigStatusService: configuration status.
2018-09-13 19:00:01,910 INFO  -configure.Collector: Getting configuration status.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution results standard output.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution standard error.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Remote command using SSH execution status: Host     : [IP_2]   User     : [root]   Password : [***********]    Command  : [shell ntpdate -u IP_3]  STATUS   : [0]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDOUT   : [Shell access is granted to root
        14 Sep 01:00:01 ntpdate[16063]: adjust time server IP_3 offset 0.353427 sec]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDERR   : []
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Successfully executedremote command using SSH.
2018-09-13 19:00:02,318 INFO  Successfully executed the command on VCenter :IP_2
8
  • And where am i using the regex here? Also, does it work if you have two or three ip's per line?
    – PanDe
    Commented Sep 25, 2018 at 2:37
  • @PetPan You do not need regex for this problem. The solution will work with multiple different ip's. Please see my recent edit.
    – Ajax1234
    Commented Sep 25, 2018 at 2:42
  • i take it back, this solution doesn't work for me properly. I have updated the original post with a relevant test file
    – PanDe
    Commented Sep 25, 2018 at 5:11
  • @PetPan Sorry for the late response. What is your desired output from your new input? Does the data after INFO take the place of the "IP"?
    – Ajax1234
    Commented Sep 25, 2018 at 5:15
  • Nop just the ip part needs to be masked as <IP_address_1> , <IP_address_2> etc.
    – PanDe
    Commented Sep 25, 2018 at 5:22
0

Here is my suggestion. This will work if there is only one ip per line.

import re

ip2idx = {}
_IP_RE = re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}", re.S)
counter = 1

with open('logfile.txt') as f1:
    for line in f1:
        line = line.rstrip()
        # use the compiled regex
        m = _IP_RE.search(line)
        if m:
            ip = m.group(0)
            idx = ip2idx.get(ip)
            if idx is None:
                ip2idx[ip] = counter
                idx = counter
                counter += 1

            print(line[:m.start()] + 'IP_'+str(idx) + line[m.end():])

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.