1

I want to find repeated copies of the config section within the partition dump (binary file), using pattern and 'magic' header. The config section always starts with 202 '0xff' bytes followed by 4 bytes '\x00\x00\x23\x27'. The script should identify different copies of configuration within the partition and print addresses (in bytes) where occurrences of the pattern starting. I adjusted an existing python script for my pattern, but it doesn't works, just throws errors, due to mixing bytes with strings. How to fix this script?

#!/usr/bin/env python3
import re
import mmap
import sys

magic = '\xff' * 202
pattern = magic + '\x00\x00\x23\x27'

fh = open(sys.argv[1], "r+b")
mf = mmap.mmap(fh.fileno(), 0)
mf.seek(0)
fh.seek(0)
for occurence in re.finditer(pattern, mf):
    print(occurence.start())
mf.close()
fh.close()

errors:

$ ./matcher.py dump.bin
Traceback (most recent call last):
  File "/home/eviecomp/BC2UTILS/dump_previous_profile/./matcher.py", line 13, in <module>
    for occurence in re.finditer(pattern, mf):
  File "/usr/lib/python3.9/re.py", line 248, in finditer
    return _compile(pattern, flags).finditer(string)
TypeError: cannot use a string pattern on a bytes-like object

pattern and magic:

enter image description here

1

1 Answer 1

4

While re can deal with byte strings (you just need to heed the warning message text and search for a bytes object, not a str), it seems overkill here.

#!/usr/bin/env python3
import mmap
from sys import argv

# NOTE: important to use `b''` literals!
magic = b'\xff' * 202
pattern = magic + b'\x00\x00\x23\x27'


with open(argv[1], "r+b") as fh:
  with mmap.mmap(fh.fileno(), 0) as mm:
    pos = -1
    while -1 != (pos := mm.find(pattern, pos + 1)):
      print(pos)

Or, for the beauty of modern Python, so you can also use an "iterator" over the matches:

from mmap import mmap
from typing import Generator
from sys import argv

def positions(mm: mmap, pattern: bytes) -> Generator[int, None, None]:
  pos = -1
  while -1 != (pos := mm.find(pattern, pos + 1)):
    yield pos

pattern = b'\xff' * 202 + b'\x00\x00\x23\x27'

with open(argv[1], "r+b") as lfile:
  with mmap(lfile.fileno(), 0) as mapping:
    all_positions = ", ".join(f"{pos:#0x}" for pos in positions(mapping, pattern))

print(all_positions)
3
  • the 1st variant gives $ ./matcher.py dump.bin Traceback (most recent call last): File "/home/eviecomp/BC2UTILS/dump_previous_profile/./matcher.py", line 9, in <module> with open(sys.argv[1], "r+b") as fh: NameError: name 'sys' is not defined and the 2nd variant: $ ./matcher.py dump.bin Traceback (most recent call last): File "/home/eviecomp/BC2UTILS/dump_previous_profile/./matcher.py", line 12, in <module> with open("largefile", "r+b") as lfile: FileNotFoundError: [Errno 2] No such file or directory: 'largefile'
    – minto
    Commented Jul 6, 2023 at 21:53
  • Sorry! I was assuming your Python was fluent enough to replace strings by argv entries yourself as needed (and literally pointed out by the second error message). Fixed that. Commented Jul 6, 2023 at 21:56
  • The 1st variant works as expected - thank you! While the 2nd variant doesn't print anything, and no errors. There is probably, typo in pattern: b'x2ff'. It should be b'xff', but this doesn't change anything.
    – minto
    Commented Jul 7, 2023 at 10:09

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.