0

There is code here that I am trying to convert from Python2 to Python3. In this section of code, data is received from a socket. data is declared to be an empty string and then concatenated. This is an important Python2 to 3 distinction. In Python3, the 'received' variable is of type Bytes and thus needs to be converted to string first via the use of str(). However, str() needs an encoding parameter. What would the default one be for python 2? I've tried several different encodings (latin-1 and such) but they seem to not match up with a magic value that is defined here after being unpacked here

"\xffSMB seems to decode correctly while "\xfeSMB does not.

I have adjusted the non_polling_read function and the NetBIOSSessionPacket class as follows. Full modified source code available on GitHub.

    def non_polling_read(self, read_length, timeout):
        data = b''
        bytes_left = read_length

        while bytes_left > 0:
            try:
                ready, _, _ = select.select([self._sock.fileno()], [], [], timeout)

                if not ready:
                    raise NetBIOSTimeout

                received = self._sock.recv(bytes_left)
                if len(received) == 0:
                    raise NetBIOSError('Error while reading from remote', ERRCLASS_OS, None)

                data = data + received
                bytes_left = read_length - len(data)
            except select.error as ex:
                if ex[0] != errno.EINTR and ex[0] != errno.EAGAIN:
                    raise NetBIOSError('Error occurs while reading from remote', ERRCLASS_OS, ex[0])
        return data

class NetBIOSSessionPacket:
    def __init__(self, data=0):
        self.type = 0x0
        self.flags = 0x0
        self.length = 0x0
        if data == 0:
            self._trailer = ''
        else:
            try:
                self.type = data[0]
                if self.type == NETBIOS_SESSION_MESSAGE:
                    self.length = data[1] << 16 | (unpack('!H', data[2:4])[0])
                else:
                    self.flags = data[1]
                    self.length = unpack('!H', data[2:4])[0]
                self._trailer = data[4:]
            except Exception as e:
                import traceback
                traceback.print_exc()
                raise NetBIOSError('Wrong packet format ')

When I start the server and issue 'smbclient -L 127.0.0.1 -d 4' from the commandline, the server first creates a libs.nmb.NetBIOSTCPSession which appears to be working well. Once it tries to unwrap the libs.nmb.NetBIOSSessionPacket, it throws an exception.

Traceback (most recent call last):
  File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3975, in processRequest
    packet = smb.NewSMBPacket(data=data)
  File "/root/PycharmProjects/HoneySMB/libs/smb.py", line 690, in __init__
    Structure.__init__(self, **kargs)
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 77, in __init__
    self.fromString(data)
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 144, in fromString
    self[field[0]] = self.unpack(field[1], data[:size], dataClassOrCode = dataClassOrCode, field = field[0])
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 288, in unpack
    raise Exception("Unpacked data doesn't match constant value %r should be %r" % (data, answer))
Exception: ("Unpacked data doesn't match constant value b'\\xffSMB' should be 'ÿSMB'", 'When unpacking field \'Signature | "ÿSMB | b\'\\xffSMBr\\x00\\x00\\x00\\x00\\x18C\\xc8\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xfe\\xff\\x00\\x00\\x00\\x00\\x00\\xb1\\x00\\x02PC NETWORK PROGRAM 1.0\\x00\\x02MICROSOFT NETWORKS 1.03\\x00\\x02MICROSOFT NETWORKS 3.0\\x00\\x02LANMAN1.0\\x00\\x02LM1.2X002\\x00\\x02DOS LANMAN2.1\\x00\\x02LANMAN2.1\\x00\\x02Samba\\x00\\x02NT LANMAN 1.0\\x00\\x02NT LM 0.12\\x00\\x02SMB 2.002\\x00\\x02SMB 2.???\\x00\'[:4]\'')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3597, in handle
    resp = self.__SMB.processRequest(self.__connId, p.get_trailer())
  File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3979, in processRequest
    packet = smb2.SMB2Packet(data=data)
  File "/root/PycharmProjects/HoneySMB/libs/smb3structs.py", line 435, in __init__
    Structure.__init__(self,data)
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 77, in __init__
    self.fromString(data)
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 144, in fromString
    self[field[0]] = self.unpack(field[1], data[:size], dataClassOrCode = dataClassOrCode, field = field[0])
  File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 288, in unpack
    raise Exception("Unpacked data doesn't match constant value %r should be %r" % (data, answer))
Exception: ("Unpacked data doesn't match constant value b'\\xffSMB' should be 'þSMB'", 'When unpacking field \'ProtocolID | "þSMB | b\'\\xffSMBr\\x00\\x00\\x00\\x00\\x18C\\xc8\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xfe\\xff\\x00\\x00\\x00\\x00\\x00\\xb1\\x00\\x02PC NETWORK PROGRAM 1.0\\x00\\x02MICROSOFT NETWORKS 1.03\\x00\\x02MICROSOFT NETWORKS 3.0\\x00\\x02LANMAN1.0\\x00\\x02LM1.2X002\\x00\\x02DOS LANMAN2.1\\x00\\x02LANMAN2.1\\x00\\x02Samba\\x00\\x02NT LANMAN 1.0\\x00\\x02NT LM 0.12\\x00\\x02SMB 2.002\\x00\\x02SMB 2.???\\x00\'[:4]\'')

Now it's obvious why this throws an exception. After all, 0xFF does NOT equal ÿ or þ. The question is why is 0xFF the value it tries to write in there in the first place when it should be two different values?

The original python 2 code seems to be quite close to C (using unpack and importing cstring). Is there an obvious benefit to this here or could this be done more simply?

My actual question is: In the original code, there is no reference to any encoding anywhere. So how is this divined then? Where does it translate 0xFF to ÿ?

6
  • Welcome to Stack Overflow. Please read How to Ask and minimal reproducible example. We must have code in the question itself - not simply linked - which we can copy and paste, without modification, in order to see the exact problem directly. Make sure it is clear: exactly what happens when you run the code? Exactly what should happen instead (i.e., show the desired output), and how is that different? Commented Jul 20, 2022 at 5:29
  • "In Python3, the 'received' variable is of type Bytes and thus needs to be converted to string first via the use of str()." This is a misunderstanding. Regardless of the Python version, the data you read from the socket will be a raw sequence of bytes. The difference is that 2.x will incorrectly pretend that this data can be considered a string, while 3.x will force you to treat it as what it actually is. If you are repeatedly receiving raw byte data and want to concatenate it, then the solution is not to try and figure out an encoding, but to start with an empty bytes object. Commented Jul 20, 2022 at 5:32
  • Updating code from 2.x to work in 3.x is not a mechanical process, especially code that makes heavy use of both text strings and raw data blobs. It requires considerable thought and design and planning. Commented Jul 20, 2022 at 5:34
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer.
    – Community Bot
    Commented Jul 20, 2022 at 5:58
  • The encoding of a series of bytes is defined by the party sending the data on the socket, the receiver just has to be aware of the encoding to correctly interpret the data. So, Python itself doesn't apply a default re-encoding to received data, as it has no way of knowing the original encoding. You get the data as raw bytes, and you'll have to know the correct encoding the sender used to correctly interpret the received bytes.
    – Grismar
    Commented Jul 20, 2022 at 7:53

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.