8
\$\begingroup\$

So as a fun weekend project, I decided to implement from scratch without AI assistance a (probably useless) Python class for handling basic arithmetic (addition, subtraction, multiplication and division) between arbitrary-precision natural numbers in any user-defined base system.

natural.py

# TODO reimplement everything in c++, base 2^30 (use 30 bit digits), karatsuba algorithm, better division algorithm

import copy

binary = '01'
octal = '01234567'
decimal = '0123456789'
hexadecimal = '0123456789abcdef'
base64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

class Natural:
    """
    A class to represent a nonnegative integer of arbitrary size in any base system.

    Attributes:
        digits (list): List of integers where each integer represents position in base string. It represents our arbitrary-sized nonnegative integer starting with least significant digit.
        base (str): String of arbitrary symbols. It represents a custom alphabet for our base system.
        b (int): Length of base string.
    """
    def __init__(self, input, base=decimal):
        """
        Args:
            input (Union[str, int, list]): Raw number in string, nonnegative integer, or list of chars (least significant digit must be stored first in list). If input of type str or list of chars, it can be empty or assumed to contain chars from base.
            base (str): String of arbitrary symbols. Defaults to decimal.

        Raises:
            ValueError: Raised when input is of wrong type.
        """
        self.digits = []
        self.base = base
        self.b = len(base)
        if isinstance(input, str):
            if input == '':
                self.digits.append(0)
            else:
                for i in range(len(input) - 1, -1, -1):
                    self.digits.append(self.base[:self.b].index(input[i]))
        elif isinstance(input, int):
            if input == 0:
                self.digits.append(0)
            else:
                while input > 0:
                    self.digits.append(input % self.b)
                    input //= self.b
        elif isinstance(input, list):
            self.digits = input
        else:
            raise ValueError('Your input is not of type str or int or list.')
    def __add__(self, other):
        """Performs a simple grade school addition between this Natural and other and returns its sum (inputs remain unchanged).

        Args:
            other (Natural): Another Natural to be added to this Natural.

        Returns:
            Natural: Sum of this Natural and other.
        """
        result = []
        top, bottom = (self, other) if self > other else (other, self)
        carry = False
        i = 0
        while i < bottom.size():
            x = top.digits[i] + bottom.digits[i]
            if carry:
                x += 1
                carry = False
            if x >= self.b:
                carry = True
            result.append(x - self.b if carry else x)
            i += 1
        if top.size() != bottom.size():
            while i < top.size():
                x = top.digits[i]
                if carry:
                    x += 1
                    carry = False
                if x >= self.b:
                    carry = True
                result.append(x - self.b if carry else x)
                i += 1
        if carry:
            result.append(1)
        return Natural(result, self.base)
    def __sub__(self, other):
        """Performs a simple grade school subtraction between this Natural and other and returns its difference (inputs remain unchanged).

        Args:
            other (Natural): Another Natural to be subtracted from this Natural. It must be less than or equal to this Natural.

        Raises:
            ValueError: Raised when this Natural is smaller than other.

        Returns:
            Natural: Difference of this Natural and other.
        """
        if self < other:
            raise ValueError('other is too big')
        result = []
        top, bottom = copy.deepcopy(self), other
        i = 0
        while i < bottom.size():
            x = top.digits[i] - bottom.digits[i]
            if x < 0:
                x += self.b
                # need to borrow...
                for j in range(i + 1, top.size()):
                    if top.digits[j] > 0:
                        top.digits[j] -= 1
                        for k in range(j - 1, i, -1):
                            top.digits[k] = self.b - 1
                        break
            result.append(x)
            i += 1
        if top.size() != bottom.size():
            while i < top.size():
                result.append(top.digits[i])
                i += 1
        difference = Natural(result, self.base)
        difference._strip_leading_zeros()
        return difference
    def __mul__(self, other):
        """Performs a simple grade school multiplication between this Natural and other and returns its product (inputs remain unchanged).

        Args:
            other (Natural): Another Natural to be multiplied with this Natural.

        Returns:
            Natural: Product of this Natural and other.
        """
        total = Natural(0, self.base)
        top, bottom = (self, other) if self > other else (other, self)
        a = 0 # carry
        for i in range(bottom.size()):
            result = []
            for k in range(i):
                result.append(0)
            for j in range(top.size()):
                x = top.digits[j] * bottom.digits[i] + a
                a, b = x // self.b, x % self.b
                result.append(b)
            if a > 0:
                result.append(a)
            a = 0
            total = total + Natural(result, self.base)
        total._strip_leading_zeros()
        return total
    def __pow__(self, n):
        """Returns this Natural raised to power of n.

        Args:
            n (int): Exponent represented as nonnegative integer.

        Raises:
            ValueError: Raised when n is less than zero.

        Returns:
            Natural: Result of exponentiation.
        """
        if n < 0:
            raise ValueError('n < 0')
        r = Natural(1, self.base)
        if n == 0:
            return r
        for i in range(n):
            r = r * self
        return r
    def __truediv__(self, other):
        """Performs a simple grade school division between this Natural and other and returns quotient and remainder (inputs remain unchanged).

        Args:
            other (Natural): Divisor which is nonzero.

        Raises:
            ValueError: Raised when divisor happens to be zero.

        Returns:
            tuple[Natural, Natural]: A tuple containing quotient and remainder.
        """
        if other.is_zero():
            raise ValueError('divisor cannot be zero')
        result = [0]
        dividend, divisor = self, other
        remainder = Natural(0, self.base)
        one = Natural(1, self.base)
        for i in range(dividend.size() - 1, -1, -1):
            remainder.digits.insert(0, dividend.digits[i])
            remainder._strip_leading_zeros()
            if divisor > remainder:
                result.insert(0, 0)
            else:
                k = copy.deepcopy(one)
                while divisor * k <= remainder:
                    k = k + one
                k = k - one
                result.insert(0, k.digits[0])
                remainder = remainder - divisor * k
        quotient = Natural(result, self.base)
        quotient._strip_leading_zeros()
        return quotient, remainder
    def __eq__(self, other):
        """Checks if this Natural is equal to other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is equal to other.
        """
        if self.size() != other.size():
            return False
        for i in range(self.size()):
            if self[i] != other[i]:
                return False
        return True
    def __ne__(self, other):
        """Checks if this Natural is not equal to other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is not equal to other.
        """
        return not self == other
    def __lt__(self, other):
        """Checks if this Natural is less than other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is less than other.
        """
        if self.size() < other.size():
            return True
        if self.size() > other.size():
            return False
        for i in range(self.size() - 1, -1, -1):
            if self.digits[i] < other.digits[i]:
                return True
            if self.digits[i] > other.digits[i]:
                return False
        return False
    def __le__(self, other):
        """Checks if this Natural is less than or equal to other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is less than or equal to other.
        """
        return self < other or self == other
    def __gt__(self, other):
        """Checks if this Natural is greater than other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is greater than other.
        """
        if self.size() > other.size():
            return True
        if self.size() < other.size():
            return False
        for i in range(self.size() - 1, -1, -1):
            if self.digits[i] > other.digits[i]:
                return True
            if self.digits[i] < other.digits[i]:
                return False
        return False
    def __ge__(self, other):
        """Checks if this Natural is greater than or equal to other.

        Args:
            other (Natural): Another Natural to be compared with this Natural.

        Returns:
            bool: True if this Natural is greater than or equal to other.
        """
        return self > other or self == other
    def __getitem__(self, i):
        """Returns ith digit of this Natural.

        Args:
            i (int): Desired position.

        Returns:
            str: ith digit of this Natural.
        """
        return self.base[self.digits[i]]
    def __str__(self):
        """Returns human-readable string form of this Natural.

        Returns:
            str: String representation of this Natural.
        """
        s = ''
        for i in range(self.size()):
            s = self[i] + s
        return s
    def _strip_leading_zeros(self):
        """Internal method to strip leading zeros from digits.
        """
        zeros = 0
        while zeros < self.size()-1 and self.digits[self.size() - 1 - zeros] == 0:
            zeros += 1
        self.digits = self.digits[:self.size() - zeros]
    def size(self):
        """Returns the number of digits of this Natural in its base system.

        Returns:
            int: Number of digits.
        """
        return len(self.digits)
    def is_zero(self):
        """Returns true if this Natural is zero in its base system.

        Returns:
            bool: Indicates whether it is zero or not.
        """
        return self.size() == 1 and self.digits[0] == 0
    def change_base(self, newbase):
        """Changes underlying base to newbase and modifies digits of this Natural accordingly.

        Args:
            newbase (str): New base string.
        """
        if newbase != self.base:
            v = 0
            for i in range(self.size()):
                v += self.digits[i] * self.b**i
            self.base = newbase
            self.b = len(newbase)
            self.digits = []
            while v > 0:
                self.digits.append(v % self.b)
                v //= self.b

natural_test.py

import copy
import operator

from natural import Natural, binary, octal, decimal, hexadecimal, base64

def testBase10(op, maxn, flag=False):
    if flag:
        for i in range(maxn + 1):
            for j in range(i + 1):
                a = Natural(i)
                b = Natural(j)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != (op(i, j)).__str__():
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {op(i, j)}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 10 test passed for {op} :)')
    else:
        for i in range(maxn + 1):
            for j in range(maxn + 1):
                a = Natural(i)
                b = Natural(j)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != (op(i, j)).__str__():
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {op(i, j)}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 10 test passed for {op} :)')

testBase10(operator.add, 100)
testBase10(operator.sub, 100, True)
testBase10(operator.mul, 100)

def testBase16(op, maxn, flag=False):
    if flag:
        for i in range(maxn + 1):
            for j in range(i + 1):
                a = Natural(i, hexadecimal)
                b = Natural(j, hexadecimal)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != hex(op(i, j))[2:]:
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {hex(op(i, j))[2:]}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 16 test passed for {op} :)')
    else:
        for i in range(maxn + 1):
            for j in range(maxn + 1):
                a = Natural(i, hexadecimal)
                b = Natural(j, hexadecimal)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != hex(op(i, j))[2:]:
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {hex(op(i, j))[2:]}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 16 test passed for {op} :)')

testBase16(operator.add, 100)
testBase16(operator.sub, 100, True)
testBase16(operator.mul, 100)

def testBase2(op, maxn, flag=False):
    if flag:
        for i in range(maxn + 1):
            for j in range(i + 1):
                a = Natural(i, binary)
                b = Natural(j, binary)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != bin(op(i, j))[2:]:
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {bin(op(i, j))[2:]}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 2 test passed for {op} :)')
    else:
        for i in range(maxn + 1):
            for j in range(maxn + 1):
                a = Natural(i, binary)
                b = Natural(j, binary)
                _a = copy.deepcopy(a)
                _b = copy.deepcopy(b)
                result = op(a, b)
                if result.__str__() != bin(op(i, j))[2:]:
                    print(f'Yuh oh, for {a} and {b} with {op}, we have {result} instead of {bin(op(i, j))[2:]}')
                    assert False
                if a != _a or b != _b:
                    print(f'Yuh oh, for {op}, inputs {_a} and {_b} got modified to {a} and {b}')
                    assert False
        print(f'Base 2 test passed for {op} :)')

testBase2(operator.add, 100)
testBase2(operator.sub, 100, True)
testBase2(operator.mul, 100)

for i in range(101):
    for j in range(1, 101):
        a = Natural(i)
        b = Natural(j)
        _a = copy.deepcopy(a)
        _b = copy.deepcopy(b)
        q, r = a / b
        if q.__str__() != (i//j).__str__() or r.__str__() != (i%j).__str__():
            print(f'hmm, dividing {a} by {b} results in q={q} and r={r} (expected: q={i // j} and r={i % j})')
            assert False
        if a != _a or b != _b:
            print(f'inputs {_a} and {_b} got modified to {a} and {b}')
            assert False
print('base 10 division test: pass :)')

for i in range(101):
    for j in range(1, 101):
        a = Natural(i, hexadecimal)
        b = Natural(j, hexadecimal)
        _a = copy.deepcopy(a)
        _b = copy.deepcopy(b)
        q, r = a / b
        if q.__str__() != hex(i//j)[2:] or r.__str__() != hex(i%j)[2:]:
            print(f'hmm, dividing {a} by {b} results in q={q} and r={r} (expected: q={hex(i//j)[2:]} and r={hex(i%j)[2:]})')
            assert False
        if a != _a or b != _b:
            print(f'inputs {_a} and {_b} got modified to {a} and {b}')
            assert False
print('base 16 division test: pass :)')

for i in range(101):
    for j in range(1, 101):
        a = Natural(i, binary)
        b = Natural(j, binary)
        _a = copy.deepcopy(a)
        _b = copy.deepcopy(b)
        q, r = a / b
        if q.__str__() != bin(i//j)[2:] or r.__str__() != bin(i%j)[2:]:
            print(f'hmm, dividing {a} by {b} results in q={q} and r={r} (expected: q={bin(i//j)[2:]} and r={bin(i%j)[2:]})')
            assert False
        if a != _a or b != _b:
            print(f'inputs {_a} and {_b} got modified to {a} and {b}')
            assert False
print('base 2 division test: pass :)')

x = Natural(143785)
print(x)
x.change_base(hexadecimal)
print(x)
x.change_base(binary)
print(x)
x.change_base(octal)
print(x)
x.change_base(decimal)
print(x)

# https://cs.stackexchange.com/a/10495/146253

weird = 'αβγδρζξ'

b = Natural('βξ', weird)
x1 = Natural('ββ', weird)
x2 = Natural('βζ', weird)
x3 = Natural('α', weird)
x4 = Natural('ξ', weird)
c = x1 + x2*b + x3*b**2 + x4*b**3

print(c)
c.change_base('0123456789XYZ')
print(c)

a = Natural('60Z8', '0123456789XYZ')
a.change_base(weird)
print(a)

x = Natural(1343435783, base64)
y = Natural(2456, base64)
q, r = x / y
print(f'q={q}, r={r}')

x = Natural(1343435783, binary)
y = Natural(2456, binary)
q, r = x / y
print(f'q={q}, r={r}')

q.change_base(base64)
r.change_base(base64)

print(f'q={q}, r={r}')

I am requesting a code review for both files above. Here are some of my specific concerns:

  • I wonder if I documented my class well. I am following Google-style documentation.
  • Is there anything to improve in my natural_test.py?
  • Are my tests comprehensive enough?
  • What are possible ways to optimize my code, in particular my code for methods __add__, __sub__, __mul__ and __div__.

Any other comments or suggestions are welcome!

\$\endgroup\$
8
  • 7
    \$\begingroup\$ I don't want to be all nitpicky, but natural numbers can't vary in precision, or have "arbitrary precision". The precision of natural numbers is always "1". I think you mean arbitrary size? \$\endgroup\$ Commented Jan 6 at 15:35
  • 1
    \$\begingroup\$ Feels like the change_base implementation technically "cheats" as v can grow arbitrarily large - while that's okay with Python's int (I think), it's basically using a different "arbitrarily large int", if that makes any sense, instead of doing everything digit by digit as other functions do. \$\endgroup\$ Commented Jan 7 at 13:07
  • 6
    \$\begingroup\$ I understand this is an educational project, but I would like to mention that the Python built-in int type is already an arbitrary width integer type. Even though it is fixed base-2 not arbitrary base (so base changing functionality is missing). See for instance here stackoverflow.com/questions/538551/… \$\endgroup\$ Commented Jan 7 at 15:13
  • 2
    \$\begingroup\$ Talking about bases of numbers confuses confuses internal representation with input/output. Bases only matter when interacting with humans. \$\endgroup\$ Commented Jan 8 at 14:58
  • 2
    \$\begingroup\$ @DJClayworth If you take 'precision' to include 'significant digits', then it seems fine to me? \$\endgroup\$ Commented Jan 8 at 18:53

6 Answers 6

10
\$\begingroup\$

More DRY

Without even delving deep into the code, a cursory overview shows that here is hardly any difference between testBase2, testBase10 and testBase16. It would make sense to merge them to a single, generic function with an additional argument for the base. Your class already has a base argument.

Even if you were writing unit tests, I do not really see a need for separate functions. Did I say unit tests? Yes, that would be nice and you are almost there. Suggested reading for pytest: How to parametrize fixtures and test functions.

Layout

Add line spacing between functions. Inside a class: one line. I like the docstrings.

Class properties

Consider this:

@property
def size(self):
    return len(self.digits)

Then, inside and outside your class you can omit the empty parentheses eg self.size.

As @toolic already mentioned, the string module provides some useful constants. Also some lesser-known ones:

string.hexdigits
'0123456789abcdefABCDEF'
string.octdigits
'01234567'

If you look inside the source code of that module, they are defined like this:

digits = '0123456789'
hexdigits = digits + 'abcdef' + 'ABCDEF'
octdigits = '01234567'

So nothing fancy here.

\$\endgroup\$
9
\$\begingroup\$

First of all, I'd like to congratulate you on writing this without using any AIs! Whatever you learn by doing it the hard way, will surely pay dividends later on. In my own experience, current LLMs can be useful when learning a completely new language, but are still extremely brittle for any kind of non-standard algorithms (and even sometimes for code that should be simple). But I would recommend letting (for instance) CoPilot review your code. Code reviews by CoPilot are often extremely helpful. I'd also recommend starting to use a linter, like pylint, flake8 or black and using type annotations (with mypy). Doing this will have an immediate impact on your coding style.

import copy

binary = '01'
octal = '01234567'
decimal = '0123456789'
hexadecimal = '0123456789abcdef'
base64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

You don't need those constants. Also, in any kind of larger program, using global constants almost invariably leads to bugs. To minimize the risk of bugs, and for readability, it's better to very clearly indicate which names refer to constants, for instance, by putting them in ALL CAPS. The only constant that is special in this code is the base64 one, since with that you introduce your own convention of representation integers in base64 (so that should be documented or at least hinted at in the doc string of the class).

class Natural:

"Natural" is not a very good name. You mean BigInt. The natural numbers are the non-negative integers. The name BigInt would make clear what the purpose of this class is (to support integers of in principle arbitrary size).

"""
A class to represent a nonnegative integer of arbitrary size in any base system.

Attributes:
    digits (list): List of integers where each integer represents position in base string. It represents our arbitrary-sized nonnegative integer starting with least significant digit.
    base (str): String of arbitrary symbols. It represents a custom alphabet for our base system.
    b (int): Length of base string.
"""

It's a bit strange to not also support negative integers, but that's a choice. The attributes of this class should really be hidden since they are implementation details. There is no need for a user of the class to know about them. If so, then they should also not be documented.

In serious code you might at most want to document them in the form of comments, if their meaning or use is not clear from the code itself. Note also that in production it can be extremely annoying to your coworkers -- or your future self -- to document or comment on things that are obvious. As rule of thumb you should ask yourself:

Will I still understand this code when I read it back half a year from now?

And then

If not, shouldn't I then rewrite it so it's clearer?! Or is it just a tiny unclarity or ambiguity that can be solved by adding a short comment?

def __init__(self, input, base=decimal):

You are using a built-in function ("input") as argument name. DO NOT DO THIS. (Did I say that loud enough?) Unfortunately Python allows this, but it's extremely bad style to use built-in function names as argument or variable names. It makes the code harder to read, and can lead to various other issues. Just don't do it. (If you'd have done this in a job interview with me, it would all by itself result in dismissal of you as candidate.)

To specify the base, it seems simpler and clearer to use a single integer as base (that would also conform to how we talk of a base in math, in other words: you're here unnecessarily increasing the mental load of whoever reads the code).

        self.digits = []
        self.base = base
        self.b = len(base)
        if isinstance(input, str):
            if input == '':
                self.digits.append(0)
            else:
                for i in range(len(input) - 1, -1, -1):
                    self.digits.append(self.base[:self.b].index(input[i]))
        elif isinstance(input, int):
            if input == 0:
                self.digits.append(0)
            else:
                while input > 0:
                    self.digits.append(input % self.b)
                    input //= self.b
        elif isinstance(input, list):
            self.digits = input
        else:
            raise ValueError('Your input is not of type str or int or list.')

Okay. First the edge conditions. It's unclear why you would allow an empty string as input arg, but not for instance None. Also, apparently you allow empty lists as input. But your internal represenation of that (digits = []) differs from the representation if the input is an empty string (digits = [0]). So, that's inconsistent. Inconsistencies like this lead to bugs or crashes later on.

Generally, if you have a bunch of "if ... else ..." statements like this, the code will be more readable if you use a match statement instead. (You can import builtins, and match type(x) with case builtins.int, etc.)

To convert an int into a list of digits, it's also clearer if you simply use list(str(x)). In your handling of int inputs, I don't really know anymore what you're doing or trying to do. The code allows an input like 12345 (as int) with base 2. That will surely generate nonsensical results.

For the str-typed inputs, of course the code can also easily crash. While for the list-typed inputs, the input is simply always accepted. So, all this really needs some proper error checking.

The most important issue I have with the code so far is that there is no uniform internal representation of these big ints. That is, the internal representation in the class stays too close to the "surface" representation of the input sequence. This implies that the actual operators (add, sub etc) will also become a lot more complicated. It seems better to me to select a uniform input representation, say with internal constant base 2**32. (Of course, you could also simply use one Python int, since those can become arbitrarily large, but that kind of defeats the purpose of the exercise.) Doing so, you also already don't need to store the base anymore (it's merely an input arg). The only internal field would then be 'digits' (a list of normal integers, 0 <= x < 2**32). For each of the input types (str, list, int, or None) I would then write a little separate conversion helper function that also does proper error checking, so the match code can be tidy and neat. You actually only need one conversion function:

match type(digits):
    case builtins.str:
         self.digits = convert(list(digits), base)
    case builtins.int:
         self.digits = convert(list(reversed(str(digits))), base)
    case builtins.list:
         self.digits = convert(digits, base)
    case _:
        raise TypeError("your error msg here") 

It perhaps a little more appropriate to raise a TypeError, but ValueError is also often used in similar situations.

I'd consider simplifying the constructor code more and pushing all logic into one top-level conversion function that calls the helper function for the different cases. So, in the constructor you would just have:

self.digits = convert(digits, base)

Function 'convert' could be made a static function of the class. Doing so, makes the __init__ simpler, and makes it easier to write unit tests.

In a program like this, type annotations can help to keep the code clear and make it more readable. So, I'd consider using them.

\$\endgroup\$
4
  • 2
    \$\begingroup\$ BigUint would be more appropriate since the representation of natural numbers wouldn't involve signedness. \$\endgroup\$ Commented Jan 7 at 22:03
  • \$\begingroup\$ @Abion47 - I agree. Thanks for the suggestiong/correction! \$\endgroup\$ Commented Jan 7 at 22:56
  • \$\begingroup\$ I don't see a problem with the name Natural; it seems to go well in hand with python's Fraction and Decimal. \$\endgroup\$ Commented Jan 9 at 10:52
  • \$\begingroup\$ @Stef - I agree that this is a rather subjective (minor) criticism. My own main principle in trying to name important classes is to make the name as self-explanatory as possible given the purpose of the class. As part of a general lib of number classes "Natural" would also work, but it would still not hint at the pretty specific purpose of supporting arbitrarily sized numbers. \$\endgroup\$ Commented Jan 9 at 13:31
6
\$\begingroup\$

A few observations/suggestions.

Pattern-matching in constructor

I think we can clean up your constructor using pattern-matching. Docstring elided for brevity.

    def __init__(self, input, base=decimal):
        
        self.digits = []
        self.base = base
        self.b = len(base)
        
        match input:
            case '' | 0:
                self.digits.append(0)
            case str(_):
                for i in range(len(input) - 1, -1, -1):
                    self.digits.append(self.base[:self.b].index(input[i]))
            case int(_):
                while input > 0:
                    self.digits.append(input % self.b)
                    input //= self.b
            case list(_):
                self.digits = input
            case _:
                raise ValueError('Your input is not of type str or int or list.')         

I also think that the following might be simplified:

            case str(_):
                for i in range(len(input) - 1, -1, -1):
                    self.digits.append(self.base[:self.b].index(input[i]))

Using reversed:

            case str(_):
                for i in reversed(range(len(input))):
                    self.digits.append(self.base[:self.b].index(input[i]))

We could simply use a slice to reverse the input, but if it's very long you may not wish to create a copy of the list.

            case str(_):
                for d in input[::-1]:
                    self.digits.append(self.base[:self.b].index(d)

Why docstrings are a good idea

In the following, what is flag? I don't see anything that explains what it does.

def testBase10(op, maxn, flag=False):

Further, you have two large chunks of code in each branch of this code. Maybe this really calls for two functions? Except that they appear identical, or at least nearly so. Is this duplication really necessary?

Loops

Setting i to 0, then looping up to a known value, unconditionally incrementing by a constant factor at the end of the loop sure looks like a place to use a for loop over a range.

        result = []
        top, bottom = (self, other) if self > other else (other, self)
        carry = False
        i = 0
        while i < bottom.size():
            x = top.digits[i] + bottom.digits[i]
            if carry:
                x += 1
                carry = False
            if x >= self.b:
                carry = True
            result.append(x - self.b if carry else x)
            i += 1
        result = []
        top, bottom = (self, other) if self > other else (other, self)
        carry = False

        for i in range(bottom.size()):
            x = top.digits[i] + bottom.digits[i]

            if carry:
                x += 1
                carry = False

            if x >= self.b:
                carry = True

            result.append(x - self.b if carry else x)

Further, since you have defined __gt__:

        top, bottom = sorted([self, other], reverse=True)

__str__

Calling a.__str__() looks really un-Pythonic vs. str(a).

String construction

        s = ''
        for i in range(self.size()):
            s = self[i] + s
        return s

Let's use a generator expression and a reversed range.

        return ''.join(self[i] for i in reversed(range(self.size())))

Code organization

First off, imports. Then classes and functions. Then the code you want to execute. But you should be wrapping that in a function and then conditionally calling it.

E.g.

if __name__ == '__main__':
    main()

You may further wish to have your testing code in a separate file.

\$\endgroup\$
2
  • \$\begingroup\$ Thank you for your suggestions but regarding your suggestion for loops, I have a reason for writing this way: I need to use i in the next part of code. \$\endgroup\$ Commented Jan 6 at 5:24
  • 2
    \$\begingroup\$ i is going to be equal to bottom.size() when the loop exits. \$\endgroup\$ Commented Jan 6 at 5:44
5
\$\begingroup\$

A few suggestions, especially the last one concerning a different approach to implementation:

Error Messaging

You have in your initializer:

                for i in range(len(input) - 1, -1, -1):
                    self.digits.append(self.base[:self.b].index(input[i]))

You are clearly appending the characters of string input in reverse order to self.digits. But if any character in input is not found in self.base, the call to str.index will raise a ValueError: substring not found exception, which is not particularly meaningful to the client. Also, given how self.base and self.b are related, self.base[:self.b] can be simplified to just self.base. I would replace the above code with:

                for ch in input[::-1]:
                    idx = self.base.find(ch)
                    if idx == -1:
                        raise ValueError(
                            f'Input character {ch!r} is not valid for your specified base.'
                        )
                    self.digits.append(idx)

Naming

Your class is named Natural, but natural what? Wouldn't NaturalNumber be a better name? I would suggest that you rename variable input since it is the name of a built-in function. Likewise, decimal is the name of a standard library module. But all of the bases you have defined at global scope should have names such as DECIMAL_BASE_CHARACTERS. That is, they should be capitalized and appended with _BASE_CHARACTERS to make their semantics clearer and to let us know that they are some pre-defined constants that can be used by clients. You might make them class attributes of NaturalNumber.

PEP 8 Guidelines

You have some very long lines, especially in your docstrings, that could be broken up to keep their length <= 80. You should also have blank lines between your method definitions.

Testing

Include a test where for a given base the input contains an invalid character. This should raise a ValueError exception with a suitable message. Catch the exception and compare its message with what you would expect.

Implementation

Consider the following:

weird = 'αβγδρζξ'

b = Natural('βξ', weird)
x1 = Natural('ββ', weird)
x2 = Natural('βζ', weird)
x3 = Natural('α', weird)
x4 = Natural('ξ', weird)
c = x1 + x2*b + x3*b**2 + x4*b**3

print('weired rexult =', c)
c.change_base(decimal)
print('converted to decimal:', c)

b.change_base(decimal)
x1.change_base(decimal)
x2.change_base(decimal)
x3.change_base(decimal)
x4.change_base(decimal)
c = x1 + x2*b + x3*b**2 + x4*b**3
print('result when first converted to decimal:', c)
c.change_base(weird)
print('result when converted back to weird:', c)

Prints:

weired rexult = ζδξγρ
converted to decimal: 13346
result when first converted to decimal: 13346
result when converted back to weird: ζδξγρ

This suggests that a totally different approach could be taken where the initializer converts the input to a decimal base stored as an int. Then any operation is performed using Python's integer math. The __repr__ method then converts the internal int representation back to a string using the original base.

\$\endgroup\$
3
\$\begingroup\$

natural.py

Layout

The docstrings are fantastic. One thing I would change is that some of them are too long. For example:

digits (list): List of integers where each integer represents position in base string. It represents our arbitrary-sized nonnegative integer starting with least significant digit.

This is more reasonable:

digits (list): List of integers where each integer represents position in base string. 
               It represents our arbitrary-sized nonnegative integer starting with least significant digit.

Also consider using type hints to describe input and return types of the functions to make the code even more self-documenting.

The functions should be separated by a blank line. The black program can be used to automatically reformat the code for you.

Comments

The "TODO" comment should be removed. You can keep track of future enhancements in another file in you version control system.

Naming

The PEP 8 style guide recommends all caps for constants. For example, hexadecimal would be HEXADECIMAL.

The variable named input in the __init__ function is the same name as a Python built-in function. This can be confusing. To eliminate the confusion, rename the variable as something more specific. The first clue is that "input" has special coloring (syntax highlighting) in the question, as it does when I copy the code into my editor.

The variable name b is not too descriptive. A longer, more meaningful name would be useful. Also consider creating another function to set base along with its length.

In the change_base function, v is also a short name that does not convey much meaning.

string

Long strings like this are error-prone:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

The string.ascii_letters function provides this for you.

string.digits also gives you 1234567890.

Simpler

In the __pow__ function, this line:

r = r * self

can be simplified as:

r *= self

The same is true of:

k = k + one
k = k - one

natural_test.py

Layout

Move the functions to the top after the import lines. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

Documentation

The functions could use docstrings as in your class, especially to describe the inputs. The input flag is too generic.

Magic number

The number 100 is used several times. It would be helpful to use a named constant. That would also allow the code to be more scalable.

DRY

In the testBase10 functions, this line is repeated twice:

print(f'Base 10 test passed for {op} :)')

Since it is the last line in each branch of the if/else, it can be moved after the whole if/else and out-dented.

if flag:

else:

print(f'Base 10 test passed for {op} :)')

There may be other opportunities to reduce repetition.

\$\endgroup\$
3
\$\begingroup\$

Performance

Storing an integer as a list of base B digits makes converting to a string maybe slightly faster than storing an int and converting it to string on demand (in __repr__/__str__/__format__). But it makes the math multiple orders of magnitude worse. I'd be a bit surprised if a meaningful size number (say, 64 bits of data each) took less than 100x longer to multiply done the way you're doing it than the same numbers represented as a pair of plain Python ints would take to multiply.

The correct way to do this is to store the value as an int, along with the base. Convert to a string on demand (it might be a little slower, but not meaningfully so most of the time; you just don't stringify numbers that often), and do the actual math with native ints (which are already arbitrary precision) far more efficiently.

Better code reuse and cross-type compatibility

Right now, you've implemented the forward binary operations, e.g. __add__, __mul__, which means two Naturals can work with each other. But frankly, especially once you're storing a single int for the value internally, there's very little reason not to support working with other numeric types.

There's a lot of finicky things to get right to properly support other numeric types, subclasses of your own type, etc., so that both somenumber + plainint and plainint + somenumber both work. Luckily, the CPython folks wrote the fractions.Fraction class as an exemplar for implementing your own numeric types without the massive code duplication a naive approach would involve (implementing __add__ and __radd__ individually, then doing the same for __mul__ and __rmul__, etc., etc.). Take a look at fractions.Fraction's source code, specifically how they implement all the binary operators via a single implementation for each operation that is then converted into both forward and reverse operators using the _operator_fallbacks helper. Note how _operator_fallbacks tests for concrete types that it knows how to handle for sure in the forward method (called first when you're on the left-hand side and the right side is not a subclass of your type, so you only want to do it if you're sure you're correct, and otherwise should return NotImplemented to let the right hand side try), while the reverse method is uses the number ABC tower to test for protocols (because by the time you're called on the right side, it means the left-side didn't know how to handle the operation, so you're the last chance, and should do it if you can). Once that's written, the actual operations can be much simpler, because they know the types have already been checked and converted, so they just need to do the raw math, not the argument parsing work.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.