9
\$\begingroup\$

The purpose of this code is to find the smallest semiprime \$s = a b\$ satisfying a bunch of conditions stated in the Math.SE question What is the smallest "prime" semiprime?. The conditions are:

  • \$a\$, \$b\$ are prime numbers,
  • the reversal of \$s = a b\$ is a prime number,
  • both concatenations of \$a\$ and \$b\$ are prime numbers,
  • all six concatenations of \$a\$, \$b\$, \$s\$ are prime numbers.

The code assumes that there is at least one solution \$s \le 140000000000\$ (this is known from an earlier search) and finds the smallest solution. The code uses the simplification \$a \equiv b \equiv 5 \pmod{6}\$, \$11 \le a < b\$, which is derived in this Math.SE answer. The library sympy is used for primality testing; search took several hours, and I'm interested how can this code be optimised; any other feedback is welcome too.

import sympy

M = 140000000000     # current upper bound on s = a * b
a_choices = [ 11, ]  # growing sorted list of primes
b = 17
while 11 * b <= M:   # b increases, M doesn't increase
    repr_b = str(b)
    # try all prime a such that 11 <= a < b and a * b <= M
    for a in a_choices:
        if a * b > M:
            break
        repr_a = str(a)
        # both concatenations of a, b must be prime numbers
        repr_ab = repr_a + repr_b
        repr_ba = repr_b + repr_a
        if sympy.isprime(int(repr_ab)) and sympy.isprime(int(repr_ba)):
            s = a * b
            repr_s = str(s)
            # the reversal of s = a * b must be a prime number
            if sympy.isprime(int(''.join(reversed(repr_s)))):
                # all concatenations of a, b, s must be prime numbers
                if ( sympy.isprime(int(repr_s + repr_ab)) and
                     sympy.isprime(int(repr_s + repr_ba)) and
                     sympy.isprime(int(repr_ab + repr_s)) and
                     sympy.isprime(int(repr_ba + repr_s)) and
                     sympy.isprime(int(repr_a + repr_s + repr_b)) and
                     sympy.isprime(int(repr_b + repr_s + repr_a)) ):
                    # output a result and update the upper bound
                    M = s
                    print('found {} = {} * {}'.format(s, a, b))
    # find the next prime b = 5 (mod 6), and (if needed) append the previous prime b to a_choices
    b_old = b
    b += 6
    while not sympy.isprime(b):
        b += 6
    if b_old * b <= M:
        a_choices.append(b_old)
# report the smallest solution
print('smallest solution: {} = {}'.format(
    M, ' * '.join(map(str, sympy.factorint(M, multiple = True)))))

The output was:

found 137800963117 = 42299 * 3257783
smallest solution: 137800963117 = 42299 * 3257783

The answer by toolic suggests it would be helpful to add a comment explaining the line b = 17. I do not edit the above code to not invalidate answers; the loop on \$b\$ begins with 17 since 17 is the next prime \$\equiv 5\pmod 6\$ after 11.

\$\endgroup\$
5
  • 1
    \$\begingroup\$ Can you add the output of your program to your post? \$\endgroup\$ Commented Oct 5 at 14:22
  • \$\begingroup\$ Why Python? C or C++ will surely cut down on that runtime. \$\endgroup\$ Commented Oct 5 at 15:43
  • \$\begingroup\$ I added the output. @Reinderien I used Python to be able to use sympy.isprime on concatenations of two (5...7)-digit primes and their product and I was willing to let it run for several hours (but I'm still interested in whether it could be made to run much faster). \$\endgroup\$ Commented Oct 5 at 15:54
  • \$\begingroup\$ I assume almost all the time is spent in the six longest isprime checks? If you can find efficient ways to avoid calling isprime (that aren't already included as optimizations in isprime), that would be the only really efficient way to optimize it imo. \$\endgroup\$ Commented Oct 6 at 8:59
  • 2
    \$\begingroup\$ @Reinderien: Nowadays, Python is one of the faster options for heavy math stuff, because almost all the runtime is spent in libraries like sympy, numpy, and co. which are the same heavily-optimized, parallelized, and vectorized FORTRAN / assembly / C / C++ code you would write by hand anyway. I suspect the biggest gains are going to be algorithmitic rather than brute-force math operations. \$\endgroup\$ Commented Oct 8 at 7:01

4 Answers 4

7
\$\begingroup\$

As you only use isprime and factorint from sympy it would clean up your code to just import those.

from sympy import isprime, factorint

In this loop, b and M are invariant.

    for a in a_choices:
        if a * b > M:
            break

So we can safely express this logic using itertools.takewhile.

    for a in itertools.takewhile(lambda x: x * b <= M, a_choices):

Though you likely don't anticipate doing this, you should probably break this up into some functions, wrap the "action" of your script in a main function, and then conditionally called that function.

if __name__ == '__main__':
    main()

This will allow you to import this file into another program and use those functions you've created without running the program itself automatically.

\$\endgroup\$
7
\$\begingroup\$

any other feedback is welcome too

Here are some general coding style recommendations.

Documentation

The PEP 8 style guide recommends adding a docstring at the top of the code to describe its purpose.

"""
The purpose of this code is to find the smallest semiprime s=ab
satisfying a bunch of conditions. The conditions are:

list of conditions here

"""

Comments

There are a lot of helpful comments in the code.

It would also be helpful to add a comment explaining why you chose the number 17:

b = 17

If the code can be run with other values, it would be a good idead to take input from the user.

Simpler

Large integers are easier to read using underscores. Change:

140000000000

to:

140_000_000_000

Scientific notation works too.

This line:

print('found {} = {} * {}'.format(s, a, b))

is simpler using an f-string:

print(f'found {s} = {a} * {b}')

Since these values are large integers (like 137800963117), and they can be hard to read, you could format the output using commas:

print(f'found {s:,} = {a:,} * {b:,}')

This would print:

found 137,800,963,117 = 42,299 * 3,257,783

Layout

The indentation is great and consistent.

It would be nice to have a few blank lines added, especially after the while loop.

\$\endgroup\$
7
\$\begingroup\$

To do this a lot faster:

You want to know whether say nine numbers are all primes. The obvious way of checking this is to test if the first number is prime, if it is prime then check the second number and so on.

Now a primality check for x takes quite a while if x is indeed prime. Checking even numbers, or numbers divisible by 3, can be done much much faster. Less than one in six numbers are not divisible by 2, 3, 5, 7. 11, 13, 17, 19 or 23. So you can check that your nine numbers are not divisible by these nine numbers. That filters out all but about one in 12 million sets of numbers very quickly, most likely much quicker than just testing one number for primality.

Eventually you will find nine numbers that are actually primes and you have to do nine primality tests, but this will be very very rare.

Now the speed of converting strings to integers used to be quite irrelevant compared to the primality tests. That’s not the case anymore so it’s better to store the numbers, and a power of 10. For example if we concatenate b and a = 365 then the concatenation is b * 1000 + 365.

And last, unless s starts with 1, 3, 7 or 9, it’s reversal cannot be a prime.

\$\endgroup\$
5
\$\begingroup\$

repr

        repr_a = str(a)

I found repr_a a confusing name for this, given that we didn't call repr().


        and sympy.isprime(int(repr_a + repr_s + repr_b))

Optimizing int -> str is uninteresting, since .isprime() is where the cycles get used. I would find

        and sympy.isprime(int(f"{a}{s}{b}"))

a more natural way to express it, and easier to read.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.