10
$\begingroup$

Are there any known subquadratic algorithms for calculating the floor of the square root of an n bit integer?

The naive algorithm would be something like

def sqrt(x):
    r = 0
    i = x.bit_length() // 2
    while i >= 0:
        inc = (r << (i+1)) + (1 << (i*2))
        if inc <= x:
            x -= inc
            r += 1 << i
        i -= 1
    return r

This takes O(n) iterations, each one involving additions that are O(n) time, so it's O(n^2) time overall. Is there anything faster? I know that for the case of multiplication there are special algorithms that do better than quadratic time, but I can't find anything for square roots.

$\endgroup$
4
  • $\begingroup$ My answer to something related might help cs.stackexchange.com/a/37338/12052. Only problem is, a part of the necessary equation you would need to find empirically to tweak its accuracy. $\endgroup$ Commented Jan 26, 2015 at 18:55
  • $\begingroup$ @FrancescoGramano: Sorry, I don't think that helps. $\endgroup$ Commented Jan 26, 2015 at 21:55
  • $\begingroup$ btw, is this sub-quadratic requirement part of a bigger problem? Because the different between the simple quadratic and complicated sub-quadratic might not be that big in practice. Or is it just of theoretical interest? $\endgroup$ Commented Jan 27, 2015 at 16:51
  • $\begingroup$ @Aryabhata Sorry I didn't see your comment earlier. No, it's not part of a bigger problem, just curiosity. $\endgroup$ Commented Jan 28, 2015 at 15:41

2 Answers 2

6
$\begingroup$

You can use Newton's method or any of a number of other methods for finding approximations to roots of the polynomial $p(x) = x^2 -c$.

The rate of convergence for Newton's method will be quadratic, which means that the number of bits that are correct doubles in each iteration. This means $O(\lg n)$ iterations of Newton's method suffice.

Each iteration of Newton's method computes

$$x_{j+1} = x_j - (x_j^2 -c)/(2x_j) = 0.5 x_j + \frac{c}{2x_j}.$$

The bit complexity of multiplication is $\stackrel{~}{O}(b \lg b)$, to multiply two $b$-bit integers (ignoring $\lg \lg b$ factors). The bit complexity for division (to $b$ bits of precision) is the same. Therefore, each iteration can be computed in $\stackrel{~}{O}(n \lg n)$ operations. Multiplying by $O(\lg n)$ iterations, we find that the overall running time to compute the square root to $n$ bits of precision is $\stackrel{~}{O}(n (\lg n)^2)$. This is sub-quadratic.

I think a more careful analysis shows that this can be improved to $\stackrel{~}{O}(n \lg n)$ running time (by taking into account that we only need to know each $x_j$ to within about $j$ bits of precision, rather than $n$ bits of precision). However, even the more basic analysis already shows a running time that is clearly subquadratic.

$\endgroup$
9
  • $\begingroup$ In binary one also has a great initial guess using the identity $x^{1/2} = 2^{1/2 \log_2 x}$. Instead of computing the log, one can approximate $\log_2 x$ as the number of digits in $x$. Eg, $\log_2 101011 \approx 6$. $\endgroup$ Commented Jan 27, 2015 at 0:00
  • $\begingroup$ @D.W.: But aren't we looking for an integer square root? If you do the newton's method iteration using only integer arithmetic, then we need to some additional justification for the $O(\log n)$ claim, don't we? Otherwise, we are assuming sufficiently large precision already... Sorry if I am missing something obvious. $\endgroup$ Commented Jan 27, 2015 at 1:20
  • $\begingroup$ @D.W. : $\;\;\;$ "The rate of convergence for Newton's method" won't be quadratic if $c\hspace{-0.04 in}=\hspace{-0.04 in}0$, and I don't know what happens for values of $c$ that aren't non-negative reals. $\:$ Your estimate for the bit complexity of multiplication is tighter than your following remark suggests. $\:$ Also, we "need to know each $x_j$ to within about" $2^{\hspace{.02 in}j}$ "bits of precision". $\;\;\;\;\;\;\;$ $\endgroup$ Commented Jan 27, 2015 at 4:37
  • $\begingroup$ @Aryabhata : $\;\;\;$ We're not quite "looking for an integer square root"; we're looking for "the floor of the square root". $\:$ You're right about the integer arithmetic issue, although the same bit complexities hold for floating-point operations. $\;\;\;\;\;\;\;$ $\endgroup$ Commented Jan 27, 2015 at 4:41
  • 1
    $\begingroup$ @RickyDemer, yes, $c=0$ is a special case, because then the root of $p(x)$ has multiplicity 2, but when $c>0$, the root has multiplicity 1 so Newton's method does have quadratic convergence. I assume no one would use Newton's method to compute the square root of $c=0$ (because the square root of zero is obviously zero). So what are you trying to say? Is your comment a trivial comment that is addressed by adding something to my answer that says "special-case the square root of zero", or is there something deeper here that I'm missing? $\endgroup$ Commented Jan 27, 2015 at 6:22
9
$\begingroup$

One of the problems with Newton's method is that it requires a division operation in each iteration, which is the slowest basic integer operation.

Newton's method for the reciprocal square root, however, doesn't. If $x$ is the number for which you want to find $\frac{1}{\sqrt x}$, iterate:

$$r_{i+1} = \frac{1}{2} r_i (3 - x r_i^2)$$

This is often expressed as:

$$w_i = r_i^2$$ $$d_i = 1 - w_i x$$ $$r_{i+1} = r_i + \frac{r_i d_i}{2}$$

That's three multiplication operations. The division by two can be implemented as a shift-right.

Now the problem is that $r$ is not an integer. However, you can manipulate it as such by implementing floating-point manually, and doing a bunch of shift operations to compensate when appropriate.

First, let's rescale $x$:

$$x' = 2^{-2e} x$$

where we would like $x'$ to be greater than, but close to, $1$. If we run the above algorithm on $x'$ instead of $x$, we find $r = \frac{1}{\sqrt x'}$. Then, $\sqrt{x} = 2^e r x'$.

Now let's split $r$ into a mantissa and exponent:

$$r_i = 2^{-e_i} r'_i$$

where $r'_i$ is an integer. Intuitively, $e_i$ represent the precision of the answer.

We know that Newton's method roughly doubles the number of accurate significant digits. So we can choose:

$$e_{i+1} = 2e_i$$

With a little manipulation, we find:

$$e_{i+1} = 2e_i$$ $$w_i = {r'_i}^2$$ $$x'_i = \frac{x}{2^{2e - e_{i+1}}}$$ $$d_i = 2^{e_{i+1}} - \frac{w_i' x'_i}{2^{e_{i+1}}}$$ $$r'_{i+1} = 2^{e_i} r'_i - \frac{r'_i d_i}{2^{e_i + 1}}$$

At every iteration:

$$\sqrt{x} \approx \frac{r'_i x}{2^{e + e_i}}$$

As an example, let's try calculating the square root of $x = 2^{63}$. We happen to know that the answer is $2^{31}\sqrt{2}$. The reciprocal square root is $\frac{1}{\sqrt{2}} 2^{-31}$, so we'll set $e = 31$ (this is the scale of the problem) and for our initial guess we'll pick $r'_0 = 3$ and $e_0 = 2$. (That is, we pick $\frac{3}{4}$ for our initial estimate to $\frac{1}{\sqrt{2}}$.)

Then:

$$e_1 = 4, r'_1 = 11$$ $$e_2 = 8, r'_2 = 180$$ $$e_3 = 16, r'_3 = 46338$$ $$e_4 = 32, r'_4 = 3037000481$$

We can work out when to stop iterating by comparing $e_i$ to $e$; if I've calculated correctly, $e_i > 2e$ should be good enough. We'll stop here, though, and find:

$$\sqrt{2^{63}} \approx \frac{3037000481 \times 2^{63}}{2^{31+32}} = 3037000481$$

The correct integer square root is $3037000499$, so we're pretty close. We could do another iteration, or do an optimised final iteration which doesn't double $e_i$. The details are left as an exercise.

To analyse the complexity of this method, note that multiplying two $b$-bit integers takes $O(b \log b)$ operations. However, we have arranged things so that $r'_i < 2^{e_i}$. So the multiplication to calculate $w_i$ multiplies two $e_i$-bit numbers to produce a $e_{i+1}$-bit number, and the other two multiplications multiply two $e_{i+1}$-bit numbers to produce a $2e_{i+1}$-bit number.

In each case, the number of operations per iteration is $O(e_i \log e_i)$, and there are $O(\log e)$ iterations required. The final multiplication is on the order of $O(2e \log 2e)$ operations. So the overall complexity is $O(e \log^2 e)$ operations, which is sub-quadratic in the number of bits in $x$. That ticks all the boxes.

However, this analysis hides an important principle which everyone working with large integers should keep in mind: because multiplication is superlinear in the number of bits, any multiplication operations should only be performed on integers which have the roughly the magnitude of the current precision (and, I might add, you should try to multiply numbers together which have a similar order of magnitude). Using integers larger than that is a waste of effort. Constant factors matter, and for large integers, they matter a lot.

As a final observation, two of the multiplications are of the form $\frac{ab}{2^c}$. Clearly it's wasteful to compute the all the bits of $ab$ only to throw $c$ of them away with a right-shift. Implementing a smart multiplication method which takes this into account is also left as an exercise.

$\endgroup$
4
  • $\begingroup$ This is great stuff. One comment, though: Isn't the bit-complexity of division asymptotically approximately the same as the bit-complexity of multiplication? So you're talking about something that gives a constant factor improvement, not an asymptotic improvement, right? That wasn't entirely clear from your answer. $\endgroup$ Commented Jan 27, 2015 at 6:18
  • $\begingroup$ You say that multiplying two $b$-bit integers takes $O(b \lg b)$ bit operations. I think the correct answer is something like $O(b \lg b (\lg lg b)^{O(1)})$ (right?). You might want to indicate that you are ignoring poly-log-log factors (e.g., by putting a tilde over your big O, or something). $\endgroup$ Commented Jan 27, 2015 at 6:19
  • 1
    $\begingroup$ @D.W. : $\;\;\;$ No, he says that "multiplying two $b$-bit integers takes $O(b\log b)$ operations." $\:$ The word "bit" only appears once in that; otherwise I would've already pointed that out. $\;\;\;\;\;\;\;$ $\endgroup$ Commented Jan 27, 2015 at 6:21
  • $\begingroup$ It is a matter of constant factors, yes. The best large integer division algorithms use a technique very similar to the whole algorithm, such as Newton-Raphson iteration and doubling the effective precision on each iteration. A Newton-Raphson loop within a Newton-Raphson loop piles on the constant factors! Ricky Demer is correct; I was thinking in the word RAM model. I probably should have mentioned this. $\endgroup$ Commented Jan 27, 2015 at 13:31

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.