From: mschoene@Math.RWTH-Aachen.DE (Martin Schoenert) Newsgroups: sci.math Subject: Re: Square root not using Newton's method Date: 21 Mar 95 08:35:26 GMT Mark_Day@powertalk.apple.com (Mark Day) writes: I'm trying to find an algorithm to compute the square root of a multiple precision fixed-point number. Solutions for multiple precision integers would be fine, too; I can adapt it for fixed-point. I'm trying to optimize for space, not speed. I've found several multiple precision packages, but they all use the Newton-Raphson iteration which needs some multiple precision temporaries. Ideally, I'd like to use just two multiple precision numbers: the argument and the root. Assume that I can perform a single precision square root. Knuth's _The Art of Computer Programming: Seminumerical Algorithms_ has an exercise that hints that such an algorithm exists and is similar to long division. He says to model it after doing square roots long hand, but I can't remember that technique. The best I could come up with requires a multiple precision remainder. This probably should be put in the FAQ. I have sent this several times now to various people. Sorry if some of that stuff is trivial for you. The first method is somewhat notorius. It is usually called the school method or division like method. It works as follows. In each step extend the remainder by two more digits from the integer A. Divide this extended remainder by 20 times the approximation of the root found so far. This is our first guess for the next digit. Now try to subtract 20 times the approximation times the next digit plus the square of the next digit from the extended remainder. If this would give a negative remainder, the first guess for the next digit was one to large and we have to correct it. Perhaps an example makes this clearer. 80 00 00 00 00 = 8 9 4 4 2 80 - 8*8 -- 16 00 1600 / (20*8) = 10 (- 20*8*10+10*10 too large, correction to 9) - 20*8*9 + 9*9 ----- 79 00 7900 / (20*89) = 4 - 20*89*4 + 4*4 -------- 7 64 00 76400 / (20*894) = 4 - 20*894*4 + 4*4 ----------- 48 64 00 486400 / (20*8944) = 2 - 20*8944*2 + 2*2 -------------- 12 86 36 Thus 8000000000 = 89442^2 + 128636. A (recursive) program that implements this method looks as follows (where B is the base, 10 in the above example). # split A in upper and lower part A == A_1 B^2 + A_0; # compute the upper half of the root recursively X_1 := Floor( Root( A_1 ) ); R := A - (X_1 B)^2; # now compute the approximation for the rest of the root X_0 := R / (2 X_1 B); S := R - 2 X_0 X_1 B; # this approximation may be off by one, correct if neccessary if S < X_0^2 then X_0 := X_0 - 1; S := S + 2 X_1 B; # now S is R - 2 X_0 X_1 B again fi; # then with X := X_1 B + X_0; T := S - X_0^2; # we have X == Floor( Root( A ) ); T == A - X^2; Note that we must have B/2 <= X_1, otherwise several correction steps may be neccessary. This is of course only a problem in the first step and can be solved for example by computing the top two digits of the root with a different method, e.g., table lookup. To compute the root of an integer with 2n digits this method needs n^2/2 multiplications of digits. It is the same number of multiplications needed to square a n digit integer if one uses the binomial expansion trick. Note that it is a factor of 2 better than the number of multiplications needed to divide a 2n digit integer by a n digit integer (though the overhead for the division is smaller so the running time is about the same). To prove that this algorithm is correct one has to show that T is indeed A - X^2 (very simple) and that 0 <= T < 2X+1 (this can be done by induction, using that 0 <= R < (2 X_1 + 1) B^2 and then computing the possible ranges of all intermediate values). If one analyses the method carefull, one notices that it also works with higher powers of the base. That is, let BB be a power of BASE such that BB^4/4 <= A and A < BB^4 (the first condition may require that we multiply A by some power of 4 first, which we correct by dividing the root by the corresponding power of 2 later). # split A in upper and lower part A == A_1 BB^2 + A_0; # compute the upper half of the root recursively X_1 := Floor( Root( A_1 ) ); R := A - (X_1 BB)^2; # now compute the approximation for the rest of the root X_0 := R / (2 X_1 BB); S := R - 2 X_0 X_1 BB; # this approximation may be off by one, correct if neccessary if S < X_0^2 then X_0 := X_0 - 1; S := S + 2 X_1 BB; # now S is R - 2 X_0 X_1 BB again fi; # then with X := X_1 BB + X_0; T := S - X_0^2; # we have X == Floor( Root( A ) ); T == A - X^2; This can be made a little bit faster than the first method for sufficiently large integers, if one uses fast methods for the computation of X_0^2 (for example Karatsuba). For example on a DECstation 5000/120 (mips R3000 @35MHz) one can compute 10000 decimal digits of sqrt(2) in about 400 msec. It is easy to see that this is a fairly straightforward translation of Newton's method. The difference is that the above method doubles the precision in each step, while Newton's method always computes with the full precision. Newton's method goes as follows. # X is the first approximation, Y the second, we need: root <= Y < X X := A; Y := 2^( NrBits(A) / 2 + 1 ) - 1; # do Newton iterations until the approximations stop decreasing while Y < X do X := Y; Y := (A/X + X) / 2; od; # now X is the integer part of the root Floor( Root( A ) ) The number of correct bits doubles with each iteration, so for an integer with 1000 decimal digits (approx 3323 bits) it performs 12 divisions. You can program this method, such that it only keeps the remainder R in each step (e.g., something that is smaller than A) and writes down the root. Of course, if you do want to keep A, then you need extra space for the remainder. Hope this helps, Martin. -- .- .-. - .. -. .-.. --- ...- . ... .- -. -. .. -.- .- Martin Sch"onert, Martin.Schoenert@Math.RWTH-Aachen.DE, +49 241 804551 Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, 52056 Aachen, Germany