Square Root Theory and Algorithms – Scott Dattalo

Overview

This article covers two classes of square root algorithms suited for microcontrollers: the classical iterative method (similar to long division) and Newton's Method. The focus is on implementations that avoid hardware multiplication — essential for 8-bit PICs.

Decimal Square Root (Long Division Method)

The classical algorithm works digit by digit, similar to long division:

Starting at the decimal point, pair off the digits
Find the largest perfect square less than or equal to the leftmost pair. Its square root is the first digit of the result
Bring down the next pair; multiply current result by 20; find next digit d such that (20·current + d)·d ≤ remainder
Subtract and repeat

Example: \(\sqrt{31415.92653}\)

      1  7  7. 2  ...
   -----------------
   ) 03 14 15.92 65
     -1
     --
     2 14         (1×20=20, find d: (20+7)×7=189 ≤ 214) → d=7
     -1 89
     ------
     25 15        (17×20=340, find d: (340+7)×7=2429 ≤ 2515) → d=7
     -24 29
     ------
     86 92        (177×20=3540, find d: (3540+2)×2=7084 ≤ 8692) → d=2
     -70 84
     ------
     16 08  ...   → √31415.92653 ≈ 177.2...

Binary Square Root — Restoring Method

The binary algorithm builds the result one bit at a time. At each step, it appends 01 to the developed root and subtracts. If the remainder is positive, the new bit is 1; if negative, the bit is 0 and the remainder is restored:

\text{If } (r - \text{trial}) \ge 0: \text{ bit} = 1,\; r \leftarrow r - \text{trial}

\text{If } (r - \text{trial}) < 0: \text{ bit} = 0,\; \text{restore } r

Example: \(\sqrt{\mathtt{01011111}_2}\)

        1  0  0  1.
   -----------------
   ) 01 01 11 11.00
     -1
     ---- → positive: bit = 1
     00 01
     -1 01  (developed root "1", append 01)
     -----
     11 00  → negative: bit = 0, restore
     +1 01
     ------
     00 01 11
       -10 01  (developed root "10", append 01)
     ---------
     11 11 10  → negative: bit = 0, restore
       +10 01
     ---------
        01 11 11
        -1 00 01  (developed root "100", append 01)
       ----------
         0 11 10  → positive: bit = 1 → root = 1001₂ = 9

Iterative Algorithm (Squaring Method)

Build the result bit by bit from the MSB, squaring at each step:

unsigned char isqrt(unsigned int N) {
    unsigned int x = 0, j;
    for(j = 1 << 7; j > 0; j >>= 1) {
        x = x + j;
        if(x * x > N)
            x = x - j;   // Undo if overshot
    }
    return (unsigned char)x;
}

This works well for processors with hardware multiply. For PICs without multiply, we can avoid squaring by exploiting the polynomial expansion. For an 8-bit number \(y\):

y^2 = \sum_{i,k} b_i b_k \cdot 2^{i+k}

Since each bit \(b_i \in \{0,1\}\), we have \(b_i^2 = b_i\), and the iterative update becomes:

Optimized algorithm without multiply

s1 = 1 << 14;   // Start with MSB pair
s2 = 0;
N  = input;

do {
    if((s1 + s2) <= N) {
        N  = N - s2 - s1;
        s2 = s2 | (s1 << 1);
    }
    s1 = s1 >> 1;
    N  = N  << 1;   // Roll left (handle overflow bit!)
} while(s1 > (1 << 6));

return (s2 >> 8);   // Result in upper byte

Key trick — rolling instead of shiftingWhen shifting N left, an overflow bit may occur. The solution is to use a roll (rotate) operation instead of a simple left shift, preserving the overflow bit in the LSB position. The working arithmetic always operates on bits 8–15 of N.

Newton's Method

Newton's iterative formula for square roots:

x_{n+1} = \frac{1}{2}\left(x_n + \frac{N}{x_n}\right)

Starting from an initial guess \(x_0\), this converges quadratically (doubles correct digits each iteration). It requires hardware division, making it ideal for processors with FPUs (like a Pentium) but impractical for most 8-bit microcontrollers.

Algorithm selection guide
• PIC/AVR (no hardware multiply): Use the optimized shift-and-subtract iterative method
• ARM Cortex-M (hardware multiply): Use the squaring iterative method
• x86/x64 with FPU: Use Newton's method or the hardware FSQRT instruction