Division by constant signed integers

The code accompanying this article can be found in a github repository.

Division is a relatively slow operation. When the divisor is constant, the division can be optimized significantly. In division by constant unsigned integers I explored how this can be done for unsigned integers. In this follow-up article, I cover how we can optimize division by constant signed integers. This article essentially provides the same information as [1].

To optimize signed division, we want to round nd\frac{n}{d} toward zero. This presents some challenges which are different than those for optimizing unsigned division. We are only dealing with numbers with a magnitude of at most 2N12^{N - 1}, and we will see that this means we can always use the round-up method. However, the rounding presents a challenge, since we want to round up when the dividend nn is negative. Most of the complexity in the implementation comes from correctly handling the rounding.

Mathematical background

Preliminaries

I will assume that we are working on an NN-bit machine which can efficiently compute the full 2N2N-bit product of two NN-bit signed integers. I will use the notation UN\mathbb{U}_N​ for the set of unsigned integers that can be represented with NN bits:
UN={0,1,...,2N1} \mathbb{U}_N = \{ 0, 1, ..., 2^N - 1 \}

Likewise, I will use the notation SN\mathbb{S}_N for the set of signed integers that can be represented with NN bits:
SN={2N1,2N1+1,...,2N11} \mathbb{S}_N = \{ -2^{N - 1}, -2^{N - 1} + 1, ..., 2^{N - 1} - 1 \}

When AA and BB are sets, the set ABA \setminus B denotes the set of elements of AA that are not in BB.

For some real number xRx \in \mathbb{R}, I will denote the absolute value of xx by x|x|. That is:
x={xwhen x0xotherwise |x| = \begin{cases} x & \text{when $x \geq 0$} \\ -x & \text{otherwise} \end{cases}

I will use the notation x\lfloor x \rfloor for the biggest integer smaller than or equal to xx, and x\lceil x \rceil for the smallest integer bigger than or equal to xx. I will use [x][ x ] to denote the value of xx when rounded toward zero. That is
[x]={xwhen x<0xwhen x0 [x] = \begin{cases} \lceil x \rceil & \text{when $x < 0$} \\ \lfloor x \rfloor & \text{when $x \geq 0$} \end{cases}

I will use the notation sgn(x)\text{sgn}(x) for the sign function:
sgn(x)={1when x<00when x=01when x>0 \text{sgn}(x) = \begin{cases} -1 & \text{when $x < 0$} \\ 0 & \text{when $x = 0$} \\ 1 & \text{when $x > 0$} \end{cases}

Finally, I will use the notation 1P1_P where PP is a predicate to denote the characteristic function:
1P={0when P is false1when P is true 1_P = \begin{cases} 0 & \text{when $P$ is false} \\ 1 & \text{when $P$ is true} \end{cases}

Signed division

The following result will come in handy for negative dividends.

Lemma: Let nZ,dN+n \in \mathbb{Z}, d \in \mathbb{N}_+, and xRx \in \mathbb{R}. When
n1dx<nd \frac{n - 1}{d} \leq x < \frac{n}{d}

then
x=nd1 \lfloor x \rfloor = \left \lceil \frac{n}{d} \right \rceil - 1

Proof: Set n1d=a+bd\frac{n - 1}{d} = a + \frac{b}{d} with aZa \in \mathbb{Z}, b{0,1,...,d1}b \in \{ 0, 1, ..., d - 1 \}. Then nd=a+b+1d\frac{n}{d} = a + \frac{b + 1}{d}. Now n1d=a\lfloor \frac{n - 1}{d} \rfloor = a and nd=a+1\lceil \frac{n}{d} \rceil = a + 1 so that nd=n1d+1\lceil \frac{n}{d} \rceil = \lfloor \frac{n - 1}{d} \rfloor + 1. So n1dx<n1d+1\lfloor \frac{n - 1}{d} \rfloor \leq \lfloor x \rfloor < \lfloor \frac{n - 1}{d} \rfloor + 1. Since x\lfloor x \rfloor is an integer, it follows that x=n1d=nd1x = \lfloor \frac{n - 1}{d} \rfloor = \lceil \frac{n}{d} \rceil - 1.
\square

Lemma 1: Let d,m,NN+d, m, N \in \mathbb{N}_+. If
2N1+<md2N1++2 2^{N - 1 + \ell} < m \cdot d \leq 2^{N - 1 + \ell} + 2^\ell

then
mn2N1+=nd1\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor = \left \lceil \frac{n}{d} \right \rceil - 1

for all negative nSNn \in \mathbb{S}_N.

Proof: Multiply 2N1+<md2N1++22^{N - 1 + \ell} < m \cdot d \leq 2^{N - 1 + \ell} + 2^\ell by nd2N+\frac{n}{d \cdot 2^{N + \ell}}. Remember that nn is negative, so the inequality 'flips' and we get nd+n2N11dmn2N1+<nd\frac{n}{d} + \frac{n}{2^{N - 1}} \cdot \frac{1}{d} \leq \frac{m \cdot n}{2^{N - 1 + \ell}} < \frac{n}{d}. Now, using that 2N1<n-2^{N - 1} < n we see that 1n2N1-1 \leq \frac{n}{2^{N - 1}}, so we have nd1dmn2N1+<nd\frac{n}{d} - \frac{1}{d} \leq \frac{m \cdot n}{2^{N - 1 + \ell}} < \frac{n}{d}. The result now follows from lemma 1.
\square

Theorem 2: Let dd and NN be integers with N>0N > 0 and define =log2(d)\ell = \lceil \log_2(|d|) \rceil and m=2N1+d+1m = \lfloor \frac{2^{N - 1 + \ell}}{|d|} \rfloor + 1. Then mUNUN1m \in \mathbb{U}_N \setminus \mathbb{U}_{N - 1} and
[nd]=sgn(d)(mn2N1++1n<0)[ \frac{n}{d} ] = \text{sgn}(d) \cdot \left(\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor + 1_{n < 0} \right)

for all nSNn \in \mathbb{S}_N.

Proof: When n0n \geq 0 we have mn2N1++1n<0=mn2N1+\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor + 1_{n < 0} = \left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor. By replacing NN by N1N - 1 in theorem 3 we see that mn2N1+=nd=[nd]\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor = \lfloor \frac{n}{|d|} \rfloor = [ \frac{n}{|d|} ] for all nUN1n \in \mathbb{U}_{N - 1}, so this also holds for all positive nSNn \in \mathbb{S}_N. When n<0n < 0 we have 1n<0=11_{n < 0} = 1. From lemma 2 we see that mn2N1+=nd1\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor = \left \lceil \frac{n}{|d|} \right \rceil - 1, so that mn2N1++1=nd=[nd]\left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor + 1 = \lceil \frac{n}{|d|} \rceil = [ \frac{n}{|d|} ] for all negative nSNn \in \mathbb{S}_N. Combining these results we see that [nd]=mn2N1++1n<0[ \frac{n}{|d|} ] = \left \lfloor \frac{m \cdot n}{2^{N - 1 + \ell}} \right \rfloor + 1_{n < 0}. Using [nd]=sgn(d)[nd][ \frac{n}{d} ] = \text{sgn}(d) \cdot [ \frac{n}{|d|} ] the result follows.
\square

Implementation

In this section, I use the uint and sint datatypes, which are an NN-bit unsigned integer and an NN-bit signed integer, respectively. I try to provide a general strategy that should work well on most instruction set architectures. Variations in the implementation might give a more efficient result. In general, you should always benchmark your implementation if performance is critical.

While theorem 2 in the previous section seems to provide a straightforward method to compute the quotient [nd][ \frac{n}{d} ] for any n,dSNn, d \in \mathbb{S}_N, there is one subtlety we glanced over. In theorem 4, we use the 2N2N-bit expression mnm \cdot n, where mUNm \in \mathbb{U}_N is an unsigned value and nSNn \in \mathbb{S}_N is a signed value. While most processors have instructions to compute the full 2N2N-bit product of two NN-bit unsigned integers or two NN-bit signed integers, most processors do not provide an instruction to compute the 2N2N-bit product of an NN-bit unsigned integer and an NN-bit signed integer.

While it is also possible to compute the product mnm \cdot n by first extending mm and nn to 2N2N-bit signed values and computing the product of those extended values, this is less efficient.

Computing the product of an unsigned and a signed value

In this section, consider mm and nn to be NN-bit bit strings. So these variables no longer represent a number, but purely a series of bits, which can hold a zero or a one. So we write m=mN1mN2...m1m0m = m_{N - 1} m_{N - 2} ... m_1 m_0, where mN1,...,m0{0,1}m_{N - 1}, ..., m_0 \in \{ 0, 1 \} are the individual bits.

Now, we can provide a string mm with a value when we interpret it as either an unsigned value (m)u(m)_\text{u}, or as a signed value (m)s(m)_\text{s}. These interpretations are defined as
(m)u=k=0N12kmk (m)_\text{u} = \sum_{k = 0}^{N - 1} 2^k m_k

and
(m)s=2N1mN1+k=0N22kmk (m)_\text{s} = -2^{N - 1} m_{N - 1} + \sum_{k = 0}^{N - 2} 2^k m_k

We see that (m)u=(m)s(m)_\text{u} = (m)_\text{s} when mN1=0m_{N - 1} = 0 and (m)u=(m)s+2N(m)_\text{u} = (m)_\text{s} + 2^N when mN1=1m_{N - 1} = 1. So when mN1=0m_{N - 1} = 0 we have
(m)u(n)s=(m)s(n)s (m)_\text{u} \cdot (n)_\text{s} = (m)_\text{s} \cdot (n)_\text{s}

So in this case we can just use signed multiplication. When mN1=1m_{N - 1} = 1 we have

(m)u(n)s=((m)s+2N)(n)s=(m)s(n)s+2N(n)s (m)_\text{u} \cdot (n)_\text{s} = ((m)_\text{s} + 2^N) \cdot (n)_\text{s} = (m)_\text{s} \cdot (n)_\text{s} + 2^N \cdot (n)_\text{s}

So, the upper NN bits of the product (m)u(n)s(m)_\text{u} \cdot (n)_\text{s} equals (m)s(n)s2N+(n)s\lfloor \frac{(m)_\text{s} \cdot (n)_\text{s}}{2^N} \rfloor + (n)_\text{s}. This expression can be evaluated by multiplying mm and nn as if they where signed numbers, taking the upper NN bits, and adding nn to this.

Runtime optimization

Consider the following code:

sint d = read_divisor();

for (int i = 0; i < size; i++) {
	quotient[i] = dividend[i] / d;
}

The value of the divisor d is not known at compile time, but once it is read at runtime, it does not change. As such, we consider d to be a runtime constant, and we can optimize this code in the following way:

sint d = read_divisor();
divdata_t divisor_data = precompute(divisor);

for (int i = 0; i < size; i++) {
	quotient[i] = fast_divide(dividend[i], divisor_data);
}

Now, the divdata_t datatype needs to hold mm, the number of bits to shift, and some field to indicate that we should negate the result [nd][\frac{n}{|d|}] when dd is negative:

typedef struct {
	uint mul;
	uint shift;
	bool negative;
} divdata_t;

Now, we compute \ell such that mm always has the most significant bit set. This way we can always compute the upper NN bits of the product mnm \cdot n in fast_divide by taking the signed product, taking the upper NN bit, and adding nn to this.

The precomputation is now a relatively straightforward implementation of theorem 4:

sdivdata_t precompute(sint d) {
	sdivdata_t divdata;
	uint d_abs = abs(d);
	
	// Compute ceil(log2(d_abs))
	uint l = floor_log2(d_abs);
	if ((1 << l) < d_abs) l++;

	// Handle case |d| = 1
	if (dabs == 1) l = 1;

	// Compute m = floor(2^(N - 1 + l) / d) + 1
	uint m = (((big_uint)1) << (N - 1 + l)) / d_abs + 1;
	
	divdata.mul = m;
	divdata.negative = d < 0;
	divdata.shift = l - 1;
	return divdata;
}

It should be noted that in the fast_divide function, the right shift by N1+N - 1 + \ell is implemented by taking the upper NN bits of the product mnm \cdot n, and shifting this right by 1\ell - 1 bits. If =0\ell = 0 this is not possible, since in this case we need to shift right by N1N - 1 bits, but taking the NN upper bits is already equivalent to a right shift of NN bits. We can fix this by simply setting =1\ell = 1 when d=1|d| = 1. In this case the expression for mm becomes 2N+12^N + 1, which will overflow to simply 11. Now, while theorem 4 doesn't hold anymore for m=1m = 1, the calculation of the product mnm \cdot n in fast_divide assumes that the most significant bit of mm is set, so we will end up with the correct value. In fact, setting m=0m = 0 would work as well.

The fast_divide function has a lot of steps, but every step should be understandable.

sint fast_divide(sint n, sdivdata_t dd) {
	big_sint full_signed_product = ((big_sint)n) * (sint)dd.mul;
	sint high_word_of_signed_product = full_signed_product >> N;
	sint high_word_of_unsigned_product = high_word_of_signed_product + n;
	sint rounded_down_quotient = high_word_of_unsigned_product >> dd.shift;
	sint quotient_rounded_toward_zero = rounded_down_quotient - (n >> (N - 1));
	if (dd.negative) {
		quotient_rounded_toward_zero = -quotient_rounded_toward_zero;
	}
	return quotient_rounded_toward_zero;
}

Compile-time optimization

In this section, I will consider how to generate optimized code for division by compile-time constant signed integers.

Most of the tricks that are applicable to calculate a quotient of unsigned integers efficiently also apply to signed integers, although we might have to do some work to handle negative integers. Of course, a division by one can be ignored and a division by minus one is equivalent to a negation. For some instruction-set architectures, it might be beneficial to implement a special case for big divisors with an absolute value of more than 2N12^{N - 1}. In this case, the value of the quotient [nd][ \frac{n}{d} ] is sgn(n)\text{sgn}(n) when nd|n| \geq d and zero otherwise.

expression_t div_by_const_sint(const sint d, expression_t n) {
	if (d == 1) return n;
	if (d == -1) return neg(n);
	uint d_abs = abs(d);
	if (is_power_of_two(d_abs)) return div_by_const_signed_power_of_two(n, d);
	return div_fixpoint(d, n);
}

Let us first consider the case where d=2|d| = 2^\ell is a power of two. If we do an arithmetic right shift by \ell bits, the result will be correct when nn is positive. However, this will round down the quotient when nn is negative. In this case we can add 212^\ell - 1 to nn in order to round up.

So, we would like to have an expression which equals 212^\ell - 1 when nn is negative and 00 otherwise, so we can simply add this to nn. The value 212^\ell - 1 consists of \ell consecutive ones in the binary representation. It can be created by doing an arithmetic right shift by \ell bits on nn, and shifting the result right by NN - \ell bits (with a normal right shift). The first shift produces the \ell ones in the \ell most significant bits when nn is negative (these are zero bits otherwise), the second shift puts them in the least significant bit positions.

expression_t div_by_const_signed_power_of_two(expression_t n, sint d) {
	uint d_abs = abs(d);
	int l = floor_log2(d_abs);
	
	// addme equals 2^l - 1 when n is negative and 0 otherwise
	// We need to add this to n to round towards zero.
	expression_t addme = shr(sar(n, constant(l - 1)), constant(N - l));
	
	expression_t result = sar(add(n, addme), constant(l));
	if (d < 0) result = neg(result);
	return result;
}

References

[1]: Division by Invariant Integers using Multiplication, Torbjörn Granlund and Peter L. Montgomery, 1994.