# Division by constant signed integers

The code accompanying this article can be found in a github repository.

Division is a relatively slow operation. When the divisor is constant, the division can be optimized significantly. In [1] I explored how this can be done for unsigned integers. In this follow-up article, I cover how we can optimize division by constant signed integers. This article should be read as a continuation of [1]. As far as I know, the information in this article was first presented in [2].

I assume that like in most programming languages, the result of the signed division is rounded toward zero. This presents some challenges which are different than those for optimizing unsigned division. We are only dealing with numbers with a magnitude of at most $2_{N−1}$, and we will see that this means we can always use the round-up method as described in [1]. The challenge consists of efficiently rounding up the quotient when $n$ is negative.

## Mathematical background

### Preliminaries

I will assume that we are working on an $N$-bit machine which can efficiently compute the full $2N$-bit product of two $N$-bit signed integers. I will use the notation $U_{N}$ for the set of unsigned integers that can be represented with $N$ bits:

$U_{N}={0,1,...,2_{N}−1}$Likewise, I will use the notation $S_{N}$ for the set of signed integers that can be represented with $N$ bits:

$S_{N}={−2_{N−1},−2_{N−1}+1,...,2_{N−1}−1}$When $A$ and $B$ are sets, the set $A∖B$ denotes the set of elements of $A$ that are not in $B$.

For some real number $x∈R$, I will denote the absolute value of $x$ by $∣x∣$. That is:

$∣x∣={x−x whenx≥0otherwise $I will use the notation $⌊x⌋$ for the biggest integer smaller than or equal to $x$, and $⌈x⌉$ for the smallest integer bigger than or equal to $x$. I will use $[x]$ to denote the value of $x$ when rounded toward zero. That is

$[x]={⌈x⌉⌊x⌋ whenx<0whenx≥0 $I will use the notation $sgn(x)$ for the *sign function*:

Finally, I will use the notation $1_{P}$ where $P$ is a predicate to denote the *characteristic function*:

### Signed division

To simplify things, I will assume that the divisor $d$ is positive. When $d$ is negative, we can use $[−dn ]=−[dn ]$. This means that to compute $[dn ]$ we can simply compute the quotient $[∣d∣n ]$ and negate it when $d$ is negative.

So, assuming that $d$ is positive, we want to evaluate

$[dn ]={⌈dn ⌉⌊dn ⌋ whenn<0whenn≥0 $Since the rounding is defined separately for negative and nonnegative dividends, it is natural to look at these cases separately.

When $n$ is nonnegative this is essentially an unsigned division, since we assumed that $d$ is positive. Since $n$ is a nonnegative $N$-bit signed, we know that the most significant bit will be zero. So $n$ can be represented as an $(N−1)$-bit unsigned integer. With the results from [1], it is straightforward to find an expression that can be efficiently evaluated and equals $[dn ]$ when $n$ and $d$ are positive.

**Corollary 16**: Let $d,N∈N$ with $d>0$, $ℓ=⌈g_{2}(d)⌉$, and $m=⌈d2_{N−1+ℓ} ⌉$. Then $m∈U_{N}$ and

for every nonnegative $n∈S_{N}$.

**Proof**: Observe that if $n$ is nonnegative and $n∈S_{N}$ implies that $n∈U_{N−1}$. The result now follows by replacing $N$ by $N−1$ and $m_{up}$ by $m$ in theorem 4.

$□$

We will now take this same expression and consider what happens when $n$ is negative. We will proceed by proving results that are analogues for the unsigned case.

The following lemma is a result complementary to lemma 1.

**Lemma 17**: *Let $n∈Z,d∈N_{+}$, and $x∈R$. When*

*then*

**Proof**: Set $dn−1 =a+db $ with $a∈Z$, $b∈{0,1,...,d−1}$. Then $dn =a+db+1 $. Now $⌊dn−1 ⌋=a$ and $⌈dn ⌉=a+1$ so that $⌈dn ⌉=⌊dn−1 ⌋+1$. So $⌊dn−1 ⌋≤⌊x⌋<⌊dn−1 ⌋+1$. Since $⌊x⌋$ is an integer, it follows that $x=⌊dn−1 ⌋=⌈dn ⌉−1$.

$□$

The following result is complementary to corollary 16.

**Theorem 18**: *Let $d,m,N∈N_{+}$. If*

*then*

*for all negative $n∈S_{N}$.*

**Proof**: Multiply $2_{N−1+ℓ}<m⋅d≤2_{N−1+ℓ}+2_{ℓ}$ by $d⋅2_{N+ℓ}n $. Remember that $n$ is negative, so the inequality ‘flips’ and we get $dn +2_{N−1}n ⋅d1 ≤2_{N−1+ℓ}m⋅n <dn $. Now, using $∣n∣≤2_{N−1}$ we see that $−1≤2_{N−1}n $, so we have $dn−1 ≤2_{N−1+ℓ}m⋅n <dn $. The result now follows from lemma 17.

$□$

This theorem allows us to prove an analogue of theorem 3 for signed numbers.

**Lemma 19**: Let $d∈S_{N}$ with $d>0$, $ℓ=⌈g_{2}(d)⌉$, and $m_{up}=⌈d2_{N−1+ℓ} ⌉$. Then

for every negative $n∈S_{N}$.

**Proof**: Since $m_{up}⋅d=⌈d2_{N−1+ℓ} ⌉⋅d$ is just $2_{N−1+ℓ}$ rounded up to the nearest multiple of $d$, we have $2_{N−1+ℓ}≤m_{up}⋅d$. We also have $⌈d2_{N−1+ℓ} ⌉−d2_{N−1+ℓ} <1$. Multiplying by $d$ and adding $2_{N−1+ℓ}$ gives $m_{up}⋅d<2_{N−1+ℓ}+d≤2_{N+ℓ}+2_{⌈log_{2}(d)⌉}=2_{N+ℓ}+2_{ℓ}$. So we have $2_{N+ℓ}≤m_{up}⋅d≤2_{N+ℓ}+2_{ℓ}$. By theorem 18, it follows that $⌊2_{N−1+ℓ}m⋅n ⌋=⌈dn ⌉−1$ for all negative $n∈S_{N}$.

$□$

With these results, it is easy to derive an expression which equals the rounded quotient without the restriction that $d$ is positive. The following theorem is the main result of this article.

**Theorem 20**: *Let $d$ and $N$ be integers with $N>0$ and define $ℓ=⌈g_{2}(∣d∣)⌉$ and $m=⌊∣d∣2_{N−1+ℓ} ⌋+1$. Then*

*for all $n∈S_{N}$.*

**Proof**: First, observe that $m⋅∣d∣$ is simply the first multiple of $∣d∣$ larger than $2_{N−1+ℓ}$. Since there are $2_{ℓ}=2_{⌈log_{2}(∣d∣)⌉}≥∣d∣$, there must be at least one multiple of $d$ in the range $(2_{N−1+ℓ},2_{N−1+ℓ}+2_{ℓ}]$. So we have $2_{N−1+ℓ}<m⋅d≤2_{N−1+ℓ}+2_{ℓ}$. Using corollary 11 we see that $⌊2_{N−1+ℓ}m⋅n ⌋+1_{n<0}=⌊2_{N−1+ℓ}m⋅n ⌋=⌈dn ⌉$ for nonnegative $n∈S_{N}$. Using lemma 13, we see that $⌊2_{N−1+ℓ}m⋅n ⌋+1_{n<0}=⌊2_{N−1+ℓ}m⋅n ⌋+1=⌊dn ⌋$ for negative $n∈S_{N}$. So $[∣d∣n ]=⌊2_{N−1+ℓ}m⋅n ⌋+1_{n<0}$ for all $n∈S_{N}$. Using $[dn ]=sgn(d)⋅[∣d∣n ]$ the result follows.

$□$

**Note**: While this theorem seems like everything you need for implementation, there is a subtle detail sweeped under the rug. The product $m⋅n$ is a $2N$-bit product of an $N$-bit unsigned number and an $N$-bit signed number. Most processors do not have support for this operation. The evaluation of this product is considered in the section about implementation.

In [1], we saw that there exists a simple trick to reduce the ‘magic constant’ $m$. This same trick works for signed division as well.

**Theorem 21**: *Let $N∈N_{+}$, $d,m∈U_{N}$ with $d>0$, and let $ℓ∈N_{0}$ be such that $ℓ≤⌈g_{2}(d)⌉−1$. Suppose that the tuple $(ℓ,m)∈N_{0}×U_{N}$ satisfies the condition of theorem 18:*

*If $m$ is even, then the tuple $(ℓ_{′},m_{′})=(ℓ−1,2m )$ satisfies the same condition. If $m$ is odd, then there exists no smaller $m$ that satisfies the condition.*

**Proof**: Suppose that $m$ satisfies the condition of theorem 18. In this case, we have $2_{N−1+ℓ}<m⋅d≤2_{N−1+ℓ}+2_{ℓ}$. It is easy to see that when $m$ is even all expressions in the inequality are even, so we can divide by two and see that $2_{N−2+ℓ}<2m ⋅d≤2_{N−2+ℓ}+2_{ℓ−1}$. The case for the condition of lemma 5 is analogous.

Suppose that there is a smaller pair $ℓ_{′},m_{′}$ that satisfies the condition $2_{N−1+ℓ_{′}}≤m_{′}⋅d≤2_{N−1+ℓ_{′}}+2_{ℓ_{′}}$. By multiplying the whole thing by $2_{ℓ−ℓ_{′}}$, we see that $2_{N−1+ℓ}≤2_{ℓ−ℓ_{′}}⋅m_{′}⋅d≤2_{N−1+ℓ}+2_{ℓ}$. The set ${2_{N+ℓ},2_{N+ℓ}+1,...,2_{N+ℓ}+2_{ℓ}}$ has $2_{ℓ}+1$ elements. We have $2_{ℓ}+1≤2_{⌈log_{2}(d)⌉−1}+1≤d$, so there can only be one multiple of $d$ in this set, which is $m⋅d$. So we have $m=2_{ℓ−ℓ_{′}}⋅m_{′}$, so $m$ must be even.

$□$

## Implementation

In this section, I use the `uint`

and `sint`

datatypes, which are an $N$-bit unsigned integer and an $N$-bit signed integer, respectively. I try to provide a general strategy that should work well on most instruction set architectures. Variations in the implementation might give a more efficient result. In general, you should always benchmark your implementation if performance is critical.

While theorem 3 in the previous section seems to provide a straightforward method to compute the quotient $[dn ]$ for any $n,d∈S_{N}$, there is one subtlety we glanced over. In theorem 4, we use the $2N$-bit expression $m⋅n$, where $m∈U_{N}$ is an unsigned value and $n∈S_{N}$ is a signed value. While most processors have instructions to compute the full $2N$-bit product of two $N$-bit unsigned integers or two $N$-bit signed integers, most processors do not provide an instruction to compute the $2N$-bit product of an $N$-bit unsigned integer and an $N$-bit signed integer.

While it is also possible to compute the product $m⋅n$ by first extending $m$ and $n$ to $2N$-bit signed values and computing the product of those extended values, this is less efficient.

### Computing the product of an unsigned and a signed value

In this section, consider $m$ and $n$ to be $N$-bit bit strings. So these variables no longer represent a number, but purely a series of bits, which can hold a zero or a one. So we write $m=m_{N−1}m_{N−2}...m_{1}m_{0}$, where $m_{N−1},...,m_{0}∈{0,1}$ are the individual bits.

Now, we can provide a string $m$ with a value when we interpret it as either an unsigned value $(m)_{u}$, or as a signed value $(m)_{s}$. These interpretations are defined as

$(m)_{u}=k=0∑N−1 2_{k}m_{k}$and

$(m)_{s}=−2_{N−1}m_{N−1}+k=0∑N−2 2_{k}m_{k}$We see that $(m)_{u}=(m)_{s}$ when $m_{N−1}=0$ and $(m)_{u}=(m)_{s}+2_{N}$ when $m_{N−1}=1$. So when $m_{N−1}=0$ we have

$(m)_{u}⋅(n)_{s}=(m)_{s}⋅(n)_{s}$So in this case we can just use signed multiplication. When $m_{N−1}=1$ we have

$(m)_{u}⋅(n)_{s}=((m)_{s}+2_{N})⋅(n)_{s}=(m)_{s}⋅(n)_{s}+2_{N}⋅(n)_{s}$So, the upper $N$ bits of the product $(m)_{u}⋅(n)_{s}$ equals $⌊2_{N}(m)_{s}⋅(n)_{s} ⌋+(n)_{s}$. This expression can be evaluated by multiplying $m$ and $n$ as if they where signed numbers, taking the upper $N$ bits, and adding $n$ to this.

### Runtime optimization

Consider the following code:

```
sint d = read_divisor();
for (int i = 0; i < size; i++) {
quotient[i] = dividend[i] / d;
}
```

The value of the divisor `d`

is not known at compile time, but once it is read at runtime, it does not change. As such, we consider `d`

to be a runtime constant, and we can optimize this code in the following way:

```
sint d = read_divisor();
divdata_t divisor_data = precompute(divisor);
for (int i = 0; i < size; i++) {
quotient[i] = fast_divide(dividend[i], divisor_data);
}
```

Now, the `divdata_t`

datatype needs to hold $m$, the number of bits to shift, and some field to indicate that we should negate the result $[∣d∣n ]$ when $d$ is negative:

```
typedef struct {
uint mul;
uint shift;
bool negative;
} divdata_t;
```

Now, we compute $ℓ$ such that $m$ always has the most significant bit set. This way we can always compute the upper $N$ bits of the product $m⋅n$ in `fast_divide`

by taking the signed product, taking the upper $N$ bit, and adding $n$ to this.

The precomputation is now a relatively straightforward implementation of theorem 4:

```
sdivdata_t precompute(sint d) {
sdivdata_t divdata;
uint d_abs = abs(d);
// Compute ceil(log2(d_abs))
uint l = floor_log2(d_abs);
if ((1 << l) < d_abs) l++;
// Handle case |d| = 1
if (dabs == 1) l = 1;
// Compute m = floor(2^(N - 1 + l) / d) + 1
uint m = (((big_uint)1) << (N - 1 + l)) / d_abs + 1;
divdata.mul = m;
divdata.negative = d < 0;
divdata.shift = l - 1;
return divdata;
}
```

It should be noted that in the `fast_divide`

function, the right shift by $N−1+ℓ$ is implemented by taking the upper $N$ bits of the product $m⋅n$, and shifting this right by $ℓ−1$ bits. If $ℓ=0$ this is not possible, since in this case we need to shift right by $N−1$ bits, but taking the $N$ upper bits is already equivalent to a right shift of $N$ bits. We can fix this by simply setting $ℓ=1$ when $∣d∣=1$. In this case the expression for $m$ becomes $2_{N}+1$, which will overflow to simply $1$. Now, while theorem 4 doesn’t hold anymore for $m=1$, the calculation of the product $m⋅n$ in `fast_divide`

assumes that the most significant bit of $m$ is set, so we will end up with the correct value. In fact, setting $m=0$ would work as well.

The `fast_divide`

function has a lot of steps, but every step should be understandable.

```
sint fast_divide(sint n, sdivdata_t dd) {
big_sint full_signed_product = ((big_sint)n) * (sint)dd.mul;
sint high_word_of_signed_product = full_signed_product >> N;
sint high_word_of_unsigned_product = high_word_of_signed_product + n;
sint rounded_down_quotient = high_word_of_unsigned_product >> dd.shift;
sint quotient_rounded_toward_zero = rounded_down_quotient - (n >> (N - 1));
if (dd.negative) {
quotient_rounded_toward_zero = -quotient_rounded_toward_zero;
}
return quotient_rounded_toward_zero;
}
```

### Compile-time optimization

In this section, I will consider how to generate optimized code for division by compile-time constant signed integers.

Most of the tricks that are applicable to calculate a quotient of unsigned integers efficiently also apply to signed integers, although we might have to do some work to handle negative integers. Of course, a division by one can be ignored and a division by minus one is equivalent to a negation. For some instruction-set architectures, it might be beneficial to implement a special case for big divisors with an absolute value of more than $2_{N−1}$. In this case, the value of the quotient $[dn ]$ is $sgn(n)$ when $∣n∣≥d$ and zero otherwise.

```
expression_t div_by_const_sint(const sint d, expression_t n) {
if (d == 1) return n;
if (d == -1) return neg(n);
uint d_abs = abs(d);
if (is_power_of_two(d_abs)) return div_by_const_signed_power_of_two(n, d);
return div_fixpoint(d, n);
}
```

Let us first consider the case where $∣d∣=2_{ℓ}$ is a power of two. If we do an arithmetic right shift by $ℓ$ bits, the result will be correct when $n$ is positive. However, this will round down the quotient when $n$ is negative. In this case we can add $2_{ℓ}−1$ to $n$ in order to round up.

So, we would like to have an expression which equals $2_{ℓ}−1$ when $n$ is negative and $0$ otherwise, so we can simply add this to $n$. The value $2_{ℓ}−1$ consists of $ℓ$ consecutive ones in the binary representation. It can be created by doing an arithmetic right shift by $ℓ$ bits on $n$, and shifting the result right by $N−ℓ$ bits (with a normal right shift). The first shift produces the $ℓ$ ones in the $ℓ$ most significant bits when $n$ is negative (these are zero bits otherwise), the second shift puts them in the least significant bit positions.

```
expression_t div_by_const_signed_power_of_two(expression_t n, sint d) {
uint d_abs = abs(d);
int l = floor_log2(d_abs);
// addme equals 2^l - 1 when n is negative and 0 otherwise
// We need to add this to n to round towards zero.
expression_t addme = shr(sar(n, constant(l - 1)), constant(N - l));
expression_t result = sar(add(n, addme), constant(l));
if (d < 0) result = neg(result);
return result;
}
```

## References

[1] Division by constant unsigned integers, Ruben van Nieuwpoort, 2020.

[2] Division by Invariant Integers using Multiplication, Torbjörn Granlund and Peter L. Montgomery, 1994.