Division by constant integers
This post is a survey of the methods to optimize integer division.
The methods in this post originate from the classic 1991 paper “Division by Invariant Integers using Multiplication” by Torbjörn Granlund and Peter L. Montgomery. In particular, the paper presents what in this post is called the “round-up method”. This method was further popularized by Henry S. Warren in his 2002 book “Hacker’s Delight”. In the 2005 paper “N-Bit Unsigned Division Via N-Bit Multiply-Add” by Arch D. Robinson presents what in this post is called the “round-down method”. In episode I and episode III of the series of articles cleverly named “The Labor of Division”, ridiculous_fish both independently discovers the round-down method, and clearly describes both methods.
What does this post add to the already existing literature? For one, it serves as a reference, combining the results in a rigorous and structured way. I am presenting the results that power the different methods (theorem 2 and 3) in a slightly simpler and more unified way that is consistent with the original presentation in the paper “Division by Invariant Integers using Multiplication”.
Overview
First, the round-up method is presented, which shows that for every -bit divisor there is a constant so that the upper bits of equal for every -bit unsigned . For certain “uncooperative” divisors the constant will not fit in bits, in which case the method is feasible but less efficient. To improve the efficiency, the round-down method is presented, which is similar to the round-up method but more efficient for uncooperative divisors. Then, we will discuss how the method can be applied to signed integers.
Intuition
To understand why these multiplicative methods work, let’s build some intuition. The idea is to choose an integer constant, , that approximates . This way, effectively represents in binary form. Multiplying by shifts the fractional bits to more significant positions, allowing them to be captured as an integer.
Simply put, since we have , we expect that . So, it isn’t to far-fetched to expect that . In the next section we’ll see under which conditions this equality holds.
If this doesn’t click yet, consider an example in base 10 instead. Pick and instead of , we’ll pick and . Picking we see that . This equals , so indeed we have .
Round-up method
We start with deriving a theorem that shows the method is correct. To prove this theorem, we’ll need the following lemma.
Lemma 1: Suppose that and are integers with , and is a real number. If then .
Proof: We have . After taking the floor of all terms and considering that the right-hand side is an integer, we get , which implies .
Now, we are ready to prove the result that justifies the round-up method.
Theorem 2: Let , , be nonnegative integers with and
then for every integer with .
Proof: Multiplying the inequality by we get . We have , so that . It follows that . By lemma 1, it follows that for all integers with .
So if we can find , satisfying , we can compute as . An that satisfies exists only if and only if the interval contains a multiple of .
If , the interval will contain or more successive integers and we can be certain that the interval contains a multiple of . Since is just the first multiple of that is at least as large as , we can always set ; if there an exists that satisfies , this will do so too.
Corollary 3: If and , then for every integer with .
However, we would like to have so that fits in an -bit word, but this requires . So, this we can’t guarantee this, and indeed we see that for some divisors there only exist , with .
So, concretely, how can we find and ? First we pick and set . For calculating this way we have , so we have to check that it satisfies .
We can also see that if is even, we can divide we can divide all terms in by two and get the valid equation . So if is even we can replace by and by , and will still hold. In some cases, this allows us to find a smaller constant that still satisfies .
We will call a divisor for which an , satisfying exists cooperative, because they allow us to easily compute the product for -bit unsigned integers . Divisors that are not cooperative are called uncooperative; for them it is more tricky and less efficient to calculate the product . We will consider both cases in the following subsections.
Cooperative divisors
To make the last section more concrete, let’s optimize the following function.
uint32_t div9(uint32_t n) {
return n / 9;
}
Using the strategy outlined in the previous section, we set and compute . Working modulo we see that equals . We have , so the test succeeds and is a cooperative divisor.
However, we see that is even. We can divide it by two to get , which is also even. Dividing by two once more gives . Adjusting for this, we only need to shift right by bits, instead of .
So the optimized version of the function is
uint32_t div9(uint32_t n) {
return (((uint64_t)954437177) * n) >> 33;
}
Indeed, GCC compiles div9
and div9_opt
to the same assembly. For example in x86_64
we get:
div9:
mov eax, edi
mul rax, rax, 954437177
shr rax, 33
ret
For 32-bit instruction sets, the shift is usually implemented by taking the high 32-bit word of the full 64-bit and performing a right shift by one on that. For example, in 32-bit RISC-V:
div9:
li a5, 954437177
mulhu a0, a0, a5
srli a0, a0, 1
ret
Uncooperative divisors
Let’s take the function from the last section but instead divide by :
uint32_t div7(uint32_t n) {
return n / 7;
}
Doing the same, we compute , . Now, we compute to be . So the test fails and and we see that is an uncooperative divisor.
This means we can’t use the value we just calculated for and instead we have to set . We get . This is a 33-bit value and we have to resort to some tricks to evaluate without overflow.
A first idea is to set and compute the product in two 32-bit words as . However, this doesn’t work since in general the product of a 32-bit number and a 33-bit number does not fit in 64 bits. So we need to take care that the addition of to does not overflow.
This can be done by evaluating as ((((mn >> N) - n) >> 1) + n) >> (l - 1)
, where mn
is the full -bit product.
So the optimized version of the function is
uint32_t div7(uint32_t n) {
// compute the high word of the product m' * n
uint32_t mn = (((uint64_t)613566757) * n) >> 32;
// compute (m * n) >> 1 in an overflow-safe way
uint32_t mn2 = ((n - mn) >> 1) + mn;
// shift right by l - 1 bits
return mn2 >> 2;
}
Indeed, GCC gives the same assembly. In x86_64
:
div7:
mov eax, edi
imul rax, rax, 613566757
shr rax, 32
sub edi, eax
shr edi
add eax, edi
shr eax, 2
ret
Or in RISC-V
:
div7:
li a5, 613566757
mulhu a5, a0, a5
sub a0, a0, a5
srli a0, a0, 1
add a0, a0, a5
srli a0, a0, 2
ret
Even uncooperative divisors
Suppose we have an even uncooperative divisor of the form , so that . We can implement a division by as a division by – which can be implemented efficiently with a bit shift – followed by a division by . Now, will be a uncooperative divisor too, but since the dividend will only be an -bit number, the constant will fit in bits. This only costs us a single bit shift, which is more efficient than implementing the overflow-avoiding product from last section.
As an example, consider the following function:
uint32_t div28(uint32_t n) {
return n / 28;
}
Following the procedure as before, we set and test . We see that so the test fails and 28 is an uncooperative divisor.
So we note that we have an even uncooperative divisor and set . We now need to divide , a -bit number, by . If would be a cooperative divisor, then would be too, so we know that is an uncooperative divisor. So we set , and get .
Note that is even, so we can use instead and decrease . We get , , . The optimized version of the function is as follows:
uint32_t div28(uint32_t n) {
n = n >> 2;
return (((uint64_t)613566757) * n) >> 32;
}
TODO: add godbolt links
In x86_64
assembly:
div28:
mov eax, edi
shr eax, 2
imul rax, rax, 613566757
shr rax, 32
ret
In RISC-V
:
div28:
li a5, 613566757
srli a0, a0, 2
mulhu a0, a0, a5
ret
Round-down method
The following theorem powers the round-down method. I present the result in a form analogous to theorem 2.
Theorem 3: Let , , be nonnegative integers with and
then for every integer with .
Proof: Multiplying the inequality by we get . Since we have and the condition of lemma 1 is satisfied. So we have for all integers with .
The computation of is very similar to the round-up method: We take and compute . The difference is of course that we round down instead of up (which can hardly be a surprise, given the name of the method). So it’s only the increment that can overflow.
For the evaluation of the expression we do need to be careful to avoid overflows again. It can happen that , and in this case naively computing will overflow. The product needs to be computed with a “widening” multiplication as before (sometimes you can get away with a “compute high word of product” instruction, if the target architecture has one).
Possible strategies to avoid overflow include:
- Compute the full -bit product (not just the high -bit word), and add to that.
- Use a saturating increment on . The implementation is architecture-dependent, but it can usually be done with an increment, followed by a single instruction that subtracts the overflow bit.
In most cases the second approach seems better. The exception is when the division is done on -bit integers but the target architecture is -bit (typically this happens when dividing 32-bit integers on a 64-bit procesors). In this case, we can just use a -bit word to store without worrying about overflow.
Correctness of the saturating-increment approach
When we use the saturating-increment approach, we calculate for every . For , incrementing would lead to an overflow, and we don’t increment. This means that instead of we calculate . I claim that when is an uncooperative divisor.
We only have when is a divisor of . But in this case, we have , which means that equals modulo . So divides , and the condition for theorem 2 holds.
Example
Checking for overflow is a bit messy in pure C. I’ll assume we use GCC and use the __builtin_uadd_overflow
builtin for this. Details can be found here. To check for overflow, we use it like this:
// add unsigned integers x and y
if (__builtin_uadd_overflow(x, y, &result)) {
// there was an overflow, result equals the low 32 bits of x + y
} else {
// there was no overflow, result equals x + y
}
We’ll now implement an optimized version of the following function.
uint32_t div7(uint32_t n) {
return n / 7;
}
With we compute . We test if is a cooperative divisor by checking if with . The left-hand side is 5 and the right-hand side is 4, so is an uncooperative divisor.
So we use the round-down method and set so that .
Now, as discussed in the previous section, when we have so that we can use a saturating increment to compute when and otherwise.
So we can write an optimized version:
uint32_t div7(uint32_t n) {
if (__builtin_uadd_overflow(n, 1, &n)) {
n--;
}
return (((uint64_t)2454267026) * n) >> 34;
}
RISC-V:
div7:
# saturating increment a0
sltiu a5,a0,-1
add a0,a5,a0
# get high word of m * (n + 1)
# (or m * n when n = 2^32 - 1)
li a4, 2454267026
mulhu a0,a0,a4
# shift right
srli a0,a0,2
ret
In x86_64 we get
div7:
# move first arg (n) to eax
mov eax, edi
# saturating increment on n
add eax, 1
sbb eax, 0
# get high word of m * (n + 1)
# (or m * n when n = 2^32 - 1)
mov edx, 2454267026
mul rax, rdx
# shift right
shr rax, 34
ret
Since x64_64
is a 64-bit architecture, we can use a 64-bit word for n + 1
without having to worry about overflow.
uint32_t div7(uint32_t n32) {
uint64_t n = n32;
return (((uint64_t)2454267026) * (n + 1)) >> 34;
}
This makes the resulting assembly ever so slightly more elegant (and I suspect more efficient as well, although I haven’t benchmarked it):
div7:
# move first arg (n) to eax/rax
mov eax, edi
# increment rax/n
add rax, 1
# load m * (n + 1) into rax
mov edx, 2454267026
mul rax, rdx
# shift rax right
shr rax, 34
ret
Signed integers
For nonnegative signed integers, we can use the same expression as we did for the round-up method. In fact, since the maximum magnitude of signed integers is smaller than the maximum magnitude of unsigned integers, it is the case that for every divisor we can find an -bit magic number such that
for all integers with . Proving this is straightforward using a variation of the proof of theorem 2.
Theorem 4: Let , , be nonnegative integers with and
then for every integer with .
Proof: When the result is trivial. When , we can multiply the inequality by to get . We have , so that . It follows that . By lemma 1, it follows that for all integers with .
But how we do handle the case where the quotient is negative? In most programming languages, integer division rounds towards zero. That means that when the quotient is negative, the expression should be rounded up instead of down.
Another subtlety that we have to handle, is that is an unsigned number while is a signed number. Processors generally have instructions to multiply unsigned integers or signed integers, but not to multiply a signed integer by an unsigned one. So we have to figure out how we can evaluate this product.
For now, I will assume that the divisor is positive. I will address the case where is negative later – it will turn out to be quite simple to handle.
Rounding up instead of down
We have the identity when and are integers. Using this, we can evaluate as .
If is not a power of two we have
Since and are consecutive multiples of , cannot be one. So also cannot be a multiple of , and for an -bit signed integers , can not be a multiple of , given that has the highest number of two’s in its factorization.
So is not an integer and we have
when is not a power of two. This last expression is potentially more efficient to evaluate.
So in practice we want to compute
when is not a power of two. An efficient way of adding is by subtracting n >> (N - 1)
. This expression will be when is negative, and otherwise. By swapping the operands of the subtraction, we can negate the result “for free”.
When is a power of two this will not work, and we need to explicitly negate the result. This can be done in a single instruction on most architectures.
Computing the product of a signed and an unsigned integer
If we set , the value with is an -bit unsigned integer with the most significant bit set. When is even, we can use and use
This is easier to evaluate since we can now simply compute as a signed product.
When is odd, we do not have this option. Let be the value obtained by interpreting the bits of (represented as an unsigned integer) as a signed integer. That is, if , then
So we have so that
That is, we can use a signed product to evaluate , and add to the high word of the product. The result will be .
Examples
Example 1: Dividing by 7
For a simple example, consider the following function:
int32_t div7(int32_t n) {
return n / 7;
}
With we find , . This is an odd number, so we can’t simplify. Since is a 32-bit number, we’ll have to evaluate the high word of the product mn_hi
as mn_signed_hi + n
. Here, mn_signed_hi
is the product of and , and is the interpretation of as a signed word. When we interpret as a signed number, we get , so this is what we use in the multiplication, instead of the unsigned value of .
We proceed by shifting right the rest of the bits. We have already shifted the product by bits, so we need to shift right by additional bits. Finally, we correct the rounding by adding n >> 31
.
All in all, we get
int32_t div7(int32_t n) {
int32_t m = -1840700269;
// evaluate the high word of m * n
int32_t mn_signed_hi = ((int64_t)m * n) >> 32;
int32_t mn_hi = mn_signed_hi + n;
// shift remaining bits and correct rounding
return (mn_hi >> 2) - (n >> 31);
}
This results in the following assembly:
div7:
# move n to ecx
mov ecx, DWORD PTR [esp+4]
# move m to eax
mov eax, -1840700269
# move m*n to edx:eax
imul ecx
# add n and store in eax
add eax, [edx+ecx]
# shift right by two and add (n << 31)
sar ecx, 31
sar eax, 2
sub eax, ecx
ret
Also, GCC sometimes prefers to use a lea
instruction instead of a simple addition. This is the only difference between the assembly generated by the two C functions above.
To divide by -7 we can now simply swap the operands of the subtraction in the return
statement. However, GCC seems to have difficulties generating efficient assembly for this, and the resulting assembly is definitely not equivalent to the assembly that results from compiling int32_t div7(int32_t n) { return n / -7; }
.
Example 2: Dividing by 9
Now, consider the following function:
int32_t div9(int32_t n) {
return n / 9;
}
Setting , we get . This is even so we can decrease to and get . This is again even so we can decrease once more to get and we get .
This fits in a signed 32-bit integer and we can proceed as before, except that we can use a simple signed product to evaluate .
int32_t div9(int32_t n) {
int32_t m = 954437177;
// evaluate the high word of m * n
int32_t mn_hi = ((int64_t)m * n) >> 32;
// shift remaining bits and correct rounding
return (mn_hi >> 2) - (n >> 31);
}
Example 3: Dividing by 4096
Consider the following function:
int32_t div4096(int32_t n) {
return n / 4096;
}
If we divide by a power of two we simply need to add d - 1
to n
to fix the rounding. Otherwise, the division is a simple shift:
int32_t div4096(int32_t) {
if (n < 0) {
n += 4095;
}
return n >> 12;
}
In practice, GCC compiles both versions to the same assembly which uses a cmovs
instruction (conditional move when the sign bit is not set):
div4096:
# move n to eax
mov eax, DWORD PTR [esp+4]
# add 4095 to eax if it is negative
test eax, eax
lea edx, [eax+4095]
cmovs eax, edx
# arithmetic shift right by 12 bits
# (i.e. signed divide by 4096)
sar eax, 12
ret