lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


>> idiv r64 is 56 uops [...]
>> div r64 is 33 uops [...]

> Precisely. As the comment there explains: "If integer fits as a
> non-negative int, compute an int remainder, which is faster.
> Otherwise, use an unsigned-integer remainder, which uses all bits and
> ensures a non-negative result."

...*if* you're on the hardware the original code was written for.

I didn't see the original post.  The quote above, though, makes it look
as though this is talking about hashint() from ltable.c.

As for the specific question in the Subject:, it probably is because,
on some machine the author was familiar with, the comment is accurate:
% on signed int is faster than % on whatever lua_Unsigned was
configured to map to when the measurement was done.  (This is
particularly likely to be true on a 32-bit machine with 64-bit

Signed % is not necessarily faster than unsigned %; this is severely
hardware- (and in some cases compiler-, or compiler-flags-, or even
OS-) dependent.  There are likely machines where unsigned % is _faster_
than signed (I know of no specific example offhand, but I know timings
for no hardware offhand; given the variety of hardware I'd be
astonished if there weren't - the comment gives no reason to think its
author did a survey of even existing-at-the-time hardware to back that
statement).  The URL specifically says x86_64 and thus is
probably[$] not relevant unless the original post was specifically
asking about x86_64 (and, even then, given that it's giving precise
micro-op counts, it is probably[$] correct for at most only one
particular implementation of x86_64).

[$] I haven't gone through the hoop-jumping necessary to actually get a
copy of that page.

There are also (no handwave here - I've seen such spec sheets) machines
where the speed of % depends more on the particular divisor in question
than on whether it's signed or unsigned.  There certainly are systems
without hardware division where software division acts that way.  There
are likely also machines where the cost of the test and conditional
outweighs any speed difference in %.  This is particularly likely to be
true on very-high-end machines; they tend to have fast division and
heavy penalties for mis-predicted branches - indeed, measuring the
speed of that code on higher-end hardware is likely to depend on branch
prediction accuracy more than on % speed.

But there are other nonportabilities too, more severe than just
performance issues; examples include log2maxs() and point2int()....

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B