- it just depends on a integer log2 function which is easily implemented by using bits directly from the exponent part of the "double", using only integer arithmetics.
- it avoids all dependencies on tests (and a modulo 2 is just reading a single bit)
- it can be efficiently implemented in hardware (in a FPU) to support x^x with any non-integer floatting-point value of x, with the best precision and excellent speed (and instead of 6 recursive calls needed for IEEE 64-bit doubles, you just need 6 stages of arithmetic gateways; for IEEE 32-bit doubles you just need 5 stages, for 128-bit long doubles, you need 7 stages)
- it can be implemented in high-precision math libraries, the cost will be linearily proportional to the log2(precision bits wanted for the result) if you can compute a fixed number of bits at each recursion, or proportional to the precision bits if you get one 1 bit of precision at each stage, and in any language. It will be normally immune to "time-based attacks" with a predetermined performance for a predetermined precision required in the result.
And probably similar tricks can be used to implement various math primitives with one argument (sinus, cosine, log/ln, tan, log10, exp...) that can be approximated with arbitrary precision and excellent performance, over a kwown range of source values where the function is strictly monotonic (strictly growing or falling, so that the convergence is warrantied), using a binary search and some bit fiddling (like the modulo 2) I exposed, so that their execution time (or number of stages in hardware) depends only on the needed precision (and will be constant for common constant precisions used by all IEEE floats). The number of precision bits you get at each steps depends only on the value of the derivative function on that range (which should be > 1/2 so that you'll get at least 1 bit at each step; if the lower bound of the derivative value is higher, you get more bits at once and you need less stages to get the final result).
All the rest of the implementation is to determine the suitable range where the function is monotonic and has the best derivative value) and only integer arithmetic operations (addition/substraction, multiplication/negation, Euclidian division/rest) and binary ops which are even simpler by hardware (like bitand, bitor, bitxor, bitnot, bitshifts).
So this "optimization" problem is not "stupid" but is a very serious one that can find many practical applications. I wonder if such technics are not already patented by chip makers like Intel, IBM, AMD, or by math library and tools providers (like Mathlab or proprietary Fortran and C/C++ libraries optimized for perfomance, precision and security). But if not, I revendicate the authorship to open it to the world... unless someone demonstrates that he found this trick before (may be the author of this problem already knows that solution and its possible important applications?)