lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> Why using powers to ten?

I used 1000000000 because:
a) it is below 2^32, so it works for 32 bit systems (sizeof(unsigned long)==4)
b) because a decimal number is easier to print (C printf with %u%09u) without even using 64 bit specifiers
c) it's big enough to have just one debug hook call every minute (depending on your system, of course), so the performance impact is small
d) because it worked very good in the tests, and I prefer to ship a good product today over a perfect product tomorrow

These design choices I made for my system and my requirements may not be the optimal choices for other systems and different requirements.

Another interesting performance indicator could be to use clock_gettime(CLOCK_THREAD_CPUTIME_ID)/GetThreadTimes().
Or even a CPU performance counter, if the product just ships on one particular CPU ... but that's all beyond the scope of Lua.


On Mon, May 3, 2021 at 6:23 PM Flyer31 Test <flyer31@googlemail.com> wrote:
Hi Phillipe,
principally I agree with you, I am really also very greedy often with any calculations and even floating point usage... . But these new STM32H7 are SOOOOO fast (also STM32G4... ), that it really makes no really more sense to think about such things too long... .

Only for the the "really small" parts like STM32G0... (Cortex M0), there you can be happy already if a division command for binary exists... . But currently for God sake NO ambition / need to implement something like user scripting / Lua support on these small Cortex M0 parts :).


On Mon, May 3, 2021 at 5:22 PM Philippe Verdy <verdyp@gmail.com> wrote:
Why using powers to ten? isn't the multiplication by powers of two faster (using 2^30=1073741824, which is the power power of 2 nearest of the largest power of 10 representable as a positive value on signed 32-bit)?

And it may be even faster on small platforms without native 64-bit integer and no barrel-shifter, using 2^24=16777216 (allows various compiler optimizations using only 8-bit, 16-bit or 32-bit only registers without requiring any shifting), the purpose being to absolutely minimize the cycles spent in such debugging calls for instrumenting and profiling apps accurately (sometimes such micro-optimizations don't matter, except for instrumentation and debugging: if you need to count millions of events, every cycle spent in each call matters on the final time to not deviate too much from the non-debug version and better select the hotspots in the non-debug version; if you use it for example to measure the number of OP_CODEs processed in the Lua VM, such hook will be extremely used, and you may want to avoid using some costly processing units like ALUs or barrel-shifters for your debugging code, as these processing units in processors could be delayed in its internal pipelines, causing additional wait cycles; this includes even modern multicore CPUs that have less ALUs than pipelines or cores, e.g. with Intel and AMD "hyperthreading" or when running on some GPUs, even if they have fast ALUs). Using multiple of 8-bits for the shift count can also help by avoiding the compilation of loops with conditional counters, as this can be easily inlined (without generating lot of native code instructions in the processor code cache).

Now on STM32, at least, native 32-bit code will be used, and 64-bit arithmetic processing is limited to just the addition in getcount(), which can be implemented in pure-bit code with 3-4 instructions (and without using a costly call to an external arithmetic support library needed by the C compiler, which will attempt to inline the generated code as much as possible).

Le lun. 3 mai 2021 à 16:06, Flyer31 Test <flyer31@googlemail.com> a écrit :
Wow, very helpful, thank you.

Concerning my Windows software I am currently too lazy probably to skip dll use (in which case I would like to stick to the "common lua dll").

But I also will do now IoT application for STM32 controllers (32bit), and then I will quite sure implement this... . (as for this application I anyway use my "own Lua subset"). I think generally it is a quite nice and useful info to get the hook count somehow "exactly", as the user then can somehow quite easily see, how stable his Lua works...some users will like this, I am sure.

On Sun, May 2, 2021 at 7:23 PM bel <bel2125@gmail.com> wrote:
Meanwhile I found some code snippet that uses a tiny extension to ldebug.c:

LUA_API int lua_getcurrenthookcount(lua_State *L) {return L->hookcount;}

This code is the only change to the Lua core, the remaining code is using the official API:

lua_sethook(L, count_hook, LUA_MASKCOUNT, 1000000000);

void count_hook(lua_State *L, lua_Debug *ar) {
void *ud;
lua_getallocf(L, &ud); // get the UD pointer from lua_newstate
struct tUserData * UD = (struct tUserData *)ud; // your userdata structure
UD->billion_steps++; // should have some field for the counter overflow
lua_sethook(L, count_hook, LUA_MASKCOUNT, 1000000000); // register handler again
}

uint64_t getcount(lua_State *L) {
unsigned long l = lua_getcurrenthookcount(L); // missing for the next billion
return (uint64_t)UD->billion_steps * 1000000000 + 1000000000 - l; // total
}


I don't know if this is accessible without any changes to the Lua core, but this change is minimal.


On Sun, May 2, 2021 at 6:19 PM Flyer31 Test <flyer31@googlemail.com> wrote:
Hi Bel,
I somehow always had problems in the last weeks in answering in this mailing list, but now it looks like it might work.

... this is what I am doing now: I use a debug hook with n=100, and then count these 100 "command hook" events... . Generally quite ok. Just I thought it possibly might be somehow nicer to give this value more exactly... ... but I will survive this +-100 command counter accuracy, this is "quite ok". Anyway I am so impressed and happy with the flawless working of Lua together with my software, and especially also the really very smart and nice error info possibilities, that I am very happy and cannot complain really about Lua... (I compare it to _javascript_ which I had used before, but it was really very bloated up and really difficult for me to implement nice error message to users in case of script errors...).

On Sat, May 1, 2021 at 6:11 PM bel <bel2125@gmail.com> wrote:


What about using a hook with a high count value and reading the hook count before and after your calls?



On Sat, May 1, 2021 at 1:37 PM nerditation <nerditation@outlook.com> wrote:
On 2021/4/14 22:47, Flyer31 Test wrote:
> Hi,
>
[...]
>
> Is it possible to readout the current value of this Lua command counter
in an easy way in C or Lua? 
>

if by "command counter" you mean the number of VM instructions (or OPCODEs), the answer is no,
one obvious reason there no such API is because the VM is an "implementation detail", and as far
as I know, Lua does not keep track of it at all,

if you insist on using it, you must patch the VM and do it yourself, but I highly suggest against this idea,
and I don't think this counter would make much sense any way.

the VM OPCODEs are designed to be easily mapped to semantics of the Lua language, and easy to implement
efficiently. it is not meant to be used as performance measurement.

becase some of the OPCODEs represent very simple computations such as a single arithmetic addition,
or logical shift; while other OPCODEs might represent quite complex (thus
resource heavy) operations,
such as the generic `for` loop, or creating a closure; the counter of OPCODEs executed/interpreted
by the VM is meaningless for most users.