[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Towards a faster interpreter
- From: Sean Conner <sean@...>
- Date: Thu, 8 Dec 2016 15:36:24 -0500
It was thus said that the Great Dibyendu Majumdar once stated:
> On 8 December 2016 at 18:46, Sean Conner <sean@conman.org> wrote:
> > It was thus said that the Great Roberto Ierusalimschy once stated:
> >> > I have been thinking of ways of making the Ravi interpreter faster
> >> > without having to resort to assembly code etc. So far one of the areas
> >> > I have not really looked at is how to improve the bytecode decoding.
> >> > The main issue is that the operands B and C require an extra bit
> >> > compared to operand A, so we cannot make them all 8 bits...
> >>
> >> Given a 32- or 64-bit machine, why decoding 8-bit operands would be
> >> faster/better than 9-bit ones?
> >
> > I would think less shifting, but not being 100% sure, I decided to test my
> > assumptions. I wrote:
> >
> > void split8(unsigned int *dest,unsigned int op)
> > {
> > dest[0] = (op >> 24) & 0xFF;
> > dest[1] = (op >> 16) & 0xFF;
> > dest[2] = (op >> 8) & 0xFF;
> > dest[3] = (op ) & 0xFF;
> > }
>
> Perhaps one of the masks could be eliminated?
Well, here's the assembly (gcc -O3 -fomit-frame-pointer) for the above:
0: 89 f0 mov eax,esi
2: 48 89 f2 mov rdx,rsi
5: c1 e8 18 shr eax,0x18
8: 89 07 mov DWORD PTR [rdi],eax
a: 89 f0 mov eax,esi
c: 81 e6 ff 00 00 00 and esi,0xff
12: c1 e8 10 shr eax,0x10
15: 89 77 0c mov DWORD PTR [rdi+0xc],esi
18: 25 ff 00 00 00 and eax,0xff
1d: 89 47 04 mov DWORD PTR [rdi+0x4],eax
20: 0f b6 c6 movzx eax,dh
23: 89 47 08 mov DWORD PTR [rdi+0x8],eax
26: c3 ret
The compiler eliminated two of the masks (op >> 24 and op)---in the source
they're there for clarity. Using bit-fields generated this code:
0: 40 0f b6 c6 movzx eax,sil
4: 48 89 f2 mov rdx,rsi
7: 89 07 mov DWORD PTR [rdi],eax
9: 0f b6 c6 movzx eax,dh
c: 89 47 04 mov DWORD PTR [rdi+0x4],eax
f: 89 f0 mov eax,esi
11: c1 ee 18 shr esi,0x18
14: c1 e8 10 shr eax,0x10
17: 89 77 0c mov DWORD PTR [rdi+0xc],esi
1a: 0f b6 c0 movzx eax,al
1d: 89 47 08 mov DWORD PTR [rdi+0x8],eax
20: c3 ret
Which I suspect is optimum (given the x86-64b calling convention) and I'm
having a hard time seeing any wasted instructions here. The difference is
only one additional instruction, so I would say use which ever one you think
is clearer.
-spc