lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Tom Spilman wrote:
[...]
    if ( key[0] == 'x' && key[1] == 0 ) { // faster than strcmp()
[...]
 I'm trying to speed up the function by the way as it's still a bit slower
than a normal table index operation.  I'm considering removing type checking
in lclass_checkobject<>(), but aside from that does anyone have any other
suggestions to speed this up?

This will only produce a tiny improvement, but switch is your friend:

	if (key[0] != '\0')
	{
		switch (key[0] | (key[1] << 8))
		{
			case 'x':
				/* do X thing */
				break;

			case 'y':
				/* do Y thing */
				break;
		}
	}

Using switch rather than a series of ifs will allow the compiler to generate an inline binary tree, or a calculated jump table, or something similar. This can be a lot faster than a series of ifs, even for small numbers of tests. Doing this:

	cmp r0, #'x'
	beq routine_to_do_x
	cmp r0, #'y'
	beq routine_to_do_y
	routine_to_do_everything_else
	b skip_to_end
	routine_to_do_x
	b skip_to_end
	routine_to_do_y
skip_to_end:

...is much kinder on the cache than:

	cmp r0, #'x'
	bne skip_x
	routine_to_do_x
skip_x:
	cmp r0, #'y'
	bne skip_y
	routine_to_do_y
skip_y:

In addition, this last piece of code, which is what a series of ifs will almost certainly generate, will cause a pipeline flush at pretty much every branch.

I've sucessfully used this technique to use switch to compare strings of up to seven bytes long:

	switch (unaligned_longlong_read(ptr))
	{
		case 'string1':
			/* do something */
		...
	}

However, there are some gotchas:

* You musn't read past the end of the string, because you may overrun the memory block and cause a segmentation fault. That's why you need the if statement in the first example.

* You must either read each byte individually and chop them together, as in the first example, or used deep magic to get an unaligned read, as in the second.

* I use gcc for everything, which supports multibyte character constants. Other compilers don't. In any case, you've got to be careful of endianness issues. One trick is to do:

	#define STRINGCONST(s)
		(s[0] | (s[1]<<8) | (s[2]<<16) | (s[3]<<24))
	unsigned int s = STRINGCONST("RIFF");

This will be optimised by the compiler into a constant load, and it's 100% legal. This will produce a big-endian string constant, suitable for decoding RIFF files. Very handy.

This message brought to you by the late-night committee for anally retentive code optimisation.

--
[insert interesting .sig here]