[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: beautiful assembly (luajit)
- From: Dimiter 'malkia' Stanev <malkia@...>
- Date: Tue, 19 Jul 2011 15:25:25 -0700
So I'm working on some vector & matrix library code (simple stuff - for
games probably), and was amazed at the code luajit generates (as long as
one avoids allocating stuff here and there.
Granted C/C++ hand-optimizely tuned, inline blah blah maybe can beat
it, but with lots of hurdle both on implementor and people later to use
it (I've been once in hell with a heavily templatized C++ code, that
required custom gcc compiler that our studio supported code for
Playstation2 and shipped games with it... duh)
To make it even more beautiful, I took care of not allocating vec3()
around, by making my v3 api take in account a register that can be
reused (I did the same with Common Lisp back and work pretty good, it
requires some discipline, but eventually good code gets written).
And somehow I prefer function calls over operators, dunno why. It makes
it clearer for me when dealing with matrices/vectors.
Now I'm think I'm ready to port a lot of quake code to straight lua,
just for fun and pleasure
Anyway, here is the code
# distanceSquaredToSegment.lua
local v3 = require( "lib/v3math" )
local min, max = math.min, math.max
function v3.madd( r, v1, k, v2 )
r[0], r[1], r[2] = v1[0]*k + v2[0], v1[1]*k + v2[1], v1[2]*k + v2[2]
return r
end
local function distance_squared(r, a, b)
return v3.mag(v3.sub(r, a, b))
end
local function distance_squared_to_segment( r, point, segment_start,
segment_dir, segment_length )
local dir = v3.sub( r, point, segment_start )
local dot = max( segment_length, min( 0, v3.dot( dir, segment_dir ) ) )
local prj = v3.madd( r, segment_dir, dot, segment_start )
local dsq = v3.mag( v3.sub( r, point, prj ))
return dsq
end
local function test()
local segment_start = v3.new()
local segment_dir = v3.new()
local segment_length = 10
local point = v3.new()
local sum = 0
local r = v3.new()
for i=0,10000 do
sum = sum + distance_squared_to_segment( r, point, segment_start,
segment_dir, segment_length )
end
end
test()
And the assembly:
# ./luajit -jdump samples/v3math/distanceSquaredToSegment.lua | tail -n 50
->LOOP:
b910fe6a movsd xmm14, [rcx+0x8]
b910fe70 subsd xmm14, [rdx+0x8]
b910fe76 movsd xmm12, [rcx+0x10]
b910fe7c subsd xmm12, [rdx+0x10]
b910fe82 movsd xmm13, [rcx+0x18]
b910fe88 subsd xmm13, [rdx+0x18]
b910fe8e movsd [rax+0x18], xmm13
b910fe94 movsd [rax+0x10], xmm12
b910fe9a movsd [rax+0x8], xmm14
b910fea0 movsd xmm15, [rbx+0x8]
b910fea6 mulsd xmm14, xmm15
b910feab movsd xmm6, [rbx+0x10]
b910feb0 mulsd xmm12, xmm6
b910feb5 addsd xmm12, xmm14
b910feba movsd xmm14, [rbx+0x18]
b910fec0 mulsd xmm13, xmm14
b910fec5 addsd xmm13, xmm12
b910feca minsd xmm13, xmm1
b910fecf maxsd xmm13, xmm0
b910fed4 mulsd xmm15, xmm13
b910fed9 addsd xmm15, [rdx+0x8]
b910fedf mulsd xmm6, xmm13
b910fee4 addsd xmm6, [rdx+0x10]
b910fee9 mulsd xmm13, xmm14
b910feee addsd xmm13, [rdx+0x18]
b910fef4 movsd [rax+0x18], xmm13
b910fefa movsd [rax+0x10], xmm6
b910feff movsd [rax+0x8], xmm15
b910ff05 movsd xmm14, [rcx+0x8]
b910ff0b subsd xmm14, xmm15
b910ff10 movsd xmm15, [rcx+0x10]
b910ff16 subsd xmm15, xmm6
b910ff1b movsd xmm6, [rcx+0x18]
b910ff20 subsd xmm6, xmm13
b910ff25 movsd [rax+0x18], xmm6
b910ff2a movsd [rax+0x10], xmm15
b910ff30 movsd [rax+0x8], xmm14
b910ff36 mulsd xmm14, xmm14
b910ff3b mulsd xmm15, xmm15
b910ff40 addsd xmm15, xmm14
b910ff45 mulsd xmm6, xmm6
b910ff49 addsd xmm6, xmm15
b910ff4e addsd xmm7, xmm6
b910ff52 add edi, +0x01
b910ff55 cmp edi, 0x2710
b910ff5b jle 0x1b910fe6a ->LOOP
b910ff61 jmp 0x1b9100028 ->6
---- TRACE 2 stop -> loop
Simply beautiful!
Thank you Mike!
the v3math is here:
https://raw.github.com/malkia/ufo/master/lib/v3math.lua