[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT performance
- From: Mike Pall <mikelu-0908@...>
- Date: Fri, 14 Aug 2009 18:15:59 +0200
Rob Kendrick wrote:
> > Ok, but you may be in for a nasty surprise: the 3GS has an ARM
> > Cortex-A8 CPU which only has VFPlite. This is actually a step back
> > from the previous models which had an ARM 1176JZ(F)-S with a full
> > VFP unit. And since the vector mode of VFP is officially deprecated,
> > you're in for more surprises in the future.
>
> Have you got a citation for this? My guy inside ARM is of the opinion
> that NEON is a /superset/ of VFP, and that there are three types for
> VFP; none, partial, and full. Nothing on the market implements full,
> and partial is only missing a handful of instructions.
NEON can only do *single precision* floating-point. But we need
*double precision* floating-point operations for Lua (and for
JavaScript, too). Only VFP can do double-precision. So in that
sense NEON is certainly not a superset of VFP.
The more important difference between the different VFP versions
is that VFPlite is non-pipelined and has rather high latencies.
Actually it looks like they've purged the term VFPlite from their
product literature -- can't imagine why. :-)
The Cortex-A8 in the iPhone 3GS definitely only has non-pipelined
VFP. Someone found that out the hard way:
http://diaryofagraphicsprogrammer.blogspot.com/2008/11/iphone-arm-vfp-code.html#c7064614874794429950
Note that this was comparing single-precision FP performance (yes,
you should use NEON for that). But it only gets worse with
double-precison FP. I've already said that softfp suddenly looks
like an attractive option.
About the vector part of VFP being deprecated:
http://forums.arm.com/index.php?showtopic=13053&pid=31161&st=0&#entry31161
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204i/Chdehgeh.html
Quoting: "The use of VFP vector mode is deprecated ..."
Well, not that this was a particular useful feature. The VFP
vector mode is not true SIMD. It's about quickly issuing multiple
operations in succession. But turning it on and off involved a
pipeline flush and programming it was quite tricky. I guess it
wasn't popular outside of handcoded assembly.
Given the sad state of floating-point support for ARM devices in
the past, it's about time they get their act together. I just
don't see anything in the published specs which indicates that
we'll see good *double-precision* floating-point performance in
ARM-based mobile devices anytime soon.
This will hurt them badly in the future when mobile devices will
run JavaScript all day. I bet Intel is ready to jump in ...
--Mike
- References:
- LuaJIT performance, John C. Turnbull
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Alex Davies
- Re: LuaJIT performance, Michael Bauroth
- Re: LuaJIT performance, RJP Computing
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Alexander Gladysh
- Re: LuaJIT performance, Timm S. Mueller
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Rob Kendrick