Re: LuaJIT performance

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LuaJIT performance
From: Mike Pall <mikelu-0908@...>
Date: Fri, 14 Aug 2009 18:15:59 +0200

Rob Kendrick wrote:
> > Ok, but you may be in for a nasty surprise: the 3GS has an ARM
> > Cortex-A8 CPU which only has VFPlite. This is actually a step back
> > from the previous models which had an ARM 1176JZ(F)-S with a full
> > VFP unit. And since the vector mode of VFP is officially deprecated,
> > you're in for more surprises in the future.
> 
> Have you got a citation for this?  My guy inside ARM is of the opinion
> that NEON is a /superset/ of VFP, and that there are three types for
> VFP; none, partial, and full.  Nothing on the market implements full,
> and partial is only missing a handful of instructions.

NEON can only do *single precision* floating-point. But we need
*double precision* floating-point operations for Lua (and for
JavaScript, too). Only VFP can do double-precision. So in that
sense NEON is certainly not a superset of VFP.

The more important difference between the different VFP versions
is that VFPlite is non-pipelined and has rather high latencies.
Actually it looks like they've purged the term VFPlite from their
product literature -- can't imagine why. :-)

The Cortex-A8 in the iPhone 3GS definitely only has non-pipelined
VFP. Someone found that out the hard way:

  http://diaryofagraphicsprogrammer.blogspot.com/2008/11/iphone-arm-vfp-code.html#c7064614874794429950

Note that this was comparing single-precision FP performance (yes,
you should use NEON for that). But it only gets worse with
double-precison FP. I've already said that softfp suddenly looks
like an attractive option.

About the vector part of VFP being deprecated:

  http://forums.arm.com/index.php?showtopic=13053&pid=31161&st=0&#entry31161
  http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204i/Chdehgeh.html

Quoting: "The use of VFP vector mode is deprecated ..."

Well, not that this was a particular useful feature. The VFP
vector mode is not true SIMD. It's about quickly issuing multiple
operations in succession. But turning it on and off involved a
pipeline flush and programming it was quite tricky. I guess it
wasn't popular outside of handcoded assembly.

Given the sad state of floating-point support for ARM devices in
the past, it's about time they get their act together. I just
don't see anything in the published specs which indicates that
we'll see good *double-precision* floating-point performance in
ARM-based mobile devices anytime soon.

This will hurt them badly in the future when mobile devices will
run JavaScript all day. I bet Intel is ready to jump in ...

--Mike

References:
- LuaJIT performance, John C. Turnbull
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Alex Davies
- Re: LuaJIT performance, Michael Bauroth
- Re: LuaJIT performance, RJP Computing
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Alexander Gladysh
- Re: LuaJIT performance, Timm S. Mueller
- Re: LuaJIT performance, Mike Pall
- Re: LuaJIT performance, Rob Kendrick

Prev by Date: Re: LuaJIT performance
Next by Date: Re: LuaJIT performance
Previous by thread: Re: LuaJIT performance
Next by thread: Re: LuaJIT performance
Index(es):
- Date
- Thread