lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Aug 21, 2014 at 04:54:03PM +0200, Jonas Thiem wrote:
> Also is it a good idea to depend on a "sane locale" for something that
> is possibly security relevant? (not a rhetoric question, I am not sure
> what a "not sane" locale is and how easy it can be butchered by an
> attacker)

At the end of the day everything is security relevant.

> Because of my lack of knowledge on things, I wrote up this lengthy
> function which is supposed to behave like a gsub with plain=True:

There's a general solution to this problem: reference manuals. Even though
I've probably accumulated well over the 10,000 hours** that it takes to
become an "expert" as a programmer, and in particular in the C language, I
always keep a copy open of the relevant standard or reference when I'm
working, and make it a habit to consult the documentation if there's any
shadow of a doubt in my mind about the behavior of some routine.

I'm not talking HOWTOs, blog posts, or third-party reference sites (e.g.
cplusplus.com or w3schools.com), but the primary source documentation: ISO
C99 standard (free N1256 PDF, last draft published by ISO WG14 which for all
intents & purposes identical to official standard), ISO C11 standard (free
N1570 PDF), SUSv4 Issue 7/IEEE Std 1003.1, 2013 (Open Group's free, official
HTML format), Lua 5.1 reference manual, Lua 5.2 reference manual, etc.

I also keep copies (or links to copies) of the relevant source code. If
there's something ambiguous or curious about a GNU extension, I look at the
code in glibc. OS X? http://opensource.apple.com/source/Libc.

Just the other day I revisited my clock_gettime(CLOCK_MONOTONIC) emulation
on OS X. Stack Overflow and other sites are full of cargo cult "answers",
often conflicting, about whether to use mach_absolute_time or
clock_get_time. Part of the issue is that when the consumer-grade multicore
revolution began ~10 years ago, word began spreading around developer
communities that the RDTSC instruction that everybody was using to benchmark
their code wasn't invariant across multiple cores. That was good advise
then.

However, people keep reciting that same caveat, even though only a year or
two after RDTSC became an issue both AMD and Intel redesigned their CPUs to
make RDTSC invariant across cores. But nobody every bothers to check the
documentation, so people keep repeating the story that you can't reliably
use RDTSC on multicore hardware without pinning the process. This is classic
cargo culting, where people pass on received advice without every
understanding the history or verifying whether it ever was or continues to
be correct.

Because Apple has only ever shipped x86 hardware with an invariant TSC
(Intel changed their design before Apple ditched PowerPC),
mach_absolute_time has never been susceptible to TSC variance. (AFAICT on
PowerPC mach_absolute_time read a global shared page updated by a hardware
timer, so it was also invariant.)

I recently became curious about mach_absolute_time on ARM, however. So the
first thing I did was download the ARM instructin reference manuals for
ARMv8-A and ARMv8-A (free registration). And because I can't actually
confirm whether iOS uses the ARM TSC to implement mach_absolute_time, I also
asked some Apple engineer friends to look into it, just to cover my bases.

Anyhow, moral of the story is to always check the reference manual, and make
a habit of it so it doesn't feel like a burden. Assumptions are the real
security issue. And we're all prone to making them because our brains are
optimized to be lazy about such things. Even if :gsub had a special flag you
just as well might not have realized you needed to use it.

Also, FWIW, I always explicitly set the locale in my programs to "C" using
setlocale. If you want consistent and dependable behavior, you often have to
be explicit about it. System- and process-wide locales made more sense
before the Internet exploded. Systems were much more isolated back then, and
didn't regularly process data generated elsewhere.