lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Feb 8, 2012 at 7:00 PM, Matthew Wild <mwild1@gmail.com> wrote:
> On 7 February 2012 07:26, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
>>> The main question I suppose is:  is the resulting user code, using
>>> mostly ordinary string functions plus a little minimal utf8 tweaking,
>>> going to be significantly uglier/harder-to-maintain/confusing, to the
>>> point where using a heavier-weight abstraction might be worthwhile?
>>>
>>> My suspicion is that for most apps, the answer is no...
>>
>> You are my idol :)
>
> Indeed. For what it's worth, XMPP (based on a certain subset of XML)
> uses only UTF-8 encoding. This means that nearly all strings in
> Prosody are UTF-8 encoded. Yet we have no standard UTF-8 string
> library/API,

Yes, you do. You depend on expat to ensure inbound UTF-8 correctness.
Switch to the pure-Lua fallback parser and you lose that. I don't see
anything obvious to keep remote servers from aborting connections to
you when a similar "see, I don't need a UTF-8 library" local bot
accidentally puts an ISO 8859-1 string into status or a MUC topic or
something. I could be missing something.

My guess is people writing pure XML processors are generally in better
shape than the rest of the world because they do less string
manipulation, and they do less string manipulation because failure to
process XML in structured form means instant abort of readers. Well,
there are some people who don''t manipulate in infoset form, but they
tend to end up in discussions about how to use "&" in a string
argument. :-)

> I understand entirely that some applications *will* need to do
> operations on unicode strings all over the place. A text editor would
> be a good example, for instance.

Anybody manipulating text nodes in XML when the contents did not
originate in XML needs to be pretty careful. But again, this problem
tends to be self-limiting because of mandated reader behavior.

> Like everything else in Lua, I
> think the application developer can safely make the decision.

I think that's way too strong. Lua makes many decisions (no, you can't
have mutable strings; no, you can't have __index that triggers on
every access; yes, strings will end up as numbers sometimes) and
although you can simulate alternatives or implement them in external C
you eventually reach a point where you're just exercising the ability
of Turing-complete environments to be universal. You are not changing
Lua-the-language. Unless you are, but once you start patching the Lua
source, it's not really Lua at that point--or rather, it's no longer
merely an application developer decision as there is now a language
developer on board.

Jay