lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Can you elaborate on the benefits of programming language
localization?  I am quite skeptical about this.  Learning a
programming language is like learning a new language and IMO it hardly
makes it more difficult to learn the English keywords. But learning a
localized programming language makes it harder to apply one's gathered
knowledge beyond private projects or other programming languages. It's
also much harder to find help.

I'm less fundamentally skeptical concerning the internationalization
of error messages, though there are problems as well - like finding
help by feeding a search engine with a localized error message.  I
usually use `LANG=C` when I want help on errors, so this is not a big
problem, at least not on *nix systems.

The bigger problem with internationalization is that in theory it is
good, but alas, in practice, translations are frequently hair-raising,
confusing and misleading.  Commercial applications might be doing
slightly better on average than open source applications, depending on
their budget.  Quality assurance is hard as maintainers don't
understand all the languages.  My impression is that in many cases,
well-meaning enthusiasts without deeper knowledge of the technical
terminology in the respective domain just do a quick translation
without too much thought.  I looked at a Babylscript translation that
is a perfect example of that.  I hope no one ever attempted to learn
or will learn JavaScript using that translation.  Translation is hard
and requires quite some thought.  In programming language design,
especially much thought should have been spent on the choice of the
original keywords.

So, to really let people benefit from translations, true experts in
both languages, the subject to translate and the respective technical
terminologies in both languages are needed.  And I'm not at all
convinced that one does people a favor by translating programming
languages. I think it becomes harder rather than easier to learn the
language because learning resources are very limited or not existent
at all.  There is a plethora of books and free material on e.g.
JavaScript in many languages, some of it excellent, but probably
hardly any material for translated versions of JavaScript/Babylscript,
if any at all.

Sorry for the negativity - I hope my reasoning has been rational and
unoffensive.  I'd encourage translations of messages with advice to
install a thorough review process before unleashing things on confused
users.

Thomas W.



2016-11-20 22:20 GMT+01:00 mathieu stumpf guntz <psychoslave@culture-libre.org>:
> Hello, if you are interested in Lua, internationalization and possibly
> programming languages localization, you might be interested in this thread.
>
> A bit of background
>
> You can skip this section if you are less interested in the human story
> stuffs and more interested on technical aspects of Lua internationalization.
>
> My initial motivation, was to have a programming languages that only use
> phonetic signs.
>
> Well, one way to do that is to assign an unique phonetic value to each sign
> usable in the programming language. For example in Lua which currently can
> only use ASCII tokens, a 256 sign mapping is enough (string content apart).
> That rather easy, and you can even easily have a monosyllabic map. For
> example in my native language we have 16 vowels (V) and and 20 consonant
> (C), that's more than enough to make a CV (or VC) sign for each ASCII
> element. A quick and dummy mapping would just assign them following some
> arbitrary order, and a somewhat less dummy solution would try to make a
> mapping with some mnemonic relation with the usual ASCII denoted sign. But
> even then, to my mind that would stay a rather impractical solution for
> anything useful.
>
> So I looked for a spoken language which had a phonetic transcription and
> possibly would have in bonus friendly morpho-syntaxic properties for a
> programming language use case. On this regards Lojban might be a good choice
> I guess, at least it passed within my radars. But an other aspect regarding
> usefulness is the number of speaker. On that point, without forgetting the
> previous one, Esperanto make a better candidate. So I began to write
> research projects about it, namely Fabeleblaĵo and Algoritmaro, but in my
> native language as I didn't feel skilled enough in Esperanto at the time.
>
> Most recently I transferred some courses of International Academy of
> Sciences San Marino, which use Esperanto as a common working language.
> Indeed, as I wanted to begin to translate my still in progress works on
> Fableblaĵo and Algoritmaro to Esperanto, I discovered that the Esperanto
> version of Wikiversity is still in beta version. I'm trying to change that
> by adding some courses and in the process so creating useful wiki templates
> and make feedback and tickets. The later are grouped on Wikimedia
> phabricator Eliso tag (Eliso stands for Esperanto kaj LIbera Scio, ie.
> Esperanto and free knowledge). For now, I completely finished only one
> wikification the course on internationalization.
>
> I also made an Esperanto localization for Babylscript. I'm not completely
> satisfied with this solution, as JS, at least the version implemented in the
> Rhino branch from which Babyscript is derived, doesn't even allow you to
> import other scripts, which is a huge restriction. Then, as a Wikimedia
> contributor, I met Lua, which is used there to create frontend editable
> modules, as you may know. So came the wish to make an Esperantist version of
> Lua.
>
> Lupa and Mallupa
>
> For those who are really only interested on Lua internationalization, you
> might skip this section and its subsection. The current section mainly focus
> on presenting (still in progress) project which so far took more an approach
> of direct localization to Esperanto, problems encountered and solutions used
> or considered.
>
> Lupa
>
> So far Lupa aims to provide an Esperantist version of Lua. At first I just
> wanted to make it a pure Esperanto version of Lua. Now, thanks to Luiz
> Henrique de Figueiredo advises and implementation suggests, I already
> shifted from a complete replacement of keywords to a more backward
> compatible approach which only aliases for misc. built in tokens. The
> current implementation is not in a sane state, as for example a simple
> single : will make lupe (the lua interpreter counterpart) crash.
>
> Still, it already enable to write some little peace of code like se 1 plus 1
> egalas 2 tiam printu 'tio estas bona aritmetiko' hop, and it works.
>
> As Esperanto as a very regular grammar, unlike much spoken languages out
> there, parsing it is a rather practicable task. Even without such a support,
> you can already make most statements coinciding with Esperanto semantically
> sound sentences, if you chose your tokens carefully. That's an other driving
> criteria behind the list of lexems translations on the project wiki. For
> example one can write the statement tabelo['enigo'] = 3 as tabelo kies
> 'enigo' ero iĝu 3, the latter also being an plain Esperanto sentence meaning
> "table whose element 'entry' become 3". This Esperanto version is a bit
> longer than it's graphemo-ideographic mix up counterpart, but keep in mind
> that the tokens are only aliased so one can also use the former mixup. Also
> note that plain Esperanto also offer shorter ways to express the same thing,
> like tabelenigiĝu 3, or in a more parser friendly version which is still
> valid Esperanto tabel-enig-iĝu 3. But of course, that kind of syntax can't
> be treated within the scope of mere relexicalisation.
>
> Even sticking to the scope of "static aliases only", there are still some
> problem to localize Lua toward Esperanto. First, Lua doesn't provide support
> for Unicode in identifiers and other built in tokens. Esperanto, do have a
> set of non-ASCII characters in it's usual way to write, namely ĉ, ĝ, ĥ, ĵ, ŝ
> and ŭ. But when it's not possible to use them, it's a recognized practice to
> append -h or -x the the letter without it's diacritic. As "x" isn't part of
> Esperanto alphabet, it's less problematic regarding possible lexem
> collisions. So far, Lupa use the -x notation to circumvent the script
> encoding limitation.
>
> A minor problem is that, as far as Esperanto is concerned, number normally
> use coma as a decimal separator rather than a dot, at least if you refer to
> most authoritative sources. It's minor in the sense that in practice, usage
> vary, and not every Esperantist take great care of typographic "subtlety".
> On the technical side, it's more annoying as 3,14 do have a well defined
> completely different meaning in Lua. Babylscript for example propose to use
> space separated coma to resolve ambiguity for the similar case of French. As
> far as I'm concerned, I would rather use a token like plie (and ... as well,
> and also, together with) as list separator operator. On a broader
> internationalization perspective, the number recognition of the lexer would
> require far more thought to support more diverse numbering system, such as
> १.६ for Hindi.
>
> Future development in Lupa should somewhat reverse it's approach to modify
> the official interpreter as little as possible. Hence the a Lua-i18n project
> presented bellow, which should focus on providing internationalization
> facilities, ideally with an approach that allow to build on top of it other
> tools which are flexible enough to support some syntactic changes. Lupa then
> could base it's later evolutions on top of this Lua-i18n.
>
> Mallupa
>
> While Lupa modify directly Lua, Mallupa just translate a localized dialect
> to a plain old lua script. Currently it uses ltokenp, which itself reuse the
> Lua lexer, to retrieve lexems. And it includes a Lupa dialect, which already
> provide more feature than Lupa. As the main part of the code is in Lua, it
> make the development far more easier. On the other hand it comes with it's
> cons, it's a source-to-source compiler, so it make debugging harder due to
> the additional layer of translation.
>
> As it rely on the Lua Lexer, there are some flexions which still can't be
> performed. In particular I wanted to add the support for the numeral suffix
> "-a" to digit which make sense regarding table locations. But a string like
> 1a will be taken as a malformed number by the lexer and it will never reach
> the dialect converter script. To avoid that, ever the lexer should be
> changed, or the project should rely on an other lexer.
>
> Lua-i18n
>
> So, Lua-i18n is focused on providing internationalization facilities by
> modifying as less as possible to official Lua release to do so.
>
> Some relative issues have been added and described on the project page.
>
> Internationalization of built in messages
> Internationalization of built in tokens
> Unicode support
>
> For the last one, Luiz suggested me the following:
>
> A hack to allow unicode identifiers is to set chars over 128 be letters.
> You can do this by editing lctype.c.
> Ask in the mailing list about this.
>
> He also provided me the attached file with this comment:
>
> Here is what I had I mind for a token filter in C. This piece of C code
> centralizes all needed changes. Just add <<#include "proxy.c">> just
> before the definition of luaX_next in llex.c. That's the only change in
> the whole Lua C code.
>
> So, so far I can't tell I miss help or a path that need more deep exploring,
> and I thank again Luiz for all this. But still, if you are interested in
> Lua-i18n, have any advice, comment, or question, please feel free to reply
> there or add it in the relevant project issue tracker.
>
>
> Kind regards,
> Mathieu