|
Le 21/11/2016 à 08:08, TW a écrit :
As said, I'm in fact more interested with an Esperanto localization, though I have no fundamental opposition to localization in various native languages, it's not my main interest. Esperanto is far easier to learn than many, in not any, other spoken language out there. In fact it seems that, in the same amount of time of learning only one language such as English, French or German, learning Esperanto first then an other language make people more skilled at last one studied alone. I won't spend to much time advocating this topic here, you can look at the Wikipedia's Esperanto article for more sources on this topic, and Esperanto myth busting essays of Claude Piron which give you point point of view of someone who worked as translator for UN. Note that I'm open to further discussion on this topic, but if I don't mistake, this list is not the fittest place for such a conversation, is it?Can you elaborate on the benefits of programming language localization? I am quite skeptical about this. Learning a programming language is like learning a new language and IMO it hardly makes it more difficult to learn the English keywords. But learning a localized programming language makes it harder to apply one's gathered knowledge beyond private projects or other programming languages. It's also much harder to find help. For more (mostly academic) sources about use of localized programming language and compilers, here is a few links:
I didn't used other systems over the last years, but I doubt there are many system which doesn't have variable environment, aren't there?I'm less fundamentally skeptical concerning the internationalization of error messages, though there are problems as well - like finding help by feeding a search engine with a localized error message. I usually use `LANG=C` when I want help on errors, so this is not a big problem, at least not on *nix systems. I agree. Now, note that this formulation tacitly imply that the opposite situation is better, and with that I don't agree. This is just a situation where you have to chose between two solutions which both are their cons and pros. Sure, letting whole latitude to diversity always come with a cost, but the cult of standardized monoculture to. Again, I'm not encouraging further debate over this on this mailing list. Please answer in private or suggest a fitter public canal. This hold for anything in this answer which is not directly related to Lua-i18n, Lupa, or Mallupa, unless a wide consensus is express to do otherwise in some way.The bigger problem with internationalization is that in theory it is good, but alas, in practice, translations are frequently hair-raising, confusing and misleading. Again, I do agree, and in my translation I do take time to make not only translation which is relevant to the given context, but which also provides a coherent lexicon so that semantic proximity is reflected by lexical proximity. I also tend to take the length of the lexems in the equation, privileging shorter words where the lexem stay meaningful and longer lexems when existing shorter options are meaningless. For the babylscript's Esperanto translation, you might consult the token translation document and the error message document, which both provide explanation of proposed choices, and other alternatives. If you have feedback to improve this documents, please do it in documents themselves. For other Babylscript translation, I'm not involved, but a concrete example of what you mean would be welcome.Commercial applications might be doing slightly better on average than open source applications, depending on their budget. Quality assurance is hard as maintainers don't understand all the languages. My impression is that in many cases, well-meaning enthusiasts without deeper knowledge of the technical terminology in the respective domain just do a quick translation without too much thought. I looked at a Babylscript translation that is a perfect example of that. I hope no one ever attempted to learn or will learn _javascript_ using that translation. Translation is hard and requires quite some thought. In programming language design, especially much thought should have been spent on the choice of the original keywords. To my mind, that's sounds more like an egg-and-chicken problem than a fundamental problem of localized programming languages.So, to really let people benefit from translations, true experts in both languages, the subject to translate and the respective technical terminologies in both languages are needed. And I'm not at all convinced that one does people a favor by translating programming languages. I think it becomes harder rather than easier to learn the language because learning resources are very limited or not existent at all. There is a plethora of books and free material on e.g. _javascript_ in many languages, some of it excellent, but probably hardly any material for translated versions of _javascript_/Babylscript, if any at all. No offense. Constructive critics comments and questions are always welcome. Thank you for having taken time to give some feedback.Sorry for the negativity - I hope my reasoning has been rational and unoffensive. I'd encourage translations of messages with advice to install a thorough review process before unleashing things on confused users. Thomas W. 2016-11-20 22:20 GMT+01:00 mathieu stumpf guntz <psychoslave@culture-libre.org>:Hello, if you are interested in Lua, internationalization and possibly programming languages localization, you might be interested in this thread. A bit of background You can skip this section if you are less interested in the human story stuffs and more interested on technical aspects of Lua internationalization. My initial motivation, was to have a programming languages that only use phonetic signs. Well, one way to do that is to assign an unique phonetic value to each sign usable in the programming language. For example in Lua which currently can only use ASCII tokens, a 256 sign mapping is enough (string content apart). That rather easy, and you can even easily have a monosyllabic map. For example in my native language we have 16 vowels (V) and and 20 consonant (C), that's more than enough to make a CV (or VC) sign for each ASCII element. A quick and dummy mapping would just assign them following some arbitrary order, and a somewhat less dummy solution would try to make a mapping with some mnemonic relation with the usual ASCII denoted sign. But even then, to my mind that would stay a rather impractical solution for anything useful. So I looked for a spoken language which had a phonetic transcription and possibly would have in bonus friendly morpho-syntaxic properties for a programming language use case. On this regards Lojban might be a good choice I guess, at least it passed within my radars. But an other aspect regarding usefulness is the number of speaker. On that point, without forgetting the previous one, Esperanto make a better candidate. So I began to write research projects about it, namely Fabeleblaĵo and Algoritmaro, but in my native language as I didn't feel skilled enough in Esperanto at the time. Most recently I transferred some courses of International Academy of Sciences San Marino, which use Esperanto as a common working language. Indeed, as I wanted to begin to translate my still in progress works on Fableblaĵo and Algoritmaro to Esperanto, I discovered that the Esperanto version of Wikiversity is still in beta version. I'm trying to change that by adding some courses and in the process so creating useful wiki templates and make feedback and tickets. The later are grouped on Wikimedia phabricator Eliso tag (Eliso stands for Esperanto kaj LIbera Scio, ie. Esperanto and free knowledge). For now, I completely finished only one wikification the course on internationalization. I also made an Esperanto localization for Babylscript. I'm not completely satisfied with this solution, as JS, at least the version implemented in the Rhino branch from which Babyscript is derived, doesn't even allow you to import other scripts, which is a huge restriction. Then, as a Wikimedia contributor, I met Lua, which is used there to create frontend editable modules, as you may know. So came the wish to make an Esperantist version of Lua. Lupa and Mallupa For those who are really only interested on Lua internationalization, you might skip this section and its subsection. The current section mainly focus on presenting (still in progress) project which so far took more an approach of direct localization to Esperanto, problems encountered and solutions used or considered. Lupa So far Lupa aims to provide an Esperantist version of Lua. At first I just wanted to make it a pure Esperanto version of Lua. Now, thanks to Luiz Henrique de Figueiredo advises and implementation suggests, I already shifted from a complete replacement of keywords to a more backward compatible approach which only aliases for misc. built in tokens. The current implementation is not in a sane state, as for example a simple single : will make lupe (the lua interpreter counterpart) crash. Still, it already enable to write some little peace of code like se 1 plus 1 egalas 2 tiam printu 'tio estas bona aritmetiko' hop, and it works. As Esperanto as a very regular grammar, unlike much spoken languages out there, parsing it is a rather practicable task. Even without such a support, you can already make most statements coinciding with Esperanto semantically sound sentences, if you chose your tokens carefully. That's an other driving criteria behind the list of lexems translations on the project wiki. For example one can write the statement tabelo['enigo'] = 3 as tabelo kies 'enigo' ero iĝu 3, the latter also being an plain Esperanto sentence meaning "table whose element 'entry' become 3". This Esperanto version is a bit longer than it's graphemo-ideographic mix up counterpart, but keep in mind that the tokens are only aliased so one can also use the former mixup. Also note that plain Esperanto also offer shorter ways to express the same thing, like tabelenigiĝu 3, or in a more parser friendly version which is still valid Esperanto tabel-enig-iĝu 3. But of course, that kind of syntax can't be treated within the scope of mere relexicalisation. Even sticking to the scope of "static aliases only", there are still some problem to localize Lua toward Esperanto. First, Lua doesn't provide support for Unicode in identifiers and other built in tokens. Esperanto, do have a set of non-ASCII characters in it's usual way to write, namely ĉ, ĝ, ĥ, ĵ, ŝ and ŭ. But when it's not possible to use them, it's a recognized practice to append -h or -x the the letter without it's diacritic. As "x" isn't part of Esperanto alphabet, it's less problematic regarding possible lexem collisions. So far, Lupa use the -x notation to circumvent the script encoding limitation. A minor problem is that, as far as Esperanto is concerned, number normally use coma as a decimal separator rather than a dot, at least if you refer to most authoritative sources. It's minor in the sense that in practice, usage vary, and not every Esperantist take great care of typographic "subtlety". On the technical side, it's more annoying as 3,14 do have a well defined completely different meaning in Lua. Babylscript for example propose to use space separated coma to resolve ambiguity for the similar case of French. As far as I'm concerned, I would rather use a token like plie (and ... as well, and also, together with) as list separator operator. On a broader internationalization perspective, the number recognition of the lexer would require far more thought to support more diverse numbering system, such as १.६ for Hindi. Future development in Lupa should somewhat reverse it's approach to modify the official interpreter as little as possible. Hence the a Lua-i18n project presented bellow, which should focus on providing internationalization facilities, ideally with an approach that allow to build on top of it other tools which are flexible enough to support some syntactic changes. Lupa then could base it's later evolutions on top of this Lua-i18n. Mallupa While Lupa modify directly Lua, Mallupa just translate a localized dialect to a plain old lua script. Currently it uses ltokenp, which itself reuse the Lua lexer, to retrieve lexems. And it includes a Lupa dialect, which already provide more feature than Lupa. As the main part of the code is in Lua, it make the development far more easier. On the other hand it comes with it's cons, it's a source-to-source compiler, so it make debugging harder due to the additional layer of translation. As it rely on the Lua Lexer, there are some flexions which still can't be performed. In particular I wanted to add the support for the numeral suffix "-a" to digit which make sense regarding table locations. But a string like 1a will be taken as a malformed number by the lexer and it will never reach the dialect converter script. To avoid that, ever the lexer should be changed, or the project should rely on an other lexer. Lua-i18n So, Lua-i18n is focused on providing internationalization facilities by modifying as less as possible to official Lua release to do so. Some relative issues have been added and described on the project page. Internationalization of built in messages Internationalization of built in tokens Unicode support For the last one, Luiz suggested me the following: A hack to allow unicode identifiers is to set chars over 128 be letters. You can do this by editing lctype.c. Ask in the mailing list about this. He also provided me the attached file with this comment: Here is what I had I mind for a token filter in C. This piece of C code centralizes all needed changes. Just add <<#include "proxy.c">> just before the definition of luaX_next in llex.c. That's the only change in the whole Lua C code. So, so far I can't tell I miss help or a path that need more deep exploring, and I thank again Luiz for all this. But still, if you are interested in Lua-i18n, have any advice, comment, or question, please feel free to reply there or add it in the relevant project issue tracker. Kind regards, Mathieu |