[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: is string.gmatch(), string.upper() 7-bit ascii only?
- From: Marc Balmer <marc@...>
- Date: Thu, 7 Apr 2016 16:40:08 +0200
> Am 07.04.2016 um 15:27 schrieb Roberto Ierusalimschy <email@example.com>:
>> I am trying to manipulate text with umlauts. string.upper() does not produce upper case version of umlauts like ä,ö,ü etc.
>> Also the %g pattern, when used in string.gmatch() does not match these umlauts.
>> Is there anything that can be done about it? Or, am I making a stupid mistake?
> The string library assumes one-byte character encodings.
> If you are using an 8-bit encoding (e.g., LATIN 1), then these should
> work, given a proper locale. Otherwise (e.g., UTF-8), you will need an
> external library.
Well, at least on an Ubuntu 14.04 system it does not work. But I don't blame Lua if the underlying OS supplied toupper() C function doesn't do the job, of course:
$ sudo locale-gen de_CH
Lua 5.3.2 Copyright (C) 1994-2015 Lua.org, PUC-Rio
> = os.setlocale('de_CH.ISO-8859-1')
> = string.upper('äöü')
(Expected is 'ÄÖÜ')