lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Hi List,

Yet another new release of ICU4Lua. First of all, this one is based on
ICU 4.2, instead of 4.0 as in previous versions.

The LuaForge files:

I made a change to the matching engine used by icu.ustring.match,
icu.utf8.match et al. By default, the character sets are still only in ASCII.
For example, %a matches only [a-zA-Z]. To match the full Unicode set of
letters, use %!a instead. This applies to all the other character classes as
well - prepend an exclamation mark to the character set letter to use the
Unicode version.

Below is a brief overview of the ICU functionality now wrapped by ICU4Lua. It
does not list all functions, just a subset of the most notable new ones. More
complete documentation is included in the release files.


icu.convert(string, current_encoding, new_encoding)
    Convert a Lua string encoded in one encoding to another.

    The name of the default codepage as detected by ICU.

    Open a collator for the given locale, which must be a Lua string
      (e.g. "de").
    If the collator could not be opened, returns nil and an error message.

icu.collator.strength(col[, new_value])
    Either sets the strength of the collator, or returns the current strength
      setting if no new value is given.
    Valid strength values are:

icu.collator.lessthan(col, a, b)
icu.collator.lessorequal(col, a, b)
icu.collator.equals(col, a, b)
    Functions for comparing ustrings a and b with the given collator.

    Open a StringPrep profile object. type can be one of:

icu.stringprep.prepare(profile, ustr)
    Prepare the given ustring according to the StringPrep profile.
    Returns either the prepared ustring, or nil and an error message.

    International Domain Names for Applications transformation. All take
    and return a ustring. toascii and tounicode are for converting
    individual domain labels (e.g. "www", "lua" or "org") while
    idntoascii and idntounicode are for full domain names ("").

Regular Expressions
icu.regex.compile(pattern[, flags])
    Creates a new compiled regex pattern object. For details on the syntax
      supported by ICU, see <>
    pattern can be a ustring or a Lua string. If a Lua string, it is expected
      to be encoded in the default codepage.
    Supported flags include:
        i  (Case insensitive)
        x  (Comments mode)
        s  (The dot '.' matches all characters including new lines)
        m  (Multiline mode)

icu.regex.match(regex, text[, start_index])
    Find the first place where the regex matches the given text, optionally
      starting the search at the given start index (one-based).
    The returned value is either false if no match was found, or a match
      object that contains these named fields:
    * value: The matching substring, as a ustring
    * start,
    * stop: Substring indices in the source text (one-based, inclusive)
    ...also, match[1] to match[n] are captures within the match, which have
      the same named fields. match[0] is the match itself again.

icu.regex.gmatch(regex, text)
    Returns an iterator over all of the matches found for a compiled regular
      expression, designed to be used in a for-loop:

    for match in icu.regex.gmatch(myRegex, inputText) do
        -- the "match" object is the same as described in the documentation
        -- for icu.regex.match()

icu.regex.replace(regex, text, replacement)
    Find all places where the given regular expression matches in text,
      replace them with a new value, and return the result.
    text must be a ustring, and replacement must be one of the following:-
    * A ustring. You can use $0, $1, $2 etc to use captured substrings from
      the match (and $$ for a literal dollar sign).
    * A table. It will be indexed with the entire matching substring (as a
      ustring), and the value found must be either a new ustring or nil/false.
    * A function. It will be called with a single parameter, a match object
      as described in the documentation for icu.regex.match(). It must
      return either a ustring or nil/false.

icu.regex.split(regex, text[, maximum])
    Returns an array of the substrings found by splitting text (which must be
      a ustring) using the given regex, with an optional maximum number of


That's it for now, thanks for reading.